I have some text with HTML tags and properties. I need to strip out the HTML parts and leave plain text. I'm using the HTMLStripper which does the job pretty good, but replaces accented letters with non-accented ones plus adds question marks.
How this behaviour can be fixed? Text example (before and after):
<p>Iš liet. <em>šveñtas</em>, <em>-à </em>‘<span class="g7"><span class="g7">pagarbiai saugomas, laikomas; iš pagarbos neliečiamas</span></span>’.</p>
Iš liet. šven?tas, -a? ‘pagarbiai saugomas, laikomas; iš pagarbos neliečiamas’.