fix for html numeric entities in descriptions

git-svn-id: svn://svn.berlios.de/gpodder/trunk@287 b0d088ad-0a06-0410-aad2-9ed5178a7e87
This commit is contained in:
Thomas Perl 2007-03-21 20:34:09 +00:00
parent 571c20d03c
commit a366ea1582
2 changed files with 11 additions and 1 deletions

View File

@ -1,3 +1,9 @@
Wed, 21 Mar 2007 21:32:43 +0100 <thp@perli.net>
* src/gpodder/libpodcasts.py: Convert numeric HTML entities to
symbolic entities so these will be replaced with the right unicode
character (for podcast descriptions); thanks to Gerrit Sangel
(newsletter sangel.eu) for reporting this bug on gpodder-devel
Tue, 20 Mar 2007 19:47:14 +0100 <thp@perli.net>
* data/po/nl.po: Added Dutch translation from Pieter De Decker
(pdedecker gmail com)

View File

@ -554,10 +554,14 @@ def channelsToModel( channels):
def stripHtml( html):
# strips html from a string (fix for <description> tags containing html)
dict = htmlentitydefs.entitydefs
rexp = re.compile( "<[^>]*>")
stripstr = rexp.sub( "", html)
# replaces numeric entities with entity names
dict = htmlentitydefs.codepoint2name
for key in dict.keys():
stripstr = stripstr.replace( '&#'+str(key)+';', '&'+unicode( dict[key], 'iso-8859-1')+';')
# strips html entities
dict = htmlentitydefs.entitydefs
for key in dict.keys():
stripstr = stripstr.replace( '&'+unicode(key,'iso-8859-1')+';', unicode(dict[key], 'iso-8859-1'))
return stripstr