00235: rss feed doesn't work with utf-8

Summary: rss feed doesn't work with utf-8
Created: 2004-12-16 09:44
Status: Closed
Category: Bug
From: shouda?
Assigned:
Priority: 444
Version: 2.0.beta10
OS:

Description: rss feed doesn't produce valid titles, descriptions and links to pages with UTF-8 characters. See http://www.pmwiki.org/wiki/UTF8/RecentChanges?action=rdf http://www.pmwiki.org/wiki/UTF8/RecentChanges?action=rss


Non-ascii characters are just a pain for RSS in general, so it may be a short while before I can resolve this one. Anyone have any useful references about placing UTF-8 into RSS feeds?

Some of the problems here may also be related to PITS:00129. --Pm


Chinese blog with UTF-8 encode produce a line '<?xml version="1.0" encoding="utf-8"?>' at top of rss feed. See [(approve links) edit diff] --shouda


2.0.beta15 changes the rss script a fair bit -- does this problem still exist? --Pm

still not working. --shouda

From: Hermann Hartenthaler
Version: 2.0beta17

Perhaps I have the same problem. The RSS-feed looks nice, but using Abilon as RSS reader results in a crash of this program trying to open the content. An analysis results in:

 This feed does not validate.
Your feed appears to be encoded as "UTF-8", but your server is reporting "US-ASCII" line 4, column 52: 'utf8' codec can't decode bytes in position 153-158: unsupported Unicode code range (maybe a high-bit character?) <title>Um- und Ausbauvorhaben der TU Berlin f? Deutschen Telekom Labor ... ^ Source: …/pmwiki.php/Main/AllRecentChanges?action=rss <?xml version="1.0"?> <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/"> <channel> <title>Um- und Ausbauvorhaben der TU Berlin f? Deutschen Telekom Laboratories im Telefunken-Hochhaus - Main.AllRecentChanges</title> …

The reason seems to be the umlaut "ü" (marked as ?)


The problem on pmwiki.org has to do with the fact that it's a mixed characters set environment; some pages are UTF-8 while others are ISO-8859-1 (and still others are other languages), so it's very difficult to get the AllRecentChanges page to display the mixed character sets correctly.

It's not normally a problem for other sites that are running pmwiki in a single character set, so if there's a problem outside of what I've mentioned here then let's go ahead and open a new PITS entry for it.

--Pm