00588: URL-encoding of non-ASCII characters in file links

Summary: URL-encoding of non-ASCII characters in file links
Created: 2005-11-11 11:27
Status: Open
Category: Bug
From: Henning
Assigned:
Priority: 5
Version: 2.1.27
OS: Windows XP/Apache 2

Description: (Originally: Version: 2.0.11)

After moving the Wiki server from Linux/Apache 2 to Windows XP/Apache 2, file names with %-encoded Umlauts yield a 403 error.

If I manually type the Umlaut into the MS Explorer address field, avoiding the %-encoding, the file is served.

This problem appears to be the mirror image of 00053.

in the Apache Bugzilla database, I found the following entry dealing with the problem:

http://issues.apache.org/bugzilla/show_bug.cgi?id=24333

Summary (according to my understanding, which is confused): %-encoding as currently implemented does not result in a defined Unicode character as no Unicode character set is defined. However, Windows XP apparently needs this unicode character to find the file in question.

(I'm not using UTF-8 encoding in my wiki, and enabling it seems to break Umlaut support for wiki page name handling, so I didn't experiment with it any further.)

I'm not sure if this is a bug, I suspect that I have mis-configured something, but I can't figure out what exactly.

--Henning November 11, 2005, at 11:30 AM


The Apache error.log doesn't seem to log the occurence of the problem.

Here is what access.log provides for a file "Großgut.xls" with an "szlig" = "ß" character:

  • Uploading:
    • GET /pmwiki/pmwiki.php/More/Kommunikation?action=upload&upname=Teinehmer_Kom_Gro\xdfgut.xls
    • GET /pmwiki/pmwiki.php/More/Kommunikation?action=upload&uprname=Teinehmer_Kom_Gro\xdfgut.xls&upresult=success
  • Downloading attempt with PmWiki generated URL:
    • URL: http://ddd.tl-home.direct.de.danzas.com/pmwiki1/pub/uploads/More.Kommunikation/Teinehmer_Kom_Gro%dfgut.xls
    • Apache log: GET /pmwiki1/pub/uploads/More.Kommunikation/Teinehmer_Kom_Gro%dfgut.xls - fails
  • Downloading attempt with manually typed URL in browser address window
    • URL: http://ddd.tl-home.direct.de.danzas.com/pmwiki1/pub/uploads/More.Kommunikation/Teinehmer_Kom_Großgut.xls
    • Apache log: GET /pmwiki1/pub/uploads/More.Kommunikation/Teinehmer_Kom_Gro%C3%9Fgut.xls - succeeds

The site search just found me an older contribution about Encoding URIs? that suggest the problem I'm observing might result from a lack of standardization of the char set used in browser-server negotiation. Doesn't help me though, other than it's better to feel misunderstood by the world than vice versa ;-) Henning November 14, 2005, at 12:01 PM


I just found that by setting $EnableDirectDownloads = 0, I can make downloads work again.

As a side effect, I had to change the markup for attached HTML files to uploadpath_intermap_entry:path/name as they would not link to other attachments with direct downloads enabled.

As this is a bit of a workaround because direct downloads might not be desired by everyone, I'll leave this issue open, but I have reduced my priority vote because right now, everything appears to be working fine for me.

--Henning November 14, 2005, at 12:46 PM


Now I discovered that using uploadpath_intermap_entry:path/name to link other pages' attachments does not benefit from the direct download functionality, so there is still a danger of authors creating supposedly correct links that won't work.

It appears that the following feature does not work at all in my installation:

To link to attachments on another page or WikiGroup, use Attach:PageName/file.ext or Attach:Group.PageName/file.ext

Is it necessary to enable it somehow, or should it work right out of the box?

--Henning November 15, 2005, at 06:39 AM


I have found still other problems of my workaround, for example 00597 and 00599, so my priority estimate is back up to 5.

--Henning November 18, 2005, at 10:17 AM


Wow, after looking into it, this bug really really bites.

First, thanks for the link to the Apache bug report. After reading it, I'm convinced that while Andre Malo may be correct according to one of the standards he's wrong from an implementation perspective.

If I do as recommended there (i.e. generate utf8-encoded urls), then it'll work fine for Apache 2 on Windows, but break everywhere else (Apache 2 on Unix, Apache 1 on Windows, other servers). That's just wrong.

It looks like the only fix for this will be to provide a switchable option on PmWiki 2 to generate utf8-encoded urls for attachments. Ugh.

I'll have to see what I can come up with...

--Pm


Thanks! That one really had me worried :-)

Now that I'm sure it's not a configuration problem, I guess I'm going to try to find an interim solution and set up mod_rewrite to catch the 7 German special characters ("ÄÖÜäöüß") so that I can use direct downloads again. (I'm Apache-ignorant, but I'm confident that I can figure it out :-)

I suppose due to the extensive Unicode character set size, relying on mod_rewrite would not be a good option for a definite solution to the problem?

--Henning November 22, 2005, at 06:59 AM


Oh, I think I found a PmWiki bug that may resolve this issue! Is there some way that I could send you a new version of upload.php to test on your site?

--Pm November 22, 2005, at 02:43 PM


My e-mail address is ... :-)

--Henning November 22, 2005, at 05:02 PM


Thanks again for the improved version of upload.php, but I'm afraid it did not fix the bug. I'll try the mod_rewrite workaround as soon as I get the necessary access rights for my server.

--Henning January 20, 2006, at 11:00 AM


Hm, it seems with UploadVariables#UploadNameChars there now is a way to prevent new uploads from using problematic characters. Not pretty but effective :-) Still, I've got about 4000 already uploaded files using Umlauts, so I'm still interested in a fix. (I haven't been able to try mod_rewrite yet due to server access rights issues.)

--Henning February 21, 2006, at 07:09 AM


Just an update: The problem still exists (in pmwiki-2.1.27), and as my server is having some performance problems currently, I'm more worried about it than ever. Would it be worth a try to change to a 2.2 version to see if it improves things? --Henning October 11, 2007, at 11:58 AM


Here is an example for what happens on a download attempt for a file with Umlauts in its name (if directdownloads are enabled):

  • "GET /pmwiki1/pub/uploads/SalesMarketing/L%fcdwigs%e4u.jpg HTTP/1.1" 403 357

I'm not sure why the result is 403 Forbidden ...

I tried entering the file system path with the exact %-markup as shown above (replacing only the forward slashes by backslashes to keep Windows happy), and the file system found me the file just fine.

--Henning October 11, 2007, at 12:55 PM