00053: Support for URL-encoding of spaces and non-ASCII characters in file links

Summary: Support for URL-encoding of spaces and non-ASCII characters in file links
Created: 2004-09-23 09:51
Status: Closed - fixed in 1.0.12
Category: Bug / Feature request
From: Henning
Assigned:
Priority: 44
Version: 1.0.5
OS: Linux/Apache 2

Description:

Support for URL-encoding of spaces and non-ASCII characters in file links

To enable naive authors to easily upload files, PmWiki should ensure that every file name they might conceive leads to a valid download link.

Current show-stoppers:

  1. Spaces often are valid OS characters, but break PmWiki file names
  2. Umlauts often are valid OS characters and valid PmWiki page names, but break PmWiki file names.
  3. Valid PmWiki page names containing Umlauts lead to invalid file URLs if per-page uploads are enabled even for Umlaut-less file names.

Feature suggestion:

It would be desirable to add a filter for URLs that encodes the offending characters for PmWiki's HTML output.

Background:

My bunch of naive authors loves uploading MS Office documents (and overwriting them with updated versions), so this filter would get considerable mileage ;-) Roughly 80% of the PmWiki-related phone calls I get are of the "Upload didn't work" variety.

Could you upload files and/or create pages demonstrating these problems onto http://www.pmwiki.org/wiki/Test/Test ? (Create new pages as needed there.) Either that or point me to a URL that shows the problem. I believe that there is indeed a problem, but I can't easily see what in the code might be breaking and some real examples would really help. --Pm

Sorry, poor choice of words on my part. The code doesn't actually break anything. It's more that it fails to fix something :-)

Here's an example:

  • Source of page: [[Attach:filenäme.txt]]
  • Current URL: http://www.somewiki.com/pub/uploads/somegroup/filenäme.txt (<= doesn't work)
  • Desired URL: http://www.somewiki.com/pub/uploads/somegroup/filen%e4me.txt (<= would work)

Example for page name complication if save-per-page is used:

  • Page name source: {{Page Näme}}
  • Page name file name: somegroup.PageNäme
  • Upload file source: [[Attach:filename.txt]] (<= no Umlauts here)
  • Current URL: http://www.somewiki.com/pub/uploads/somegroup.PageNäme/filename.txt (<= doesn't work due to Umlauts in page name)
  • Desired URL: http://www.somewiki.com/pub/uploads/somegroup.PageN%e4me/filename.txt (<= would work)

(Note that the space in the page name is already handled (by stripping), but other non-URL-suitable characters are not handled in any way.) --Henning


I've created a demo page at http://www.pmwiki.org/wiki/Test/PageNäme . Due to the setup of the Test area, it can only demonstrate one of the three forms of the problem though.

--Henning


Copy of discussion from 00024:

Umlauts work for me in page names, but not in the URLs of uploaded files. (I think you have to use ?action=upload to generate file names with umlauts.) This applies to other characters as well, most notably the space.

As it seems uploads' URLs are treated like links and different from page names, is my Umlaut problem the same issue as described here? --Henning

Yes, special characters outside of the ASCII character set aren't allowed in URLs (and by extension, attach-links) unless they're url-encoded. There are a couple of reasons for this: first, that's what the URL standard says, and second, it's not clear to me that it's "safe" to allow non-ASCII characters in filenames under the various operating systems. But point me to a page where the umlauts aren't working and I'll see what I can do. --Pm

Today I noticed that switching to per-page file storage (Cookbook/UploadGroups) has added to the problem. Many of the pages naturally feature Umlauts in their names, making the corresponding file URLs invalid even after a successful upload.

(Actually, spaces in file names have the same effect as Umlauts, invalidating the URL. Per-page storage is not affected by spaces in page names because they are stripped automatically.)

For my users, the file name issue is a major source of frustration, so I'm somewhat desperate for a technical solution.

It would be perfect if the PmWiki-generated HTML for file links could be expanded for URL encoding. I can't say much about operating system standards, but Windows and Linux at least handle the Umlauts well.

A conservative solution would be to simply strip the Umlauts (and spaces) from the file names on upload. (Only downside: The users then would have to pay attention to avoid involuntary file overwrites if a stripped file name comes out the same as an existing file name.)

Maybe I should post this somewhere else as a feature request? :-) -Henning

Yes, create a new issue for this one, since it really is different from the original issue listed here. --Pm

I've just met this problem two days ago, when I enabled uploads in my PmWiki (1.0.8, WinXP, PHP 4.3.1). I have a group named in Russian (win-1251 encoding), so when uploading a file (ex. techdesc.mht) the path for this file turns out to be /wiki/uploads/<russian-group-name>/techdesc.mht. Group name is not url-encoded, so the file can't be accessed.

On the other side, many pages have Russian names and are accessed correctly. I guess, the solution would be to process attached file URLs the way page name URLs are. Probably doing urlencode() would do the thing. --Zverik


The status of this PITS still is "awaiting feedback". Was that aimed at the need for a demo page (which can be found here: http://www.pmwiki.org/wiki/Test/PageNäme)? Or is feedback from more users required to confirm the priority of the issue? (Zverik, get moving! ;-) --Henning October 15, 2004, at 08:45 AM


I've changed Status to "Open" now, figuring that's what I'm supposed to do after providing feedback. Sorry if I misunderstood the procedure! Henning October 19, 2004, at 06:36 AM


No, the delays were simply that it's taken me this long to figure out the many sources of the problem and come up with a worthy fix. I think I have it fixed now for 1.0.12 and 2.0.devel16, we'll see if they work or not!

Please let me know if these fix the problems--if so, feel free to close this PITS. Thanks!

Pm


Thanks a lot! :-)

I hope it didn't appear as if I were trying to push you in a complicated issue - it's just that I was afraid that I had somehow made myself the bottleneck here!

The above test page works fine now, but it seems 1.0.12 isn't available for download yet so I couldn't test it on my installation right away.

Thanks again! That feature will help greatly to give my authors better confidence in their wiki skills :-)

--Henning October 25, 2004, at 06:58 AM


I found 1.0.13 now ("latest version" still links to 1.0.11, it seems, I relied on guessing the URL to find 1.0.13), installed it and ran a couple of tests. Everything works great now! I'm going to close the PITS right away :-) --Henning October 28, 2004, at 09:54 AM