00932: Make UTF8 the default encoding

Summary: Make UTF8 the default encoding
Created: 2007-05-26 15:16
Status: Open
Category: Feature
Assigned:
Priority: 5553
Version: Every
OS: n/a

Description: I think UTF-8 should be the default encoding used by PmWiki. It is well supported by every modern OS and browser and not using it seems to cause a lot of bugs (looking at the issues list).

Did you know, for instance, that ISO-8859-1 doesn't even include the euro symbol? This causes all sorts of weird problems, e.g. http://www.wikipublisher.org/wiki/index.php?n=Issues.00104

PmWiki ships for years (2014 or earlier? I could not find a note under release notes) with a sample-config.php which has UTF-8 enabled by default through

include_once("scripts/xlpage-utf-8.php");

Is it time to move this to stdconfig.php and just have an optional switch in config.php to turn it off? Something like:

if (IsEnabled($EnableUTF-8,1))
  include_once("$FarmD/scripts/xlpage-utf-8.php");

- HansB June 09, 2017, at 10:45 AM

As Hans noted, UTF-8 is the default encoding for new wikis, since version 2.2.1 (2009-03-28). But it is not trivial to enable it on existing wikis. Moving it in stdconfig.php would be far too late, and setting it by default is not as easy as it seems. (1) xlpage-utf-8.php (or other encoding script) must be included before any changes to the PageStore class, and before authuser.php which can happen in config.php. (2) It is also currently impossible for pages and uploaded files that have international characters in their names, created in the ISO encoding, to automatically recover them to UTF-8. If we (I?) write some function to address&fix (2), then we may be able to include it before config.php. --Petko June 18, 2017, at 02:12 AM

I forgot that (3) wikis with variable values with international characters, eg. $WikiTitle, $MakePageNamePatterns, $UploadNameChars or $XL in local configuration, will need to edit their files and save them in UTF-8 encoding without BOM. These values cannot be assumed in one encoding and converted to UTF-8 without thinking, PmWiki even has no way to list the variables that have been modified in local config. --Petko June 19, 2017, at 11:13 PM