UTF-8

Summary: A collection of UTF-8 related tips and fixes
Maintainer: Petko
Categories: CMS, International, Layout

This page concerns PmWiki versions earlier than 2.2.30. The page PmWiki.UTF-8 is about the latest current version.

Description

A collection of UTF-8 related tips and fixes for PmWiki

  • Please add your comments/questions about a section in the end of that section (to keep them together).
  • Feel free to contribute with other tips or propose improvements to the existing ones.

How to enable UTF-8 support in PmWiki

Note that the 2.2.x versions of PmWiki have had much better Unicode support for more than a year now. It is highly recommended to use a recent stable for an international UTF-8 wiki.

Inside your (farm)config.php add the following line:

 include_once($FarmD.'/scripts/xlpage-utf-8.php');

It is recommended to enable UTF-8 for the entire site (as this encoding allows any language or alphabet to be used), and not for some Groups only (as in such case cross-group links may not work properly).

Internationalization and XLPage

Add in (farm)config.php:

 include_once($FarmD.'/scripts/xlpage-utf-8.php');
 XLPage('fr','Site.XLPage-fr');

See PmWiki.Internationalizations. Basically, you create an "XLPage" (eXtra Language page) containing the strings to translate (Edit - History - Print - Recent Changes - Search). You can use Localization.XLPageTemplate as a base for your translations. Copy this page, for example as Site.XLPage-fr, then fill in the translations.

You can use some ready translations from other PmWiki users, see PmWiki.OtherLanguages.

To properly display the dates, numbers and other localized stuff, make sure you set in your Site/XLPage-fr the correct locale in .UTF-8, example:

 'Locale' => 'fr_FR.UTF-8',

Otherwise some of the accentuated letters may disappear, i.e. for French: février. (On some systems it is not .UTF-8 but .utf8, try both.)

Broken {$Namespaced}, {$Titlespaced}, {$Groupspaced}

This was fixed in PmWiki version 2.2.0-beta30 (2007-02-09), but if you cannot upgrade, or wish to keep the latest stable version, take a look here:

Using a different PageStore object (PerGroupSubDirectories...)

If you are using using an alternative page storing format/function/filename (examples: SQLite, PerGroupSubDirectories, CompressedPageStore), you must call the include_once($FarmD.'/scripts/xlpage-utf-8.php'); and XLPage() after the declaration of the alternative PageStore object (new PageStore() or include_once(recipe)).

Page names not properly resolved (pages disappear, titles break...)

Any call to ResolvePageName() must be made after the include_once($FarmD.'/scripts/xlpage-utf-8.php'); and XLPage() calls. This function may be called by you in config.php and also by some recipes, so you should include any recipes after the include xlpage-utf-8.php and XLPage() call.

Order: In the best case, you should

  1. first declare the PageStore object (or recipe that declares it),
  2. next set internationalizations (xlpage-utf-8.php and XLPage()), and
  3. then all other recipes.

This tip will save you days of headaches determining why your pages disappear and titles break!!!

Note: some recipes (re)defining $PageNameChars, $NamePattern, $AsSpacedFunction may not work with UTF-8, for example SubgroupMarkup.

Page Encoding

NOTE: The encoding type used when config.php is saved has an effect if you will be converting between character encodings on your wiki. Refer to page encoding for more details. If you are not using international characters then you do not need to be concerned about this.

SELinux may forbid Apache from reading scripts/xlpage-utf-8.php, thus preventing characters displaying correctly. You may want to re-set the security policy on this file (using for instance restorecon /path/to/scripts/xlpage-utf-8.php).

Alternatively, you may add the following line to config.php :

$HTTPHeaders['utf-8'] = 'Content-type: text/html; charset=UTF-8';

Other comments

UTF8 should be enabled when installing PmWiki, to avoid page content problems. This is not said in the install instructions AFAIK - jdd

If you have problems with your old pages written in other unicode after enabled utf-8, use the programs 'recode' or 'iconv' on GNU/Linux distributions to migrate the pages. Also for the i18n-Packages from PmWiki.

jesus2099 (2008-10-13) : This is not enough. You still have to save your local/config.php file in UTF-8 without BOM (if it contains strings like titles etc.) and to add the following line to the same config file (Cookbook.ContentType) :
$HTTPHeaders[] = 'Content-type: text/html; charset=utf-8;';
This should be $HTTPHeaders['utf-8'] = 'Content-type: text/html; charset=UTF-8';, and it is not needed in the 2.2 beta version. --Petko

Pagenames didn't look right after redirecting out of action=edit session, so I included "Charset=$Charset" to headers and set $Charset variable as global and also included a meta tag to the redirection page html to indicate charset as well. But there are still some problems for me because it looks like my filesystem is running on ISO-8859-1 and kde in utf-8 and after a redirection pagenames still look odd in the browser's get/url box, but removing the header field "location:" everything looks right.

 function Redirect($pagename, $urlfmt='$PageUrl') {
  # redirect the browser to $pagename
  global $EnableRedirect, $RedirectDelay, $EnableStopWatch, $Charset;
  SDV($RedirectDelay, 0);
  clearstatcache();
  $pageurl = FmtPageName($urlfmt,$pagename);
  if (IsEnabled($EnableRedirect,1) && 
      (!isset($_REQUEST['redirect']) || $_REQUEST['redirect'])) {
    header("Location: $pageurl");
    header("Content-type: text/html;Charset=$Charset");
     echo "<html><head>
       <meta http-equiv='content-type' content='text/html; charset=$Charset' />
       <meta http-equiv='Refresh' Content='$RedirectDelay; URL=$pageurl' />
      <title>Redirect</title></head><body>".$pageurl."</body></html>";
     exit;
  }
  echo "<a href='$pageurl'>Redirect to $pageurl</a>";
  if (@$EnableStopWatch && function_exists('StopWatchHTML'))
    StopWatchHTML($pagename, 1);
  exit;
 }

CarlosAB

I feel ugly urls can happen with redirects disabled, but even then, the link should point to the right destination. On most UTF-8 wikis, the 2.2 version should work out of the box, without the need to modify core scripts. --Petko

Another tip if you are using kde with openbsd, is to export KDE_UTF8_FILENAMES=1 in your environment, so you will be able to see utf-8 filenames.

CarlosAB December 22, 2008, at 05:00 PM

Notify may give problems though. I added the following to my config.php to get utf-8 in the mail body working:

  $NotifyHeaders = "Content-Type: text/plain; charset=utf-8\n" .
                   "Content-Transfer-Encoding: 8bit\n";

Utf-8 in the subject (as the website's title, for example) still is a problem. Something like mime_header_encode would need to be used.

wvengen? February 11, 2009, at 10:54 AM

Since 2007, I have used a patched version of Notify which encodes UTF-8 titles correctly. You can get it here. I place it in my /cookbook/ directory, add to config.php
 $NotifyHeaders = 'Content-type: text/plain; charset=UTF-8';
 inclide_once("$FarmD/cookbook/notify.php");
and remove existing $EnableNotify. I'll see about adding this to the core, possibly controlled by a variable. --Petko February 11, 2009, at 04:38 AM
Recent PmWiki versions can encode the e-mail subject header, see $EnableNotifySubjectEncode. --Petko

See Also

Contributors