01556: Two bugs in pm_json_encode()
Description: This bug has existed since pmwiki-2.3.36 and only occurs if the wiki uses a encoding OTHER than UTF-8!
Effect of the error:
If the "config.php" file contains the following entries: "$EnablePmSyntax = 1;" and "XLPage('de','PmWikiDe.XLPage');" the "Einfärbung" (Highlight) button will be missing.
The problem is that the configuration is passed to JavaScript from "config.php" via JSON, and this can only be done using UTF-8 encoding. In my opinion, the PHP function json_encode() is very unreliable, as it fails at the slightest glitch.
The problem can be fixed in "pmwiki.php" using the "pm_json_encode()" function. Lines 952–953 (pmwiki-2.6.0) read:
if (function_exists('json_encode'))
$out = json_encode($x);
and should be changed to:
if (function_exists('json_encode') and
$out = json_encode($x));
Then, right below that, on line 956, another error occurs:
$out = '"'.preg_replace_callback("/[\x00-\x1f\\/\\\\\"]/",'cb_rfc8259',$x).'"';
A null byte must not appear in a regular expression and can be easily fixed:
$out = '"'.preg_replace_callback('/[\x00-\x1f\\/\\\\"]/','cb_rfc8259',$x).'"';
That should fix the bugs, and the feature should now work as intended. Michael Engelke
Thanks for this report, yes it is a known problem if the wiki has not enabled UTF-8. I think you mean "$EnablePmSyntax = 2;" not "1". I think we should try to recode the JS arrays to UTF-8. --Petko
I have prototyped an automatic recoding to UTF-8 and tested it a little. You get the pre-release from ChangeLog or Subversion and report if it works. --Petko
Yes, I meant "$EnablePmSyntax = 2;".
Regarding your solution: No, it only works to a limited extent. It outputs "Einf?rbung"—meaning the "ä" is converted to a "?". However, if the PHP extension "mb_string" is installed, it is output correctly. (With the encoding "Einf\u00e4rbung")
I propose a compromise: If the PHP extension “mb_string” is not available, the PHP function json_encode() is NOT used, and pm_json_encode() leaves the special characters as they are.
PS: The "JSON_INVALID_UTF8_IGNORE" flag only causes non-UTF-8 characters to be omitted. And the second error is still there: "Warning: preg_replace_callback(): Null byte in regex in /var/www/tests/pmwiki/pmwiki.php on line 997"
I cannot recommend re-encoding to UTF-8. For one thing, because the wiki’s content type would be, for example, "ISO-8859-x" (x for 1, 2, 9), and UTF-8 encoding could lead to further problems. And secondly, ONLY the JSON format requires UTF-8 encoding, but JavaScript itself doesn’t care about the encoding as long as it is interpreted by the code parser. Another problem would be mixed encoding: if "config.php" were in UTF-8 and the wiki runs on "ISO-8859-1"—that can only go wrong. The pm_json_encode() function works quite well with “ISO-8859-1” encoding, and JavaScript has no issues with it either. That’s why I’d go for the simple solution—i.e., not using json_encode() or checking whether json_encode() failed. Michael Engelke
Apparently there is something different in your installation, because I tested the prototype and I don't see the warnings, I see "Einfärbung". I don't see warnings with preg_replace either but no problem wrapping it in single quotes.
I think we may revert the previous version (which worked with UTF-8 wikis, without json_encode or mb_*), and add a variable to configure a custom json_encode override function. Some people may want to do this rather than enable json_encode or UTF-8.
Note: Implementing and maintaining a pure‑PHP converter (independent of mbstring/iconv/json_encode) that reliably converts arbitrary encodings into JSON Unicode escapes is beyond the current scope and my available time. Documenting the known issue, and linking to a future Cookbook recipe that helps, is not a problem. --Petko
Re 1: PHP installations vary depending on the operating system. This is even more of an issue with NAS systems.
I run PmWiki on a wide variety of systems. On top of that, there are the customer installations. The error where the "ä" is replaced by a "?" occurs on an OpenWrt system that we use for internal purposes within the "Linux User Group." (There is apparently no multibyte extension there.)
The problem with `preg_replace` occurs on a client system where either the "json_encode()" function is not available or has been disabled for some reason. (There is a similar case on line 80, where the variable "$ChangeSummary" is defined. — Only there did they get it right twice ;-)
Re 2: I have no objection to that. But why not check whether json_encode() executed successfully? After all, there’s a fallback function right below it that can take over if necessary. (And as a bonus, it ignores the character encoding.) Especially since this only affects those who don’t have a UTF-8 wiki.
Re 3: In my opinion, it would make more sense to develop an extension that converts an existing wiki installation into a UTF-8 wiki. (There might be some issues with the history, so at least we should be careful there.) This would only work for file-based wikis, since SQL wikis (e.g., using the SQLite plugin) would require a separate converter. But that can wait—a long time...
Thank you for your efforts Michael Engelke
These are good points. JSON_INVALID_UTF8_IGNORE flag removed to let the conversion fail, and then retry with the core function. This works for me with a basic installation in iso8859-1 and the German language pack.
3. There is already Cookbook:MigrateUTF8 --Petko
I just tested it on all PmWiki instances and there are no errors. (Now I just have to wait for the next release so I can test it on my own website too.)
Re 3: Oh, thanks for the tip! — I just noticed there’s an article about it too. (Maybe I should order one of those "RTFM" T-shirts...)
I hereby request that the status of this bug report be set to "closed." Thanks again for all your help. Michael Engelke