01556: Two bugs in pm_json_encode()

Summary: Two bugs in pm_json_encode()
Created: 2026-05-25 17:10
Status: Closed, fixed for 2.6.1
Category: Bug
Assigned:
Priority: 1
Version: 2.6.0
OS: 8.2

Description: This bug has existed since pmwiki-2.3.36 and only occurs if the wiki uses a encoding OTHER than UTF-8!

Effect of the error: If the "config.php" file contains the following entries: "$EnablePmSyntax = 1;" and "XLPage('de','PmWikiDe.XLPage');" the "Einfärbung" (Highlight) button will be missing.

The problem is that the configuration is passed to JavaScript from "config.php" via JSON, and this can only be done using UTF-8 encoding. In my opinion, the PHP function json_encode() is very unreliable, as it fails at the slightest glitch.

The problem can be fixed in "pmwiki.php" using the "pm_json_encode()" function. Lines 952–953 (pmwiki-2.6.0) read:

  if (function_exists('json_encode'))
    $out = json_encode($x);

and should be changed to:

  if (function_exists('json_encode') and
    $out = json_encode($x));

Then, right below that, on line 956, another error occurs:

    $out = '"'.preg_replace_callback("/[\x00-\x1f\\/\\\\\"]/",'cb_rfc8259',$x).'"';

A null byte must not appear in a regular expression and can be easily fixed:

    $out = '"'.preg_replace_callback('/[\x00-\x1f\\/\\\\"]/','cb_rfc8259',$x).'"';

That should fix the bugs, and the feature should now work as intended. Michael Engelke

Thanks for this report, yes it is a known problem if the wiki has not enabled UTF-8. I think you mean "$EnablePmSyntax = 2;" not "1". I think we should try to recode the JS arrays to UTF-8. --Petko

I have prototyped an automatic recoding to UTF-8 and tested it a little. You get the pre-release from ChangeLog or Subversion and report if it works. --Petko

Yes, I meant "$EnablePmSyntax = 2;". Regarding your solution: No, it only works to a limited extent. It outputs "Einf?rbung"—meaning the "ä" is converted to a "?". However, if the PHP extension "mb_string" is installed, it is output correctly. (With the encoding "Einf\u00e4rbung") I propose a compromise: If the PHP extension “mb_string” is not available, the PHP function json_encode() is NOT used, and pm_json_encode() leaves the special characters as they are.

PS: The "JSON_INVALID_UTF8_IGNORE" flag only causes non-UTF-8 characters to be omitted. And the second error is still there: "Warning: preg_replace_callback(): Null byte in regex in /var/www/tests/pmwiki/pmwiki.php on line 997"

I cannot recommend re-encoding to UTF-8. For one thing, because the wiki’s content type would be, for example, "ISO-8859-x" (x for 1, 2, 9), and UTF-8 encoding could lead to further problems. And secondly, ONLY the JSON format requires UTF-8 encoding, but JavaScript itself doesn’t care about the encoding as long as it is interpreted by the code parser. Another problem would be mixed encoding: if "config.php" were in UTF-8 and the wiki runs on "ISO-8859-1"—that can only go wrong. The pm_json_encode() function works quite well with “ISO-8859-1” encoding, and JavaScript has no issues with it either. That’s why I’d go for the simple solution—i.e., not using json_encode() or checking whether json_encode() failed. Michael Engelke

Apparently there is something different in your installation, because I tested the prototype and I don't see the warnings, I see "Einfärbung". I don't see warnings with preg_replace either but no problem wrapping it in single quotes.

I think we may revert the previous version (which worked with UTF-8 wikis, without json_encode or mb_*), and add a variable to configure a custom json_encode override function. Some people may want to do this rather than enable json_encode or UTF-8.

Note: Implementing and maintaining a pure‑PHP converter (independent of mbstring/iconv/json_encode) that reliably converts arbitrary encodings into JSON Unicode escapes is beyond the current scope and my available time. Documenting the known issue, and linking to a future Cookbook recipe that helps, is not a problem. --Petko

Re 1: PHP installations vary depending on the operating system. This is even more of an issue with NAS systems. I run PmWiki on a wide variety of systems. On top of that, there are the customer installations. The error where the "ä" is replaced by a "?" occurs on an OpenWrt system that we use for internal purposes within the "Linux User Group." (There is apparently no multibyte extension there.) The problem with `preg_replace` occurs on a client system where either the "json_encode()" function is not available or has been disabled for some reason. (There is a similar case on line 80, where the variable "$ChangeSummary" is defined. — Only there did they get it right twice ;-)

Re 2: I have no objection to that. But why not check whether json_encode() executed successfully? After all, there’s a fallback function right below it that can take over if necessary. (And as a bonus, it ignores the character encoding.) Especially since this only affects those who don’t have a UTF-8 wiki.

Re 3: In my opinion, it would make more sense to develop an extension that converts an existing wiki installation into a UTF-8 wiki. (There might be some issues with the history, so at least we should be careful there.) This would only work for file-based wikis, since SQL wikis (e.g., using the SQLite plugin) would require a separate converter. But that can wait—a long time...

Thank you for your efforts Michael Engelke

These are good points. JSON_INVALID_UTF8_IGNORE flag removed to let the conversion fail, and then retry with the core function. This works for me with a basic installation in iso8859-1 and the German language pack.

3. There is already Cookbook:MigrateUTF8 --Petko

I just tested it on all PmWiki instances and there are no errors. (Now I just have to wait for the next release so I can test it on my own website too.)

Re 3: Oh, thanks for the tip! — I just noticed there’s an article about it too. (Maybe I should order one of those "RTFM" T-shirts...)

I hereby request that the status of this bug report be set to "closed." Thanks again for all your help. Michael Engelke