Dash-Pagenames

Summary: URLs and page names with dashes for word spacing, UTF-8 friendly
Version: 2023-02-17
Prerequisites:
Status: new
Maintainer: HansB
License: GPL
Users: (view? / edit)

Description

Page naming with dashes/hyphens replacing spaces between words, UTF-8 and upper/lower cases preserved, for better readable page names and URLs.

This script changes how spaces are treated in links, to create or link to pages. Very readable page names can easily be created through link markup, or a 'new page' form like New Page Box Plus, in many cases making it unnecessary to use a Title directive, even when using international characters in a UTF-8 enabled wiki.

The CamelCase PageNames are no longer necessary, page and file names and URLs are created with words separated by dashes/hyphens and with international characters if UTF-8 is enabled.

In links spaces get converted to dashes/hyphens instead of WikiWords. If UTF-8 is enabled, UTF-8 characters are preserved. Upper and lower cases of letters are preserved.

Links in groups PmWiki, Site and SiteAdmin, as well as WikiWord links keep their functionality, and cross-linking pages between groups needs no special attention.

Installation

Download dash-pagenames.phpΔ, copy to Cookbook folder and include in config.php, after the PmWiki's utf-8 script, with

include_once("$FarmD/cookbook/dash-pagenames.php");

Configuration

The default configuration allows UTF-8 encoded characters in page names and URLs, with all lower and upper cases of letters preserved (if the language contains lower and upper cases for characters). This presumes that UTF-8 is enabled on the wiki.

Some other options are available, to be added to config.php before including the script:

1. To force only ASCII characters for page names and URL:

$ForcePageNamesToASCII = 1;

This strips accents from characters of languages based on the Latin alphabet. The PHP Intl module needs to be enabled on the server. This will not change non-Latin words into ASCII words.

2. To force all characters to lower case (language permitting)

$ForcePageNamesToLowerCase = 1;

3. To add more groups to $CamelCaseGroups, which need to follow PmWiki's native $MakePageNamePatterns.
For example add 'MyCamelGroup':

$CamelCaseGroups['MyCamelGroup'] = 1;

Usage

URL examples

Easier readable, easier editable, possibly better for SEO:

  • mydomain.org/music/workshop-notes/hab-Spaß-mit-Flöten
  • mydomain.org/art/expositions/l’exposition-rétrospective-de-l’art-français

Pagelink examples:

With UTF-8 enabled, page names can be created which are much easier to read.

  • [[número de teléfono]] will link to a page, or to create a new page, with name número-de-teléfono.
  • [[CamelCaseWords]] link to CamelCaseWords
  • [[Camel Case Words]] link to Camel-Case-Words, but in groups PmWiki, Site and SiteAdmin will still link to CamelCaseWords
  • [[wiki blog/]] will link to group wiki-blog
  • [[workshop notes/ hab Spaß mit Flöten]] link to workshop-notes.hab-Spaß-mit-Flöten
  • [[Calvin & Hobbs]] will link to page Calvin-and-Hobbs (customisable replacement of solitary '&').
  • [['''Calvin''' & ''Hobbs'']] will link to page Calvin-and-Hobbs (some inline markup can be used in the link, but will be stripped for the page name).
  • [[l'exposition rétrospective de l'art français]] links to l’exposition rétrospective de l’art français (custom replacement of "'" with "’" enabling use of apostrophe).

Title and other PageVariables

PageVariables work as normal, {$FullName}, {$Name}, {$Group} will show dash-spaced names, whereas {$Title} and {$Titlespaced} show the page name with spaces instead of dashes.

Special URL characters

As is the case with PmWiki's native $MakePageNamePatterns scheme, some characters need special attention in use:
The dot '.' , slash '/' , question mark '?', hash '#', percent '%', dollar '$', colon ':'.
The dot '.' and slash '/' are used to separate the group from the name part in a link. Anything from '?' or '#' on-wards will be stripped, as these characters are used in URLs to begin and separate parameters.
The dash/hyphen '-' substitutes the space as word separator, and extra dashes will be stripped, as well as anything which is not a pagename character.

Using Quotation marks

"Universal" quotation marks, and anything following it, are stripped. This is standard PmWiki behaviour. But one can use some UTF-8 alternatives, like English “double” quote marks, or other language-specific quotation marks. The single quote mark ' , the Apostrophe, will be replaced by , the Right Single Quotation Mark, rather than being stripped. English ‘Single’ quote marks could also be used.
Some possibilities with UTF-8:

  ’   Apostrophe’s single use.
  “ ” English “double”.
  ‘ ’ English ‘Single’.
  « » French «Christmas trees».
  „ “ German „paws“.
  „ ” Polish „paws”.
  » « Swedish »reverse«. 

Using inline markup /HTML Entities

HTML entities in text entered for links will be stripped. This means link text can contain and display HTMl entities and simple wiki inline markup (like bold and italics), but they will not be part of the page name and URL.

Using En– and Em–dashes

As a dash/hyphen is used to substitute the space between words, an en–dash could be used instead of a hyphen as a word–connector or for number ranges, like 65–75. En– or em–dashes won't be spaced with {$Title}, {$Titlespaced} or {$Namespaced}, so if the skin displays the pagename as title, the en-dash punctuation remains, whereas any dashes in the pagename will be displayed as spaces. An em-dash can be typed in Windows with Alt+0150 (hold down Alt key while typing 0150 on the numeric keypad), on a Mac with Shift+Option+- (hold down Shift and Option while typing - ).

Notes

PmWiki uses page names as file names, this should not be a problem when using UTF-8. You may wish to read more about UTF-8 use in page names here: PmWiki.UTF-8#toc-4

The script makes use of an alternate version for $MakePageNamePatterns to create page names with dashes/hyphens, and an alternate version of function MakePageName, which enables automatic switching between the original $MakePageNamePatterns, essential for groups PmWiki, Site and SiteAdmin, and the patterns for dashed-spaced names, for all others groups.

The array $MPN_ReplacePatterns adds some extra replacement patterns, which get processed first. You can add your own replacement rules to these arrays, or disable any included. Some characters with special use in URLs are not permitted in page names, therefore some patterns are added to substitute certain special characters. An additon in config.php would look like this:

   $MPN_ReplacePatterns["/\&/"] = ' et ';   //'et' gets substituted for '&' 
   $MPN_ReplacePatterns["/\@/"] = ' à ';   //'à' gets substituted for '@'

To do / some day / maybe

Explore possibilities to preserve original UTF-8 character input in Title directly at time of page creation, when page names are forced to be in ASCII. (A new page form from Fox PageManagement using Fox can add a (:title ...:) directive with the submitted name already populated).

If you have future plans or wishes for this recipe.

Change log / Release notes

  • 2023-02-17: Changed the script to use two new configuration variables: $ForcePageNamesToASCII and $ForcePageNamesToLowerCase. Removed $EnableUTF8PageNames and $MPN_UTF8_ReplacePatterns. Modified the script for optionally force page names to lower case (no letter case preservation). Modified the function which forces page names to ASCII.
  • 2023-02-15: Added $EnableUTF8PageNames, set to 0 to force ASCII in page names. Added $CamelCaseGroups so other groups can be added to be except from dashed name patterns. Added $MPN_UTF8_ReplacePatterns.
  • 2023-02-14: Modified spacing function to use not just dashes, but the native AsSpaced or UTF8 AsSpaced as needed.
  • 2023-02-13: Fixed name and group patterns for when utf-8 is not enabled.
  • 2023-02-12: Initial release.

If the recipe has multiple releases, then release notes can be placed here. Note that it's often easier for people to work with "release dates" instead of "version numbers".

See also

Cookbook /
AlternateNamingScheme  Use other naming schemes for PmWiki pages
ISO8859MakePageNamePatterns  How to convert ISO 8859 character input for page names to unaccented ASCII equivalents
Router  Router allows a website's url structure to be different from PmWiki's group/page structure. (beta)
UnaccentUTF8  Diacritics-insensitive page index and searches (Beta)

Contributors

Comments

See discussion at Dash-Pagenames-Talk?

User notes? : If you use, used or reviewed this recipe, you can add your name. These statistics appear in the Cookbook listings and will help newcomers browsing through the wiki.