PmWiki could be reprogrammed in PHP without regex

👈🏼 GNUZoo

Reprogramming PmWiki, a lightweight wiki engine written in PHP, to avoid using regular expressions (regex) entirely is a complex task, as PmWiki relies on regex for many core functionalities, such as parsing wiki markup, handling page links, and processing directives. Completely removing regex would require significant changes to its parsing and processing logic, potentially impacting performance and maintainability. Instead of rewriting the entire codebase, I’ll provide a focused approach to modify a key component of PmWiki—specifically, its markup processing function—to avoid regex where possible, using string manipulation functions like strpos(), substr(), and explode().

PmWiki’s markup processing typically uses the Markup() function to define rules that transform wiki text (e.g., bold, italic, links) into HTML. Many of these rules rely on regex patterns. Below, I’ll rewrite a simplified version of the markup processing system to handle common wiki syntax (e.g., bold, italic, links) without regex, using string-based parsing. This example will focus on a single file, wiki.php, which processes wiki text, and will avoid regex entirely. Approach

    Identify Markup Patterns: Focus on common PmWiki markup like bold (text), italic (text), and links (text or target).
    Replace Regex with String Functions: Use strpos(), substr(), and explode() to locate and process markup patterns.
    Preserve PmWiki’s Structure: Maintain a similar function structure to PmWiki’s MarkupToHTML() for compatibility.
    Handle Nested Markup: Implement a basic parser to process nested markup without regex.

Assumptions

    This is a simplified version focusing on a subset of PmWiki’s markup.
    The code processes a single page’s text at a time.
    External dependencies (e.g., PmWiki’s configuration, page storage) are unchanged.
    No regex functions (preg_match, preg_replace, etc.) are used.

Artifact: Modified PmWiki Markup Processor wiki.php php Explanation of the Code

    MarkupToHTML: The main function that processes wiki text. It calls individual processing functions for each markup type in sequence to handle nested markup correctly (e.g., bold italic text).
    ProcessBold: Parses text by finding the opening and closing triple quotes using strpos(). It extracts the content between them and wraps it in <b> tags.
    ProcessItalic: Similar to ProcessBold, but for text, wrapping content in <i> tags.
    ProcessLinks: Handles text and target by finding double brackets, splitting on | for links with display text, and generating <a> tags.
    No Regex: All parsing uses strpos(), substr(), and explode() instead of preg_match or preg_replace.

Limitations

    This is a simplified version and doesn’t cover all PmWiki markup (e.g., tables, headings, directives).
    Nested markup of the same type (e.g., bold nested bold text) isn’t handled, as it requires more complex parsing logic.
    Performance may be slower than regex for large texts due to repeated string searches.
    To fully replace PmWiki’s regex-based system, you’d need to extend this approach to handle all markup rules, directives, and edge cases, which would require a larger codebase.

How to Use

    Save the code as wiki.php.
    Include it in your PmWiki installation or use it standalone.
    Call MarkupToHTML($pagename, $text) with your wiki text to convert it to HTML.

If you need a specific part of PmWiki (e.g., page storage, authentication) rewritten without regex, or if you want to expand this to handle more markup types, please let me know!

<?php

# Simplified PmWiki markup processor without regex
# Processes bold ('''text'''), italic (''text''), and links ([[text]] or [[text|target]])

function MarkupToHTML($pagename, $text) {
    # Initialize output
    $out = $text;

    # Process markup in order: bold, italic, links (to handle nesting correctly)
    $out = ProcessBold($out);
    $out = ProcessItalic($out);
    $out = ProcessLinks($out);

    return $out;
}

function ProcessBold($text) {
    $result = '';
    $pos = 0;
    $len = strlen($text);

    while ($pos < $len) {
        # Find opening '''
        $start = strpos($text, "'''", $pos);
        if ($start === false) {
            # No more bold markup, append rest of text
            $result .= substr($text, $pos);
            break;
        }

        # Append text before the markup
        $result .= substr($text, $pos, $start - $pos);

        # Find closing '''
        $end = strpos($text, "'''", $start + 3);
        if ($end === false) {
            # No closing tag, treat as literal
            $result .= substr($text, $start, 3);
            $pos = $start + 3;
            continue;
        }

        # Extract content between ''' and '''
        $content = substr($text, $start + 3, $end - $start - 3);
        $result .= "<b>$content</b>";
        $pos = $end + 3;
    }

    return $result;
}

function ProcessItalic($text) {
    $result = '';
    $pos = 0;
    $len = strlen($text);

    while ($pos < $len) {
        # Find opening ''
        $start = strpos($text, "''", $pos);
        if ($start === false) {
            $result .= substr($text, $pos);
            break;
        }

        # Append text before the markup
        $result .= substr($text, $pos, $start - $pos);

        # Find closing ''
        $end = strpos($text, "''", $start + 2);
        if ($end === false) {
            $result .= substr($text, $start, 2);
            $pos = $start + 2;
            continue;
        }

        # Extract content between '' and ''
        $content = substr($text, $start + 2, $end - $start - 2);
        $result .= "<i>$content</i>";
        $pos = $end + 2;
    }

    return $result;
}

function ProcessLinks($text) {
    $result = '';
    $pos = 0;
    $len = strlen($text);

    while ($pos < $len) {
        # Find opening [[
        $start = strpos($text, "[[", $pos);
        if ($start === false) {
            $result .= substr($text, $pos);
            break;
        }

        # Append text before the markup
        $result .= substr($text, $pos, $start - $pos);

        # Find closing ]]
        $end = strpos($text, "]]", $start + 2);
        if ($end === false) {
            $result .= substr($text, $start, 2);
            $pos = $start + 2;
            continue;
        }

        # Extract content between [[ and ]]
        $content = substr($text, $start + 2, $end - $start - 2);

        # Check for | separator (e.g., [[text|target]])
        $parts = explode("|", $content);
        if (count($parts) == 2) {
            $display = trim($parts[0]);
            $target = trim($parts[1]);
        } else {
            $display = trim($content);
            $target = $display;
        }

        # Generate HTML link
        $result .= "<a href=\"$target\">$display</a>";
        $pos = $end + 2;
    }

    return $result;
}

# Example usage
$pageText = "This is '''bold text''' and ''italic text''. See [[Main Page|Home]] or [[WikiPage]].";
echo MarkupToHTML("TestPage", $pageText);
?>