PrintGroup

Summary: Export WikiGroup pages to one large HTML file, and/or to PDF
Version: 20210822
Prerequisites: PmWiki 2.2.56 or newer, wkhtmltopdf
Status: Experimental
Maintainer: Petko
License: GPLv3+
Users: (view? / edit)
Discussion: PrintGroup-Talk?

Questions answered by this recipe

  1. I have a wiki containing critically important information that I may need at any time within 5 minutes. Even if there is a nightly backup system in place, in case of a server failure, I must be able to access the information without having to first re-install the operating system, the server and the wiki.
  2. How can I easily export all pages from a WikiGroup to a single large HTML page, and/or to a PDF file?
  3. How can I create PDF snapshots of specific groups in my wiki, manually or automatically?

Description

Export a whole group to HTML or to PDF.

This recipe shares goals with BackupHTMLZip (Export your wiki to static HTML then optionally compress/zip it) but instead of a static HTML export it creates one HTML file per WikiGroup then optionally converts it to PDF. Attached files are not included in the HTML export but accessible from the running wiki. Embedded pictures are included in the PDF export.

Pages in the exported group are converted to sections and ordered alphabetically (after HomePage and SideBar on top), and in-group links between them work both in the HTML export and in the PDF file. Links to pages in other WikiGroups, to attached files, or to external sites open in the browser.

The recipe also allows to export to PDF a single page, and can provide a dynamic URL list with groups to be exported.

Installation

  1. Place printgroup.phpΔ in your pmwiki/cookbook directory.
  2. Add to config.php such a snippet:
    if($action=='printgroup' || $action=='pdfgroup' || $action=='pdfpage' || $action=='groupurls') {
      include_once("$FarmD/cookbook/printgroup.php");
    }
  3. Download stable version of WkHTMLToPDF for your system and place it where PrintGroup can find it.
    • On GNU/Linux, the distribution repositories usually contain either an old version, or a beta version; the stable binary from the official website may work better. Place the bin/wkhtmltopdf file and the lib/ files somewhere within your $PATH, or in any directory close to the wiki, eg pmwiki/utilities. In the latter case, you'll need to configure the $PrintGroup['pdfbinary'] variable.
    • On Windows, you can extract all files from the bin and lib directories to a directory close to the wiki, eg pmwiki/utilities. Then you'll need to configure the $PrintGroup['pdfbinary'] variable, see below.

Configuration

The following variables can be configured before the include_once line:

$PrintGroup['dir'] = "$WorkDir/.printgroup"; # the directory where the exports are saved.
$PrintGroup['datefmt'] = "%Y%m%d_%H%M%S"; # the timestamp, part of the filename, per https://php.net/strftime.
$PrintGroup['pdfbinary'] = "D:/xampp/htdocs/pmwiki/utilities/wkhtmltopdf.exe"; # the path to the wkhtmltopdf binary.
$PrintGroup['pdfargs'] = ''; # additional arguments to the wkhtmltopdf command call, default none.
$PrintGroup['outpdf'] = true; # whether to send the PDF file to the browser (true) or only store it on the server (false).
$PrintGroup['adminperms'] = false; # whether to export the page with admin permissions; may be set to true if pages or sections are read-protected with (:if auth edit|admin:) permissions and you need the export for reference and disaster recovery.
$PrintGroup['keeplatest'] = 10; # how many exports to keep; only the latest [10] are kept, older ones are deleted when new ones are added. Set to 1 to only keep the last export.
$PrintGroup['enableallpages'] = true; # when set to true, a list of links to all pages in the group is inserted after the SideBar page, like a table of contents; set to false to disable this feature.
$PrintGroup['keeptemphtml'] = true; # whether to keep the HTML file after conversion to PDF, set to false to delete it.
$PrintGroup['allgroupspattern'] = '*.*,-PmWiki*.*,-Site.*,-SiteAdmin.*,-*.WikiSandbox'; # all pages from the wiki, from which to deduce the group names to export; this is used with the action groupurls.
$PrintGroup['pageskippattern'] = '-*.RecentChanges,-*.GroupHeader,-*.GroupFooter,-*.GroupAttributes'; # pages that are not to be included in the export, note the 'minus' before every page pattern
$PrintGroup['GroupTemplate'] = "(your custom html template)"; # the full HTML template, including any CSS; see the default one defined in printgroup.php; the variable {GroupText} will be replaced with the list of page sections.
$PrintGroup['PageTemplate'] = "(your custom html template)"; # the HTML snippet for a page section, see the default one defined in printgroup.php; the variable {PageText} will be replaced with the HTML output of the page.
$PrintGroup['customcss'] = ''; # additional CSS to be injected in the Group template; can be set in a local/Group.php file.

Automatic nightly export to PDF

The main reason for writing this recipe is to have access to a critically important wiki content in case of a server crash, same as BackupHTMLZip. The other features are happy side effects.

The goal is to configure the task scheduler and let it manage the nightly exports automatically without any action from the user or admin. In this case, the configuration could look like this:

$PrintGroup = array(
  'pdfbinary' => "C:\\xampp\\htdocs\\pmwiki\\util\\wkhtmltopdf\\bin\\wkhtmltopdf.exe",
  'outpdf' => false,
  'adminperms' => true,
  'dir' => 'MyWiki-backup/MYWIKI_PDF',
  'keeplatest' => 1,
);

This Windows server also needs to install WGET binaries.

And the batch program launched by the task scheduler looks like this:

@echo off
rem   This tool is launched by the Task Scheduler every night
rem   Note, the exports are in the directory
rem     C:\xampp\htdocs\pmwiki\MyWiki-backup
rem   Written by Petko Yotov pmwiki.org/petko

cd C:\xampp\htdocs\pmwiki\util

rem   Get the list of groups
C:\xampp\htdocs\pmwiki\util\wget\wget.exe -O grouplist.txt -q http://192.168.111.3/pmwiki/pmwiki.php?action=groupurls

rem   Start the PDF exports
C:\xampp\htdocs\pmwiki\util\wget\wget.exe -O trash.txt -q -i grouplist.txt

rem   Launch a static HTML export to ZIP
rem C:\xampp\htdocs\pmwiki\util\wget\wget.exe -O trash.txt -q http://192.168.111.3/pmwiki/pmwiki.php?action=bhzip

The last line launches a BackupHTMLZip export, remove the "rem" at the start to enable it.

Internationalization

The following strings can be translated in an XLPage:

  # link to top of page
  "top" => "",
  # link to Table of Contents (SideBar)
  "toc" => "",

  # Footer of each page: "Last modified by NAME on DATE"
  "Last modified by" => "",
  "on" => "",
  "Original URL" => "",

Usage

In your browser, open a page with the action printgroup or pdfgroup or pdfpage, for example [[Main.WikiSandbox?action=pdfgroup]].

This will create a large HTML page with the content of all pages in the Main group, covert it into PDF and send it to the browser for download. The PDF will stay stored in the $PrintGroup['dir'] directory.

The action printgroup will create a HTML export for all pages in the current group; pdfgroup will create a PDF file with all pages in the group; pdfpage will create a PDF from the current single page. The exports will use the templates defined in $PrintGroup['GroupTemplate'], $PrintGroup['PageTemplate'], and $PrintGroup['customcss'].

If the variable $PrintGroup['outpdf'] is true then the PDF export will be sent to the browser, otherwise it will only be stored on the server and the browser will display a redirect link to the current wiki page.

Notes

Change log / Release notes

  • 20210822 Update for PHP 8. Clear unneeded output before the actual PDF that could break on-the-fly generation. Add $PrintGroup['datefmt'].
  • 20170711 first public release after about 2 months of usage on an intranet server.

See also

Cookbook /
BackupHTMLZip  Export your wiki to static HTML then optionally compress/zip it (Experimental)
EPUBCreation  Assembles wiki page collections into an EPUB e-book file (EPUB output is ready for beta testing)
ExportHTML  Export PmWiki pages as "static" HTML pages
ExportHTMLWiki  Create a wiki using static HTML pages as front end for visitors (stable)
GeneratePDF  Generate PDF versions of pages (?action=pdf) (Stable)
PmWiki2PDF-v2  Generate a PDF (Stable)
PublishPDF  Typesets wiki page collections into PDF (finalist: New Zealand open source awards 2008) (Stable, reliable and substantially complete, php 5.5 compliant)
PublishWikiTrail  Provide the ability to publish the pages of a wiki trail as a single web page, formatted for printing
SiteDump  creates a .tar.gz file of the complete site for download (stable)
WikiConversion  Recipes for converting other wikis from and to PmWiki
ZipExport  Export the content of PmWiki-formatted page files in a zip archive (Beta)

Contributors

Recipe written and maintained by Petko.

Comments

See discussion at PrintGroup-Talk?

User notes? : If you use, used or reviewed this recipe, you can add your name. These statistics appear in the Cookbook listings and will help newcomers browsing through the wiki.