ExportHTML

Summary: Export PmWiki pages as "static" HTML pages

Version: 2007-01-12

Prerequisites: any

Status:

Maintainer: Petko (original author: Pm)

Discussion: ExportHTML-Talk

Categories: SystemTools, Administration, PHP72

Questions answered by this recipe

Can I convert PmWiki pages into static HTML?
Can I export my site to a set of static HTML pages?

Description

PmWiki does not come with a built-in export to HTML feature, but other tools can be used to grab a "static" copy of PmWiki pages. Here's at least four options.

1. Export using Wget

Wget recursively from a start page

The following wget(1) command will grab all of the pages from a site, which are linked in some way from the starting page, fixing up links to be relative as needed.

    wget -r -k -np -p -l 0 http://example.org/wiki --html-extension

If running with $EnablePathInfo set, then it's possible to get just a single wikigroup using

    wget -r -k -np -p -l 0 http://example.org/wiki/SomeGroup --html-extension

Here's the meaning of the options to wget:

    -r    Recursive retrieval.  Wget will follow any links it finds in
          the document

    -k    Link adjustment.  After retrieving the pages, wget will convert
          all of the downloaded files to have relative links instead
          of absolute ones.

    -np   No parent.  Wget will restrict itself to the path
          given on the command line (in the above case, it would
          only download the pages of SomeGroup).

    -p    Prerequisites.  Wget will retrieve local copies of any .css
          files or gif images needed to display the static copy of the
          page locally.

    -l 0  Infinite follow.  Wget will follow all of the links it encounters
          (subject to the -np restriction above), so that it will completely
          spider the group.

    --html-extension    Adds the extension html to each file to allow for local viewing.
          Internet Explorer and Firefox on Windows is too stupid to not recognize
          html files by code only.

Since Wget doesn't respect the rel="nofollow" extension, it will follow also the action links. To avoid this:

Add $EnableRobotCloakActions = 1; to your config file and
add Wget to $RobotPattern or add '-u HTTrack' to the Wget command line to enable PmWiki recognizing Wget as robot.

Wget plus page list

If you want to include pages not reached by a path from the starting page, or you want finer control which pages to be downloaded, you can create a list of pages to be downloaded using a pagelist, e.g. (:pagelist group=*,-PmWiki,-Site*:).

To hide unwanted links, use (:noaction:) (:noleft:) (:noheader:) (:nofooter:).

Wget should be called with -l 1 so only the pages listed are downloaded.

Notes

If you want the retrieved pages to have .html extensions automatically added to them, see the HtmlUrls recipe.

How can I login using AuthUser and wget?

PmWiki doesn't use HTTP Basic authentication (i.e., wget --http-user and --http-passwd), but expects the data to be provided via POST requests. Try
--post-data=authid=USER&authpw=PASSWD

Caution: on linux systems, it is unwise to type commands into the terminal that contain sensitive data such as passwords, as they will be stored in your ~/.bash_history or equivalent file as plain text.

In order to avoid this, unset the local HISTFILE variable:
$ unset HISTFILE

Then enter the desired command:
$ command --containing --sensitive --data

The only side effect is that no terminal history will be saved in ~/.bash_history until you next log in (restoring the HISTFILE value after the command doesn't work).

2. HTTrack

Use HTtrack.

3. ExportHTMWiki

Use ExportHTMLWiki recipe script, which includes a batch export command.

4. BackupHTMLZip

Use BackupHTMLZip recipe script, which exports a static copy of your wiki.

Contributors

Comments

See discussion at ExportHTML-Talk

User notes +1: If you use, used or reviewed this recipe, you can add your name. These statistics appear in the Cookbook listings and will help newcomers browsing through the wiki.