(redirected from Cookbook.PmWiki2HTML-usingWGET)
Questions answered by this recipe
- Can I convert PmWiki pages into static HTML?
- Can I export my site to a set of static HTML pages?
PmWiki does not come with a built-in export to HTML feature, but other tools can be used to grab a "static" copy of PmWiki pages. Here's at least four options.
1. Export using Wget
Wget recursively from a start page
The following wget(1) command will grab all of the pages from a site, which are linked in some way from the starting page, fixing up links to be relative as needed.
wget -r -k -np -p -l 0 http://example.org/wiki --html-extension
wget -r -k -np -p -l 0 http://example.org/wiki/SomeGroup --html-extension
Here's the meaning of the options to wget:
-r Recursive retrieval. Wget will follow any links it finds in the document -k Link adjustment. After retrieving the pages, wget will convert all of the downloaded files to have relative links instead of absolute ones. -np No parent. Wget will restrict itself to the path given on the command line (in the above case, it would only download the pages of SomeGroup). -p Prerequisites. Wget will retrieve local copies of any .css files or gif images needed to display the static copy of the page locally. -l 0 Infinite follow. Wget will follow all of the links it encounters (subject to the -np restriction above), so that it will completely spider the group. --html-extension Adds the extension html to each file to allow for local viewing. Internet Explorer and Firefox on Windows is too stupid to not recognize html files by code only.
Since Wget doesn't respect the rel="nofollow" extension, it will follow also the action links. To avoid this:
- Add $EnableRobotCloakActions = 1; to your config file and
- add Wget to
$RobotPatternor add '-u HTTrack' to the Wget command line to enable PmWiki recognizing Wget as robot.
Wget plus page list
If you want to include pages not reached by a path from the starting page, or you want finer control which pages to be downloaded, you can create a list of pages to be downloaded using a pagelist, e.g. (:pagelist group=*,-PmWiki,-Site*:).
To hide unwanted links, use (:noaction:) (:noleft:) (:noheader:) (:nofooter:).
Wget should be called with -l 1 so only the pages listed are downloaded.
If you want the retrieved pages to have .html extensions automatically added to them, see the HtmlUrls recipe.
How can I login using AuthUser and wget?
PmWiki doesn't use HTTP Basic authentication (i.e.,
wget --http-user and
--http-passwd), but expects the data to be provided via POST requests. Try
Caution: on linux systems, it is unwise to type commands into the terminal that contain sensitive data such as passwords, as they will be stored in your ~/.bash_history or equivalent file as plain text.
In order to avoid this, unset the local HISTFILE variable:
$ unset HISTFILE
Then enter the desired command:
$ command --containing --sensitive --data
The only side effect is that no terminal history will be saved in ~/.bash_history until you next log in (restoring the HISTFILE value after the command doesn't work).
Use ExportHTMLWiki recipe script, which includes a batch export command.
Use BackupHTMLZip recipe script, which exports a static copy of your wiki.
- Cookbook /
- BackupHTMLZip Export your wiki to static HTML then optionally compress/zip it. (Experimental)
- ExportHTMLWiki Create a wiki using static HTML pages as front end for visitors (stable)
- FastCache Caches complete wiki pages for very fast retrieval (beta)
- HtmlUrls Add ".html" to the end of page urls (Core)
See discussion at ExportHTML-Talk