(redirected from Cookbook.PmWiki2HTML-usingWGET)
Questions answered by this recipe
PmWiki does not come with a built-in export to HTML feature, but other tools can be used to grab a "static" copy of PmWiki pages. Here's at least two options.
1. Export using Wget
Wget recursively from a start page
The following wget(1) command will grab all of the pages from a site, which are linked in some way from the starting page, fixing up links to be relative as needed.
wget -r -k -np -p -l 0 http://example.org/wiki --html-extension
wget -r -k -np -p -l 0 http://example.org/wiki/SomeGroup --html-extension
Here's the meaning of the options to wget:
-r Recursive retrieval. Wget will follow any links it finds in the document -k Link adjustment. After retrieving the pages, wget will convert all of the downloaded files to have relative links instead of absolute ones. -np No parent. Wget will restrict itself to the path given on the command line (in the above case, it would only download the pages of SomeGroup). -p Prerequisites. Wget will retrieve local copies of any .css files or gif images needed to display the static copy of the page locally. -l 0 Infinite follow. Wget will follow all of the links it encounters (subject to the -np restriction above), so that it will completely spider the group. --html-extension Adds the extension html to each file to allow for local viewing. Internet Explorer and Firefox on Windows is too stupid to not recognize html files by code only.
Since Wget doesn't respect the rel="nofollow" extension, it will follow also the action links. To avoid this:
Wget plus page list
If you want to include pages not reached by a path from the starting page, or you want finer control which pages to be downloaded, you can create a list of pages to be downloaded using a pagelist, e.g. (:pagelist group=*,-PmWiki,-Site*:).
To hide unwanted links, use (:noaction:) (:noleft:) (:noheader:) (:nofooter:).
Wget should be called with -l 1 so only the pages listed are downloaded.
If you want the retrieved pages to have .html extensions automatically added to them, see the HtmlUrls recipe.
How can I login using AuthUser and wget?
PmWiki doesn't use HTTP Basic authentication (i.e.,
Caution: on linux systems, it is unwise to type commands into the terminal that contain sensitive data such as passwords, as they will be stored in your ~/.bash_history or equivalent file as plain text.
In order to avoid this, unset the local HISTFILE variable:
Then enter the desired command:
The only side effect is that no terminal history will be saved in ~/.bash_history until you next log in (restoring the HISTFILE value after the command doesn't work).
See discussion at ExportHTML-Talk