PmDocConvert

Summary: PmDocConvert makes it easy to upload and display formats that can be handled by OpenOffice.org on a PmWiki page.
Version: 20171013
Prerequisites: pmwiki 2.2.58
Status: Alpha
Maintainer: Ccox
Categories: Uploads Images
Discussion: PmDocConvert-Talk

How can I easily embed/convert foreign documents inside of my wiki?

How about?

 
(:docconvert file=MyPresentation.ppt display=inline iborder=0:)
(:docconvert file=MyDocument.doc display=inline iwidth=100%:)
(:docconvert file=MySpreadsheet.xls display=inline headingrows=1,2:)
 

Description

PmDocConvert makes it easy to upload and display formats that can be handled by OpenOffice.org/LibreOffice.org on a PmWiki page. This includes popular Microsoft formats like PowerPoint®, Excel® and Word®. Uploaded documents can be displayed inline or converted to other formats with links to that format.

PmDocConvert uses a program called DC.pyΔ to interface to OpenOffice.org running in daemon mode with PyUNO (Python/UNO) support. If you are using OpenOffice.org 3.0 or higher, chances are you have this support already built-in. Getting OpenOffice.org software to run in daemon mode is left up to the implementer, but I will share my soffice-daemon init script hereΔ. In particular this init script was designed for SUSE Enterprise Linux 10, but you can easily modify it for your own Linux distribution or Unix distribution. Note: The DC.py provided here is NOT the DocumentConverter.py script found at http://www.artofsolving.com/opensource/pyodconverter. I used that script to create DC.py.

Typical headless invoke for LibreOffice looks like:

 /usr/lib64/libreoffice/program/soffice.bin  --headless --accept='socket,port=8100;urp;'

Installation

  1. Assume uploads are using the default style where they are placed in a directory named by Group (if not, scripts will have to be altered).
  2. Assume the ability to reference PmWiki uploads by URL is enabled and working.
  3. Assume you have Python.
  4. Assume you have OpenOffice.org 3.0 or higher with Python/UNO support.
  5. Get DC.pyΔ (remove .txt).
  6. Get and modify soffice-daemonΔ (remove .txt).
  7. Make alterations where necessary to DC.py and soffice-daemon.
  8. Attempt to get the OpenOffice.org binary to start and stay daemonized (should run as same user as your web server... sorry).
  9. Attempt to connect to it running DC.py (e.g. DC.py test.doc test.html) Note: you must run as the same user that owns the daemon.

That is part one. If you did not make it this far, then you will not make it any further. Feel free to look at the http://www.artofsolving.com/opensource/pyodconverter page for additional hints in trying to make this work. I did NOT try to test this under Windows.

  1. Get pmdocconvert.phpΔ and install it into your cookbook area.
  2. Add include_once("$FarmD/cookbook/pmdocconvert.php"); to your local/config.php.

This recipe now writes HTML conversion trees inside your upload dir under pmdocconvert/Group/Page.

If you have protected uploads, but in the necessary web server allowances for that pmdocconvert directory.

As of 1.0 PmDocConvert tries to allow for per page uploads.

Usage

Execute DC.py --help and it will show you the myriad of conversion options that can be used to it. The act of converting from one format to another is defined by the filters in OpenOffice.org and are done by looking at the extension types as far as DC.py is concerned.

The general markup format is (all on one line):

 
 (:pmdocconvert file=upload-file 
    [convertto=convert-type] default=pdf
    [display=display-type] default=link
    [headingrows=csv-inline-heading-rows] default=1, can use commas to do multiple heading rows
    [startrow=what-row-to-start-from-for-csv-inline] default=1
    [showfilelink=true/false] default=false, allows showing link to original uploaded file
    [popiframe=true/false] default=false,  if Cookbook/PopupIFrame included and true, it will display converted file or html in a popup modal iframe.
    [iborder=iframe-border]
    [iscroll=iframe-scrollbars]
    [iwidth=iframe-width]
    [iheight=iframe-height]
    [ialign=iframe-alignment]
    [dcopts=options-to-DC.py] :)
 
Where:

convertto =

  • html
  • txt
  • png
  • jpg
  • gif
  • csv
  • pdf (default)
  • (etc...)

Converted files, apart from HTML conversion, are named using the basename of the upload-file followed by an extension appropriate to the conversion (e.g. file=upload.ppt converrto=jpg would result in a file called upload.jpg). Converting to HTML creates output in a directory in the uploads area named by upload-file.dir which contains the converted content.

display =

  • inline (attempt to show converted file inline. For html, display inside of an IFRAME.)
  • attach (display as an Attach:converted-file)
  • link (default) (display as a link to the attachment [[Attach:converted-file|converted-file]])
  • imagelink (display as [[Attach:upload-file|Attach:converted-file]]) For convertto= some image type.

iborder,iscroll,iwidth,iheight,ialign

  • iborder = size of IFRAME border
  • iscroll = true/false for scrollbars on IFRAME
  • iwidth = width of IFRAME
  • iheight = height of IFRAME
  • ialign = alignament left/right/center for IFRAME

dcopts =

You can include DC.py options through this. You can type DC.py --help for a full list. For example, dcopts='--Format=2 --BackColor=0xff0000'

Other Options

  • quiet = true/false (if true, do not display anything, just convert)
  • headingrows = #,#,... (set to the row numbers to highlight when converting to csv inline display)
  • forceconvert = true/false (if true, convert, othewise conversion does not happen unless source is newer)

Examples

 
 (:docconvert files=MyPresentation.ppt convertto=pdf:)
 

Since pdf is the default, you don't need convertto in this case. The result is a link (the default display) to the converted pdf file of the presentation.

 
 (:docconvert files=MyDocument.doc convertto=html iwidth=100%:)
 

Since convertto=html, the assumption is that display=inline, so we set the IFRAME parameter to use the width of the browser space.

 
 (:docconvert files=MySpreadsheet.csv convertto=csv headingrows=1,2,8:)
 

The default display for converrto=csv is an inline PmWiki table with rows 1,2 and 8 designated at PmWiki table row headings (bolded by default in PmWiki).

 
 (:docconvert files=QuietDocument.doc quiet=true:)
 

Converts QuietDocument.doc to QuietDocument.pdf, but does not display anything.

 
 (:docconvert files=SpecialDrawing.odp convertto=html iborder=0 dcopts='--BackColor=0xffffff
    --LinkColor=0xffffff --ALinkColor=0xffffff --VLinkColor=0xffffff --UseButtonSet=-1:)
 

Yuk! What is that? Well, in Open Office, there's little difference between an Impress Presentation and a Drawing, so instead of using Open Office Draw, I used Impress... why? We can create object actions in Impress, for example, the ability to go to a different page by clicking on an object. Turns out that in HTML conversion an ImageMap is created to handle the hot spot areas. But using the Text (non-graphical) UseButtonSet, and by making all of the background, text and links white, it will look like an image, but it will have hot clickable areas defined.

 
 (:docconvert file=virtualization_whitepaper.doc popiframe=true convertto=html:)
 

Display converted HTML inside of a popup iframe (PopupIFrame required).

Contributors

Ccox

Comments

See discussion at PmDocConvert-Talk

User notes : If you use, used or reviewed this recipe, you can add your name. These statistics appear in the Cookbook listings and will help newcomers browsing through the wiki.