|
Cookbook /
ConvertTABLESummary: How to convert HTML pages with tables full of data into advanced tables
Version: 17 September 2005
Prerequisites:
Status:
Maintainer: Brooks Kelley
Discussion: ConvertTABLE-Talk
Note: Cookbook.ConvertHTML converts HTML to PmWiki markup "on the fly"
QuestionHow can I make it easier to convert HTML pages with tables full of data into advanced tables? AnswerI had this problem with AzRepeaters.Net. I needed to convert about 20 pages filled with tables of data into a PmWiki's advanced table markup Currently, my solution is a Linux one using the Bash command line and not a PmWiki cookbook recipe. So, I ran this command line on a bash console on the HTML pages I needed to convert.... linuxmachine> cat filecontainingtabledata.html | sed '/^$/d' |sed 's/[ \t]*$//' | tr -d [:cntrl:] | tr -s [:blank:] |sed 's/</\n</g' | sed 's#^<[Tt][Rr].*>#(:cellnr:)#g' |sed 's#^<[Tt][Dd].*>#(:cell:)#g' |sed -e :a -e 's/<[^>]*>//g;/</N;//ba' | tr -d [:cntrl:] |sed 's/(:cell/\n(:cell/g' | grep "^(:cell" > data_in_pmwiki_markup.txt
Let me explain what is going on with this command line. First, be very careful with the syntax. The symbol you see above " | " is the key on your keyboard just about the enter key. It creates a pipe that streams data from one command to another until you get a final output. In other words, it is a necessary part of the command string. I start the stream by using After creating the data stream, I then pipe it into Then it is piped into Then I pipe it into The reason it might be considered clever is that sed works on one line at a time. Even though a lot of data is being shipped thru the pipe, it still is parsed one line at a time. This makes it easier to do line be line editing with a streaming editor like sed. And, as you will see later, I use it to do little but important tricks as the data streams thru each pipe. Then I pipe it again into O.K., now that is done, we can pipe it again into Then, I want to run everything together as one line. You may see why in a moment. I continue piping and use Now that everything is together. I can pipe it into the next step and set up each occurence of (:cell to start a new line. I do that with Now, I finally pipe it for the last time where I just I then edited it in a text editor. I know that Patrick talked about making this into a another PmWiki recipe since most of the cookbook is that. But this will give you a start until that is done. Neat Little Add On to make it work better By the way, in case you do run this script, you will find that you create an extra (:cell:) just after the (:cellnr:). The way to get rid of that extra (:cell:) is to add this to the script by piping the stream into the commands You will have to adjust the spacing of the blanks in sed 's/(:cellnr:) (:cell:)/(:cellnr:)/g' to get it to delete the extra (:cell:). Caveat and work around! My script does remove HTML tags that you might want to include in the final data. I had that problem too. I adjusted the script with a This means you can change tags like <A HREF=http://www.somedomain.com to [[http://www.somedomain.com. Then on the next to final command, you can change the final bit of the tag left over of > to ]] . I had to do this also because AzRepeater.Net has links showing where repeaters are on a map. Notes and Comments''The newest the latest. And please don't forget to end by date and name. Thanks) Versionsdate of publication : 2005-09-02 : name of the cookbook - version 00007
See Also
ContributorsBrooks Kelley 17 September 2005 CommentsSee discussion at ConvertTABLE-Talk User notes? : If you use, used or reviewed this recipe, you can add your name. These statistics appear in the Cookbook listings and will help newcomers browsing through the wiki. |