Scraper

Summary: Add a markup for retrieving a portion of the content from an external webpage (screen-scraping).
Version:
Prerequisites: PmWiki 2.2beta, PHP.
Status: Project
Maintainer: CRAjr
Categories: Includes, RSS

Questions answered by this recipe

This section is optional; use it to indicate the types of questions (if any) this recipe is intended to answer.

How can I have PmWiki check the content of an external web page?

Description

The goal of this recipe is to create a markup -- maybe (:Scrape :), but I haven't decided yet -- that would capture the content of an external web page. The captured result could then be examined and used in conjunction with conditional markups to determine what is displayed on a Wiki page.

As an example, the organization where I work has an automatic mechanism for blocking rogue workstations from accessing the network. If a machine is making too many network requests or other rogue activity (usually due to malware), then it is automatically blocked. Unfortunately, no automatic notification of the blocking is done. Instead, a web page is available that lists the IPs and MACs of the misbehaving systems. Using this recipe, I should be able to create a Wiki page that checks for my group's IP range within the content of the master block web page. If no machines are found, then a box is displayed with a green background and a message that all is okay. If, however, one or more of my group's machines has been blocked, then a box with a red background will be displayed that contains only the IDs and other information of my group's systems as captured from the master block web page.

Notes

Your host probably must allow outgoing http requests.

I've only just begun creating and testing this recipe. Nothing is ready for posting yet.

My primary goal is to create a true/false result if a predefined string is found in the content of the scraped web page. A secondary goal, if the result is true, is to capture all the content between a second and third predefined string. This subset of the full content would then be available for display on the Wiki page.

The functionality of this markup is provided by an excellent PHP "screen-scraping" utility written by Troy Wolf called, class_http.php.

Installation

  • Nothing available yet!

Usage

Release Notes

If the recipe has multiple releases, then release notes can be placed here. Note that it's often easier for people to work with "release dates" instead of "version numbers".

See Also

Contributors

Comments