Been playing around with an alternative API for parsing XML which may appeal if you hate SAX and DOM.

1)

XML Pull

The XML Pull API is another way of parsing XML, similar to the SAX API (provided by the native XML extension) but perhaps easier to use - no need for call back functions.

As with SAX, the API has been defined by developers as opposed to being a W3 standard like DOM, the reference implementation being in Java here: http://www.xmlpull.org/.

While playing around with SaxFilters realised it’s pretty easy to “invert” the SAX API to an XML Pull API.

So enter XML Pull in PHP, the prototype for which you can get via CVS;

cvs -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/htmlsax login

cvs -z3 -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/htmlsax co xmlpull

What does this mean? Parsing an XML document can look like this;

<?php
/* $Id: index.php,v 1.1.1.1 2003/08/08 21:31:44 harryf Exp $ */
if (!defined('XMLPull')) {
    define('XMLPull', '../../');
}
require_once(XMLPull.'PullFactory.php');

$test = <<<EOD
<?xml version="1.0"?>
<config>
    <database host="losthost">
        <dbname>test</dbname>
        <user>HarryF</user>
        <pass>Secret</pass>
    </database>
</config>
EOD;

// $reader = & fopen ('test.xml','r');
$reader = fopen('string://test','r'); // Uses stream_wrapper_register for a string
$parser = & PullFactory::getParser('expat',$reader);

// Here's how you handle events - simpler than normal SAX?

while ( $event = $parser->parse() ) {
    switch ( $type = $event->getType() ) {
        case XML_PULL_START_TAG:
            echo ( '<hr />' );
            echo ( 'Start tag: '.$event->getTag().'<br />' );
            echo ( 'Attributes:' );
            print_r($event->getAttribs() );
        break;
        case XML_PULL_END_TAG:
            echo ( 'End tag: '.$event->getTag().'<br />' );
            echo ( '<hr />' );
        break;
        case XML_PULL_TEXT:
            echo ( 'Text: '.$event->getText().'<br />' );
        break;
    }
}
?>

I’ve only implements three states right now and there’s no recoginition of namespaces etc. But all in good time...

The overhead is fairly low. Profiling with XDebug turn out that PHP‘s require_once() slows things down the most.

If you’re wondering what this is BTW;

$reader = fopen('string://test','r');

Head to stream_wrapper_register()

For more on XML Pull try here at Javaworld

1) So “they” (wife and friend) have gone out on Friday night leaving me to look after two(!) kids, both of which are thankfully asleep. Time for action...

blogs/harry_fuecks/xml_pull_in_php.txt · Last modified: 2005/10/15 21:47