The Iterator Pattern provides a standard mechanism to access collections, such as arrays, database result sets and the contents of a file. It can help code using the Iterator interface become reusable; more or less the same code which displays the results of a database query could also be used to display the contents of a directory, for example. There’s also some introduction to the idea of interface classes, coming soon with PHP 5.

1)

Looping the Loop

While loops are something so common in PHP we probably never think twice about how we use them, every time we display some database results in an HTML table.

From most people’s first encounters with PHP they’ll be used to writing control structures like;

while ( $row = mysql_fetch_array($result) ) {
   // Do something here
}

What other things get looped throught? The dir API;

$dir = dir('/home/user/files'); 
while($file = $d->read()) { 
    // Do something
}
$dir->close();

The same goes for walking through a single file e.g;

$fp = fopen ("/home/user/files/somefile.txt", "r");
while (!feof ($fp)) {
    $buffer = fgets($fp, 4096);
    // do something
}
fclose ($fp);

And for any general array we might do;

$someArray = array ('red','blue','green');
while ( $element = each ( $someArray ) ) {
    echo ( $element['value'].'<br />' );
}

Obviously there’s some kind of pattern going on here...

One Loop to Rule them All

Using an Iterator pattern, we can unify all these these situations to provide a single API so that stepping through a set of database results can be done in exactly the same way as stepping through the contents of a file (or an array containing an XML-RPC response...). In general the Iterator pattern is intended to provide a simple API for moving through “collections”.

To do so we need an agreement that the “collection”, be it a query result set, an array or the contents of a file, will provide the same methods for accessing the data. The agreement is called an interface, a feature coming with PHP5 as explained in this article on PHP Volcano.

Here’s an interface providing an API which can help unify all the loops above;

2)

<?php
/**
 * Defines the interface for concrete Iterators
 *
 * @interface
 */
class SimpleIterator {
    /**
     * Returns the current element from the collection
     * and moves the internal pointer forward one
     *
     * @return mixed
     */
    function fetch() {
        die ('SimpleIterator::fetch must be implemented');
    }
 
    /**
     * Returns the number of elements in the collection
     *
     * @return int
     */
    function size() {
        die ('SimpleIterator::size must be implemented');
    }
 
    /**
     * Resets the collection pointer to the start
     *
     * @return void
     */
 
    function reset() {
        die ('SimpleIterator::reset must be implemented');
    }
}
?>

The SimpleIterator get doesn’t actually get used or even extended by any other classes, it’s simply a contract which other classes, which claim to use the SimpleIterator interface, must obey; i.e they must provide the three methods defined.

Note though that as this is still PHP4, we have to “pretend” to implement the interface.

So putting this in practice, here’s a MySQLResultIterator which implements the SimpleIterator interface;

<?php
class MySQLResultIterator {
    var $result;
    function MySQLResultIterator (& $result) {
        $this->result=& $result;
    }
    function fetch() {
        return mysql_fetch_array($this->result);
    }
    function size() {
        return mysql_num_rows($this->result);
    }
    function reset() {
        return mysql_data_seek($this->result,0);
    }
}
?>

The above class takes a MySQL query result resource as it’s constructors argument then hides the MySQL functions behind the same API as the simple iterator. Here it is put to simple use;

<?php
require_once('lib/MySQLResultIterator.php');

mysql_connect('localhost', 'user', 'pass') or die(mysql_error()); 
mysql_select_db('database') or die(mysql_error()); 

$sql="SELECT * FROM table";
$result=mysql_query($sql);

$iterator=new MySQLResultIterator($result);

echo ( "<b>Number of results:</b> ".$iterator->size()."<br />\n" );
echo ( "<hr /><b>First iteration</b>\n" );
while ( $element = $iterator->fetch() ) {
    print_r($element);
}

$iterator->reset();

echo ( "<hr /><b>Second iteration</b>\n" );
while ( $element = $iterator->fetch() ) {
    print_r($element);
}
?>

To iterate through the contents of a directory instead, I can use an iterator like;

<?php
class DirectoryIterator {
    var $dir;
    function DirectoryIterator (& $dir) {
        $this->dir=& $dir;
    }
    function fetch() {
        return $this->dir->read();
    }
    function size() {
        $i=0;
        $this->reset();
        while ( $this->fetch() ) {
            $i++;
        }
        $this->reset();
        return $i;
    }
    function reset() {
        $this->dir->rewind();
    }
}
?>

This one takes an instance of PHP‘s inbuilt dir class and wraps it up in the SimpleIterator API. Note the size() method is not so good but there’s no alternative I know of for easily counting the number of files in a directory.

With some minor modifications to the code, using it is exactly the same as the MySQLResultIterator;

<?php
require_once('lib/DirectoryIterator.php');

$dir = dir('/home/user/files'); 

$iterator=new DirectoryIterator($dir);

echo ( "<b>Number of results:</b> ".$iterator->size()."<br />\n" );
echo ( "<hr /><b>First iteration</b>\n" );
while ( $element = $iterator->fetch() ) {
    print_r($element);
}

$iterator->reset();

echo ( "<hr /><b>Second iteration</b>\n" );
while ( $element = $iterator->fetch() ) {
    print_r($element);
}

$dir->close();
?>

The same goes for looping through the contents of a file or an array (examples are provided with the code download).

Note that the implementations of SimpleIterator above are geared for procedural code. We’d need to write our own for our own database access classes, for example.

A modified version of the DataAccessResult class built in this article might look like;

/**
 *  Fetches MySQL database rows as objects
 * @implements SimpleIterator
 */
class DataAccessResult {
    /**
    * @var object instance of DataAccess
    * @access private
    */
    var $da;
    /**
    * @var resource MySQL query resource
    * @access private
    */
    var $query;
 
    /**
    * Constucts a new DataAccess object
    * @param object instance of DataAccess
    * @param resource MySQL query resouce
    */
    function DataAccessResult(& $da,$query) {
        $this->da=& $da;
        $this->query=$query;
    }
    /**
    * Returns an array from query row or false if no more rows
    * @return mixed
    * @access public
    */
    function fetch () {
        if ( $row=mysql_fetch_array($this->query,MYSQL_ASSOC) )
            return $row;
        else
            return false;
    }
    /**
    * Returns the number of rows fetched
    * @return int
    */
    function size () {
        return mysql_num_rows($this->query);
    }
 
    /**
    * Resets the query resource
    * @return void
    */
    function reset () {
        mysql_data_seek($this->query,0);
    }
    /**
    * Returns false if no errors or returns a MySQL error message
    * @return mixed
    */
    function isError () {
        $error=$this->da->isError();
        if (!empty($error))
            return $error;
        else
            return false;
    }
}
?>

Cure for Hair Loss!

Now that’s all well as fine but a list of files from a directory can be alot different to rows from a database right? For the files you may have a single column table but for the query rows you may have many columns. Well now we have an API for fetch the results from either, building different tables get’s alot easier, using some more classes (which I guess are examples of the builder pattern perhaps - another story).

First we need a general class for building tables. This will use the iterator but when it comes to adding rows to the table, it will delegate this job to another class to create the row;

<?php
class TableBuilder {
    var $iterator;
    var $rowBuilder;
    var $table;
    function TableBuilder(& $iterator,$rowBuilder) {
        $this->iterator=& $iterator;
        $this->rowBuilder=& $rowBuilder;
        $this->table='';
        $this->build();
    }
 
    function build() {
        $this->table="<table>\n";
        $this->table.=$this->rowBuilder->columnHeaders();
        while ( $row = $this->iterator->fetch() ) {
            $this->table.=$this->rowBuilder->addRow($row);
        }
        $this->table.="</table>\n";
    }
 
    function getTable() {
        return $this->table;
    }
}
?>

The iterator gets used inside the build method. The TableBuilder doesn’t care what kind of iterator it is so we can happily throw it any type so long as it implements the API of the SimpleIterator interface.

Notice also that there’s no definition anywhere to a specific number of columns; all that is left to the $rowBuilder object passed to the TableBuilder to use.

Note that in practice, we’d probably include more methods for defining the look and feel of the table (e.g. use of CSS) but this demonstrates the idea.

Here’s the base class for row building;

<?php
class RowBuilder {
    function columnHeaders() {
        die ('RowBuilder::columnHeaders abstract');
    }
 
    function addRow($row) {
        die ('RowBuilder::addRow abstract');
    }
}
?>

And here’s a couple of examples, one for displaying a specify query result for some fictional “articles” table;

<?php
class ArticleRowBuilder extends RowBuilder {
    function columnHeaders() {
        return ( "<tr>\n<th>Title</th><th>Author</th>".
                 "<th>Date Published</th>\n</tr>\n" );
    }
 
    function addRow($row) {
        return ( "<tr>\n<td>".$row['title']."</td><td>".$row['author']."\n".
                 "<td>".$row['published']."</td>\n</tr>" );
    }
}
?>

Here’s the same again for the contents of a directory;

<?php
class DirectoryRowBuilder extends RowBuilder {
    function columnHeaders() {
        return ( "<tr>\n<th>Filename</th>\n</tr>\n" );
    }
 
    function addRow($file) {
        return ( "<tr>\n<td>".$file."</td>\n</tr>\n" );
    }
}
?>

Putting the former into action we have;

<?php
require_once('lib/MySQLResultIterator.php');
require_once('lib/TableBuilder.php');
require_once('lib/RowBuilder.php');
require_once('lib/ArticleRowBuilder.php');

mysql_connect('localhost', 'user', 'pass') or die(mysql_error()); 
mysql_select_db('database') or die(mysql_error()); 

$sql="SELECT * FROM articles";
$result=mysql_query($sql);

$iterator=new MySQLResultIterator($result);

$rowBuilder=new ArticleRowBuilder;
$tableBuilder=new TableBuilder($iterator,$rowBuilder);

echo ( $tableBuilder->getTable() );
?>

And the latter?

<?php
require_once('lib/DirectoryIterator.php');
require_once('lib/TableBuilder.php');
require_once('lib/RowBuilder.php');
require_once('lib/DirectoryRowBuilder.php');

$dir = dir('/home/user/files');

$iterator=new DirectoryIterator($dir);

$rowBuilder=new DirectoryRowBuilder;
$tableBuilder=new TableBuilder($iterator,$rowBuilder);

echo ( $tableBuilder->getTable() );

$dir->close();
?>

In other words, ton’s of code re-use. Displaying the results of either a table or a directory is now simply a matter of creating a new subclass of RowBuilder for the data to be displayed then making a few minor alterations to the code that uses all the classes.

Here’s the lot as UML;

The Need for Standards

A while back I was ranting about the need for a PHP Community Process. The Iterator pattern (IMO) demonstrates the point. If, whenever publishing an open source PHP project which includes some form of container class, we implemented the same iterator interface, using the class would be “plug and play”.

If I make a class available, for example, which lists files in a remote ftp directory and implement the SimpleIterator interface, you know what to expect and using your TableBuilder class (or otherwise) can make use of my FtpDirectoryList class without a moments hesition (or need to re-write it).

Better yet if we all agree on an API for building HTML tables...

It's an Iterator, Jim, but not as we know it

As I mentioned earlier, the SimpleIterator interface is fairly different to the way the Iterator pattern is normally implemented. Vincent Oostind’s Eclipse Library takes the standard approach to implementing an Iterator, the base class looking something like this (note I’ve modified this);

class Iterator {
    /***
     * Create a new iterator that's immediately ready for use.  Normally,
     * the constructor calls <code>reset() 

.

  • / function Iterator(& $container) {} /*
  • Initialize this iterator.
  • @returns void
  • / function reset() {} /*
  • Advance the internal cursor to the next object. The behavior of this
  • method is undefined if
    isValid()

    returns

    false

    .

  • @returns void
  • / function next() {} /*
  • Check if the iterator is valid
  • @returns bool
  • / function isValid() {} /*
  • Return a reference to the current object. The behavior of this method
  • is undefined if
    isValid()

    returns

    false

    .

  • @returns mixed
  • **/

function &getCurrent() {} } </code>

In general the way an Iterator is normally implemented is to accept the container as an argument in the Iterator’s constructor (or provide a method for adding containers) then provide an API much like the one above ( e.g. Java's List Iterator ) to access the elements in the container.

I’ve chosen to fly in the face of reason, the Gang of Four and Sun and implement the SimpleIterator interface directly in the container. Also the methods I’ve provided only deal with getting the current value and moving forward one (the fetch() method) or going back to the start (the reset() method), as well as finding out how many values are stored in the container (the size() method). Why?

My guess is 99% of operations that deal with some form of container in PHP simply read through it in sequence and display the results in a web page. You’re average PHP script never lives beyond 30 seconds and during that time, a user has no say in what happens. When walking through an array in PHP, for example, how often do you go three steps forward, two back then forward again? My guess is hardly ever, if at all. And most database queries that fetch sets of data will be output straight to a table.

Also having to worry about next(), hasNext() and so on makes looping more complicated, requiring use of for{} or do{}while loops rather than a simple while{} which everyone uses all the time with functions like mysql_fetch_array().

What’s more, the PHP function library generally hides all issues relating to indexes from us. How often, for example, do you use the function fseek()? Probably no where near as often as reading a file from start to finish like this;

while (!feof ($fp)) {
    $buffer = fgets($fp, 4096);
    // do something
}

In other words, most data fetching in PHP is based on sequential access, as with a database cursor, rather than worrying about the current index.

The reason why I’ve implemented the SimpleIterator in the container is again for ease of use in the code that uses the container. Having to first pass the container object to an Iterator instance makes for hard work. It may not always be necessary to implement directly in the container; perhaps use a subclass which implements the SimpleIterator.

Here’s the IndexedArrayIterator by the way, which implements the SimpleIterator;

<?php
class IndexedArrayIterator {
    var $array;
    function IndexedArrayIterator (& $array) {
        $this->array=& $array;
        $this->withKey=$withKey;
    }
    function fetch() {
        $element=each($this->array);
        return $element['value'];
    }
    function size() {
        return count($this->array);
    }
    function reset() {
        reset ($this->array);
    }
}
?>

This is designed for things like an array of objects and should not be used for associative arrays. The Articles class shown in the the Factory Method could easily implement the SimpleIterator in much the same way as the IndexedArrayIterator.

Overall I think the SimpleIterator conforms to the Iterator concept described in the Gang of Four Design Patterns while being an implementation which is better suited to PHP. Anyway, if you don’t agree, flame on...

[UPDATE 7 May 2003] One thing I’ve missed here which Simon Wilson was kind enough to point out - if the IndexedArrayIterator is used to step through an array of boolean values, the moment one of the values is false, it will stop iterating. Haven’t come across the problem before because I use this only on arrays of array or arrays of objects rather than arrays of simple typed variables.

Any ideas how to fix that (ideally without adding an isLast() method) - right now can’t see an easy one?

Resources

PHP Iterator - PHP Builder article demonstrating a more stardard type of iterator. API docs for Eclipse Iterators

Update

[20 August 2003] With further thought, I’m now tending to favour one of the two following iterators (depending on the problem). Both are sequential (i.e. they do not need some kind of internal index counter);

This is the simplest, the iterator being implemented the single method fetch();

class Collection {
    var $struct;
    function Collection(& $struct) {
        $this->struct = & $struct;
    }
    function & fetch() {
        $element = each ( $this->struct );
        if ( $element ) {
            return $element['value'];
        } else {
            reset ($this->struct);
            return false;
        }
    }
}

This works nicely for arrays or arrays or arrays of objects but should not be used to iterate over an array of scalar values (an empty string, NULL, false or 0 will halt the iteration).

The bigger brother of this iterator looks like this;

class Collection {
    var $collection;
    var $element;  // Single element stored here
    var $first = TRUE;
    function Collection (& $collection) {
        $this->collection = & $collection;
    }
    function reset() {
        $this->first = TRUE;
    }
    function next() {
        if ($this->first) {
            $record = & reset($this->collection);
            $this->first = FALSE;
        } else {
            $record = next($this->collection);
        }
        if (is_array($record)) {
            $this->record = $record;
            return TRUE;
        } else {
            $this->record = NULL;
            return FALSE;
        }
    }
    function get() {
        return $this->record;
    }
}

The “magic” above is in the PHP reset() function which, if you examine the manual carefully, returns the first element from the array as well as resetting it. The API of the above class is a little more complex but is able to iterate over arrays of scalar values which some may evaluate to false and it still delegates the maintaining of indexes to PHP itself, rather than building it’s own “array pointer”.

This comes thanks to Jeff Moore who’s doing an amazing job with a new framework over a WACT

One final note - PHP5 looks like it’s going to implementing iterators at an engine level - see here for details.

2) If you’ve run into Iterators before, like the Java Iterator interface, wait one moment before flaming; there’s a reason why the SimpleIterator above doesn’t look the same which I’ll explain in a moment.

design/the_iterator_pattern.txt · Last modified: 2005/10/15 21:47