Just dropping a quick example while experimenting with Simple Tests Lexer. If you’re in need of a tool to help parse some mini language in PHP, it’s a good place to look.
One of the by products of Simple Test is is regular expression-based lexer, found under CVS here. Marcus uses it to build an HTML parser for Simple Test but it can be applied to pretty much anything you like. Note there’s a standalone version of the Lexer available at Marcus SourceForge “code dump” - http://sf.net/projects/lamplib.
Right now I’m driven to explore it out of need. WACT is evolving a template expression / data binding language, use of which is described in the Template Authors Guide. Unfortunately there’s a bug ( effects described here ) which seems to suggest that PHP‘s PCRE syntax changed somewhere between PHP 4.1.2 and 4.3.2.
The regex in question is;
$regex = '/^((?Us).*)'. preg_quote('{$', '/') .
'(([^"\'}]+|(\'|")(?U).*\4)+)' .
preg_quote('}', '/') . '((?s).*)$/';
I’ve found no obvious indication of changes to the syntax in the PHP manual / CVS ( any input much appreciated ). That regex also makes my mind bend (Jeffs mind bends to this ) so exploring the possibilities of a more manageable approach to parsing the template expression language.
Anyway, cutting a long story short, here’s very basic example of using Simple Tests lexer as a template engine, which may help someone get started. Further examples are best found by examining the parser test cases.
If I have a template like;
The opening message is {$Greeting}<br>
The final word on the subject is {$Closing}
where {$Greeting} and {$Closing} are template variable references I want replaced with some values I’ve assigned to the template engine, a parser using Simple Tests lexer might look like;
<?php // Include the parser require_once 'path/to/simpletest/parser.php'; class YetAnotherTemplateParser { // Template output placed here var $output = ''; // Hash of variables to replace var $phpVars = array(); // Register a variable function registerVariable($name,$value) { $this->phpVars[$name] = $value; } // Display the page function display() { echo $this->output; } /** * Callback function (or mode / state), called by the Lexer. This one * deals with text outside of a variable reference. * @param string the matched text * @param int lexer state (ignored here) */ function writePlainText($match,$state) { $this->output .= $match; return TRUE; } /** * Callback for template variable references. * @param string the matched text * @param int lexer state */ function writeVariable($match,$state) { switch ($state) { // Entering the variable reference case LEXER_ENTER: // Start of variable reference - nothing to do yet break; // Contents of the variable reference case LEXER_UNMATCHED: if ( isset($this->phpVars[$match]) ) { $this->output.= $this->phpVars[$match]; } break; // Exiting the variable reference case LEXER_EXIT: // End of variable reference - nothing to do - finished break; } return TRUE; } } // Create the template parser $Parser = & new YetAnotherTemplateParser (); // Register some template variables to be replaced $Parser->registerVariable('Greeting','Hello World!'); $Parser->registerVariable('Closing','Goodbye World!'); //Create the Lexer // Second arg: the initial "state" or callback function // Third arg: case sensitivity ON - not relevant to this example as // as regex patterns are not alphas but so you know $Lexer = &new SimpleLexer($Parser,'writePlainText',TRUE); // Add the variable reference starting regex pattern // Second arg: the current state to which this pattern applies - prevents // prevents template syntax like {$my{$var causing two // transitions // Third arg: the state (callback function) to send further calls to // once the pattern has been found $Lexer->addEntryPattern('\{\$','writePlainText','writeVariable'); // Add the exit pattern for variable references, returning the Lexer to // it's previous state (uses a state stack) // Second arg: the state in which this pattern applies $Lexer->addExitPattern('\}','writeVariable'); $template = 'The opening message is {$Greeting}<br> The final word on the subject is {$Closing}'; $Lexer->parse($template); $Parser->display(); ?>
The comments hopefully explain what’s going on.
Of course this a very undemanding language but from messing around so far, get the feeling that Simple Tests Lexer could scale nicely to a pretty complex language (HTML, which Marcus uses it for, is none too easy to parse).
Note: this is not to encourage you to write yet more template engines! There’s more than enough. Parsing CSS, Javascript, VBScript (nudge nudge) or SQL, for example, could be worthwhile missions though or for mini languages like WACTs template expression language.