Functional Parsers and Their Composition with Examples in C#
Part 2
Let us consider, as an example, the following piece of code that creates a parser for the Fore interface.
_interface =
from tokenInterface in ParseToken(Token.Interface)
from name in ParseToken(Token.Ident)
from baseInterface in (from tokenColon in ParseToken(Token.Colon)
from baseType in Parse.Ref(() => _qualifiedName)
select baseType).OptionalNull()
from members in Parse.Ref(() => _interfaceMembers)
from tokenEnd in ParseToken(Token.End)
from tokenEndInteface in ParseToken(Token.Interface)
from nameEnd in ParseToken(Token.Ident)
from tokenSemicolon in ParseToken(Token.Semicolon)
select new CodeInterface(name, baseInterface, members);
If you look at this code, you can notice that it corresponds to the BNF rule that can be written for the Fore interface.
<interface> ::= "Interface" <ident> [ ":" <ident> ] <interfaceMembers> "End" "Interface" <ident> ";"
Square brackets enclose an optional part, curly braces denote a repeating part (from 0 times to any number of times), and parentheses denote either a choice between several alternatives or a subexpression.
I will slightly modify the BNF rule syntax for convenience. The next few lines represent a modified BNF, where the rules describe themselves.
Module := (Rule)*:Rules;
Rule := Ident:Name ":=" Expression ";";
Expression := ExprVariantBlock ("|" ExprVariantBlock)*;
ExprVariantBlock := (ExprVariant (":" Ident:PropertyName)?)+;
ExprVariant := Token | Ident:RuleRef | "(" Expression ")" (RepeatModifier)?;
RepeatModifier := "?" | "*" | "+" | "{" UInteger:MinCount ("-" UInteger:MaxCount)? "}";
Token := String ("!")?;
An asterisk means repetition starting from 0 times, a plus means repetition starting from 1 time, and a question mark denotes an optional element. A vertical bar separates alternatives. Token is an identifier; an exclamation mark after the identifier means that it is case-sensitive. Ident, Integer, and UInteger are the standard rules for an identifier, a signed integer, and an unsigned integer. A colon after a rule element specifies the name of that element. Element names will be used when generating classes.
The equivalence between the code that constructs parsers and the BNF rules makes it possible to use a parser code generator for a set of BNF rules. The code generator uses prewritten classes that represent the parse tree of the BNF rules. Code generation starts from the first rule, meaning that the first rule represents the root of the parse tree of the text. During code generation, each rule, as well as each expression enclosed in parentheses (except the simplest ones), creates a class (a branch of the parse tree). Variants are generated as a class hierarchy, and repeating elements are generated as collections.
Let us take, for example, a structure of the form (number;number). The rule for this structure looks like this:
Structure := "(" Integer:FirstInteger ";" Integer:SecondInteger ")";
In this case, the generator will create a class for the entire set of rules and one class for the single rule.
public sealed class TestParser
{
private static AR.Parsing.Parser<Structure> _structure;
static TestParser()
{
_structure = from item0 in Parse.Token("(", true).TrimLeft()
from item1 in Parse.Integer.TrimLeft()
from item2 in Parse.Token(";", true).TrimLeft()
from item3 in Parse.Integer.TrimLeft()
from item4 in Parse.Token(")", true).TrimLeft()
select new Structure(item1, item3);
}
public static AR.Parsing.Parser<Structure> StructureParser => _structure;
public sealed class Structure
{
private int _firstInteger;
private int _secondInteger;
public Structure(int firstInteger, int secondInteger)
{
_firstInteger = firstInteger;
_secondInteger = secondInteger;
}
public int FirstInteger => _firstInteger;
public int SecondInteger => _secondInteger;
}
}
The second classical example is parsing the simplest expression consisting of numbers and the operations +, -, *, and /.
Sign := "+" | "-";
FactorSign := "*" | "/";
Expression := Addent (Sign Addent)*;
Addent := Factor (FactorSign Factor)*;
Factor := Ident | Integer | "(" Expression ")";
The third classical example is generating itself, i.e., a BNF parser from a description of BNF rules.




