Functional Parsers and Their Composition with Examples in C#
Part 1
Parsers are used almost everywhere, from converting a string into a number to parsing constructs of programming languages. This is a rather well-worn and thoroughly researched topic. Here, an unusual approach to building parsers is presented—one that is characteristic of functional programming languages. This has now become possible in C# thanks to its mature evolution in the functional direction.
A parser is essentially a function that transforms an input string into a parsed structural tree. What exactly the tree represents depends on what is being parsed. For example, when converting a string to a number, the result will be a number; for a programming language parser, it will be the structural tree of a program or module.
The unusual approach lies in the fact that complex parsers are built from simple parsers by combining them as functions. This is achieved using higher-order functions that take a function (or functions) as input and return another function. When using the LINQ syntax of the C# language, building parsers becomes an incredibly simple and engaging task. This approach fascinated me long ago, but I never had a chance to use it. Recently, I finished writing a library and successfully used it to parse the Fore language.
Now for the examples. A parser is described as a delegate:
public delegate IResult<T> Parser<out T>(Input input);
It takes one argument as input and returns a certain result. The input argument is a structure consisting of a string and the current position in that string. Any string is implicitly converted to this structure using an implicit cast operator.
public class Input
{
private readonly string _source;
private readonly int _position;
private readonly int _line;
private readonly int _column;
public Input(string source);
private Input(string source, int position, int line, int column);
public Input Next();
public Input Next(int skipChars);
public int Column { get; }
public int Line { get; }
public int Position { get; }
public string Source { get; }
public bool Eof { get; }
public char Current { get; }
public bool IsNext(string value, bool ignoreCase);
public static implicit operator Input(string source);
}
The result is also a structure that can hold either a value or parsing errors. There is also a Result class that implements the IResult interface.
public interface IResult<out T>
{
bool Success { get; }
IList<Error> Errors { get; }
T Value { get; }
Input Remainder { get; }
string GetTrace();
}
Next, there is a class with extension functions, which defines simple parsers and higher-order functions for combining parsers. Two parser-combining functions, Select and SelectMany, are required by the C# compiler in order to use LINQ syntax.
The simplest character parser takes a predicate function as an argument, which determines whether the current character matches or not, and a text message describing the error:
public static Parser<char> Chr(char character)
{
return Chr(c => c == character, () => string.Format("'{0}' expected", character));
}
public static Parser<char> Chr(Predicate<char> predicate, Func<string> errorMessage)
{
if (predicate == null)
throw new ArgumentNullException("predicate");
if (errorMessage == null)
throw new ArgumentNullException("errorMessage");
return input => (!input.Eof) && predicate(input.Current) ?
new Result<char>(input.Current, input.Next()) :
new Result<char>(new Error(input, errorMessage(), Severity.Error, "char"));
}
A higher-order function that transforms a parser of a single element into a parser of multiple elements. Then the Select and SelectMany functions.
public static Parser<IEnumerable<T>> Many<T>(this Parser<T> parser)
{
if (parser == null)
throw new ArgumentNullException("parser");
return input =>
{
List<T> values = new List<T>();
IResult<T> result = parser(input);
while (result.Success)
{
values.Add(result.Value);
input = result.Remainder;
result = parser(input);
}
return new Result<IEnumerable<T>>(values, input);
};
}
public static Parser<TResult> Select<T, TResult>(this Parser<T> parser, Func<T, TResult> func)
{
if (parser == null)
throw new ArgumentNullException("parser");
if (func == null)
throw new ArgumentNullException("func");
return input =>
{
IResult<T> result = parser(input);
if (result.Success)
return new Result<TResult>(true, func(result.Value), result.Errors, result.Remainder);
return new Result<TResult>(result.Errors);
};
}
public static Parser<TResult> SelectMany<T, TArg, TResult>(this Parser<T> source, Func<T, Parser<TArg>> selector, Func<T, TArg, TResult> resultSelector)
{
if (source == null)
throw new ArgumentNullException("source");
if (selector == null)
throw new ArgumentNullException("selector");
if (resultSelector == null)
throw new ArgumentNullException("resultSelector");
return input =>
{
IResult<T> result = source(input);
if (result.Success)
{
IResult<TArg> resultArg = new Result<TArg>(selector(result.Value)(result.Remainder), AppendRule(result.Errors, typeof(TResult).Name));
if (resultArg.Success)
return new Result<TResult>(resultSelector(result.Value, resultArg.Value), resultArg.Remainder);
return new Result<TResult>(resultArg.Errors);
}
return new Result<TResult>(result.Errors);
};
}
And this is where all the magic begins. As an example, I will write a parser for a signed integer. For this, we need the Or variant combinator and the optional parsing combinator.
public static Parser<T> Or<T>(this Parser<T> parser, Parser<T> other)
{
if (parser == null)
throw new ArgumentNullException("parser");
if (other == null)
throw new ArgumentNullException("other");
return input =>
{
IResult<T> result = parser(input);
if (result.Success)
return result;
return other(input);
};
}
public static Parser<T> Optional<T>(this Parser<T> parser, T defaultValue)
{
if (parser == null)
throw new ArgumentNullException("parser");
return input =>
{
IResult<T> result = parser(input);
if (result.Success)
return new Result<T>(result.Value, result.Remainder);
return new Result<T>(defaultValue, input);
};
}
A parser for a signed integer. An optional plus or minus before the digits, at least one digit, followed by any number of digits. The code is easy to read.
Parser<int> intParser =
from sign in Chr('+').Or(Chr('-')).Optional('+')
from digit in Chr(c => Char.IsDigit(c), () => "Integer expected")
from digits in Chr(c => Char.IsDigit(c), () => "Integer expected").Many()
select int.Parse(sign + digit + string.Concat(digits));
Usage examples:
Console.WriteLine(intParser("-12345"));
Console.WriteLine(intParser("+*a"));
Result:
-12345
Error(s) in line 1, column 2: Integer expected
Another example is parsing a structure of the form (number;number) consisting of two integers and summing them:
Parser<int> structParser =
from leftBracket in Chr('(')
from number1 in intParser
from semicolon in Chr(';')
from number2 in intParser
from rightBracket in Chr(')')
select number1 + number2;
Usage examples:
Console.WriteLine(structParser("(-5;2)"));
Console.WriteLine(structParser("(123)"));
Result:
-3
Error(s) in line 1, column 5: ';' expected
Sources:
http://en.wikipedia.org/wiki/Parser_combinator
http://blogs.msdn.com/b/lukeh/archive/2007/08/19/monadic-parser-combinators-using-c-3-0.aspx
http://cascadeofinsights.com/post/1629270971/parser-combinators-in-c
http://microparser.codeplex.com/
http://lorgonblog.wordpress.com/2007/12/02/c-3-0-lambda-and-the-first-post-of-a-series-about-monadic-parser-combinators/




