Skip to main content

Command Palette

Search for a command to run...

Functional Parsers and Their Composition with Examples in C#

Part 1

Updated
5 min read

Parsers are used almost everywhere, from converting a string into a number to parsing constructs of programming languages. This is a rather well-worn and thoroughly researched topic. Here, an unusual approach to building parsers is presented—one that is characteristic of functional programming languages. This has now become possible in C# thanks to its mature evolution in the functional direction.

A parser is essentially a function that transforms an input string into a parsed structural tree. What exactly the tree represents depends on what is being parsed. For example, when converting a string to a number, the result will be a number; for a programming language parser, it will be the structural tree of a program or module.

The unusual approach lies in the fact that complex parsers are built from simple parsers by combining them as functions. This is achieved using higher-order functions that take a function (or functions) as input and return another function. When using the LINQ syntax of the C# language, building parsers becomes an incredibly simple and engaging task. This approach fascinated me long ago, but I never had a chance to use it. Recently, I finished writing a library and successfully used it to parse the Fore language.

Now for the examples. A parser is described as a delegate:

public delegate IResult<T> Parser<out T>(Input input);

It takes one argument as input and returns a certain result. The input argument is a structure consisting of a string and the current position in that string. Any string is implicitly converted to this structure using an implicit cast operator.

public class Input
{
    private readonly string _source;
    private readonly int _position;
    private readonly int _line;
    private readonly int _column;

    public Input(string source);
    private Input(string source, int position, int line, int column);
    public Input Next();
    public Input Next(int skipChars);
    public int Column { get; }
    public int Line { get; }
    public int Position { get; }
    public string Source { get; }
    public bool Eof { get; }
    public char Current { get; }
    public bool IsNext(string value, bool ignoreCase);
    public static implicit operator Input(string source);
}

The result is also a structure that can hold either a value or parsing errors. There is also a Result class that implements the IResult interface.

public interface IResult<out T>
{
    bool Success { get; }
    IList<Error> Errors { get; }
    T Value { get; }
    Input Remainder { get; }
    string GetTrace();
}

Next, there is a class with extension functions, which defines simple parsers and higher-order functions for combining parsers. Two parser-combining functions, Select and SelectMany, are required by the C# compiler in order to use LINQ syntax.

The simplest character parser takes a predicate function as an argument, which determines whether the current character matches or not, and a text message describing the error:

public static Parser<char> Chr(char character)
{
    return Chr(c => c == character, () => string.Format("'{0}' expected", character));
}

public static Parser<char> Chr(Predicate<char> predicate, Func<string> errorMessage)
{
    if (predicate == null)
        throw new ArgumentNullException("predicate");
    if (errorMessage == null)
        throw new ArgumentNullException("errorMessage");
    return input => (!input.Eof) && predicate(input.Current) ?
        new Result<char>(input.Current, input.Next()) :
        new Result<char>(new Error(input, errorMessage(), Severity.Error, "char"));
}

A higher-order function that transforms a parser of a single element into a parser of multiple elements. Then the Select and SelectMany functions.

public static Parser<IEnumerable<T>> Many<T>(this Parser<T> parser)
{
    if (parser == null)
        throw new ArgumentNullException("parser");
    return input =>
    {
        List<T> values = new List<T>();
        IResult<T> result = parser(input);
        while (result.Success)
        {
            values.Add(result.Value);
            input = result.Remainder;
            result = parser(input);
        }
        return new Result<IEnumerable<T>>(values, input);
    };
}

public static Parser<TResult> Select<T, TResult>(this Parser<T> parser, Func<T, TResult> func)
{
    if (parser == null)
        throw new ArgumentNullException("parser");
    if (func == null)
        throw new ArgumentNullException("func");
    return input =>
    {
        IResult<T> result = parser(input);
        if (result.Success)
            return new Result<TResult>(true, func(result.Value), result.Errors, result.Remainder);
        return new Result<TResult>(result.Errors);
    };
}

public static Parser<TResult> SelectMany<T, TArg, TResult>(this Parser<T> source, Func<T, Parser<TArg>> selector, Func<T, TArg, TResult> resultSelector)
{
    if (source == null)
        throw new ArgumentNullException("source");
    if (selector == null)
        throw new ArgumentNullException("selector");
    if (resultSelector == null)
        throw new ArgumentNullException("resultSelector");
    return input =>
    {
        IResult<T> result = source(input);
        if (result.Success)
        {
            IResult<TArg> resultArg = new Result<TArg>(selector(result.Value)(result.Remainder), AppendRule(result.Errors, typeof(TResult).Name));
            if (resultArg.Success)
                return new Result<TResult>(resultSelector(result.Value, resultArg.Value), resultArg.Remainder);
            return new Result<TResult>(resultArg.Errors);
        }
        return new Result<TResult>(result.Errors);
    };
}

And this is where all the magic begins. As an example, I will write a parser for a signed integer. For this, we need the Or variant combinator and the optional parsing combinator.

public static Parser<T> Or<T>(this Parser<T> parser, Parser<T> other)
{
    if (parser == null)
        throw new ArgumentNullException("parser");
    if (other == null)
        throw new ArgumentNullException("other");
    return input =>
    {
        IResult<T> result = parser(input);
        if (result.Success)
            return result;
        return other(input);
    };
}

public static Parser<T> Optional<T>(this Parser<T> parser, T defaultValue)
{
    if (parser == null)
        throw new ArgumentNullException("parser");
    return input =>
    {
        IResult<T> result = parser(input);
        if (result.Success)
            return new Result<T>(result.Value, result.Remainder);
        return new Result<T>(defaultValue, input);
    };
}

A parser for a signed integer. An optional plus or minus before the digits, at least one digit, followed by any number of digits. The code is easy to read.

Parser<int> intParser = 
    from sign in Chr('+').Or(Chr('-')).Optional('+')
    from digit in Chr(c => Char.IsDigit(c), () => "Integer expected")
    from digits in Chr(c => Char.IsDigit(c), () => "Integer expected").Many()
    select int.Parse(sign + digit + string.Concat(digits));

Usage examples:

Console.WriteLine(intParser("-12345"));
Console.WriteLine(intParser("+*a"));

Result:

-12345
Error(s) in line 1, column 2: Integer expected

Another example is parsing a structure of the form (number;number) consisting of two integers and summing them:

Parser<int> structParser = 
    from leftBracket in Chr('(')
    from number1 in intParser
    from semicolon in Chr(';')
    from number2 in intParser
    from rightBracket in Chr(')')
    select number1 + number2;

Usage examples:

Console.WriteLine(structParser("(-5;2)"));
Console.WriteLine(structParser("(123)"));

Result:

-3
Error(s) in line 1, column 5: ';' expected

Sources:
http://en.wikipedia.org/wiki/Parser_combinator
http://blogs.msdn.com/b/lukeh/archive/2007/08/19/monadic-parser-combinators-using-c-3-0.aspx
http://cascadeofinsights.com/post/1629270971/parser-combinators-in-c
http://microparser.codeplex.com/
http://lorgonblog.wordpress.com/2007/12/02/c-3-0-lambda-and-the-first-post-of-a-series-about-monadic-parser-combinators/