Sunday, April 8, 2012

A Recursive Decent Parser in C# using BNF

Introduction

Carrying on from my last two posts I'll quickly take the Backus Naur Form, or the Extended Backus Naur Form and use that to create a simple Recursive Decent Parser.

A word of caution. My use of BNF is a bit loose. If you were to write a EBNF that was going to be machine read, or if you were working on some serious industrial strength domain language then this should be tight and rigorous so that everyone understood the BNF syntax. Mine is a bit loose, but is fine for holding the grammar and being used to write the parser.

My two previous posts have been on:


The project

This is a simple calculator. So simple that it only takes in single characters, 2+1, 9+5 are valid. 12 + 6 isn't. (12 has two digits).


The BNF for the grammar


<expression>  :=  <number>  |
                   "(" <expression> <operator> <expression> ")" 
                   
<operator>  :=  "+" | "-" | "*" | "/" 
<number> := '0' |'1' |'2' |'3' |'4' |'5' |'6' |'7' |'8' |'9' 


Notice that there are three "production rules", expression, operator and number.
This suggests that there will be tree main methods with the same (or similar names).

Notice that I have not used a termination character. No reason why.
Notice that I have used BNF for assignment ":=" rather than "=". Why? I just prefer it.


The code

I tend to avoid comments in code as I firmly believe that the code should comment itself. However, I have taken the BNF rules and placed them as comments above the methods that implement them.
Notice that <expression> has become a method whose name is ExpressionValue()

Take notice of the Exception class. This is a must have really.


using System;

namespace RecursiveDecentParser
{
public class Calculator
{
private System.IO.StringReader _reader;
public Calculator (string text)
{
_reader = new System.IO.StringReader(text);
}
public double Calculate()
{
return ExpressionValue();
}
//<expression>  ::=  <number>  |
        //                   "(" <expression> <operator> <expression> ")" 
private double ExpressionValue()
{
SkipBlanks();
char ch = (char)_reader.Peek();
int num = -1;
bool chIsNumber = CharIsDigit(ch, out num);
if (chIsNumber)
{
_reader.Read();
return num;
}
else if(NextCharIsOpenBracket())
{
ConsumeNextChar();
double leftExpressionVal = ExpressionValue();
char op = GetOperator();
double rightExpressionVal = ExpressionValue();
SkipBlanks();
if(!NextCharIsClosingBracket())
{
throw new ParseException("Missing closing bracket.");
}
else
{
ConsumeNextChar();
return PerformOperation(leftExpressionVal, op, rightExpressionVal);
}
}
else
{
throw new ParseException("Unexpected character, \"" + _reader.Peek() + "\" in input.");
}
}
//BNF: <operator>  ::=  "+" | "-" | "*" | "/"
private char GetOperator()
{
SkipBlanks();
char op = (char)_reader.Peek();
if(op == '+' || op == '-' || op == '*' || op == '/')
{
_reader.Read();
return op;
}
else if (op == '\n')
throw new ParseException("Missing operator at end of line");
else 
throw new ParseException("Missing operator. Expected +, -, * or /" +
                        " Found " + op ); 
}
// BNF: <number> ::= '0' |'1' |'2' |'3' |'4' |'5' |'6' |'7' |'8' |'9'
// Yes! only one digit at the moment.
private int GetNumber()
{
SkipBlanks();
char ch = (char)_reader.Peek();
int num = -1;
//bool chIsNumber = int.TryParse(CharToString(ch), out num);
bool chIsNumber = CharIsDigit(ch, out num);
if(chIsNumber)
{
ConsumeNextChar();
return num;
}
else if (ch == '\n')
throw new ParseException("Missing operator at end of line");
else 
throw new ParseException("Missing operator. Expected +, -, * or /" +
                        " Found " + ch ); 
}
private double PerformOperation(double leftVal, char op, double rightVal)
{
double result = 0;
switch (op)
{
case '+':
result = leftVal + rightVal;
break;
case '-':
result = leftVal - rightVal;
break;
case '*':
result = leftVal * rightVal;
break;
case '/':
result = leftVal / rightVal;
break;
}
return result;
}
private bool CharIsDigit(char ch, out int num)
{
string s = CharToString(ch);
return int.TryParse(s, out num);
}
private bool NextCharIsOpenBracket()
{
bool res = false;
char ch = (char)_reader.Peek();
if (ch == '(')
res = true;
return res;
}
private bool NextCharIsClosingBracket()
{
bool res = false;
char ch = (char)_reader.Peek();
if (ch == ')')
res = true;
return res;
}
private void ConsumeNextChar()
{
_reader.Read();
}
private void SkipBlanks()
{
while(CharToString((char)_reader.Peek()) == " ")
ConsumeNextChar();
}
private string CharToString(char ch)
        {
            return new string(ch, 1);
        }
}


    [Serializable]
    public class ParseException : Exception
    {
        public ParseException() { }
        public ParseException(string message) : base(message) { }
        public ParseException(string message, Exception inner) : base(message, inner) { }

        protected ParseException(
            System.Runtime.Serialization.SerializationInfo info,
            System.Runtime.Serialization.StreamingContext context)
            : base(info, context) { }
    }
}



Unit tests

Here are some tests that show it's limitations and workings.
Notice that a single digit input will work, if it is enclosed in brackets it will not. This is in line with the BNF grammar.
An expression such as 2+3 must be enclosed in brackets.
As you can see, I have not tested the code for missing brackets, spaces in strange places, odd characters


[TestFixture]
public class SimpleInputTests
{
[Test]
[ExpectedException (typeof(ParseException))]
public void Exception_if_empty()
{
Calculator calculator = new Calculator("");
calculator.Calculate();
}
[Test]
[ExpectedException (typeof(ParseException))]
public void Simple_bracket_and_one_digit()
{
Calculator calculator = new Calculator("(2)");
calculator.Calculate();
}
[Test]
public void Simple_addition_with_brackets()
{
Calculator calculator = new Calculator("(2+2)");
double answer = 0;
answer = calculator.Calculate();
Assert.AreEqual(4, answer);
}
[Test]
public void No_brackets_and_single_input_digit()
{
Calculator calculator = new Calculator("2");
Assert.AreEqual(2, calculator.Calculate());
}
[Test]
public void Simple_multiplication_with_brackets()
{
Calculator calculator = new Calculator("(2*5)");
double answer = 0;
answer = calculator.Calculate();
Assert.AreEqual(10, answer);
}
[Test]
public void Simple_division_with_brackets()
{
Calculator calculator = new Calculator("(8/4)");
double answer = 0;
answer = calculator.Calculate();
Assert.AreEqual(2, answer);
}
[Test]
public void Simple_subtraction_with_brackets()
{
Calculator calculator = new Calculator("(8-5)");
double answer = 0;
answer = calculator.Calculate();
Assert.AreEqual(3, answer);
}
[Test]
public void Single_digit_add_bracketed_addition_clause()
{
Calculator calculator = new Calculator("(3 + (4+1))");
double answer = 0;
answer = calculator.Calculate();
Assert.AreEqual(8, answer);
}
[Test]
public void Bracketed_addition_clause_add_single_digit()
{
Calculator calculator = new Calculator("((4+1) + 3)");
double answer = 0;
answer = calculator.Calculate();
Assert.AreEqual(8, answer);
}
[Test]
public void Random_complex_sum()
{
Calculator calculator = new Calculator("((4+(5-4)) + 2)");
double answer = 0;
answer = calculator.Calculate();
Assert.AreEqual(7, answer);
}
}
}



No comments:

Post a Comment