2014-05-08
md
Adding Boolean Operations
Part 7 – Optimizing by Constant Folding-> <-Part 5 – Constructing a Parse Tree
 Part 6 of Parsing and Evaluating Mathematical Expressions 

discontinuous By building a parse tree, it is now possible to evaluate efficiently mathematical expressions as when plotting the graph of a function. However, all the functions that have been included in the parser are continuous. It would therefore be difficult to draw a discontinuous graph such as shown on the right.

An if function as found in Excel or LibreOffice.Calc with conditional execution makes this relatively easy.

The if function

Formally the function

if(condition, true_expression, false_expression)

returns the value of true_expression when condition is true and returns the value of false_expression if the condition is false. Here is the statement for the figure on the right: if(x < 25, 2*x, 20+2*x).

The condition, x < 25 returns true when the value of the variable x is less than 25 and false otherwise. The exact nature of such Boolean relationships will be examined later. For now, the usual convention is adopted. The logical value false is represented by the number 0 and any other number represents the logical value true.

The straightforward implementation of the if function could be as below.

function _if.Calc: float; begin if args[0] = 0.0 then result := args[2] else result := args[1]; end;

Then the function could be added to the function list,

InternalTargets.Add('if', _if.create(3))

Unfortunately, it is a mistake to proceed in that fashion because both the true and false expressions as well as the condition will be evaluated by TCallNode’s Eval method to fill the args array before _if.Calc is invoked. The problem is that if statements are often used to avoid calculating the value of a function when a variable is not in its domain. If drawing the graph of x·ln(x) over the interval −2 to 2, it would be better to draw if(x <= 0, 0, x*ln(x)).

There is no way to avoid the evaluating the three parameters of the if function when it is implemented as a normal internal function. A solution is to evaluate the if function as a special case in TCallNode.Eval method. First we remove the implementation _if, it is no longer needed. Then if is registered as a function without implementation

InternalTargets.Add('if', nil);

and then TCallNode.Eval is changed.

function TCallNode.Eval: float; var I: Integer; begin if FCount > 0 then begin if assigned(FTarget) then begin for I := 0 to FCount - 1 do FArgs^[I] := Nodes[I].Eval; TFunction(FTarget).ArgArray := FArgs; if FTarget is TVariableFunction then TVariableFunction(FTarget).Count := FCount; end else begin // target = nil : if function if Nodes[0].Eval = 0 then result := Nodes[2].eval else result := Nodes[1].eval; exit; end; end; result := FTarget.Calc; end;

There is a potential problem, should another function without implementation be added to the internal function list later. More egregious, the above method is not the most efficient way since at each evaluation of an internal function an unnecessary test is performed. As usual with a little more coding that test can be performed only once when parsing.

What is needed is a new node for the if function only with a simple constructor and an Eval method that calculates directly only one of the true and false expressions on the basis of the value of the condition.

Type TIfNode = class(TParentNode) public constructor Create(Nodes: PNodeList); function Eval: float; override; end; constructor TIfNode.Create(Nodes: PNodeList); begin FCount := 3; FNodes := Nodes; end; function TIfNode.Eval: float; begin if Nodes[0].Eval = 0.0 then result := Nodes[2].Eval else result := Nodes[1].Eval end;

The parser’s Call method must be modified to create a TIfNode instead of a TCallNode when appropriate.

function Call: TTree; ... begin ... if not ExternalTargets.Find(Tokenizer.Token, Target) then if not InternalTargets.Find(Tokenizer.Token, Target) then RaiseParseException(peUnknownIdentifier); if Target = nil then begin {$IFDEF DEBUG} Assert(uppercase(Tokenizer.Token) = uppercase(_('if')), 'should be "if"'); {$ENDIF} FoundCount := Params(Nodes); if FoundCount <> 3 then begin FreeNodes; if FoundCount < Target.Count then RaiseParseException(peTooFewParams) else {if FoundCount > count then} RaiseParseException(peTooManyParams) end; result := TIfNode.Create(Nodes); Tokenizer.NextTrueToken; // skip ')' exit; end; if (Target.Count = 0) and not (Target is TVariableFunction) then begin Nodes := nil; Tokenizer.NextTrueToken; ...

One side benefit of this approach is that a better test for the could be performed. Instead of an assertion, the test against the token could verify that an if function is invoked. Of course, there is the added verification that the function was called with 3 parameters

Avoiding Boolean Operations

As will become clear later on, it requires considerable work to add Boolean comparisons such as x < 0 to the parser. The grammar must be changed, and quite a few additional classes to represent a slew of new parse tree nodes must be created. All this work can be avoided by adding a few internal functions. Versions 0.92 and earlier of the application Gaston managed with the following set of functions:

lt(x, y) which returns 1 (true) if x < y, 0 (false) otherwise, le(x, y) which returns 1 (true) if x ≤ y, 0 (false) otherwise, eq(x, y) which returns 1 (true) if x = y, 0 (false) otherwise. ne(x, y) which returns 1 (true) if x ≠ y, 0 (false) otherwise.

The if statement to draw the broken line if(x<25, 2*x, 20+2*x) becomes if(lt(x,25),  2*x, 20+2*x). Other functions could be added, such as gt(x, y) and /code>ge(x, y) but these can be avoided since x > yyy < x and xyyx. By the same token one could do without ne since ne(x, y) = 1eq(x, y) = 0 so 1–eq(x, y) is the same as ne(x, y). That equivalence may not be obvious to all, which explains why ne was included in Gaston’s parser.

It may be necessary to combine Boolean values with the usual conjunction and disjunction operators ‘and’ and ‘or’. The set of all x less than 25 or greater than 50 is written x < 25 or x > 50 while the set of all x between 25 and 50 is written x > 25 and x < 50. Again, older versions of Gaston did not implement these operators but relied on their equivalence with the mathematical operators ‘*’ and ‘+’.

x < 25 or x > 50 is lt(x, 25) + lt(50, x) x > 25 and x < 50 is lt(25, x) * lt(x, 50).

Boolean Constructs

There are six infix operators that represent comparisons of numeric values. As with other infix operators, a class descendant of TBinaryNode is created for each of them. To avoid introducing type checking, the comparisons return a number of type float like all other calculations in the parser. Again 1 will represent a true statement and 0 a false one. As usual the actual evaluation is simple.

Operator Node Eval method
< TLessNode if Nodes[0].Eval < Nodes[1].eval
then result := 1.0 else result := 0.0
<= TLessOrEqualNode if Nodes[0].Eval <= Nodes[1].eval
then result := 1.0 else result := 0.0
> TGreaterNode if Nodes[0].Eval > Nodes[1].eval
then result := 1.0 else result := 0.0
>= TGreaterOrEqualNode if Nodes[0].Eval >= Nodes[1].eval
then result := 1.0 else result := 0.0
= (or ==) TEqualNode if Nodes[0].Eval = Nodes[1].eval
then result := 1.0 else result := 0.0
<> (or !=) TNotEqualNode if Nodes[0].Eval <> Nodes[1].eval
then result := 1.0 else result := 0.0

Of course, the tokenizer will have to be modified to recognize these operators. This will be simple, there is no new problem in making those changes. There remains the question of precedence. What is to be made of the expression 5 < 3 + 3? If the comparison is done first the result is 3 since 5 < 3 yields 0. If the addition is done first, the result is 1 because 5 is less than 6. In Pascal and many other languages, comparisons have the lowest precedence so that the correct answer will be deemed to be 1. This requires a change to the grammar as a new syntactic element is added.

The main productions of the grammar, as defined in part 3, were

Exponent = Factor { ("^" | "**") Exponent }. Term = Exponent { ("*" | "/" | ":" | "%" ) Exponent }. Expression = Term { ("-" | "+" | "|" ) Term }. Params = "(" Expression { ListSym, Expression } ")". Call = Identifier [Params]. Factor = Number | "(" Expression ")" | Call | ("-" | "+") Factor.

A new term is needed for the new element. However, it seemed better to rename the production Term { ("-" | "+") Term } to Sum in order to call the new element Expression which is defined as a Sum or a comparison of two Sums:

Expression = Sum [ ("<" | "<=" | "=" | "==" | ">" | ">=" | "<>" | "!=") Sum ];

The reader will notice that this definition is not similar to that of Term or Sum. One could certainly define something similar to the latter. Calling it Expression2 it would be:

Expression2 = Sum { ("<" | "<=" | "=" | "==" | ">" | ">=" | "<>" | "!=") Sum };

An Expression2 is therefore a Sum or a sequences of Sums separated by comparison operators. This definition was not adopted because it is easy to make a mess of such expressions. Just what is meant by −2 < 3 = 3? If evaluated left to right the result is 1 = 3 -> 0 and if evaluated right to left -2 &tl; 1 -> 1. Of course it could be argued that this is exactly the same problem as mentioned in part 2 of this series when Exponent was introduced. Still, Pascal does not allow such constructs, hence the definition adopted above is the one chosen here.

There are three more operators required to complete the handling of logical statements: logical negation (not), conjunction (and), disjunction (or). Exclusive disjunction (xor) was not added. While bitwise xor is useful, the logical version is actually the same as not equal.

Operator Node Eval method
! TNotNode if Nodes[0].Eval = 0 then result := 1.0
else result := 0.0
& (or &&) TAndNode if (Nodes[0].Eval = 0) or (Nodes[1].eval = 0)
then result := 0.0 else result := 1.0
| (or ||) TOrNode if (Nodes[0].Eval = 0) and (Nodes[1].eval = 0) then result := 0.0 else result := 1.0

As discussed in the previous section conjunction is akin to multiplication, it will be handled as a Term. Disjunction on the other hand is like addition, and handled in Sum. The logical negation is just like the unary "-". So here is the complete augmented grammar

DecSym = ? locale dependant character, usually "." in English, "," in French... ? ListSym = ? locale dependant character, usually "," in English, ";" in French... ? Digit = "0" | ... | "9". Letter = "a" | ... | "z" | "A" | ... | "Z". Number = Digit {Digit} [ DecSym [{Digit}] [ ("E" | "e") ["-" | "+"] Digit {Digit}] ]. Identifier = Letter { Letter | Digit }. Exponent = Factor { ("^" | "**") Exponent }. Term = Exponent { ("*" | "/" | ":" | "%" | "&" | "&&" ) Exponent }. Sum = Term { ("-" | "+" | "|" ) Term }. Expression = Sum, [ ("<" | "<=" | "=" | "==" | ">" | ">=" | "<>" | "!="), Sum ]. Params = "(" Expression { ListSym, Expression } ")". Call = Identifier [Params]. Factor = Number | "(" Expression ")" | Call | ("-" | "+" | "!") Factor. ParamList = "(", Identifer { ListSym, Identifer } ")"; Assignment = Identifier, [ParamList], ":=", Expression;

The following graph shows the hierarchy of nodes used to construct a parse tree. There are so many nodes descending from the class TBinaryNode, that they are shown piled one on top of the other in groups corresponding to the syntactic elements Exponent, Term, Sum and Expression left to right along the bottom and the arrow from TBinaryNode to each descendant is not drawn.

Of course the other six classes are used to represent a Factor.

In conclusion, it is tedious but not at all difficult to add logical and Boolean expression.

node 2

Source code

The source code for all units making up the parser is available along with a simple demonstration programme and test code. It is in the archive ParsingMathExpr-6.zip.

The demonstration program compiles with Delphi version 2 and up. It should also compile with Lazarus/FreePascal with minimal changes. As before, unit tests require DUnit.pas which works with newer versions of Delphi only. Unit testing was done with Delphi 2010.

Part 7 – Optimizing by Constant Folding-> <-Part 5 – Constructing a Parse Tree