CS 225 Data Structures and Programming Principles
mp_parse—The Shunting Yard and ASTs
Write your own “circular array” class that provides efficient insert/remove operations at both the beginning and end of the sequence,
Use that circular array class to implement your own stack and queue,
Use that stack and queue to implement Dijkstra’s Shunting Yard Algorithm for parsing infix expressions, and
Write a basic interactive interpreter for executing such expressions.
Getting Started
This will create a new folder in your working directory called “mp_parse”. You can see a list of files you should be aware of.
mp_parse.1
In your mp_parse’s include folder you will find the following files:
circ_array.h: the declaration of the circ_array template class
circ_array.tcc: the implementation file for the circ_array class
We have provided you with a skeleton for the circ_array class, but you will need to fill each function in with a correct body.
Notice: The Doxygen for circ_array contains details about each of the functions we require you to write for your circ_array class. You should ensure that you implement all of the required functions exactly as specified in the Doxygen. You are free to add any additional functions or variables (public and/or private) that you want.
Testing mp_parse.1
Compile your code using the following command:
make circ_array_test
The tests for your circ_array can then be run like so:
./circ_array_test
No output is good! Any output indicates that some assertion (think of it as an unrecoverable exception) failed. You should also run the tests under valgrind (or compile with ASAN) to check for memory errors!
There are also provided tests that you may run under monad. We highly recommend debugging with the provided code (e.g., not under monad) until you are confident that it is working—then use monad to double-check your work.
To run the provided tests for mp_parse.1, you should run the following command from the monad folder:
./monad mp_parse.1 --provided
Notice: The tests provided are deliberately insufficient! You should strongly consider writing your own tests to make sure your code is working.
mp_parse.2
In this part of the mp you will write several classes (but don’t worry, most are simple!):
stack, an implementation of the stack ADT,
queue, an implementation of the queue ADT, and
the node class hierarchy, which represents nodes in an abstract syntax tree (there are 11 node types)
Your stack Implementation
We have given you a skeleton for the class in stack.h and stack.tcc.
Notice: You should refer to the Doxygen for the stack class for details on the required functions of your stack class.
Again, you are free to add any additional functions/variables that you see fit, so long as you provide the required public functions.
You are required to use the circ_array<T> data member to store the elements of the stack.
Your queue Implementation
We have given you a skeleton for the class in queue.h and queue.tcc.
Notice: You should refer to the Doxygen for the queue class for details on the required functions for your queue class.
You are required to use the circ_array<T> data member to store the elements of the queue.
Testing Your stack and queue
Compile your code using the following command:
make stack_queue_test
The tests for your stack and queue can then be run like so:
./stack_queue_test
No output is good! Any output indicates that some assertion (think of it as an unrecoverable exception) failed. You should also run the tests under valgrind (or compile with ASAN) to check for memory errors!
The node Hierarchy
Next, you will be writing the nodes of an Abstract Syntax Tree for simple mathematic expressions. We have not provided you with any skeleton code, but you should be able to understand what classes belong in what files based on the Doxygen we have provided for the node class and its hierarchy.
The node classes in this hierarchy provide nodes for a tree. They all inherit from node, which provides an interface: each node in the tree will have a method called value() that returns a double indicating the value for the node. The notion of the value of a node is different depending on the type of the node, so this method should be pure virtual. Its destructor should be able to be = defaulted, so you should only need a node.h file in the include/ directory.
The terminal Node
Next, we suggest that you write the simplest of the derived node classes: the terminal. The terminal node represents a single number at the “leaves” of our tree. Thus, the “value” of a terminal node is simply the number it contains.
Since this is a concrete class (all pure-virtual methods have been implemented), you will need both a terminal.h in include/ and a terminal.cpp in src/.
Notice: You can refer to the Doxygen for the terminal class for more details.
Once you’ve implemented the terminal class, you should add a helper function called make_term that takes a single argument: the number to place in a new terminal node returned. This function should be declared in terminal.h and defined in terminal.cpp. It is a standalone function (not a member function).
Notice: You should refer to the Doxygen for make_term for more information. (This should be a very simple function.)
The unary_op Hierarchy
Next, you should move on to the simpler of the two operator base classes, the unary_op. This derived class of node represents an internal node in the parse tree that has a single child (hence, it is a “unary operator”). It defines its value as follows:
double unary_op::value() const
{
return compute(arg_->value());
}
Where compute() is itself a pure-virtual function that must be overridden in the derived classes of unary_op.
More information is available in the Doxygen for the unary_op node.
Notice: You should add the following files in this part (you won’t get credit for this part if you forget any of the following files):
include/unary_op.h
src/unary_op.cpp
include/uminus.h
src/uminus.cpp
include/uplus.h
src/uplus.cpp
The uminus node
This concrete class that inherits from unary_op should define compute() as returning the negation of its argument.
Since this class is so simple, and doesn’t require any additional member variable, we can leverage a C++11/14 feature to avoid having to write a redundant constructor. Instead, we can have our header file look something like this:
class uminus : public unary_op
{
public:
using unary_op::unary_op; // <-- "inheriting" constructor
// additional stuff
};
This using clause inside of the class instructs the compiler to “inherit” the constructors of our base class (which is not the default behavior).
Notice: You should refer to the Doxygen for the uminus class while writing your class.
The uplus node
This concrete class that inherits from unary_op should define compute() as simply returning the argument. This class is even more simple that uminus, and you can use the same trick as above to inherit the constructor from unary_op.
Notice: You should refer to the Doxygen for the uplus class while writing your class.
There is also a helper function make_unary_op, which takes two parameters: a string indicating the operator to make (# for plus and ~ for minus; the reason for the weird symbols is to distinguish against binary + and binary -) and the argument to place inside the newly made operator node.
You should declare this function in unary_op.h and implement it in unary_op.cpp.
Notice: You can refer to the Doxygen for make_unary_op for more information. (Don’t overthink this function: you should simply return a unique_ptr to the appropriate derived class based on the first argument.)
The binary_op Hierarchy
Next, you should move on to the more complicated of the two operator base classes, the binary_op. This derived class of node represents an internal node in the parse tree that has a two children (hence, it is a “binary operator”). It defines its value as follows:
double binary_op::value() const
{
return combine(left_->value(), right_->value());
}
Where combine() is a new pure-virtual function declared in binary_op that must be implemented by all deriving classes. This function, then, defines the value of a binary_op by what the derived class combine() does: so a node representing + should add its arguments, - should subtract them, etc.
There are five subclasses of binary_op. You should implement each subclass by defining what combine() does. You should be able to use the same trick as before for uminus and uplus to inherit the constructor of binary_op.
Notice: You should refer to the Doxygen for the binary_op class hierarchy to guide your implementation.
You will end up adding 12 files for this part, but each will be very short.
Notice: You should add the following files in this part (you will not get credit if you forget any of the following files):
include/binary_op.h
src/binary_op.cpp
include/divide.h
src/divide.cpp
include/exponent.h
src/exponent.cpp
include/minus.h
src/minus.cpp
include/plus.h
src/plus.cpp
include/times.h
src/times.cpp
Once you’ve added these nodes, you’ve finished the class hierarchy for the abstract syntax trees we will be using for math expressions!
Finally, you should add a helper function make_binary_op that takes three arguments: the type of operator to make (as a string: +, -, *, /, and ^) and the two children to add for that new operator.
Notice: You should refer to the Doxygen for the make_binary_op function for details. (Again, don’t overthink this function: we simply want to return a unique_ptr to the appropriate derived class based on the first argument.)
Testing Your AST Nodes
Compile your code using the following command:
make ast_test
The tests for your nodes can then be run like so:
./ast_test
No output is good! Any output indicates that some assertion (think of it as an unrecoverable exception) failed. You should also run the tests under valgrind (or compile with ASAN) to check for memory errors!
There are also provided tests that you may run under monad. We highly recommend debugging with the provided code (e.g., not under monad) until you are confident that it is working—then use monad to double-check your work.
To run the provided tests for mp_parse.2, you should run the following command from the monad folder:
./monad mp_parse.2 --provided
Notice: The tests provided are deliberately insufficient! You should strongly consider writing your own tests to make sure your code is working.
Warning: You are adding lots of files for the node hierarchy. Please remember the following while you do so:
Every header file needs to have unique double inclusion guards.
Every class in this MP has been placed inside the cs225 namespace.
Every new file that you add will need to be svn added before you commit.
Double check that your files made it to subversion when you commit by going to your subversion directory in your browser. (You may need to refresh before they show up.) If you forget to add any of the required files, you will get a “silly zero”, so please make sure you’ve added everything.
mp_parse.3
Finally, you will be writing an implementation of Dijkstra’s Shunting Yard Algorithm to parse “infix expressions”, which look like this:
1 + 2 * 5 / 6 ^ 7
These expressions are called “infix” since the operator (e.g., *) appears between the operands (e.g., 2 and 5). The output of your parser should be the root of an abstract syntax tree (AST) consisting of the nodes that you write in mp_parse.2.
Your parser will be defined and implemented in src/parser.cpp, as it is just an implementation detail of a provided parse() function.
Notice: You should refer to the Doxygen for the parser class for more details.
You will also find the Doxygen for the token class hierarchy helpful, as this is the input to your parser.
You should ensure that you throw a std::runtime_error in the following cases:
input has extra operators (e.g., 1 + 1 +)
input has extra numbers (e.g., 1 + 1 1)
input has unbalanced parentheses (e.g., (1 + 1)
Manual Testing
Provided with your code are two testing executables: lex_test and parse_test. Compile them with:
make
lex_test is already written for you and simply shows you what your input is to the parser. For example:
./lex_test
> 1 + 2 * 5 / 6 ^ 7
num(1) op(+) num(2) op(*) num(5) op(/) num(6) op(^) num(7)
>
(You can exit the program by typing a blank line.) The output here shows you what the contents of your queue is as input to the parser. It consists of elements of class token (refer to the Doxygen for class token for more information).
The parse_test executable gives you an interactive interpreter for math expressions. In other words, it’s a calculator! For example:
./parse_test
> 1 + 1
2
> 1 + 2 * 4 / 5 + 2
4.6
>
(You can exit the program by typing a blank line.)
Testing mp_parse.3
You should be able to use the parse_test executable to test lots of edge cases yourself, without having to write any code at all!
You can also use the provided monad tests by running the following from the monad folder: