A peristent data structure is one in which we can make changes (such as inserting and removing elements) while still preseving the original data structure Of course, an easy way to do this is to create a new copy of the entire data structure every time we perform an operation that changes the data structure, but it is much more efficient to only make a copy of the smallest possible portion of the data structure.
First, let's look at one of the easest possible data structures — a stack, implemented using a linked list.
Persistent Stack
Let's say that we have a stack S1, and we want to create a new stack S2, which is the result of pushing a new element X onto S1, without changing S1 at all. This is easy — we just add a new element that points to S1:
Starting with the stack S1:
We push X on S1, to get S2, leaving S1 as it was
What about popping? That works in the same way. Starting with S1, we can pop off the first element (A), to get S2:
What if we want to push something on to this new stack? It works in the same was as before – if we push an X onto S2 above to get S3, we have
Persistent Binary Search Trees
OK, so stacks are easy – what about something more complicated, like a binary search tree? Consider the following BST B1:
Let's say we want to insert a "C" into the tree B1, to get a new tree B2 – but we also want to preseve B1, and we don't want to make a complete copy of B1. What can we do?
We can make a copy of the path from the root to the inserted node, leaving the rest of the tree as it was!
We will follow a similar strategy for deleting from a BST – copying the path from the root to the deleted node, leaving everything else the same. So, to delete the "D" from B1 above, we would get:
Let's look at a more complicated case: Consider removing the "E" from the followng tree (Assuming that when we remove a node with 2 children, we replace the value at that node with the largest value in the left subtree):
We will need to copy the "D" into the "E" node, and then remove the "D" node (making the parent of "D" point to "D"s only child "C"). We just need to make a copy of all the nodes from the root of the original tree to the node that is actually removed from the tree, leaving everything else the same:
JSON
(JavaScript Object Notation) is a file format that is both human and machine readible, and is much less verbose than XML. It has become a fairly common standard for data exchange and data serialization. A JSON data member is one of:
Value, which is either a:
String (like "cat" or "box", the quatation marks are required)
Number (like 32 or 23.5 or 234.32e6)
true or false
null
Object, which is a list of key:value pairs, separated by commas, where key is a string and value is any JSON data member. So the following are all JSON objects
{"pet" : "dog", "days" : 3, "flag" : true }
{"firstName" : "John", "lastName" : "Smith", "age" : 22 }
List, which is a list of zero or more values separated by commas. So the following are all JSON lists
[3, 5, 6]
["cat", "dog"]
[3, "cat", 4.5, true]
{ "class" : "cs112", "students" : [{"firstName" : "Donald", "lastName" : "Knuth"}, {"firstName": "Richard", "lastName" : "Stallman"}, {"firstName" : "Douglas", "lastName" : "Crockford"}]}
[[1, 2],[3,4],[5,6]]
JSON objects can contain other objects and lists, and JSON lists can contain other lists and objects, so the following are also valid JSON data memebrs:
Encoding JSON in strings
How can a string contain the " character? Well, a string can contain any character (including " <-- " -->) but to put a " in a string literal, we need to escape it, using \, as follows:
To create a string representing the JSON object
{ "cat" : "dog" , "cart" : "horse"}
We would use the string literal:
String jsonString = "{ \"cat\" : \"dog\", \"cart\" : \"horse\"}"
The assignment
For Project 4, you will be creating both a Persistent Stack and a Persistent Binary Search Tree. Each data structure will contain a array of all past versions of the data structure, from the initial, empty data structure (at time 0), to the current version of the data structure.
Stack
For a stack, if we were to push "A", push "B", push "C", then pop (the "C"), then push a "D", then push an "E", our internal representation would be:
Stack Methods
PersistentStack() Constructor. Takes no parameters
void push(String elem) Push the string elem onto the current stack, creating a new current stack. If the array contating all of the old stacks is full, create a new one twice as large and copy all the data across.
string pop() Pops the top element from the current stack, creating a new current stack. If the array contating all of the old stacks is full, create a new one twice as large and copy all the data across
int currentTime() Returns the "current time", that is, how many opeations have been done on the stack. Each time a push or pop operation is done, the curentTime is incremented
int size(int time) Returns the size (number of elements in the stack) at a given time
int size() Returns the size (number of elements in the stack) at the current time. Equivalent to size(currentTime()).
String[] getAllElements(int time, boolean reversed) Returns an array of all of the elements at a given time. The length of the returned string array should be the same as the number of elements in the stack at that time. reversed == false, then first element in the return array is the top of the stack, if reversed == true, the last element in the return array is the top of the stack
Binary Search Tree
If you were to start with an empty BST, and insert the keys "C", "A", "D", "E", "B" in order, then remove the "C" and insert an "F", you would get the following tree:
NOTE: We are going to do something a bit more tricky with keys in the final assignment ... We will not be doing normal string comparisons. See below for more.
Binary Search Tree methods
PersistentBST(String key) Create a new BST, using key as the sorting key. More on what we mean by this later, under BST Keys
void insert(String elem) Insert an element into the BST, creating a new tree under a new time step.
delete(String elem) Deletes an element from the BST, creating a new tree under a new time step.
boolean find(String elem, int time) Returns true if elem is in the BST at time "time". This operation does not change any tree and does not change the current time.
boolean find(String elem) Returns true if the element is in the BST. As above, this operation does not change the tree and does not change the current time.
int currentTime() Returns the "current time", that is, how many opeations have been done on the BST. Each time an element is inserted or deleted, the curentTime is incremented.
int size(int time) Returns the size (number of elements in the BST) at a given time
int size() Returns the size (number of elements in the BST) at the current time. Equivalent to size(currentTime()).
String[] getAllElements(int time) Returns a sorted array of all elements in the BST at the given time. The length of the array should be equal to the size of the returned BST
String[] getAllElements() Returns a sorted array of all elements in the BST at the current time. Equivalent to getAllElements(currentTime())
BST Keys
We will be inserting Java Strings into our BST, but they will be string represntations of JSON objects. To make our life a little easier, each JSON object will only contain values that are strings (and not numbers, booleans, lists, or other objects). How can we compare these string? Each BST will be given a key string on construction, and that key will be used to order the elements. So, if we created a BST using the key "firstName", and then inserted a number of JSON elements (all of which have the firstName key), we would sort them by that first name. So, the following code:
PersistentBST t = new PersistentBST("firstName"); t.insert("{ \"firstName\": "Adam", \"lastName\":\"Smith\"}"); t.insert("{ \"firstName\": "John", \"lastName\":\"Adams\"}"); t.insert("{ \"firstName\": "George", \"lastName\":\"Washington\"}"); String [] values = t.getAllElements(); for (int i = 0; i < values.length; i++) System.out.println(values[i]);
Would have the output:
{ "firstName": "Adam", "lastName":"Smith" } { "firstName": "George", "lastName":"Washington"} { "firstName": "John", "lastName":"Adams"}
While the code:
PersistentBST t = new PersistentBST("lastName"); t.insert("{ \"firstName\": "Adam", \"lastName\":\"Smith\"}"); t.insert("{ \"firstName\": "John", \"lastName\":\"Adams\"}"); t.insert("{ \"firstName\": "George", \"lastName\":\"Washington\"}"); String [] values = t.getAllElements(); for (int i = 0; i < values.length; i++) System.out.println(values[i]);
Would have the output:
{ "firstName": "John", "lastName":"Adams"} { "firstName": "Adam", "lastName":"Smith" } { "firstName": "George", "lastName":"Washington"}
Note that all element comparisons are done through the key field. So, if you do a find in the BST, it will return true if there is an object with the given key, and a delete should delete any object that contains the given key, regardless of any other keys that may exist in the object.
Project Restructions
You are NOT ALLOWED to use any classes from the Java Collections framework. ONLY standard arrays and linked structures that you complete enitrely on your own.
NO ArrayList
NO LinkedList
NO HashMap
NO TreeMap
... etc
Program Submission
You need to submit an electronic version of your code. To submit electronically, submit the files PersistentStack.java and PeristentBST.java (plus any other helper classes that you may have created) the subversion repository:
https://www.cs.usfca.edu/svn/<username>/cs245/Project3
You may develop your code in any environment that you like, bu it needs to run under linux in the labs! While I recommend developing under linux, you may develop in Windows if you prefer, as long as your program runs under linux.
Collaboration
It is OK for you to discuss solutions to this program with your classmates. However, no collaboration should
involve looking at one of your classmate's source programs! It is usually extremely easy to determine that someone has copied a program, even when the individual doing the copying has changed identifier names and comments.