CMPT 225 Assignment 5 – Hash Table
In this assignment you are to implement a hash table class to store a set of strings (i.e. not a template class!). Your hash table should use separate chaining to deal with collisions and should therefore be implemented as an array of linked lists. The assignment is worth 7% of your grade.
Please read the requirements carefully, paying particular attention to the names and input and output requirements of the class and its methods. We will be testing your class in our test program, and if you do not follow the requirements this test program may not compile, and your class may therefore not be marked. There is a test program in the Simple Test Function section that you can use to check that your class can be correctly compiled.
You must also test your class and my main function using g++ in Linux ad described in the Linux Test section.
Hash Table Class
Implement a hash table class called HashTable to store strings (don't forget to include the string library) that has the following public methods.
Your insert, search and remove functions must handle the situation where different strings map to the same hash value, i.e. collisionsnote1. This should be handled structurally by inserting (or finding) the value in the linked list at the given index.
Public Methods
§ default constructor – creates a hash table of default size: 101
§ constructor – creates a hash table to store n items where the value of n is given by the constructor's single integer parameter, the size of the underlying array should be a prime number, approximately 2n and at least equal to 2n
§ copy constructor – a constructor that creates a deep copy of its HashTable reference parameter
§ destructor - de-allocates any dynamic memory associated with the object
§ operator= – overloads the assignment operator for HashTable objects – (deep) copies its HashTable reference parameter into the calling object and returns a reference to the calling object after de-allocating any dynamic memory associated with the original contents of the calling object; if the calling object is the same as the parameter the operator should not copy it
§ insert – inserts its single string parameter into the hash table unless the string is already contained in the table; returns true if the insertion was successful, returns false otherwise
§ remove – removes its single string parameter from the hash table; returns true if the parameter value was found and removed, returns false if the parameter value was not found in the hash table
§ search – returns true if its single string parameter is contained in the hash table, returns false otherwise
§ size – returns the number of items stored in the hash table
§ maxSize – returns the size of the hash table’s underlying array
§ loadFactor – returns the load factor of the hash table
§ intersectionnote2 – returns a vector<string>note3 containing the intersection of the calling object's data and its single HashTable reference parameter's data
§ unions – returns a vector containing the union of the calling object's data and its single HashTable reference parameter's data
§ difference – returns a vector containing the set difference of the calling object's data and its single HashTable reference parameter's data
Private Methods and Attributes
I expect that you will need to implement one or more helper methods (which should be private); for example you will want to compute the hash value of a string.
Your class should also have the following private attributes.
§ LinkedList* to refer to an array of linked list objects
§ int to record the size of the underlying array
§ int to record the current number of items in the hash table
Other Requirements
§ Methods and parameters should be made const as appropriate
§ Your class should be implemented in separate .h and .cpp files
§ The set operations should be performed efficiently - in less than O(n2) time
Constructor and Hash Table Size
As noted above the constructor (int n) should create an underlying array of size roughly 2n, where the size is a prime number. This will entail writing a (private) Boolean method to check whether or not a number is prime. There is an obvious (O(n)) algorithm for checking whether or not a number is a prime number.
Hash Function
Insertions should be mapped to array indexes using this hash function: hash_value modulo array_size. To achieve this your functions must first generate a hash value from a given string. This hash value should be calculated by assigning the values 1 to 26 to the letters a to z (regardless of letter case) and concatenating the binary values for each letter. The formula given below results in this value:
ch0 * 32n-1 + ch1 * 32n-2 + ... + chn-2 * 321 + chn-1 * 320
Where ch0 is the left most character of the string and n is the size of the string. For example the string "cat" has this value:
3 * 322 + 1 * 32 + 20 = 3124
Note that using the result of this calculation on large strings will cause overflow. You should therefore use Horner's method to perform the calculation, and apply the mod operator after computing each expression in Horner's method. Horner's method was discussed in class.
To find the "value" of a lower case character at position i in a string str you can do this: int asc = str[i] - 96; //ASCII a = 97
For the purposes of this assignment you can assume that all strings will be lower case. If you want to test your table with strings that contain upper case characters it is your responsibility to deal with them.
Linked List Class
Your hash table should deal with collisions using separate chaining. In other words your hash table should consist of an array of linked lists. This means that you should also implement a linked list class (which I'm assuming is called LinkedList). The LinkedList's public methods are driven by the public methods of the HashTable class.
§ default constructor – creates an empty list
§ copy constructor – a constructor that creates a deep copy of its LinkedList reference parameter
§ destructor - de-allocates any dynamic memory associated with the object
§ operator= – overloads the assignment operator for LinkedList objects – (deep) copies its LinkedList reference parameter into the calling object and returns a reference to the calling object after de-allocating any dynamic memory associated with the original contents of the calling object; if the calling object is the same as the parameter the operator should not copy it
§ insert – inserts its single string parameter into the list unless the string is already contained in the list; returns true if the insertion was successful, returns false otherwise
§ remove – removes its single string parameter from the list; returns true if the parameter value was found and removed, returns false if the parameter value was not found in the list
§ search – returns true if its single string parameter is contained in the list, returns false otherwise
You will also want to be able to get at the entire contents of each list, so I wrote this method:
§ get - returns a vector containing the contents of the list
Other Requirements
§ Methods and parameters should be made const as appropriate
§ Your class should be implemented in separate .h and .cpp files
§ Your linked list must be made up of Nodes, therefore you also require a Node class (or Node struct). This Node class should be defined in the LinkedList.h file (you may define its constructor(s) in LinkedList.cpp if you wish).
Implementation Note 1: Collision Discussion
Dealing with collisions using separate chaining is straightforward since they are dealt with by the structure of the hash table (which is more complex than a hash table that implements open addressing since it consists of an array of linked lists).
It is important to determine which class is responsible for each part of the process of inserting, finding or removing values. As an example, when inserting a string into the hash table these steps should be performed.
1. Calculate which index the string maps to (by running the hash function on the string)
2. Determine whether or not the string has already been stored in the linked list at the given index
3. Either insert the string and return true or return false without inserting the string
It is the responsibility of the HashTable class to compute the index (step 1). Steps 2 and 3 require in-depth knowledge of the contents of a linked list and are therefore the responsibility of theLinkedList class and should both be handled by its insert method. The hash table's insert method can therefore call the insert method of the appropriate linked list to perform the insertion and determine the result.
Implementation Note 2: Set Operations
Your HashTable class has access to a hash table’s underlying array, and can therefore iterate through the contents of the array. Since each array element is a linked list this requires retrieving the contents from each such list in turn.
This suggests the following algorithm as the basis for intersection:
for each string s in the calling object
if s is in the parameter
store s in result
The methods for union and set difference can be implemented in a similar way.
Implementation Note 3: Vectors
A vector is a STL (Standard Template Library) template class that can be found in the <vector> header file. Below, you will find pretty much everything you need to know about vectors for the purposes of this assignment.
Declaring a Vector
Since vectors are a template class then when a vector is declared the type to be stored in it should be specified.
vector<string> v; //creates a new vector of strings called v
Accessing Vector Elements
Individual elements of a vector can be accessed using indexes like an array, or by using the vector's at method.
string s1 = v[2]; //assigns the third element of v to s1
string s2 = v.at(3); //assigns the fourth element of v to s2
Inserting Values in a Vector
There are numerous methods for inserting values in a vector. The simplest way is to insert a value at the end of the vector by calling its push_back method. Note that vectors increase size as necessary.
v.push_back("sif"); //inserts the string sif at the end of v
Warning: inserting a value into a given position in a vector using its insert method is less efficient than using push_back.
Finding the Size of a Vector
The number of items stored in a vector can be found using its size method.
for(int i=0; i < v.size(); ++i) { ... } // iterates through the contents of v
Copying a Vector into another Vector
The insert method can be used to insert the entire contents of a vector at the end of another vector. In the example given below the contents of v2 are inserted at the end of v1.
v1.insert(v1.end(), v2.begin(), v2.end());
Include Statements and Include Guards
Your hash table class is not a template class therefore you should not be including your .cpp files in your .h file as you did with assignments 2 and 4. Instead you can use one of the normal include methodologies for classes. My project include statements and include guards are as follows (with respect to the classes I wrote for the project).
§ LinkedList.h begins with the directive #pragma once (which prevents multiple inclusion)
§ LinkedList.cpp contains the directive #include "LinkedList.h"
§ HashTable.h begins with the directive #pragma once and contains the directive #include "LinkedList.h
§ HashTable.cpp contains the directive #include "HashTable.h"
§ The .cpp file that contains the main function contains the directive #include "HashTable.h" (it does not include LinkedList.h)
Simple Test Function
Before you submit your assignment you should compile and run it with a test file whose contents are set out below. The file should be a separate .cpp file from your HashTable and should contain only the code provided below*. If you are unable to compile and run this program it is very likely that we will not be able to compile and run your submission and will therefore not mark your assignment.
The test function is not in any way intended to be a comprehensive test of your class so successfully running the program given to you below does not mean that you will get full marks for the assignment. However, it will demonstrate that we should be able to compile and run your program, allowing us to test each of your methods in detail.
*I really mean this - you should not even be adding a single line to this program to make it compile!
#include <iostream>
#include <vector>
#include "HashTable.h"
using namespace std;
void simpleTest();
int main()
{
simpleTest();
return 0;
}
void simpleTest()
{
HashTable ht1(9);
ht1.insert("bat");
ht1.insert("cat");
ht1.insert("rhinoceros");
ht1.insert("ocelot");
ht1.insert("elephant");
ht1.insert("hippopotamus");
ht1.insert("giraffe");
ht1.insert("camel");
ht1.insert("lion");
ht1.insert("panther");
ht1.insert("bear");
ht1.insert("wolf");
// search
cout << "search" << endl;
string test1 = "frog";
string test2 = "camel";
cout << test1 << ": " << ht1.search(test1) << endl;
cout << test2 << ": " << ht1.search(test2) << endl;
// copy constructor and remove
HashTable ht2(ht1);
ht2.remove("ocelot");
ht2.remove("camel");
ht2.remove("rhinoceros");
// set difference
cout << endl << "set difference" << endl;
vector<string> difference = ht1.difference(ht2);
for(int i=0; i < difference.size(); ++i){
cout << difference[i] << endl;
}
}
Linux Test
The last thing you should do before submitting your assignment is to test that it can be compiled using g++ and run in Linux. The Linux OS is available in both the CS labs in Surrey, and is the default OS in lab 4080. These instructions assume that you are using a lab computer.
Double click on the Home icon on the left of the screen, and download your HashTable and Linked List header (.h) and implementation (.cpp) files, and a file named a5.cpp (that contains the test program given to you above) into the Documents folder. Also download (into the same folder) this make file (it's in a .zip file).
Open a terminal window (to do this click on the small icon in the top left corner and select Accessories ... Terminal Emulator) and enter the following instructions
cd Documents
This sets the current directory to the Documents folder
make
This compiles the program into an executable file called a5.
./a5
This runs the test program
Assessment
The assignment is out of 56. 15 marks are assigned to coding style related to comments, naming and class design – see the code quality page for more detail.
Submission
You should submit your assignment online to the CoursSys submission server. You must submit four files:
§ HashTable.h
§ HashTable.cpp
§ LinkedList.h
§ LinkedList.cpp
These files should not be enclosed in a .zip file. Please read the documentation on site for further information. The assignment is due by 11:59pm on Tuesday the 29th of July.
If you are unable to complete one of the methods, make sure that any calls to that method will still compile and run by writing a stub for that method that returns a default value of the appropriate type.
John Edgar (johnwill@sfu.ca)