Hash Table Project
Introduction
Hashing is an efficient way of storing and retrieving values or objects when the various key values are not consecutive, and there are less objects than the maximum key. If a hashing technique was not utilised then the amount of elements needed to store these values in an array, would be the same as the maximum key. With hashing, the modular function is used to obtain a number as a result of a particular key.
However it is perfectly possible that collisions will occur when a value is 'hashed' this is a problem which affects the performance of the program. As the number of probes needed to find a key in an array increases, the lower the efficiency and hence the program will take longer to execute. The way around this is to make the size of the array sufficiently large so as to limit the number of collisions which will occur. (NOTE; the size of the array in most cases corresponds to the mod value).
The ratio between the number of elements needing to be stored, and the size of the hash table (or array) is called the 'load factor'. The closer the result of this ratio comes to 1, the higher the probability of collisions (i.e. the nearer the size of the hash table (or array) comes to the number of elements to be stored). The prewritten Java class, 'hashtable' uses a default load factor of 0.75 to determine the size of the hash table.
This coursework initially asks us to implement our own version of the Java hashtable class with a simple collision handling technique, to store an array of nine given integers. We are given the mod value (and hence the size of the hash table for these integers). This value is 11, to be implemented as;
Hash(key) = key MOD 11
The integers to be stored in this array are as follows; 82, 31, 28, 4, 45, 27, 59, 79, 35.
We have also been asked to compare the performance of this kind of hashing technique on the integers above, to that of a binary search. This is to be done by looking at the average number of probes needed for both, to find if a given key is in the table and to determine that it is not. Also a theoretical approach is asked of to compare the two search methods on the given integers if the array is increased in size.
Informal Algorithum Specification
Int modValue Int position set up a normal array hashTable Stores an array of numbers into a hash table (constructor) takes an array of integers get length of that array use that length and the load factor of 0.75 and work out length of hashTable array i.e. 0.75 * length of original array rounds this value up to the next whole integer stores this value in modValue for (every element in original array) position = key MOD(modValue) is position in hashTable array empty while (position is full) position = position + 1 if position is greater than modValue position = 0 end if is position in hashTable array empty if (position is empty) break end if end while loop store element in position in hashTable array end for loop Search key in array (observer) boolean found set to false; int checkPosition; do checkPosition = key MOD(modValue) if (checkPosition is empty) break end if if (the value in checkPosition is the same as the key) set found to true break else checkPosition = checkPosition + 1 if (checkPosition >= modValue) set checkPosition to 0 end if while (the found boolean, found is false); if (found = true) return false; else return true; end if end while Search key in array (observer) for (every element in hash table) if there is an empty cell use X as the element to be printed else print elements in a row end if end for
Java class hash that implements storage
class Hash { int modValue; int position; int[] hashTable ; // constructor to store elements in a hash table, takes an array of integers public Hash(int[] arrayThatWasSentToObject) { //gets the length of the original array and finds out length of new array modValue = (int)((arrayThatWasSentToObject.length / 0.75)-1); // -1 has been //put here for testing purposes as it was // specified that we use Mod11 // System.out.println("modValue is "+modValue); - DEBUGGING CODE // sets the length of the hash table to this value hashTable = new int[modValue]; for (int i = 0; i < arrayThatWasSentToObject.length; i++) { int value = arrayThatWasSentToObject[i]; position = value % modValue; // Runs through hash table to find an empty space for the collisions while (hashTable[position] != 0) { position = position + 1; // wraps around to the beginning if (position >= modValue) { position = 0; } // System.out.println("Position is " + position); - DEBUGGING CODE if (hashTable[position] == 0) { break; } } // Store value in empty position hashTable[position] = value; } /* System.out.println("Debugging information - here is the hashTable"); - DEBUGGING CODE for (int j = 0; j < hashTable.length; j++) { if (hashTable[j] == 0) { System.out.println("" + j + ": x"); } else { System.out.println("" + j + ": " + hashTable[j]); } } */ } // Observer which searches array for a particular element public boolean search (int key) { boolean found = false; int checkPosition; // uses a do loop and a boolean flag called 'found' and breaks when has found the //element or an empty space, // setting the found flag true and false respectively do { // initial probe checkPosition = key % modValue; if (hashTable[checkPosition] == 0) { break; } if (hashTable[checkPosition] == key) { found = true; break; } else { checkPosition = checkPosition + 1; // wraps round if (checkPosition >= modValue) { checkPosition = 0; } // could loop indefinitely if not found and table is full - will never be //full because of the 0.75 ratio } } while (!found); if (!found) { return false; } else { return true; } } // Observer to print hash table public void print() { for (int j = 0; j < hashTable.length; j++) { // prints an 'X' as the element if a cell is null if ((hashTable[j]) == 0) { System.out.print("X, "); } else { System.out.print(hashTable[j] + ", "); }; } } } Test program for the 3 methods class Test { public static void main(String[] args) { int[] array = {82, 31, 28, 4, 45, 27, 59, 79, 35}; //test for implementation of hash storage Hash hash = new Hash(array); //test for search of an item if (hash.search(31)) { System.out.println("the selected number 31 has been found in the array"); } else { System.out.println("the number 31 has not been found"); } if (hash.search(32)) { System.out.println("the selected number 32 has been found in the array"); } else { System.out.println("the number 32 has not been found"); } //test for printing array hash.print(); } }
How the hash table will be executed
A) Performing the function MOD11 on the values 82, 31, 28, 4, 45, 27, 59, 79, 35 and then putting the values against there relevant positions in the hash table, results in 3 collisions in positions 2, 4 and 5.
0 1 2 3 4 5 6 7 8 9 10 45 79 4 82 28 31 35 59 27
B) The first collision (which because of the order of the numbers given) is that of 27 and 82 in the position 5. The position 6 is checked, however this too is full, another probe is done on position 7 this is found to be empty and '27' is moved there
0 1 2 3 4 5 6 7 8 9 10 45 79 4 82 28 27 31 35 59 27
C) The value 59 will be moved to element 8 after places 5, 6, and 7 have been checked.
0 1 2 3 4 5 6 7 8 9 10 45 79 4 82 28 27 59 31 35 59
D) The last collision, that of 79 and 35 in place 2, is resolved by moving the element '35' to the next free position which is 3.
0 1 2 3 4 5 6 7 8 9 10 45 79 35 4 82 28 27 59 31 35
E) The final hash table looks as follows, with the only free spaces now at 0 and 10.
0 1 2 3 4 5 6 7 8 9 10 45 79 35 4 82 28 27 59 31
Average probe needed to find an item in the array
The average number of probes needed to find if an element is in the array is;
(Total number of probes) / (Number of elements or values in the array)
The table below shows the values, their resulting placement when the mod11 functioned is performed, their final position in the hash table after collision handling, and the number of probes needed to find that key in the hash table.
Key Key MOD 11 Final position Probes to find key 82 5 5 1 31 9 9 1 28 6 6 1 4 4 4 1 45 1 1 1 27 5 7 3 59 4 8 5 79 2 2 1 35 2 3 2
Here we can see that the maximum probes needed is 5 (value '59') hence the efficiency order (worst case scenario) here is O(5), however the hash search method was designed as O(1), though unless perfectly hashed, the order of a hashing method is O(n). though even with collisions O(n) is very close to O(1).
Probes needed is 16, so for these 9 elements the average number of probes needed is:
16 / 9 = 1.78 (2dp)
Average probes needed to find that a given item is not in the array
To find that a value is not in the array the hash table is probed at the expected position (mod of the number given, in this case mod 11) then if it is not found immediately, a linear search is performed until the value is found, or an empty space is reached (where the search terminates). There will always be an empty space if a suitable load ratio is used.
The table below shows the initial probe (for any given key, Mod 11 will evaluate to one of these values) and the amount of probes inclusive of the initial probe, needed to reach an empty space in the array (assumes the given key is not located in the table).
Initial probe Probes till stopped 0 1 1 10 2 9 3 8 4 7 5 6 6 5 7 4 8 3 9 2 10 1
The empty cells are 0 and 10 so the average number of probes needed is;
(Probes needed) / (Length of hash table) = 56 / 11 = 5.09 (2dp)
Comparison of the hashing method with a Binary Search
A binary search only works on ordered data, so for instance the integers given for this coursework would look as follows;
0 1 2 3 4 5 6 7 8 4 27 28 31 35 45 59 79 82
The search is performed by probing the midpoint of the array and comparing this value with the one you are trying to locate. The arrays are then split into two sub-arrays and the appropriate one is discarded.
For example, if we were trying to locate the number 45 in the above array. We would start by probing the midpoint value and comparing. So the mid value here is 35, which is less than 45 and since we know the array is sorted in ascending order it is possible to discard the first array. Now left with the array;
4 5 6 7 8 35 45 59 79 82
We apply the same method and see that 59 is bigger than 45 so the second array is now discarded. This results in;
4 5 6 35 45 59
In this case the 3rd probe locates the number 45.
The amount of probes needed to a specific key in both the hashing and the binary search methods, are illustrated below
Key Hashing Binary search 82 1 4 31 1 3 28 1 2 4 1 4 45 1 3 27 3 3 59 5 2 79 1 3 35 2 1
Average 1.78 probes 2.78 probes
The average number probes needed for the binary search to find that a value is not in the table, relates closely to the O-notation (worst case scenario) and maximum probes needed. The maximum probes for the above values is 4. Hence;
Average 5.08 probes 4 probes
In a comparison of a small amount of elements the hashing method appears to be worse. However, if we use O-notation, the worst case scenario for a binary search is of order O(log_n) whereas for hashing, although being of Order O(n) it is very close too O(1) even with collisions.
As the load factor ratio decreases in the hash table (the array increases in size when containing the same amount of elements) collisions are less likely and the order comes closer to O(1) and hence to maximum efficiency. However if the array increases and the load factor ratio remains the same then the results will be constant
On the other hand with a binary search the efficiency would go down the larger the array, as more probes would be needed to find whether the element is or isn't in the array.
Critical review
If I had more time I would like to have implemented a re-hashing system which would create an hash table with perfect hashing and hence of O(1).
I would also have liked to incorporate the console reader class and possibly an applet to create a workable interface which would allow easier interaction with the program.
More observer methods to query the hash table for various information would also have been beneficial
References
Java, How to program - Deitel and Deitel
Java in a Nutshell - David Flanagan