Hash Table Project

Introduction

Hashing is an efficient way of storing and retrieving values or objects when the various key values are not consecutive, and there are less objects than the maximum key. If a hashing technique was not utilised then the amount of elements needed to store these values in an array, would be the same as the maximum key. With hashing, the modular function is used to obtain a number as a result of a particular key.

However it is perfectly possible that collisions will occur when a value is 'hashed' this is a problem which affects the performance of the program. As the number of probes needed to find a key in an array increases, the lower the efficiency and hence the program will take longer to execute. The way around this is to make the size of the array sufficiently large so as to limit the number of collisions which will occur. (NOTE; the size of the array in most cases corresponds to the mod value).

The ratio between the number of elements needing to be stored, and the size of the hash table (or array) is called the 'load factor'. The closer the result of this ratio comes to 1, the higher the probability of collisions (i.e. the nearer the size of the hash table (or array) comes to the number of elements to be stored). The prewritten Java class, 'hashtable' uses a default load factor of 0.75 to determine the size of the hash table.

This coursework initially asks us to implement our own version of the Java hashtable class with a simple collision handling technique, to store an array of nine given integers. We are given the mod value (and hence the size of the hash table for these integers). This value is 11, to be implemented as;

Hash(key) = key MOD 11

The integers to be stored in this array are as follows; 82, 31, 28, 4, 45, 27, 59, 79, 35.

We have also been asked to compare the performance of this kind of hashing technique on the integers above, to that of a binary search. This is to be done by looking at the average number of probes needed for both, to find if a given key is in the table and to determine that it is not. Also a theoretical approach is asked of to compare the two search methods on the given integers if the array is increased in size.

Informal Algorithum Specification

Int modValue
Int position
set up a normal array hashTable 

Stores an array of numbers into a hash table (constructor)
takes an array of integers 
get length of that array
use that length and the load factor of 0.75 and work out length of hashTable array i.e.
  0.75 * length of original array  
rounds this value up to the next whole integer
stores this value in modValue 
for (every element in original array)
        position = key MOD(modValue)
        is position in hashTable array empty
        while (position is full)
                position = position + 1
        if position is greater than modValue 
                position = 0
        end if
            is position in hashTable array empty
        if (position is empty)
            break
                end if
        end while loop
        store element in position in hashTable array
end for loop

Search key in array (observer)
boolean found set to false;
int checkPosition;

do
checkPosition = key MOD(modValue)
if (checkPosition is empty)
    break
end if
if (the value in checkPosition is the same as the key)
    set found to true
    break
else
    checkPosition = checkPosition + 1
if (checkPosition >= modValue)
        set checkPosition to 0
end if
while (the found boolean, found is false);
    if (found = true)
        return false;
    else 
            return true;
    end if
end while

Search key in array (observer)
for (every element in hash table) 
    if there is an empty cell 
        use X as the element to be printed
    else 
        print elements in a row
        end if
end for

Java class hash that implements storage

class Hash
{
    int modValue;
    int position;
    int[] hashTable ;
    
    // constructor to store elements in a hash table, takes an array of integers
    public Hash(int[] arrayThatWasSentToObject)
    {
        //gets the length of the original array and finds out length of new array
        modValue = (int)((arrayThatWasSentToObject.length / 0.75)-1); // -1 has been 
           //put here for testing purposes as it was 
        // specified that we use Mod11
        // System.out.println("modValue is "+modValue); - DEBUGGING CODE
        // sets the length of the hash table to this value
        hashTable = new int[modValue];
        
        for (int i = 0; i < arrayThatWasSentToObject.length; i++)
        {
                int value = arrayThatWasSentToObject[i];
                position = value % modValue;
                
                // Runs through hash table to find an empty space for the collisions
            while (hashTable[position] != 0)
            {
                    position = position + 1;
                    // wraps around to the beginning 
                if (position >= modValue)
                {
                    position = 0;
                }
                // System.out.println("Position is " + position); - DEBUGGING CODE
                if (hashTable[position] == 0)
                {
                     break;
                }
            }
            // Store value in empty position
            hashTable[position] = value;
        }
        
/*      System.out.println("Debugging information - here is the hashTable"); - DEBUGGING CODE
        for (int j = 0; j < hashTable.length; j++) 
        {
            if (hashTable[j] == 0) 
            {
                System.out.println("" + j + ": x");
            } else 
            {
                System.out.println("" + j + ": " + hashTable[j]);
            } 
        } */
    }
    
    // Observer which searches array for a particular element
    public boolean search (int key)
    {
        boolean found = false;
        int checkPosition;
        
        // uses a do loop and a boolean flag called 'found' and breaks when has found the 
        //element or an empty space, 
        // setting the found flag true and false respectively
        do
        {
            // initial probe
            checkPosition = key % modValue;
            if (hashTable[checkPosition] == 0)
            {
                break;
            }
            if (hashTable[checkPosition] == key)
            {
                found = true;
                break;
            }
            else
            {
                checkPosition = checkPosition + 1;
                // wraps round
                if (checkPosition >= modValue)
                {
                    checkPosition = 0;
                }                
                // could loop indefinitely if not found and table is full - will never be 
                //full because of the 0.75 ratio
            }
        } while (!found);
        
        if (!found)
        {
            return false;
        }
        else 
        {
            return true;
        }
    }


// Observer to print hash table
    public void print() 
    {
        for (int j = 0; j < hashTable.length; j++) 
        {
            // prints an 'X' as the element if a cell is null
            if ((hashTable[j]) == 0) 
            {
                System.out.print("X, ");
            } 
            else 
            {
                System.out.print(hashTable[j] + ", "); 
            };
        }
    }
}

Test program for the 3 methods
class Test
{
    public static void main(String[] args) 
    {
        int[] array = {82, 31, 28, 4, 45, 27, 59, 79, 35};
        
        //test for implementation of hash storage
        Hash hash = new Hash(array);
        
        //test for search of an item
        if (hash.search(31)) 
        {
            System.out.println("the selected number 31 has been found in the array");
        } 
        else 
        {
            System.out.println("the number 31 has not been found");
        }
        
        if (hash.search(32)) 
        {
            System.out.println("the selected number 32 has been found in the array");
            
        } 
        else 
        {
            System.out.println("the number 32 has not been found");
        }

        //test for printing array
        hash.print();
    }
}

How the hash table will be executed

A) Performing the function MOD11 on the values 82, 31, 28, 4, 45, 27, 59, 79, 35 and then putting the values against there relevant positions in the hash table, results in 3 collisions in positions 2, 4 and 5.

0       1       2       3       4       5       6       7       8       9       10
        45      79              4       82      28                      31      
                                                                                
                35              59      27                                      

B) The first collision (which because of the order of the numbers given) is that of 27 and 82 in the position 5. The position 6 is checked, however this too is full, another probe is done on position 7 this is found to be empty and '27' is moved there

0       1       2       3       4       5       6       7       8       9       10
        45      79              4       82      28      27              31      
                                                                                
                35              59      27                                      

C) The value 59 will be moved to element 8 after places 5, 6, and 7 have been checked.

0       1       2       3       4       5       6       7       8       9       10
        45      79              4       82      28      27      59      31      
                                                                                
                35              59                                              

D) The last collision, that of 79 and 35 in place 2, is resolved by moving the element '35' to the next free position which is 3.

0       1       2       3       4       5       6       7       8       9       10
        45      79      35      4       82      28      27      59      31      
                                                                                
                35                                                              

E) The final hash table looks as follows, with the only free spaces now at 0 and 10.

0       1       2       3       4       5       6       7       8       9       10
        45      79      35      4       82      28      27      59      31      

Average probe needed to find an item in the array

The average number of probes needed to find if an element is in the array is;

(Total number of probes) / (Number of elements or values in the array)

The table below shows the values, their resulting placement when the mod11 functioned is performed, their final position in the hash table after collision handling, and the number of probes needed to find that key in the hash table.

Key        Key MOD 11   Final position  Probes to find key
 82            5                  5                      1
 31            9                  9                      1
 28            6                  6                      1
 4             4                  4                      1
 45            1                  1                      1
 27            5                  7                      3
 59                4              8                      5
 79            2                  2                      1
 35                2              3                      2

Here we can see that the maximum probes needed is 5 (value '59') hence the efficiency order (worst case scenario) here is O(5), however the hash search method was designed as O(1), though unless perfectly hashed, the order of a hashing method is O(n). though even with collisions O(n) is very close to O(1).

Probes needed is 16, so for these 9 elements the average number of probes needed is:

16 / 9 = 1.78 (2dp)

Average probes needed to find that a given item is not in the array

To find that a value is not in the array the hash table is probed at the expected position (mod of the number given, in this case mod 11) then if it is not found immediately, a linear search is performed until the value is found, or an empty space is reached (where the search terminates). There will always be an empty space if a suitable load ratio is used.

The table below shows the initial probe (for any given key, Mod 11 will evaluate to one of these values) and the amount of probes inclusive of the initial probe, needed to reach an empty space in the array (assumes the given key is not located in the table).

Initial probe           Probes till stopped 
      0                          1
      1                          10
      2                          9
      3                          8
      4                          7
      5                          6
      6                          5
      7                          4
      8                          3
      9                          2
     10                          1

The empty cells are 0 and 10 so the average number of probes needed is;

(Probes needed) / (Length of hash table) = 56 / 11 = 5.09 (2dp)

Comparison of the hashing method with a Binary Search

A binary search only works on ordered data, so for instance the integers given for this coursework would look as follows;

0       1       2       3       4       5       6       7       8
4       27      28      31      35      45      59      79      82

The search is performed by probing the midpoint of the array and comparing this value with the one you are trying to locate. The arrays are then split into two sub-arrays and the appropriate one is discarded.

For example, if we were trying to locate the number 45 in the above array. We would start by probing the midpoint value and comparing. So the mid value here is 35, which is less than 45 and since we know the array is sorted in ascending order it is possible to discard the first array. Now left with the array;

4       5       6       7       8
35      45      59      79      82

We apply the same method and see that 59 is bigger than 45 so the second array is now discarded. This results in;

4       5       6
35      45      59

In this case the 3rd probe locates the number 45.

The amount of probes needed to a specific key in both the hashing and the binary search methods, are illustrated below

Key      Hashing          Binary search 
82          1               4
31          1                   3
28          1               2
4           1               4
45          1               3
27          3               3
59          5               2
79          1               3
35          2               1

Average 1.78 probes 2.78 probes

The average number probes needed for the binary search to find that a value is not in the table, relates closely to the O-notation (worst case scenario) and maximum probes needed. The maximum probes for the above values is 4. Hence;

Average 5.08 probes 4 probes

In a comparison of a small amount of elements the hashing method appears to be worse. However, if we use O-notation, the worst case scenario for a binary search is of order O(log_n) whereas for hashing, although being of Order O(n) it is very close too O(1) even with collisions.

As the load factor ratio decreases in the hash table (the array increases in size when containing the same amount of elements) collisions are less likely and the order comes closer to O(1) and hence to maximum efficiency. However if the array increases and the load factor ratio remains the same then the results will be constant

On the other hand with a binary search the efficiency would go down the larger the array, as more probes would be needed to find whether the element is or isn't in the array.

Critical review

If I had more time I would like to have implemented a re-hashing system which would create an hash table with perfect hashing and hence of O(1).

I would also have liked to incorporate the console reader class and possibly an applet to create a workable interface which would allow easier interaction with the program.

More observer methods to query the hash table for various information would also have been beneficial

References

Java, How to program - Deitel and Deitel
Java in a Nutshell - David Flanagan

Back