Hash Chaining
Hashing Defined
Let S be a sequence of key values {ki} and let HT be a hash table implemented as an array of pointers to records.
ki may be of datatype integer, float, string, etc.
HT[i] points to a record containing the following fields
1. a field named keyvalue that may contain a key value or null,
2. a field named infoptr that is a pointer to other information associated with the key value and
3. a third field named next that points to the next record in a linked list of such records. (The reason for the 3rd field and the linked lists will become apparent later.)
HT is initialized to all null pointers.
Let h be a hash function so that h(kj) = i where i is a valid index for array HT.
Obviously h must convert from ki's data type to an integer within HT's array bounds.
A good hash function scrambles (or hashes) the input keys so that the output indexes are scattered over the full range of HT's array bounds in a, seemingly, random manner. Of course the transformation from key value to index is not random. It must be deterministic so that, later, the input key may be retrieved from the array, HT, using the same transformation (hash function, h).
A key value, ki, and a pointer, pi, to its associated information normally may be efficiently stored in and retrieved from HT using the hash function, h, by storing them in HT[h(ki)].keyvalue and HT[h(ki)].infoptr respectively.
Handling Collisions Using Chaining
Let hash be a hashing function that maps (transforms) key values into array indices.
The flaw in using pure hashing is that two different keys (say, k1 and k2) may be transformed to the same index value of HT. In this case we say that the two keys hash to the same index value.
I.e., if hash(k1) = hash(k2) we are said to have a collision.
Since the two keys cannot be stored in precisely the same location we must employ some method of resolving this conflict. One method is called chaining and resolves the conflict by adding the second key, k2 (and its data pointer), to a linked list whose head is at the same location as the first key. That location is, of course, HT[hash(k1)] = HT[hash(k2)].
Retrieving key values and associated information
In order to determine if a key value, say, k1, has been store in HT or to retrieve the information associated with a key value that has been stored in HT one must compute the hash location, hash(k1), and search the linked list at that location for the record whose keyvalue is k1.
Computing Hash Function Efficiency
If a linked list contains M items then 1 lookup is required to find the 1st item, 2 lookups are required to find the 2nd item, ..., M lookups are required to find the Nth item. Thus, the number of lookups required to find all N items is 1 + 2 + ... + M = M*(M+1)/2.
One may determine the efficiency of a hash function by simulating the placement of a randomly generated list of N items into the hash table. To perform this simulation, instead of using a hash table of pointers one may use a hash table of integers initialized to all zeros. And, instead of placing records into the hash table one may simply increment the value of the location where the record was to be stored to indicate the number of records that should be stored at that location.
I.e., Execute HT[hash(ki)]++ instead of placing key ki into HT.
After all keys have been process then the average number of lookups per item may be computed by first computing the total number of lookups required to retrieve all keys.
total number of lookups = ∑(HT[i]×(HT[i]+1)/2) for 0 <= i < N)
And then by dividing that total by the number of data items, N.
Top of this page
Home page 
