I. Hashing:
1. Hash Table Representation:
- Hash table is a data structure used for storing and retrieving data very quickly. Insertion of data in the hash table is based on the key value. Hence every entry in the hash table is associated with some key.
- Using the hash key the required piece of data can be searched in the hash table by few or more key comparisons. The searching time depends on the size of the hash table.
- The effective representation of dictionary can be done using hash table. We can place the dictionary entries in the hash table using hash function.
2. Hash Function:
- Hash function is a function which is used to put the data in the hash table. Hence one can use the same hash function to retrieve the data from the hash table. Thus hash function is used to implement the hash table.
- An integer value is returned by the hash function, called hash key.
Example:
Consider that we want place some employee records in the hash table The record of employee is placed with the help of key: employee ID. The employee ID is a 7 digit number for placing the record in the hash table. To place the record 7 digit number is converted into 3 digits by taking only last three digits of the key.
If the key is 496700 it can be stored at 0th position. The second key 8421002, the record of those key is placed at 2nd position in the array.
Hence the hash function will be:
H (key) = key%1000
Where key%1000 is a hash function and key obtained by hash function is called hash key.
- Bucket and Home bucket: The hash function H(key) is used to map several dictionary entries in the hash table. Each position of the hash table is called bucket.
The function H(key) is home bucket for the dictionary with pair whose value is key.
3. Types of Hashing Functions:
There are various types of hash functions that are used to place the record in the hash table:
i. Division or Modulus Method:
- The hash function depends upon the remainder of division.
0 |
|
1 |
|
2 | 2 |
3 |
|
4 | 4 |
5 |
|
6 |
|
7 | 7 |
8 |
|
9 | 9 |
- Typically the divisor is table length.
Example: If the record 54, 72, 89, 37 is placed in the hash table and if the table size is 10 then
h(key) = record % table size
4 = 54%10
2 = 72%10
9 = 89%10
7 = 37%10
ii. Mid Square:
In the mid square method, the key is squared and the middle or mid part of the result is used as the index.
If the key is a string, it has to be preprocessed to produce a number.
The formula for computing the hash key is:
h(key) = (key)2 =è and Result = middle digits of squared number or key
Example: Consider that if we want to place a record 3111 then:
(3111)2 = 9678321
For the hash table of size 1000
h(3111) = 783 (the middle 3 digits)
iii. Multiplicative hash function:
The given record is multiplied by some constant value. The formula for computing the hash key is-
h(key) = floor(p *(fractional part of key*A))
Where, p is an integer constant, and A is constant number.
Donald Knuth suggested to use constant A = 0.61803398987
Example: If key 107 and p = 50 then,
h(key) = floor(50*(107*0.61803398987))
= floor(3306.4818458045)
= 3306
At 3306 location in the hash table the record 107 will be placed.
iv. Digit Folding:
The key is divided into separate parts and using some simple operation these parts are combined to produce the hash key.
The formula for computing the hash key is:
h(key) = h(k1) +h(K2) +----
Where, h(k1), h(k2),… are separate parts used for computing key.
Example:
Consider a record 12365412 then it is divided into separate parts as 123, 654, 12 and these are added together
H(key) = 123+654+12
= 789
The record will be placed at location 789.
v. Digit Analysis:
The digit analysis is used in a situation when all the identifiers are known in advance. We first transform the identifiers into numbers using some radix, r. Then examine the digits of each identifier. Some digits having most skewed distributions are deleted. This deleting of digits is continued until the number of remaining digits is small enough to give an address in the range of the hash table. Then these digits are used to calculate the hash address.
4. Collisions In Hash Table:
The hash function is a function that returns the key value using which the record can be placed in the hash table. Thus this function helps us in placing the record in the hash table at appropriate position and due to this we can retrieve the record directly from that location. This function need to be designed very carefully and it should not return the same hash key address for two different records. This is an undesirable situation in hashing.
Definition: The situation in which the hash function returns the same hash key (home bucket) for more than one record is called collision and two same hash keys returned for different records is called synonym.
Similarly when there is no room for a new pair in the hash table then such a situation is called overflow. Sometimes when we handle collision it may lead to overflow conditions. Collision and overflow show the poor hash functions.
Index | Hash key |
0 |
|
1 | 131 |
2 |
|
3 | 43 |
4 | 44 |
5 |
|
6 | 36 |
7 | 57 |
8 | 78 |
9 | 19 |
For example,
Consider a hash function.
h(key) = recordkey%10 having the hash table size of 10
The record keys to be placed are:
131, 44, 43, 78, 19, 36, 57 and 77
Now if we try to place 77 in the hash table then we get the hash key to be 7 and at index 7 already the record key 57 is placed. This situation is called collision. From the index 7 if we look for next vacant position at subsequent indices 8,9 then we find that there is no room to place 77 in the hash table. This situation is called overflow.
5. Collision Resolution Techniques:
If collision occurs then it should be handled by applying some techniques. Such a technique is called collision handling technique.
There are two methods for detecting collisions and overflows in the hash table:
- Chaining
- Open addressing (linear probing)
The two more difficult collision handling techniques are:
- Quadratic probing
- Double hashing
II. CHAINING:
In collision handling method chaining is a concept which introduces an additional field with data i.e. chain. A separate chain table is maintained for colliding data. When collision occurs then a linked list(chain) is maintained at the home bucket.
Example:
Consider the keys to be placed in their home buckets are:
131, 3, 4, 21, 61, 7, 97, 8, 9
Then we will apply a hash function as H(key) = key % D
Where, D is the size of table. The hash table will be:
Here D = 10
0
1
2
3
4
5
6
7
8
9
A chain is maintained for colliding elements. For instance 131 have a home bucket (key) 1. Similarly key 21 and 61 demand for home bucket 1. Hence a chain is maintained at index 1.
III. OPEN ADDRESSING :
i. LINEAR PROBING:
- This is the easiest method of handling collision. When collision occurs i.e. when two records demand for the same home bucket in the hash table then collision can be solved by placing the second record linearly down whenever the empty bucket is found.
- When use linear probing (open addressing), the hash table is represented as a one-dimensional array with indices that range from 0 to the desired table size-1.
- Before inserting any elements into this table, we must initialize the table to represent the situation where all slots are empty.
- This allows us to detect overflows and collisions when we inset elements into the table. Then using some suitable hash function the element can be inserted into the hash table.
For example:
Consider that following keys are to be inserted in the hash table:
131, 4, 8, 7, 21, 5, 9, 29
Initially, we will put the following keys in the hash table.
We will use Division hash function. That means the keys are placed using the formula:
H(key) = key % table-size
H(key) = key % 10
For instance the element 131 can be placed at
H(key) = 131 % 10
= 1
Index 1 will be the home bucket for 131. Continuing in this fashion we will place 4, 8, 7.
Index | Hash key |
0 |
|
1 | 131 |
2 |
|
3 |
|
4 | 4 |
5 |
|
6 |
|
7 | 1 Like 0 Dislike Follow 1 Other Lessons for You DBMS 2Phase Lock in Distributed Database: In this protocol, it is required that all the data items must be reached in a mutually independent manner, i.e. when one transaction is performing, then no other transaction... Explain The Working Of JVM 1. when we execute the java file, JVM is loaded into memory.2. In JVM, first class loader starts which loads the class into memory, i.e. it divides the code intoRuntime Memory Area.3. Runtime memory consists... Introduction to Quantum Computing and Quantum Information - What is a qubit ? The classical two-state system can have two possible states either 1 or 0 whereas a qubit can be in a superposition between 0 and 1 Qubit is generically represented as liner superposition of basis states ... Difference between System Software and Application Software The software is the virtual component of the computer through which we are doing our work. The software is classified as System software and Application Software. System Software provides the platform... Pointers in C/C++ Many students have difficulty in understanding pointers. The best way to understand pointers is through memory representation. Whenever we declare a variable, the computer allocates some amount of memory... R Looking for BTech Tuition ?Learn from Best Tutors on UrbanPro. Are you a Tutor or Training Institute? Join UrbanPro Today to find students near you X Looking for BTech Tuition Classes?The best tutors for BTech Tuition Classes are on UrbanPro
Take BTech Tuition with the Best TutorsThe best Tutors for BTech Tuition Classes are on UrbanPro |