Foreword

At the time of enrollment, the school establishes an archive information for each of our children’s shoes, of course, each archive information corresponds to an archive number, and for example, in the school library, the library has a unique book number for each book, then the problem Here comes, when we need to quickly find the corresponding file information through the file number or quickly find the corresponding book through the secretary number, what kind of data structure can we pass at this time? In the previous sections, we explained ArrayList and LinkedList in detail. We know that the bottom layer of ArrayList is a one-dimensional array, but we do not know the index in the array in advance. At this time, the corresponding file number or book number needs to be traversed. Definitely not O (1), even if we know the index, but if the index key is large, it is no longer suitable as the index of the array at this time. If you query through the LinkedList doubly linked list, it is definitely not O (1) through our analysis. The time complexity obtained by using the hash algorithm is constant time O (1). We used to call it hash, which is actually called hashing. Hashing is a technique used to uniquely identify a specific object from a group of similar objects.

Hash

In hashing, large keys are converted to small keys by using a hash function, and these values ​​are then stored in a data structure called a hash table. The basic idea of ​​hashing is to uniformly allocate entries (key / value pairs) in the array, assign a key (transition key) to each element, and through key search, we can access the corresponding element in O (1) time. Use a hash function to calculate an index that suggests where the element can be found or inserted. Hashing is performed in two steps: the element is converted to an integer by using a hash function. This element can be used as an index to store the original element, which belongs to a hash table. The element is stored in a hash table, which can be quickly retrieved using a hash key.

hash = hashfunc (key) index = hash% array_size

The most important of the above is to obtain the hash value of the key through the hash function, and then demodulate the obtained hash value to the array size and store it in the index address in the hash table. Until in this method, the hash has nothing to do with the size of the array, and then it is reduced to an index (a number between 0 and array_size-1) by using the modulo operator (%). To achieve a good hashing mechanism, a good hash function with the following basic requirements is important:

Easy to calculate: It should be easy to calculate and it cannot be the algorithm itself (not for the sake of algorithm).

Uniform distribution: It should provide uniform distribution in the hash table and should not cause aggregation.

Fewer conflicts: Try to avoid conflicts when different elements are mapped to the same hash value.

Note: No matter how good the hash function is, conflicts are inevitable. Therefore, in order to maintain the performance of the hash table, it is important to manage conflicts through various conflict resolution techniques. To store objects using a hash function: Create an array of size M. Choose a hash function h, which is a mapping from objects to integers 0,1, …, M-1. These objects are put into an array of indices calculated by the hash function index = h (object). This array is called a hash table . So how do we choose a hash function? One way to create a hash function is to use Java’s hashCode () method. The hashCode () method is implemented in the Object class, so every class in Java inherits it. The hash code provides a numeric representation of the object. Let’s take a look at the following code example:

 

 

        String obj1 = String.valueOf (4 );
        String obj2 = String.valueOf (16 );
        String obj3 = String.valueOf (68 );
        String obj4 = String.valueOf (125 );
        String obj5 = String.valueOf (255 );

        System.out.println (obj1.hashCode () % 5 );
        System.out.println (obj2.hashCode () % 5 );
        System.out.println (obj3.hashCode () % 5 );
        System.out.println (obj4.hashCode () % 5 );
        System.out.println (obj5.hashCode () % 5);

 

 

As the size of the hash array is 5, the hash function we created is the hashcode method provided to us in Java. The number printed in the figure above is the index storage address in the hash table. At this time we find that the storage address of obj4 and obj1 in the hash table is the same, which is also what we call a conflict. There are four ways to resolve conflicts in hashing: (1) open addressing or linear detection or closed hashing, (2) rehashing, (3) chain addressing, and (4) establishing a common overflow area . Here I will show you two common types, linear detection or open addressing and chain addressing. First, let ’s take a look at open addressing.

Open addressing

In open addressing, all entry records are stored in the array itself, not in the linked list. When we insert a new element or entry, the hash index of the hash value is first calculated, and then the array is checked (starting from the hash index). If the hash index address is not occupied, the entry record is inserted into the address at the hash index, otherwise it will continue in a certain probe sequence until an unoccupied address is found. The probe sequence is the sequence that is followed when traversing the entries. There may be different intervals between successive inlet slots or probes in different detection sequences. When searching for entries, the array will be scanned in the same order until the target element is found or an unused address is found, which indicates that there is no such key in the hash table. The name “open addressing” means that the element address Determined by its hash value. Linear detection refers to a fixed interval (usually 1) between successive detections. Assume that the hash index of a particular entry is an index. The detection sequence for linear detection will be:

index = index% hashTableSize index = (index + 1)% hashTableSize index = (index + 2)% hashTableSize index = (index + 3)% hashTableSize

…….

The meaning above indicates that when the hash value of the specified key has been occupied, the hash value is incremented at an interval of 1, and this increment is performed until an unoccupied index storage address is found.

 

 

public  class HashTable {

    // Array capacity 
    private  int capacity;

    // Hash key-value pair array 
    private Entry [] entries = ();

    public HashTable ( int capacity) {
         this .capacity = capacity;
        entries = new Entry [ this .capacity];
    }

    // Add key-value pairs 
    public  void put (String key, String value) {
         final Entry hashEntry = new Entry (key, value);
         int hash = getHash (key);
        entries [hash] = hashEntry;
    }

    // Get key hash value 
    private  int getHash (String key) {
         int hashCode = key.hashCode ();
         int hash = hashCode% capacity;
         while (entries [hash]! = Null ) {
            hashCode + = 1 ;
            hash = hashCode% capacity;
        }
        return hash;
    }

    // Get the specified key value 
    public String get (String key) {
         int hashCode = key.hashCode ();
         int hash = hashCode% capacity;
         if (entries [hash]! = Null ) {
             while (! Entries [hash] .key .equals (key))
            {
                hashCode + = 1 ;
                hash = hashCode% capacity;
            }
            return entries [hash] .value;
        }
        return  null ;
    }

    private  class Entry {
        String key;
        String value;

        public Entry (String key, String value) {
             this .key = key;
             this .value = value;
        }
    }
}

 

 

We add the above test data to our custom hash table class in the console, and then query the value of the corresponding key, as follows:

 

 

public  class Main {

    public  static  void main (String [] args) {

        HashTable table = new HashTable (5 );

        table.put (String.valueOf ( 4), String.valueOf (4 ));
        table.put (String.valueOf ( 16), String.valueOf (16 ));
        table.put (String.valueOf ( 68), String.valueOf (68 ));
        table.put (String.valueOf ( 125), String.valueOf (125 ));
        table.put (String.valueOf ( 255), String.valueOf (255 ));

        System.out.println (table.get (String.valueOf ( 4 )));
        System.out.println (table.get (String.valueOf ( 125 )));
    }
}

 

 

Chain address method

We use a single linked list to implement the chain address method. The chain address method is one of the most commonly used conflict resolution techniques. In a single linked list, each element of the hash table is a linked list. To store elements in the hash table, It must be inserted into a specific linked list. If there are any conflicts (i.e. two different elements with the same hash value), the two elements are stored in the same linked list, the cost of the lookup is to scan the entries of the selected linked list to obtain the required key, if the key The distribution of is sufficiently uniform that the average cost of a lookup depends only on the average number of keys per linked list. For the chain address method, the worst case is that all entries are inserted into the same linked list. The search process may have to scan all its entries, so the worst-case cost is proportional to the number of entries (N) in the table. The chain address method is stored in the hash table as shown below:

In the example we gave at the beginning, if the hash values ​​of keys 16 and 125 are the same, we will store them in the same linked list. Next, we use the sample code to implement the chain address method, as follows:

 

 

public  class HashTable {

    // Array capacity 
    private  int capacity;

    // Hash key-value pair array 
    private Entry [] entries = ();

    public HashTable ( int capacity) {
         this .capacity = capacity;
        entries = new Entry [ this .capacity];
    }

    // Add key value pair 
    public  void put (String key, String value) {
         // Get key hash value 
        int hash = getHash (key);

        // The instantiation class stores the key and value 
        final Entry hashEntry = new Entry (key, value);

        // If there are no conflicting keys in the array, then directly store 
        if (entries [hash] == null ) {
            entries [hash] = hashEntry;
        }

        // If a conflicting hash value is found, it will be stored in the next reference in the single linked list 
        else {
            Entry temp = entries [hash];
             while (temp.next! = Null ) {
                temp = temp.next;
            }
            temp.next = hashEntry;
        }
    }


    // Get key hash value 
    private  int getHash (String key) {
         return key.hashCode ()% capacity;
    }

    // Get the specified key value 
    public String get (String key) {

        int hash = getHash (key);

        if (entries [hash]! = null ) {

            Entry temp = entries [hash];

            while (! temp.key.equals (key)
                     && temp.next! = null ) {
                temp = temp.next;
            }
            return temp.value;
        }

        return  null ;
    }

    private  class Entry {
        String key;
        String value;
        Entry next;

        public Entry (String key, String value) {
             this .key = key;
             this .value = value;
             this .next = null ;
        }
    }
}

 

 

The console code is the same as the demo open addressing method and the printed results are consistent, so it will not be given here. So far we have implemented hash algorithms and the most commonly used conflict resolution techniques in hash algorithms: open addressing and chain addressing.

Summary

In this section, we will, as always, first understand the implementation of the corresponding concept of the algorithm to pave the way for our next section to analyze Hashtable in detail. Well, we have stopped this section. Thank you for reading. See you in the next section.

 

Java entry series [1] string creation method

Java entry series [2] string features

Java entry series [3] stringbuilder string buffer

Java entry series [4] packaging class

Java entry series [5] inheritance abstract classes interfaceson

Java entry series [6] Principles of dynamic array

Java entry series [7] collection arraylist

Java entry series [8] principles of doubley linked list alogrithm

Java entry series [9] linkedlist source code analysis

Java entry series [10] Hash algorithm principle

Java entry series [11] Hashtable source code analysis

Java entry series [12] Hashcode and equals

Java entry series [13] Principles of the read-black tree algorithmn method

Java entry series [14] Hashmap source code ananlysis

Orignal link:https://www.cnblogs.com/CreateMyself/p/11509691.html