Foreword

In the previous section, we implemented the hashing algorithm and conflict resolution. We used two methods: open address method and chain address method. In this section, we analyze the source code in detail to see which method is used in the source code for conflicts and comparison. What we have achieved, what can be transformed.

Hashtable source code analysis

We analyze the operations behind the scenes by instantiating Hashtable in the console and adding key-value pair instance code, as follows:

 public  static  void main (String [] args) {

        Hashtable hashtable = new Hashtable ();
        hashtable.put ( -100, "first" );
 }

Next, let’s take a look at what preparations are done behind the scenes when we initialize Hashtable?

 

 

public  class Hashtable <K, V>
     extends Dictionary <K, V>
     implements Map <K, V> , Cloneable, java.io.Serializable {

    // store key-value pair data 
    private  transient Entry <?,?> [] Table;

    // Storage data size 
    private  transient  int count;

    // Threshold: (int) (capacity * loadFactor).) 
    Private  int threshold;

    // Load factor: Considering the tradeoff between time and space cost, the default is 0.75. Because higher values ​​will reduce the space overhead, but increase the time cost of finding elements 
    private  float loadFactor;

    // Specify the capacity and load factor constructor 
    public Hashtable ( int initialCapacity, float loadFactor) {
         if (initialCapacity <0 )
             throw  new IllegalArgumentException ("Illegal Capacity:" +
                                               initialCapacity);
        if (loadFactor <= 0 || Float.isNaN (loadFactor))
             throw  new IllegalArgumentException ("Illegal Load:" + loadFactor);

        if (initialCapacity == 0 )
            initialCapacity = 1 ;
            
        this .loadFactor = loadFactor;
        
        table = new Entry <?,?> [initialCapacity];
        
        // The default threshold is 8 
        threshold = ( int ) Math.min (initialCapacity * loadFactor, MAX_ARRAY_SIZE + 1 );
    }

   // Specify the capacity constructor 
    public Hashtable ( int initialCapacity) {
         this (initialCapacity, 0.75f );
    }

    // The default parameterless constructor (initial capacity is 11, load factor is 0.75f) 
    public Hashtable () {
         this (11, 0.75f );
    }
    
    
    private  static  class Entry <K, V> implements Map.Entry <K, V> {
         final  int hash;
         final K key;
        V value;
        Entry <K, V> next;

        protected Entry ( int hash, K key, V value, Entry <K, V> next) {
             this .hash = hash;
             this .key =   key;
             this .value = value;
             this .next = next;
        }
    }
}

 

 

Hashtable internally stores data through the Entry array. It can be seen from the Entry structure that hash conflicts are resolved using the chain address method. When the Hashtable is initialized without specifying a capacity and a load factor, the default initialization capacity is 11, the load factor is 0.75, and the threshold is 8. If the capacity is less than 0, an exception is thrown. If the capacity is equal to 0, the capacity is 1 and the threshold is 0. Otherwise, the threshold is calculated based on the specified capacity * 0.75 or the specified capacity * specified load factor.

Through the above source code and variable definitions, we can quickly reach the above conclusion. This does not require us to discuss too much. Next, let’s take a look at how we do it internally when we add the key-value pair data as above. ?

 

 

public  synchronized V put (K key, V value) {
         if (value == null ) {
             throw  new NullPointerException ();
        }

        Entry <?,?> Tab [] = table;

        int hash = key.hashCode ();
       
        int index = (hash & 0x7FFFFFFF)% tab.length;
        
        Entry <K, V> entry = (Entry <K, V> ) tab [index];
        
        for (; entry! = null ; entry = entry.next) {
             if ((entry.hash == hash) && entry.key.equals (key)) {
                V old = entry.value;
                entry.value = value;
                 return old;
            }
        }

        addEntry (hash, key, value, index);
        
        return  null ;
    }    

 

 

Let’s analyze step by step. First, if the added value is empty, an exception is thrown, and then the hash value of the added key is obtained. The point is here. What is the role of the following code snippet?

 int index = (hash & 0x7FFFFFFF)% tab.length;

Because the array index cannot be negative, the hash value of the key is converted to a positive value by logical AND operation, which is to ensure that the index is positive, so  int index = (hash & 0x7FFFFFFF)% tab.  How is length calculated? The binary of 0x7FFFFFFF is 1111111111111111111111111111111111, because it is a positive number, the sign is 0, ie 01111111111111111111111111111111111, and for the value we add -100, the binary is 11111111111111111111111111110011100, the two are converted to binary for logical addition operation, the final result is 01111111111111111111111111110011100, converted to decimal The result is 2147483548, which is the principle calculation method we explained. In fact, we can subtract by decimal. The decimal of the above 0x7FFFFFFF is 2147483647. At this time, we directly subtract (100-1), which is 99, and finally get It is also 2147483548. Finally, the modulo result of initial capacity 11 is indexed as 1. If the hash value of the key is positive, then this problem does not exist, that is, the hash value obtained by the logical AND operation is the original value. Next, get the position of the corresponding index in the array, and then perform a loop. The question arises: Why loop the array? This is the following code snippet:

 

 

       for (; entry! = null ; entry = entry.next) {
             if ((entry.hash == hash) && entry.key.equals (key)) {
                V old = entry.value;
                entry.value = value;
                 return old;
            }
        }

 

 

The above is to solve the same key value and overwrite the corresponding value, or can’t understand? We add another line of code to the console:

 

 

public  static  void main (String [] args) {

        Hashtable hashtable = new Hashtable ();
        
        hashtable.put ( -100, "first" );

        hashtable.put ( -100, "second" );
}

 

 

The keys we added above are -100. Through our analysis of the above loop source code, the value first in the above line is replaced with second. In other words, when we add the same key, the value override of the latter will occur at this time. For the former value, we can also learn by returning. If the return value is empty, it means that there is no overwriting, otherwise there is a return value, which means that the same key exists and the value is overwritten . We can get by printing out the data in the Hashtable as follows. This is different from the C # operation Hashtable. If the same key exists, an exception is thrown directly.

 

 

        Enumeration keys = hashtable.keys ();

        while (keys.hasMoreElements ()) {

            Object key =   keys.nextElement ();

            String values = (String) hashtable.get (key);
            System.out.println (key + "------>" + values);
        }

 

 

Before we finish, let’s continue to analyze the following code to add key-value pairs to the array:

 

 

private  void addEntry ( int hash, K key, V value, int index) {
        modCount ++ ;
        
        // Define the stored data variable 
        Entry <?,?> Tab [] = table;
        
        // Expand the array 
        if the elements in the array exceed or equal the threshold if (count> = threshold) {
            rehash ();

            tab = table;
            hash = key.hashCode ();
            index = (hash & 0x7FFFFFFF)% tab.length;
        }

        // Add key-value pairs and hash values ​​to the storage array 
        Entry <K, V> e = (Entry <K, V> ) tab [index];
        tab [index] = new Entry <> (hash, key, value, e);
        count ++ ;
}

 

 

When adding data to the stored array, it is necessary to determine whether the threshold has been exceeded. After all, it is to expand the hash table. Next, let’s see what the specific implementation is.

 

 

protected  void rehash () {

        // Get the current capacity of the storage array 
        int oldCapacity = table.length;
        Entry <?,?> [] OldMap = table;

        // New capacity = current capacity * 2 + 1 
        int newCapacity = (oldCapacity << 1) + 1 ;
        
        // Determine whether the new capacity exceeds the maximum array size. If it exceeds, then the maximum capacity is the defined maximum array size 
        if (newCapacity-MAX_ARRAY_SIZE> 0 ) {
             if (oldCapacity == MAX_ARRAY_SIZE) 
                return ;
            newCapacity = MAX_ARRAY_SIZE;
        } 
        Entry <?,?> [] NewMap = new Entry <?,?> [NewCapacity];

        modCount ++ ;
        
        // Recalculate the threshold 
        threshold = ( int ) Math.min (newCapacity * loadFactor, MAX_ARRAY_SIZE + 1 );
        
        // The expanded storage array 
        table = newMap;

        // cyclically update the currently stored array data into the expanded storage array 
        for ( int i = oldCapacity; i--> 0 ;) {
             for (Entry <K, V> old = (Entry <K, V>) oldMap [i]; old! = null ;) {
                Entry <K, V> e = old;
                old = old.next;

                int index = (e.hash & 0x7FFFFFFF)% newCapacity;
                e.next = (Entry <K, V> ) newMap [index];
                newMap [index] = e;
            }
        }
}

 

 

As explained above, it is very clear. Next, we add the following code to the console:

 

 

public  static  void main (String [] args) {

        Hashtable hashtable = new Hashtable ();

        hashtable.put ( -100, "first" );

        hashtable.put ( -100, "second" );

       hashtable.put ( "Aa", "third");
        hashtable.put ("BB", "fourth" );

        Enumeration keys = hashtable.keys ();

        while (keys.hasMoreElements ()) {

            Object key =   keys.nextElement ();

            String values = (String) hashtable.get (key);
            System.out.println (key + "------>" + values);
        }
}

 

 

When we add the above two lines of code, at this moment we think about what the result data will be printed? as follows:

Java_entry_series_[11]_Hashtable_source_code_analysis_0.png

Alas, there seems to be a problem. The key we added above is Aa. Shouldn’t it be Aa that is printed first? How is the key BB? Not only does it make us wonder, the main reason is that the hash value calculated for key Aa and key BB is the same. If you do n’t believe it, we can print out that the corresponding hash value for both is 2112, as follows:

System.out.println ("Aa" .hashCode ());
 System.out.println ( "BB" .hashCode ());

Next, let’s take a look at how it is finally stored in the array. We excerpt the above code snippet as follows:

  Entry <K, V> e = (Entry <K, V> ) tab [index];
  tab [index] = new Entry <> (hash, key, value, e);

The problem lies in this place. In the previous section, we explained the hashing algorithm to resolve the conflict using the chain address method. We are adding the same hash value calculated by the key to the end of the single-linked list. This is exactly the opposite. Here we take What is added is placed in the head of the singly linked list, and the added is placed in the next reference. Because the above first fetches the index corresponding to the added key Aa, and then re-instantiates the storage of the data of the key BB, its next (next) points to Aa, so the above print result is obtained. Here we need Pay attention. So why do you do this? Compared with our implementation in the previous section, the data structure definition is different. In the previous section, we used the loop traversal method, but the method of assigning the next reference in the constructor in the source code. Of course, the source code method is the best performance because Go for loop traversal. Alright, let’s take a look at the delete method next, we continue to add the following code in the console:

hashtable.remove ("Aa");

We also take a look at how deletion works in the source code. The source code is as follows:

 

 

public  synchronized V remove (Object key) {

    // Define the storage array variable 
    Entry <?,?> Tab [] = table;
    
    // Calculate key hash value 
    int hash = key.hashCode ();
    
    // Get key index 
    int index = (hash & 0x7FFFFFFF)% tab.length;
    
    // Get the key index storage data 
    Entry <K, V> e = (Entry <K, V> ) tab [index];
    
    for (Entry <K, V> prev = null ; e! = null ; prev = e, e = e.next) {
        
        // If the deleted data is at the head of the singly linked list, enter this statement, otherwise continue to the next loop 
        if ((e.hash == hash) && e.key.equals (key)) {
            
            modCount ++ ;
            
            // If the deleted data is not at the head of the singly linked list, enter the statement 
            if (prev! = Null ) {
                prev.next = e.next;
            } else {
                 // If you delete data at the head of the storage array index, enter the statement 
                tab [index] = e.next;
            }
            
            // array size minus 1 
            count-- ;
            
            // Return the deleted value 
            V oldValue = e.value;
            
            // To delete the value, set it to empty 
            e.value = null ;
            
            return oldValue;
        }
    }
    return  null ;
}

 

 

Through the above analysis of the delete operation, at this time we delete the key Aa. At this time, the head key of the single linked list is BB, so the next cycle will be performed. Finally, the second if statement is entered. If the key BB is deleted, because at this time, The head of the single linked list exists, so prev is empty, and the else statement is entered for element deletion. This concludes the analysis of the Hashtable source code. As for other things, such as obtaining the corresponding value of the key or whether the key is contained in the storage array, it is relatively simple, and will not be explained here.

Summary

In this section, we analyze the Hashtable source code in detail. Hashtable uses the chain address method to resolve hash conflicts. At the same time, when a conflict occurs, the conflicting data is stored in the head of the singly linked list, and the existing data is used as the next reference to the header. Hashtable is not allowed to insert any Empty keys and values. The method is modified by the keyword synchronized to know that Hashtable is thread-safe. At the same time, the default initialization capacity is 11, load factor is 0.75f, and the load factor is set to 0.75f. The reason is that if collisions or collisions occur very frequently It will slow down the operation of using elements, because it is not enough to just know the index at this time. At this time, it is necessary to traverse the linked list to find the stored elements. Therefore, it is very important to reduce the number of collisions. The larger the array, the smaller the chance of collision and the load. The factor determines the balance between array size and performance, which means that when 75% of the buckets become empty, the array size will expand. This operation is performed by the rehash () method . In the next section, we will further study the calculation principles of hashCode, equals, and hashCode, and then analyze the HashMap source code. Thank you for reading. See you in the next section.

 

Java entry series [1] string creation method

Java entry series [2] string features

Java entry series [3] stringbuilder string buffer

Java entry series [4] packaging class

Java entry series [5] inheritance abstract classes interfaceson

Java entry series [6] Principles of dynamic array

Java entry series [7] collection arraylist

Java entry series [8] principles of doubley linked list alogrithm

Java entry series [9] linkedlist source code analysis

Java entry series [10] Hash algorithm principle

Java entry series [11] Hashtable source code analysis

Java entry series [12] Hashcode and equals

Java entry series [13] Principles of the read-black tree algorithmn method

Java entry series [14] Hashmap source code ananlysis

Orignal link:https://www.cnblogs.com/CreateMyself/p/11525112.html