Foreword

We know that the red-black tree was introduced to the HashMap in Java 8 to improve the operation performance. Since we have analyzed the red-black tree principle graphically in the previous section, we will invest more energy into the analysis principle instead The algorithm itself, HashMap uses more frequent key-value pair data types in Java, so it is very necessary for us to analyze the specific implementation principles behind it. Whether it is C # or Java principle analysis, we never plan to explain the code line by line. The most important thing is the design idea. There may be a few more words in important places.

HashMap Principle Analysis

Let’s go from shallow to deep, step by step. First, let’s understand a few attributes defined in HashMap. Later we will explain why we need to define this value. Does it depend on the head?

 

 

public  class HashMap <K, V> extends AbstractMap <K, V>
     implements Map <K, V> , Cloneable, Serializable {

    // default initial capacity 
    static  final  int DEFAULT_INITIAL_CAPACITY = 1 << 4 ;

    // Maximum capacity 
    static  final  int MAXIMUM_CAPACITY = 1 << 30 ;

    // default load factor 
    static  final  float DEFAULT_LOAD_FACTOR = 0.75f ;

    // Threshold value of linked list to red-black tree 
    static  final  int TREEIFY_THRESHOLD = 8 ;

    // Cancel threshold 
    static  final  int UNTREEIFY_THRESHOLD = 6 ;
    
    // Minimum tree capacity 
    static  final  int MIN_TREEIFY_CAPACITY = 64 ;

}

 

 

Constructor analysis

    public HashMap () {
         this .loadFactor = DEFAULT_LOAD_FACTOR;
    }

When instantiating the HashMap, we do not specify any parameters. At this time, the load factor is defined as 0.75f.

public HashMap ( int initialCapacity) {
         this (initialCapacity, DEFAULT_LOAD_FACTOR);
}

When instantiating a HashMap, we can also specify the initial capacity, and the default load factor is still 0.75f.

 

 

   public HashMap ( int initialCapacity, float loadFactor) {
         if (initialCapacity <0 )
             throw  new IllegalArgumentException ("Illegal initial capacity:" +
                                               initialCapacity);
        if (initialCapacity> MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
         if (loadFactor <= 0 || Float.isNaN (loadFactor))
             throw  new IllegalArgumentException ("Illegal load factor:" +
                                               loadFactor);
        this .loadFactor = loadFactor;
         this .threshold = tableSizeFor (initialCapacity);
    }

 

 

When instantiating the HashMap, we can specify both the default initialization capacity and the load factor. Obviously, the initialization capacity cannot be less than 0, otherwise an exception is thrown. If the initialization capacity exceeds the defined maximum capacity, the defined maximum capacity is assigned and initialized. The capacity cannot be less than or equal to 0 for the load factor, otherwise an exception is thrown. Next, set the threshold based on the provided initial capacity. Let’s look at the implementation of the tableSizeFor method above.

 

 

static  final  int tableSizeFor ( int cap) {
         int n = cap-1 ;
        n | = n >>> 1 ;
        n | = n >>> 2 ;
        n | = n >>> 4 ;
        n | = n >>> 8 ;
        n | = n >>> 16 ;
         return (n <0)? 1: (n> = MAXIMUM_CAPACITY)? MAXIMUM_CAPACITY: n + 1 ;
}

 

 

What does this method do? The power of threshold = 2 is greater than the minimum value of the initialization capacity.  So far, we have learned about modulo operation [%], bitwise left shift [<<], bitwise right shift [>>]. Here we will learn bitwise OR operation [|], unsigned bitwise Move right [>>>]. A bitwise OR operation has a binary value of 1, and the result is 1, otherwise it is 0, and an unsigned bitwise right shift is just the high order without positive or negative. Don’t see the above [n | = n >>> 1], in fact, it is [n = n | n >>> 1], which is the same as the four arithmetic operations that we normally perform, just logical operations and bit operations Only the symbols are different. We use the following example to illustrate the above conclusions, assuming that the initialization capacity is 5, then we perform the above operation.

 

 

       0000 0000 0000 0000 0000 0000 0000 0101 cap = 5
       0000 0000 0000 0000 0000 0000 0000 0100 n = cap-1
           0000 0000 0000 0000 0000 0000 0000 0010 n >>> 1
            0000 0000 0000 0000 0000 0000 0000 0110 n | = n >>> 1
            0000 0000 0000 0000 0000 0000 0000 0001 n >>> 2
            0000 0000 0000 0000 0000 0000 0111 n | = n >>> 2
            0000 0000 0000 0000 0000 0000 0000 0000 n >>> 4
            0000 0000 0000 0000 0000 0000 0000 0111 n | = n >>> 4
          0000 0000 0000 0000 0000 0000 0000 0000 n >>> 8
            0000 0000 0000 0000 0000 0000 0000 0111 n | = n >>> 8
            0000 0000 0000 0000 0000 0000 0000 0000 n >>> 16
            0000 0000 0000 0000 0000 0000 0111 n | = n >>> 16

 

 

The final calculation result is 7 as above, and then 1 is subtracted from the initial calculation, so the minimum power of 2 for the initial capacity is 5, which is the threshold value of 8. If the initial capacity is 8, the threshold value is also 8 . Next comes our focus on insert operations.

Analysis of Insertion Principle

public V put (K key, V value) {
         return putVal (hash (key), key, value, false , true );
}

The above insert operation is a short line of code, it is just that the putVal method is called, but we notice that the hash value of the key is calculated first, and we look at the implementation of this method.

static  final  int hash (Object key) {
         int h;
         return (key == null )? 0: (h = key.hashCode ()) ^ (h >>> 16 );
}

The direct understanding of the method is that if the incoming key is empty, the hash value is 0, otherwise the local hashCode method of the key is directly called to obtain the hash value, and then it is shifted 16 bits to the right, and finally the bitwise difference OR (1 as long as different results). It seems that I still don’t understand. Let’s put it aside for a while. Let’s continue to see the specific implementation of the insert method.

 

 

 final V putVal ( int hash, K key, V value, boolean onlyIfAbsent,
                    boolean evict) {
        Node <K, V> [] tab; Node <K, V> p; int n, i;
        
        // Step [1]: Tab is empty and expanded 
        if ((tab = table) == null || (n = tab.length) == 0 )
            n = (tab = resize ()). length;
            
        // Step [2]: Calculate index and handle null      
        if ((p = tab [i = (n-1) & hash]) == null )
            tab [i] = newNode (hash, key, value, null );
         else {
            Node <K, V> e; K k;
            
            // Step [3]: The key exists, directly overwrite the value     
            if (p.hash == hash && 
                ((k = p.key) == key || (key! = Null && key.equals (k))))
                e = p;
                
            // Step [4]: ​​If it is a red-black tree     
            else  if (p instanceof TreeNode)
                e = ((TreeNode <K, V>) p) .putTreeVal ( this , tab, hash, key, value);
             else {
                
                // Step [5]: if it is a linked list 
                for ( int binCount = 0;; ++ binCount) {
                     if ((e = p.next) == null ) {
                        p.next = newNode (hash, key, value, null );
                        
                        // If the length of the linked list is greater than 8, it will be converted to a red-black tree for processing 
                        if (binCount> = TREEIFY_THRESHOLD-1 )
                            treeifyBin (tab, hash);
                        break ;
                    }
                    if (e.hash == hash && 
                        ((k = e.key) == key || (key! = null && key.equals (k))))
                         break ;
                    p = e;
                }
            }
            if (e! = null ) {
                V oldValue = e.value;
                 if (! OnlyIfAbsent || oldValue == null )
                    e.value = value;
                afterNodeAccess (e);
                return oldValue;
            }
        }
        ++ modCount;
        
        // Step [6]: Expand the capacity 
        if it exceeds the maximum capacity if (++ size> threshold)
            resize ();
        afterNodeInsertion (evict);
        return  null ;
    }

 

 

We first look at step [2], we will look at step [1] implementation later, we first extract the above key retrieval logic

if ((p = tab [i = (n-1) & hash]) == null )
            tab [i] = newNode (hash, key, value, null );

In the above, the hash value of the key is calculated and bitwise ANDed with the length of the array. The hash algorithm directly determines whether the storage of the key is evenly distributed, otherwise conflicts or collisions will occur, which will seriously affect performance, so the above [  (n-1) & hash  ] is the key to the collision. Isn’t it ok to directly call the key’s local hashCode method to get the hash value, it is definitely not possible, let’s look at an example. Suppose we get the hash values ​​of several keys by calling the local hashCode method, 31, 63, 95, and the default initialization capacity is 16. Then call (n-1 & hash) and calculate as follows:

 

 

0000 0000 0000 0000 0000 0000 0001 1111 hash = 31
0000 0000 0000 0000 0000 0000 0000 1111 n-1
0000 0000 0000 0000 0000 0000 0000 1111 => 15

0000 0000 0000 0000 0000 0000 0011 1111 hash = 63
0000 0000 0000 0000 0000 0000 0000 1111 n-1
0000 0000 0000 0000 0000 0000 0000 1111 => 15

0000 0000 0000 0000 0000 0000 0111 1111 hash = 95
0000 0000 0000 0000 0000 0000 0000 1111 n-1
0000 0000 0000 0000 0000 0000 0000 1111 => 15

 

 

Because the low-order bits of (2 ^ n-1) are always 1, and then according to the bitwise operation (0-1 is always 0), all final results have 1111, which is why the same index is returned, so although we have different Hash value, but the result is stored in the hash bucket array at the same index location. So in order to solve the problem that the low-order bits are not involved in the operation at all: by calling the above hash method, bitwise right-shifting 16-bits and exclusive-OR, solving the conflict caused by the low-order not participating in the operation and improving performance. We continue to return to the above step [1]. When the array is empty, how is the internal capacity expanded? Let’s take a look at the resize method implementation.

 

 

final Node <K, V> [] resize () {
        Node <K, V> [] oldTab = table;
         int oldCap = (oldTab == null )? 0 : oldTab.length;
         int oldThr = threshold;
         int newCap, newThr = 0 ;
         if (oldCap> 0 ) {
             if (oldCap > = MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                 return oldTab;
            }
            else  if ((newCap = oldCap << 1) <MAXIMUM_CAPACITY && 
                     oldCap > = DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1;
         }
         else  if (oldThr> 0 )
            newCap = oldThr;
         else {               
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = ( int ) (DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        if (newThr == 0 ) {
             float ft = ( float ) newCap * loadFactor;
            newThr = (newCap <MAXIMUM_CAPACITY && ft <( float ) MAXIMUM_CAPACITY? 
                      ( int ) ft: Integer.MAX_VALUE);
        }
        threshold = newThr;
        ...
    }

 

 

It can be known from the above that when the HashMap is instantiated without parameters, the default initialization capacity is 16 at this time, the default threshold is 12, and the load factor is 0.75f. When the parameter is specified (such as the initialization capacity is 5), the initialization capacity is 8 The threshold is 8 and the load factor is 0.75f. Otherwise, if a load factor is also specified, the specified load factor shall prevail. At the same time, when the capacity is exceeded, the expanded capacity is twice the original capacity. Here we find a problem: the capacity in hashTable can be odd or even, and the capacity in HashMap is always a power of 2, which is even, why design it like this?

int index = (n-1) & hash;

The index position in the hash bucket array is calculated for the HashMap as above. If the capacity in the HashMap is not a power of 2, then by pressing the AND operation, the index can only be 16 or 0, which means that more will happen. Conflicts will also lead to poor performance. You can use O (1) to retrieve them. Now O (log n) is needed because when a conflict occurs, all nodes in a given bucket will be stored in the red-black tree. The capacity is a power of 2. At this time, the AND operator and the hash table are used to calculate the index storage location, as follows:

int index = (hash & 0x7FFFFFFF)% tab.length;

According to the way HashMap calculates the index, when we subtract 1 from the power of 2, we get a number with all binary digits at the end. For example, the default initialization capacity is 16, and if we subtract 1 from it, we get 15. Its binary representation is 1111. At this time, if you perform bitwise AND of any number on 1111, we will get the last 4 digits of the integer. In other words, it is equivalent to modulo 16, but division is usually an expensive operation. In other words, bitwise operation is more efficient than modulo operation. So far we know that the reason for the capacity of the power of 2 in HashMap is that the hash bucket array index storage uses bitwise operations instead of modulo operations, because its efficiency is higher than modulo operations . After recalculating the capacity and threshold after the above expansion, the next step is to rehash the hash bucket array. Please continue reading.

Analysis of factors affecting HashMap performance

Before explaining the above re-hashing, we need to start from the beginning. Until here, we know that the default initialization capacity of HashMap is 16. If we have 16 elements put into HashMap, if a good hashing algorithm is implemented, then There will be 1 element in each bucket in the hash bucket array. In this case, it only needs to find the element once. If there are 256 elements in the HashMap, if a good hashing algorithm is implemented, Then in the hash bucket array, 16 elements will be put in each bucket. Similarly, in this case, it only takes 16 times to find any element. At this point, we can know that if HashMap Doubling or doubling the number of elements stored in a hash bucket array, the maximum time cost of finding an element in each bucket is not great, but if the default capacity is maintained continuously, which is 16 unchanged, if each storage There are a large number of elements in the bucket. At this time, the mapping performance of the HashMap will begin to decline. For example, there are currently 16 million pieces of data in HashMap. If a good hashing algorithm is implemented, one million elements will be allocated in each bucket, that is, to find any element, you need to find a maximum of one hundred Ten thousand times. Obviously, after we enlarge the stored elements, it will seriously affect the performance of HashMap, so what solution do we have for this? Let’s return to the original topic, because the default bucket size is 16 and when the number of stored element entries is small, HashMap performance does not change, but when the number of buckets continues to increase, it will affect HashMap performance, which is due to what What causes it? Mainly, we have been keeping the capacity fixed, but we have been increasing the size of the storage elements in the hash bucket array in the HashMap, which completely affects the time complexity. If we increase the bucket size, when the total items in each bucket start to increase, we will be able to keep the number of elements in each bucket constant and keep O (1) time for query and insert operations the complexity. So when is the time to increase the bucket size or capacity? The size (capacity) of the bucket is determined by the load factor, which is a metric that determines when to increase the size (capacity) of the bucket in order to maintain O (1) time complexity for query and insert operations, so When to increase the size of the capacity depends on the product (initialized capacity * load factor), so the capacity and load factor are the fundamental factors affecting the performance of the HashMap.We know that the default load factor is 0.75, which is 75 percent, so the value of increasing the capacity is (16 * 0.75) = 12, which we call the threshold, which means that it is stored in the HashMap until the 12th For each key-value pair, the capacity will be kept at 16. When the 13th key-value pair is inserted into the HashMap, its capacity will be changed from the default 16 to (16 * 2) = 32. Through the above formula for calculating the capacity increase threshold, we think from a reverse perspective: load factor ratio = number of elements in the hash bucket array / bucket size of the hash bucket array. For example, if the default bucket size is 16, When inserting the first element, does its load factor ratio = 1/16 = 0.0625> 0.75? If it is not necessary to increase the capacity, when inserting the 13th element, does its load factor ratio = 13/16 = 0.81> 0.75? If yes, increase capacity. Having said this, let’s take a look at re-hashing. Before explaining why re-hashing is required, we need to understand the concept of re-hashing: the process of recalculating the hash code of the elements stored in the hash bucket array. At the threshold, move it to another larger array of hash buckets . When the number of elements stored in the hash bucket array exceeds the limit of the load factor, the capacity is doubled and re-hashing is performed. So why re-hash? Because after doubling the capacity, if you do not process the key-value pairs that already exist in the hash bucket array, then doubling the capacity does not make any sense.At the same time, it is also to keep the elements in each bucket uniformly distributed, because Only by distributing the elements uniformly into each bucket can O (1) time complexity be achieved . Next, we continue with the re-hash source code analysis

Heavy hashing source code analysis

 

 

  Node <K, V> [] newTab = (Node <K, V> []) new Node [newCap];
        table = newTab;
         if (oldTab! = null ) {
             for ( int j = 0; j <oldCap; ++ j) {
                Node <K, V> e;
                 if ((e = oldTab [j])! = Null ) {
                    oldTab [j] = null ;
                     if (e.next == null )
                        newTab [e.hash & (newCap-1)] = e;
                     else  if (e instanceof TreeNode)
                        ((TreeNode <K, V>) e) .split ( this , newTab, j, oldCap);
                     else { // preserve order 
                        Node <K, V> loHead = null , loTail = null ;
                        Node <K, V> hiHead = null , hiTail = null ;
                        Node <K, V> next;
                         do {
                            next = e.next;
                             if ((e.hash & oldCap) == 0 ) {
                                 if (loTail == null )
                                    loHead = e;
                                 else 
                                    loTail.next = e;
                                loTail = e;
                            }
                            else {
                                 if (hiTail == null )
                                    hiHead = e;
                                 else 
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next)! = null );
                         if (loTail! = null ) {
                            loTail.next = null ;
                            newTab [j] = loHead;
                        }
                        if (hiTail! = null ) {
                            hiTail.next = null ;
                            newTab [j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }

 

 

Overall analysis of re-hashing after expansion is divided into three cases: ① The elements of the hash bucket array are non-linked lists, that is, there is only one element in the array ② The elements of the hash bucket array are converted from red and black trees Is a linked list. Regarding the situation of ①②, I don’t need to describe it again, we will focus on the optimization of the linked list next. This is the following piece of code:

 

 

                        Node <K, V> loHead = null , loTail = null ;
                        Node <K, V> hiHead = null , hiTail = null ;
                        Node <K, V> next;
                         do {
                            next = e.next;
                             if ((e.hash & oldCap) == 0 ) {
                                 if (loTail == null )
                                    loHead = e;
                                 else 
                                    loTail.next = e;
                                loTail = e;
                            }
                            else {
                                 if (hiTail == null )
                                    hiHead = e;
                                 else 
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next)! = null );
                         if (loTail! = null ) {
                            loTail.next = null ;
                            newTab [j] = loHead;
                        }
                        if (hiTail! = null ) {
                            hiTail.next = null ;
                            newTab [j + oldCap] = hiHead;
                        }

 

 

Seeing the above code, we can’t help wondering that it seems that two linked lists have been declared, a lower linked list (lower) and a high linked list (high). We don’t understand it for the time being, then we enter the  do {} while ()   loop, and then focus Here comes the phrase  e.hash & oldCap == 0.  What is it? According to this line of code, you can enter the low linked list and the high linked list respectively. Well, we understand it through an example: Suppose that the default initial capacity is 16, and then we insert an element of 21, according to our description above, first calculate the hash value, then calculate the index position, For easy and intuitive understanding, we still calculate step by step.

static  final  int hash (21 ) {
         int h;
         return (21 == null )? 0: (h = 21.hashCode ()) ^ (h >>> 16 );
}

The hash method is called to calculate that the value of key 21 is still 21. Then, the index position stored in the hash bucket array is calculated by the following AND operation.

i = (16-1) & 21

In the end, we calculate its index position, i, is equal to 5, because the initial capacity is 16, and the threshold is 12, when the 13th element is inserted to start the expansion, the capacity becomes 32. At this time, if the index storage location is calculated again as above,  i = (32-1) & 21  and the result is 21. From this we conclude that when the capacity is 16, the index position of insert element 21 is 5, and the capacity after expansion is 32. At this time, the index position of insert element 21 is 21, which means [the new index after expansion = Original index + original capacity] . Similarly, if the inserted element is 5 and the capacity is 16, then the index position is 5; if the capacity is expanded to 32, the index position is also 5; that is, [expanded index = original index]. Because the capacity is always twice the original capacity (for example, 16 to 32, that is, from 0000 0000 0000 0000 0000 0000 0001 0000 = ”0000 0000 0000 0000 0000 0000 0010 0000) from the point of view, the high-order bits are changed from 0 to 1, and That is, we calculate the hash value of the element and the original capacity by bitwise AND operation. If the result is equal to 0, the expanded index is equal to the original index, otherwise it is equal to the original index plus the original capacity, that is, the hash value The original bitwise AND operation is  e.hash & oldCap == 0  to determine whether the new index position has changed. It is more easy to understand, for example (5 & 16 = 0), then the index position of element 5 after expansion Is [new index = original index + 0], for example (21 & 16 = 16), then the expanded index position of element 21 is [new index = original index + 16]. Because the capacity is always a power of two, this saves the time of recalculating the hash in the previous version to achieve optimization. At this point, we can further summarize the meaning of the capacity always being a power of two: ① The hash bucket array index storage uses bitwise operations instead of modulo operations, because its efficiency is higher than modulo operations. save time.Finally, the index-invariant linked list, that is, the low-order linked list, and the index-change linked list, that is, the high-order linked list are placed into the new hash bucket array after the expansion, and the re-hashing process ends here. Next we analyze how to put elements into the red-black tree?

Insert value into red-black tree to keep tree balance

            // Step [4]: ​​If it is a red-black tree     
            else  if (p instanceof TreeNode)
                e = ((TreeNode <K, V>) p) .putTreeVal ( this , tab, hash, key, value);

Then we look at the specific method of putting the value into the red-black tree described above, as follows:

 

 

        final TreeNode <K, V> putTreeVal (HashMap <K, V> map, Node <K, V> [] tab,
                                        int h, K k, V v) {
            Class <?> Kc = null ;
             boolean searched = false ;
            TreeNode <K, V> root = (parent! = Null )? Root (): this ;
             for (TreeNode <K, V> p = root ;;) {
                 int dir, ph; K pk;
                 if ((ph = p .hash)> h)
                    dir = -1 ;
                 else  if (ph < h)
                    dir = 1 ;
                 else  if ((pk = p.key) == k || (k! = null && k.equals (pk)))
                     return p;
                 else  if ((kc == null && 
                          (kc = comparableClassFor ( k)) == null ) || 
                         (dir = compareComparables (kc, k, pk)) == 0 ) {
                     if (! searched) {
                        TreeNode <K, V> q, ch;
                        searched = true ;
                         if (((ch = p.left)! = null && 
                             (q = ch.find (h, k, kc))! = null ) || 
                            ((ch = p.right)! = null && 
                             (q = ch.find (h, k, kc))! = null ))
                             return q;
                    }
                    dir = tieBreakOrder (k, pk);
                }

                TreeNode <K, V> xp = p;
                 if ((p = (dir <= 0)? P.left: p.right) == null ) {
                    Node <K, V> xpn = xp.next;
                    TreeNode <K, V> x = map.newTreeNode (h, k, v, xpn);
                     if (dir <= 0 )
                        xp.left = x;
                     else 
                        xp.right = x;
                    xp.next = x;
                    x.parent = x.prev = xp;
                     if ( xpn ! = null )
                        ((TreeNode <K, V>) xpn) .prev = x;
                    moveRootToFront (tab, balanceInsertion (root, x));
                    return  null ;
                }
            }
        }

 

 

What we need to think about is: (1) Where is the specific position of the element to be inserted in the red-black tree? (2) After finding the specific position, then you need to know whether it is the left or right? . It is still very easy to understand if we understand it according to the normal idea. We start to traverse the tree from the root node, and compare the hash value of each node with the hash value of the node to be inserted. If the element to be inserted is to the left of its parent node, look at Whether an element already exists on the left side of the parent node. If it does not exist, the left node of the parent node is left to the node to be inserted. The same is true for the right side of the parent node, but if the left and right sides of the parent node both have references, then Continue to traverse until the specific location of the node to be inserted is found. This is our general idea when writing code or testing code, but we also need to consider the boundary problem, otherwise it means that the consideration is not complete. What is the boundary problem for the element to be inserted into the red and black tree? When the hash value of the traversed node and the node to be inserted are equal, then we should determine the order of the elements to maintain the balance of the tree? That is the following code in the above:

 

 

                else  if ((kc == null &&
                          (kc = comparableClassFor (k)) == null) ||
                         (dir = compareComparables (kc, k, pk)) == 0 ) {
                     if (! searched) {
                        TreeNode <K, V> q, ch;
                        searched = true ;
                         if (((ch = p.left)! = null && 
                             (q = ch.find (h, k, kc))! = null ) || 
                            ((ch = p.right)! = null && 
                             (q = ch.find (h, k, kc))! = null ))
                             return q;
                    }
                    dir = tieBreakOrder (k, pk); 
                }

 

 

In order to solve the problem of inserting elements into the red-black tree, how to determine the order of the elements. It is solved by two schemes: ① implementing the Comprable interface ② breaking the deadlock mechanism . Next, let’s look at an example of implementing Comprable, as follows:

 

 

    public  class PersonComparable implements Comparable <PersonComparable> {
     int age;

    public PersonComparable ( int age) {
         this .age = age;
    }

    @Override
    public  boolean equals (Object obj) {
         if ( this == obj) {
             return  true ;
        }

        if (obj instanceof PersonComparable) {
            PersonComparable p = (PersonComparable) obj;
             return ( this .age == p.age);
        }

        return  false ;
    }

    @Override
    public  int hashCode () {
         return 42 ;
    }

    @Override
    public  int compareTo (PersonComparable o) {
         return  this .age- o.age;
    }
}

 

 

Then in the console, call the following code for testing:

 

 

        HashMap hashMap = new HashMap ();

        Person p1 = new Person (1 );
        Person p2 = new Person (2 );
        Person p3 = new Person (3 );
        Person p4 = new Person (4 );
        Person p5 = new Person (5 );
        Person p6 = new Person (6 );
        Person p7 = new Person (7 );
        Person p8 = new Person (8 );
        Person p9 = new Person (9 );
        Person p10 = new Person (10 );
        Person p11 = new Person (11 );
        Person p12 = new Person (12 );
        Person p13 = new Person (13 );

        hashMap.put (p1, "1" );
        hashMap.put (p2, "2" );
        hashMap.put (p3, "3" );
        hashMap.put (p4, "4" );
        hashMap.put (p5, "5" );
        hashMap.put (p6, "6" );
        hashMap.put (p7, "7" );
        hashMap.put (p8, "8" );
        hashMap.put (p9, "9" );
        hashMap.put (p10, "10" );
        hashMap.put (p11, "11" );
        hashMap.put (p12, "12" );
        hashMap.put (p13, "13");

 

 

In contrast to the above code, we implemented the Comprable interface and directly rewritten the hashcode to a constant value. At this time, conflicts will occur. The hash values ​​of each element inserted into the HashMap are equal, that is, the index positions are the same, that is, the final conversion from the linked list to Red-black tree, since the hash value is the same, how do we determine its order? At this time, we return to the above  comparableClassFor  method and  compareComparables  method (the specific implementation of the two will not be explained one by one)

 

 

// If implement the Comparable interface returns to its concrete implementation type, or null otherwise 
static Class <?> ComparableClassFor (the X-Object) {
         IF (the X- instanceof Comparable) {
            Class <?> C; Type [] ts, as; Type t; ParameterizedType p;
             if ((c = x.getClass ()) == String. Class ) // bypass checks 
                return c;
             if ((ts = c. getGenericInterfaces ())! = null ) {
                 for ( int i = 0; i <ts.length; ++ i) {
                     if ((((t = ts (i)) instanceof ParameterizedType) && 
                        ((p = (ParameterizedType) t ) .getRawType () == 
                         Comparable. class ) &&
                        (as = p.getActualTypeArguments ())! = null && 
                        as.length == 1 && as [0] == c) // type arg is c 
                        return c;
                }
            }
        }
        return  null ;
    }

 

 

   // Call a custom comparator that implements the Comprable interface to determine the order 
   static  int compareComparables (Class <?> Kc, Object k, Object x) {
         return (x == null || x.getClass ()! = Kc? 0 :
                ((Comparable) k) .compareTo (x));
    }

But if the Person class above us achieve not realize Comprable interface will use this time to break the deadlock mechanism (I guess whether the author named this method from Wikipedia ”
https://en.wikipedia.org/wiki/Tiebreaker

” More appropriate [Tiebreaker is a system of playoffs, mainly used in baseball and softball sports, especially in the playoffs and international tournaments to avoid knock-out time Win and lose.]), Which corresponds to the following code:

dir = tieBreakOrder (k, pk);

 

 

       static  int tieBreakOrder (Object a, Object b) {
             int d;
             if (a == null || b == null || 
                (d = a.getClass (). getName ().
                 compareTo (b.getClass (). getName ())) == 0 )
                d = (System.identityHashCode (a) <= System.identityHashCode (b)?
                     -1: 1 );
             return d;
        }

 

 

Because the hash values ​​are equal and the Comparable interface is not implemented at the same time, but we have to solve such a practical problem. It can be said that we finally adopted a “restrained” solution and obtained the  unique and constant object by calling the above  System.identityHashCode The hash value thus determines the order. OK, so the question is, what is the difference between inserting keys that implement the Comparable interface into the tree and inserting keys that do not implement the interface into the tree? If the key implements the Comparable interface, finding the specified element will use the tree feature to quickly find it. If the key does not implement the Comparable interface, finding the specified element will use the traversal tree to find it . The question is again, since the order is determined by the comparator that implements the Comparable interface, why not directly use the breakthrough deadlock mechanism as a comparator? Let’s take a look at the following example:

Person person1 = new Person (1 );
Person person2 = new Person (1 );
System.out.println (System.identityHashCode (person1) == System.identityHashCode (person2));

At this point, we know that even two identical object instances have different identityHashCode, so we cannot use identityHashCode as a comparator. The question comes again. Since identityHashCode is used to determine the order of elements, when traversing the tree is used to find elements, the characteristics of the tree are not used at all, so why construct a tree? Because HashMap can contain keys of different object instances, some may implement the Comparable interface, and some may not . Let’s look at an example of mixing different classes:

 

 

public  class Person {
     int age;

    public Person ( int age) {
         this .age = age;
    }

    @Override
    public  boolean equals (Object obj) {
         if ( this == obj) {
             return  true ;
        }

        if (obj instanceof Person) {
            Person p = (Person) obj;
             return ( this .age == p.age);
        }

        return  false ;
    }

    @Override
    public  int hashCode () {
         return 42 ;
    }
}

 

 

 

 

        HashMap hashMap = new HashMap ();

        PersonComparable p1 = new PersonComparable (1 );
        PersonComparable p2 = new PersonComparable (2 );
        PersonComparable p3 = new PersonComparable (3 );
        PersonComparable p4 = new PersonComparable (4 );
        PersonComparable p5 = new PersonComparable (5 );
        PersonComparable p6 = new PersonComparable (6 );

        Person p7 = new Person (7 );
        Person p8 = new Person (8 );
        Person p9 = new Person (9 );
        Person p10 = new Person (10 );
        Person p11 = new Person (11 );
        Person p12 = new Person (12 );
        Person p13 = new Person (13 );

        hashMap.put (p1, "1" );
        hashMap.put (p2, "2" );
        hashMap.put (p3, "3" );
        hashMap.put (p4, "4" );
        hashMap.put (p5, "5" );
        hashMap.put (p6, "6" );
        hashMap.put (p7, "7" );
        hashMap.put (p8, "8" );
        hashMap.put (p9, "9" );
        hashMap.put (p10, "10" );
        hashMap.put (p11, "11" );
        hashMap.put (p12, "12" );
        hashMap.put (p13, "13");

 

 

Java_entry_series_[14]_HashMap_source_code_analysis_0.png

When using the mixed mode, object instances that implement the Comparable interface and that do not implement the Comparable interface can compare keys based on the class name.

Summary

In this section, we have explained the details of the implementation of HashMap in detail. Some of the relatively simple places have not been analyzed one by one. If there are inappropriate descriptions or misunderstandings in the article, we hope to correct them. Thank you for reading.

Java entry series [1] string creation method

Java entry series [2] string features

Java entry series [3] stringbuilder string buffer

Java entry series [4] packaging class

Java entry series [5] inheritance abstract classes interfaceson

Java entry series [6] Principles of dynamic array

Java entry series [7] collection arraylist

Java entry series [8] principles of doubley linked list alogrithm

Java entry series [9] linkedlist source code analysis

Java entry series [10] Hash algorithm principle

Java entry series [11] Hashtable source code analysis

Java entry series [12] Hashcode and equals

Java entry series [13] Principles of the read-black tree algorithmn method

Java entry series [14] Hashmap source code ananlysis

 

 

 

Orignal link:https://www.cnblogs.com/CreateMyself/p/11610626.html