Java, Weak References and WeakHashMap

Most any Java Developer will be familiar with the concepts of references, as in pass-by-reference vs. pass-by-value. (Pointers, now that’s another thing…)

When calling methods, primitive data types are passed by value, while objects and arrays are passed by reference. This means when you call a method with an object as a parameter, you are merely providing that method a way to access/manipulate the same object via a reference; no copy is made. Contrast that with primitives: When calling a method that requires them, a copy of that value is put on the call stack before invoking the method.

In that way, references are somewhat like pointers, though they obviously cannot be manipulated by pointer arithmetic. But what about weak references? What are they, and how do they contrast with strong references?

Weakly understood

Based on my experience, the concept of weak references, or more generally reachability, is not one that is well-understood in the Java world. At least I did not have a good grasp of them until stumbling upon some sample code one day. It may be that the need to utilize them is outside the confines of most day-to-day programming tasks, as the concept is fairly low-level. Nonetheless, it’s an important concept to understand.

Basically, Java specifies five levels of reachability for objects that reflect which state the object is in, in relation to being marked as finalizable, being finalized and being reclaimed. They are, in order of strongest-to-weakest:

  1. Strongly Reachable
  2. Softly Reachable
  3. Weakly Reachable
  4. Phantom Reachable
  5. Unreachable

An object’s normal state, as soon as it has been instantiated and assigned to a variable/field is strongly reachable. Chances are, these are the only types of objects you’ve worked with. We’ll first cover the concept of weakly reachable objects, as I believe it provides a good base for understanding the remainder.

Cleaning out the trash

Going by the API reference, a weakly reachable object is one that can be reached by traversing (i.e. going through) a weak reference. That’s a succinct definition to be sure, but it just raises the next question: What is a weak reference?

Simply put, if an object can only be reached by traversing a weak reference, the garbage collector will not attempt to keep the object in memory any more than it would an object with no references to it, i.e. an object that cannot be accessed. Thus, from the garbage collector’s point-of-view, a weakly-referenced object will eventually be cleaned from memory the same as an object no references to it.

So, if weakly-referenced objects are treated the same as completely non-referenced ones, what is the purpose of the weak reference? A good example is the WeakHashMap, a class provided by Java.

WeakHashMap

The best way to describe a WeakHashMap is one where the entries (key-to-value mappings) will be removed when it is no longer possible to retrieve them from the map. For example, say you’ve added an object to the WeakHashMap using a key k1. If you now set k1 to null, there should be no way to retrieve the object from the map, since you don’t have the key object around any more to call get() with. This behaviour is possible because WeakHashMap only has weak references to the keys, not strong references like the other Map classes.

This makes it ideal for use as a cache of sorts. A typical use case is to associate keys with some large objects that take up a lot of memory; with only a weak reference to the keys, when there is no longer any external reference to the keys, the entry for it will be removed, which will also remove the WeakHashMap’s reference to the value objects. This can also make the value objects then eligible for garbage collection, *provided there are no other strong references* to the value objects outside of the map.

Note that for the WeakHashMap to work this way, as it was intended, the key objects must only be considered equal if they are actually the same object – i.e. object identity instead of mere equality. This is the default behaviour for Object.equals() and Object.hashCode(), so if these methods have not been overridden, the object is OK to be used as a key in WeakHashMap. Objects like Integer are not suitable for use in WeakHashMap, because it is possible to create two separate (non-identical) objects that are both equal:

final Integer i1 = new Integer(4);
final Integer i2 = new Integer(4);
LOGGER.debug("i1.equals(i2): " + i1.equals(i2)); // True.
LOGGER.debug("i1 == i2: " + (i1 == i2)); // False.

Another point of importance is that String is not a suitable key for a WeakHashMap as well. In addition to its overriding of equals() and hashCode(), String objects in Java are also interned (i.e. stored) in a pool by the JVM when created. This means that they may remain strongly referenced even after you have apparently gotten rid of your reference to them. Because of this, entries that you add to a WeakHashMap using String keys may never get dropped, even after you have apparently lost reference to the keys, since the Strings may remain strongly referenced in the string intern pool.

An example of String interning:

final String s1 = "The only thing we have to fear is fear itself.";
final String s2 = "The only thing we have to fear is fear itself.";
LOGGER.debug("s1.equals(s2): " + s1.equals(s2)); // True.
LOGGER.debug("s1 == s2: " + (s1 == s2)); // May also return true!

String objects are interned for performance reasons, so when you are going to create a new String, Java first checks if there is a String in the pool that is “equal” to the one you are creating. If such a String exists, the existing object is just returned instead of having to instantiate a new object. This is possible because Strings in Java are immutable, i.e. operations that appear to modify a String (such as concatenation, toUpperCase(), etc.) really return a new String object while preserving the original.

The last usage note is that even though the keys are weakly-referenced by WeakHashMap, the values remain strongly-referenced. Thus, you must take care to not use value objects that strongly reference the keys themselves, as if this happens, the keys/entries will no longer be automatically dropped because a strong reference may always exist to the keys. (This can be avoided by wrapping the value object in a WeakReference, so that both keys and values are weakly-referenced when in the WeakHashMap)

Example use of WeakHashMap

Here is a brief, albeit contrived example of WeakHashMap at work:

// SampleKey is just an object that holds a single int. (Use instead of
// Integer, since Integer overrides equals() and hashcode())
SampleKey key = new SampleKey(42);
SampleObject value = new SampleObject("Sample Value");

final WeakHashMap<SampleKey, SampleObject> weakHashMap = new WeakHashMap<SampleKey, SampleObject>();
weakHashMap.put(key, value);

// At this point, we still have a strong reference to the key. Thus, even
// though the key is weakly-referenced by the WeakHashMap, nothing will
// be automatically removed even if we give a hint to the GC.
System.gc();

LOGGER.debug(weakHashMap.size()); // Will still be '1'.
LOGGER.debug(weakHashMap.get(key)); // Will still be 'Sample Value'.

// Now, we if set the key to null, the entry in weakHashMap will eventually
// disappear. Note that the number of times we have to 'kick' the GC
// before the entry disappears may be different on each run depending
// on the JVM load, memory usage, etc.
// This could also allow the SampleObject value to be GC'd, provided there
// were no other references to it.
key = null;
value = null;
int count = 0;
while(0 != weakHashMap.size())
{
  ++count;
  System.gc();
}
LOGGER.debug("Took " + count + " calls to System.gc() to result in weakHashMap size of : " + weakHashMap.size());

Finishing up

In an upcoming article, I plan on covering the other types of references (soft and phantom) as well as the associated Reference classes in Java. I wanted to keep this post brief so that it provided a basic understanding of the situation.

Changes/Fixes

  • 2011-04-10: Fixed numerous incorrect usages of the term “dereference”. Thanks to Ranjit for the explanation.
  • 2017-12-31: Clarified usage of WeakHashMap as a cache.

References

  1. Package java.lang.ref
  2. WeakHashMap
  3. Understanding Weak References

11 Comments »

  1. Thanks for writing this up. Very well written and explained. Look forward to reading about your take on SoftReference and PhantomReference, in particular. Coming from a C/C++ background, I carry a different connotation of the word dereference and felt a little thrown off by your use of that word to indicate nullifying a Java reference. Other than that, great job!

    Ranjit

  2. @Ranjit – thanks for the comment. I believe I was mistaken to use “dereference” as originally done. It does not mean “set to null”, even in Java.

    I believe in Java, same as C/C++, “dereferencing” means to get the value pointed at.

    Sorry for the mistake. I will correct/update this post.

  3. Thank you, its a good article, looking for your article on Soft and Phantom references.

  4. Nice article with good explaination on weakHashMap.

  5. Nice article ๐Ÿ™‚

  6. Hi Peter, Chng

    Very well written on WeakHashMap.

    Can you please tell me any other way the object can be considered as weak? Like Here you have marked SampleKey explicitly null.

    Also if you can focus like is it good to use WeakHashMap instead HashMap? or there are some disadvantages.

    Thanks
    Mayur Mewada

  7. Really nice article. Well done

  8. Why would I need to set the key to null, normally you would delete the entry from the Hashmap explicitly hashmap.remove(key)

  9. Nice article , eager to see the article about soft and phantom type of reference.

  10. Good one, Thank you ๐Ÿ™‚

  11. Very good article.. Simple to understand.

Comments are now closed for this entry.