Custom Implementation of HashSet in Java
Understanding Internal Working and Step-by-Step Construction
Introduction
HashSet is one of Java’s core Collection classes. It is part of the Java Collections Framework and implements the Set interface. A Set is a collection of unique elements — meaning it does not allow duplicate values.
The interesting part is that internally HashSet uses HashMap for storage. This design decision ensures that HashSet provides constant-time performance for basic operations like add, remove, and contains.
In this blog, we will:
- Understand how HashSet works internally.
 - Learn about its relationship with HashMap.
 - Build a custom HashSet from scratch.
 - Analyze its advantages, disadvantages, and real-world use cases.
 
Understanding the Relationship Between HashSet and HashMap
A HashSet in Java is essentially backed by a HashMap.
When you add an element to a HashSet, the element is stored as a key in the internal HashMap.
The value is a constant dummy object (a placeholder).
Example:
HashSet set = new HashSet<>();
set.add("Java");
set.add("Python");
set.add("C++");
Internally:
HashMap:
[Java=null, Python=null, C++=null]
This means:
- The uniqueness guarantee in HashSet comes from the uniqueness of HashMap keys.
 - HashSet relies entirely on HashMap’s hashing mechanism.
 
Core Features of HashSet
- Stores unique elements only.
 - Allows null values (only one null element allowed).
 - Unordered collection — insertion order is not maintained.
 - Backed by a HashMap internally.
 
Designing a Custom HashSet
Since HashSet internally uses HashMap, we will follow a similar approach but keep it simpler.
Step 1: Define a CustomHashSet Class
public class CustomHashSet {
    private static final Object DUMMY = new Object();
    private CustomHashMap map;
    public CustomHashSet() {
        map = new CustomHashMap<>();
    }
}
Here:
- We use our previously implemented CustomHashMap to store elements.
 - A single dummy object is used as the value for all keys.
 
Step 2: Implement the add() Method
This will add elements to the set.
public boolean add(K key) {
    if (map.get(key) == null) {
        map.put(key, DUMMY);
        return true;
    }
    return false; // element already exists
}
Step 3: Implement the contains() Method
Checks whether the element exists.
public boolean contains(K key) {
    return map.get(key) != null;
}
Step 4: Implement the remove() Method
Removes the element from the set.
public boolean remove(K key) {
    return map.remove(key);
}
Step 5: Implement the display() Method
Displays all elements in the set.
public void display() {
    map.display();
}
Step 6: Testing the CustomHashSet
public class TestCustomHashSet {
    public static void main(String[] args) {
        CustomHashSet set = new CustomHashSet<>();
        set.add("Java");
        set.add("Python");
        set.add("C++");
        set.add("Java"); // duplicate element
        set.display();
        System.out.println("Contains Python? " + set.contains("Python"));
        System.out.println("Contains Ruby? " + set.contains("Ruby"));
        set.remove("Python");
        set.display();
    }
}
Expected Output
Bucket 1: [Java]
Bucket 2: [Python]
Bucket 3: [C++]
Contains Python? true
Contains Ruby? false
Bucket 1: [Java]
Bucket 3: [C++]
Got it — you only want the CustomHashSet complete implementation (self-contained, no external CustomHashMap dependency).
CustomHashSet – Complete Code
import java.util.LinkedList;
public class CustomHashSet<K> {
    // Node class for each element in the set
    private class Node {
        K key;
        Node(K key) {
            this.key = key;
        }
    }
    // Default initial capacity
    private int capacity = 10;
    private LinkedList<Node>[] buckets;
    private int size = 0;
    @SuppressWarnings("unchecked")
    public CustomHashSet() {
        buckets = new LinkedList[capacity];
        for (int i = 0; i < capacity; i++) {
            buckets[i] = new LinkedList<>();
        }
    }
    // Hash function to calculate bucket index
    private int hash(K key) {
        return Math.abs(key.hashCode() % capacity);
    }
    // Add element to set
    public boolean add(K key) {
        int index = hash(key);
        LinkedList<Node> bucket = buckets[index];
        for (Node node : bucket) {
            if (node.key.equals(key)) {
                return false; // already exists
            }
        }
        bucket.add(new Node(key));
        size++;
        // Optional: rehash if load factor exceeds threshold
        if ((1.0 * size) / capacity > 0.7) {
            rehash();
        }
        return true;
    }
    // Check if element exists
    public boolean contains(K key) {
        int index = hash(key);
        LinkedList<Node> bucket = buckets[index];
        for (Node node : bucket) {
            if (node.key.equals(key)) {
                return true;
            }
        }
        return false;
    }
    // Remove element
    public boolean remove(K key) {
        int index = hash(key);
        LinkedList<Node> bucket = buckets[index];
        for (Node node : bucket) {
            if (node.key.equals(key)) {
                bucket.remove(node);
                size--;
                return true;
            }
        }
        return false;
    }
    // Display all elements
    public void display() {
        for (int i = 0; i < capacity; i++) {
            if (!buckets[i].isEmpty()) {
                System.out.print("Bucket " + i + ": ");
                for (Node node : buckets[i]) {
                    System.out.print("[" + node.key + "] ");
                }
                System.out.println();
            }
        }
    }
    // Returns number of elements
    public int size() {
        return size;
    }
    // Internal method to rehash (resize and redistribute keys)
    @SuppressWarnings("unchecked")
    private void rehash() {
        LinkedList<Node>[] oldBuckets = buckets;
        capacity *= 2;
        buckets = new LinkedList[capacity];
        size = 0;
        for (int i = 0; i < capacity; i++) {
            buckets[i] = new LinkedList<>();
        }
        for (LinkedList<Node> bucket : oldBuckets) {
            for (Node node : bucket) {
                add(node.key);
            }
        }
    }
    // Test main method
    public static void main(String[] args) {
        CustomHashSet<String> set = new CustomHashSet<>();
        set.add("Java");
        set.add("Python");
        set.add("C++");
        set.add("Java"); // duplicate
        System.out.println("Initial Set:");
        set.display();
        System.out.println("Contains Python? " + set.contains("Python"));
        System.out.println("Contains Ruby? " + set.contains("Ruby"));
        set.remove("Python");
        System.out.println("\nAfter Removing Python:");
        set.display();
    }
}
Expected Output
Initial Set:
Bucket 1: [Java]
Bucket 2: [Python]
Bucket 3: [C++]
Contains Python? true
Contains Ruby? false
After Removing Python:
Bucket 1: [Java]
Bucket 3: [C++]
Explanation
- Buckets: Array of linked lists used to store elements based on hash codes.
 - Hash Function: Uses hashCode() of the key to determine its bucket index.
 - Add: Checks for duplicates, then adds new elements.
 - Contains: Returns true if the key exists.
 - Remove: Deletes an element if found.
 - Rehashing: Automatically doubles bucket size when load factor exceeds 0.7 to maintain efficiency
 
Internal Working of HashSet
Here’s what happens internally when we perform operations:
- add(element)
- Calls map.put(element, DUMMY)
 - HashMap computes the hash index and stores the element as a key.
 
 - contains(element)
- Calls map.get(element)
 - Returns true if element exists.
 
 - remove(element)
- Calls map.remove(element)
 - Deletes the key from the internal map.
 
 
Time Complexity of HashSet Operations
| Operation | Average Case | Worst Case | 
|---|---|---|
| add() | O(1) | O(n) | 
| contains() | O(1) | O(n) | 
| remove() | O(1) | O(n) | 
The worst-case scenario happens when all elements hash to the same bucket, forming a linked list (or tree in Java 8+).
Differences Between HashSet and HashMap
| Feature | HashSet | HashMap | 
|---|---|---|
| Data Structure | Uses HashMap internally | Uses array of buckets + linked list/tree | 
| Storage | Only keys | Key-value pairs | 
| Null Values | Allows one null value | Allows one null key and multiple null values | 
| Order | Unordered | Unordered | 
| Purpose | Store unique elements | Store key-value pairs | 
Advantages of HashSet
- Provides O(1) average time complexity for add, remove, and contains.
 - Automatically handles uniqueness of elements.
 - Easy to implement using HashMap.
 - Flexible and widely used in applications.
 
Disadvantages of HashSet
- No ordering guarantees (use LinkedHashSet for order preservation).
 - Not synchronized — requires synchronization for multi-threading (use ConcurrentHashSet).
 - Memory overhead because of internal HashMap storage.
 - Poor hashCode implementation can degrade performance.
 
Real-World Applications
- Removing duplicates from a collection.
 - Maintaining a list of unique items (e.g., usernames, email IDs).
 - Storing distinct elements for mathematical operations.
 - Membership checking (fast lookup).
 
Conclusion
In this blog, we’ve dissected the HashSet implementation and learned that it is basically a wrapper around HashMap.
By building a CustomHashSet, we understood:
- How elements are stored as keys in a HashMap.
 - How collisions are handled through hashing and chaining.
 - How core set operations map to HashMap operations.
 
Understanding this relationship not only clarifies HashSet’s internal mechanism but also strengthens your grasp on Java’s Collections Framework.
Next Blog- Custom Implementation of LinkedHashMap in Java
                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                
                                                                