Python Basics October 23 ,2025

Python Sets 

Python sets are unordered collections of unique elements. While seemingly simple, sets are highly optimized for membership testing, mathematical operations, and eliminating duplicates. Understanding their internal mechanics and performance characteristics is essential for writing high-performance Python code.

1. Set Internals: How Python Stores Sets

  • Sets are implemented as hash tables.
  • Each element is hashed, and the hash determines its position in memory.
  • This allows O(1) average time complexity for membership tests, additions, and deletions.
  • Since sets are unordered, no indexing or slicing is supported.
s = {1, 2, 3, 4}
print(hash(1))  # Hash value used internally for storage

Key Insight:

  • Sets can only store hashable objects (immutable types like int, float, string, tuple).
  • Mutable objects like lists or dictionaries cannot be elements.

2. Creating Sets and Frozensets

Sets in Python are unordered collections of unique elements.
They are primarily used for membership testing, eliminating duplicates, and performing mathematical operations like union, intersection, and difference.

Python provides two main types of sets — mutable sets (set) and immutable sets (frozenset).

Mutable Sets (set)

A mutable set allows adding and removing elements dynamically.
It is defined using curly braces {} or the built-in set() constructor.

# Creating a set
s = {1, 2, 3}

# Adding an element
s.add(4)

# Removing an element
s.remove(2)

print(s)  # Output: {1, 3, 4}

Key Properties:

  • No duplicates: If you try to add an existing element, it will be ignored.
  • Unordered: The order of elements is not guaranteed.
  • Mutable: Elements can be added or removed after creation.

Common Use Cases:

  • Removing duplicates from a list.
  • Fast membership checks (in and not in are highly efficient).
  • Performing set operations like union or intersection.

Example:

nums = [1, 2, 2, 3, 3, 3]
unique_nums = set(nums)
print(unique_nums)  # {1, 2, 3}

Immutable Sets (frozenset)

A frozenset is the immutable version of a regular set.
Once created, it cannot be changed — you cannot add, remove, or modify its elements.

fs = frozenset([1, 2, 3])

# fs.add(4)  # ❌ AttributeError: 'frozenset' object has no attribute 'add'

Because of their immutability, frozensets are hashable, which means they can be:

  • Used as keys in dictionaries, or
  • Included as elements inside other sets — something normal sets can’t do.

Example:

fs1 = frozenset([1, 2])
fs2 = frozenset([3, 4])

# Using frozensets as dictionary keys
data = {fs1: "Group 1", fs2: "Group 2"}
print(data[fs1])  # Group 1

This property makes frozensets ideal for representing fixed combinations, frozen states, or unique sets of values where modification should not occur.

Comparison: set vs frozenset

Featuresetfrozenset
MutabilityMutableImmutable
Syntax{1, 2, 3} or set([...])frozenset([...])
Can be modifiedYes (add(), remove(), etc.)No
Can be used as dictionary keyNoYes
Can contain other setsNoYes (as frozensets)
HashableNoYes

Use Case Example

Imagine you’re tracking unique sets of permissions or combinations of attributes in a system —
you can use a frozenset to ensure those combinations stay fixed and can be used as lookup keys:

permissions = {
    frozenset(['read', 'write']): "Editor",
    frozenset(['read']): "Viewer"
}

print(permissions[frozenset(['read'])])  # Viewer

This pattern is common in configuration systems, caching, and data deduplication logic.

3. Set Operations

Python’s set data structure is designed with mathematical operations in mind.
It supports fast and expressive ways to perform unions, intersections, and differences — all built on a hash table foundation that ensures exceptional performance, even with large datasets.

These operations are widely used in data filtering, membership testing, deduplication, and comparative analytics.

A. Union (| or union())

The union operation combines elements from two or more sets, returning a new set that contains all unique elements from the input sets.

a = {1, 2, 3, 4}
b = {3, 4, 5, 6}

print(a | b)             # {1, 2, 3, 4, 5, 6}
print(a.union(b))        # {1, 2, 3, 4, 5, 6}

Explanation:

  • Duplicates are automatically removed.
  • The result is a new set (original sets remain unchanged).

Use Case Example:
Combining two lists of users, IDs, or items to find everyone involved:

all_users = set(team_A) | set(team_B)

B. Intersection (& or intersection())

The intersection operation returns elements that are common to both sets.

print(a & b)             # {3, 4}
print(a.intersection(b)) # {3, 4}

Use Case Example:
Finding common tags, mutual friends, or shared items between groups:

common_tags = set(post1_tags) & set(post2_tags)

Behavior:
Returns a new set containing only overlapping elements.

C. Difference (- or difference())

The difference operation finds elements that are in the first set but not in the second.

print(a - b)              # {1, 2}
print(a.difference(b))    # {1, 2}

Explanation:

  • Order matters: a - b is not the same as b - a.
  • Useful for exclusion operations.

Use Case Example:
Identifying which users unsubscribed or which features are unique to one product version.

unique_to_a = users_A - users_B

D. Symmetric Difference (^ or symmetric_difference())

The symmetric difference operation returns all elements that are in either set, but not in both — i.e., elements that are exclusive to each set.

print(a ^ b)                         # {1, 2, 5, 6}
print(a.symmetric_difference(b))     # {1, 2, 5, 6}

Use Case Example:
Finding items that changed between two datasets or records:

changed_items = old_data ^ new_data

Behavior:
Excludes common elements, focusing only on differences from both sides.

E. Chaining Operations

You can chain multiple set operations for complex comparisons:

c = {5, 6, 7}
result = (a | b) & c    # Common elements between (a ∪ b) and c
print(result)            # {5, 6}

This chaining pattern is frequently used in filtering pipelines, search indexing, and data reconciliation workflows.

F. Performance Considerations

All fundamental set operations (|, &, -, ^) are implemented using optimized hash table algorithms.

Average complexity:

O(len(s) + len(t))
where s and t are the two sets involved.

This efficiency makes sets extremely valuable when:

  • Working with large datasets.
  • Performing frequent membership tests (if x in set).
  • Implementing deduplication or category-based filtering.

 

4. Set Methods — Detailed Overview

MethodDescriptionTime Complexity
add(x)Add element xO(1)
remove(x)Remove x; raises KeyError if absentO(1)
discard(x)Remove x; no error if absentO(1)
pop()Remove and return arbitrary elementO(1)
clear()Remove all elementsO(n)
copy()Shallow copyO(n)
update(*others)Add elements from other iterable(s)O(k)
intersection_update(*others)Update with intersectionO(n)
difference_update(*others)Update with differenceO(n)

Pro Tip: Use discard() instead of remove() when element absence is possible.

5. Removing Duplicates from Lists Using Sets

One of the most common and practical uses of sets in Python is removing duplicates from lists or other sequences.
Since sets inherently store only unique elements, converting a list to a set automatically eliminates duplicates in a single, efficient step.

numbers = [1, 2, 2, 3, 4, 4, 5]
unique_numbers = list(set(numbers))
print(unique_numbers)  # [1, 2, 3, 4, 5]

Explanation:

  • The set() constructor removes all repeated values.
  • Converting back to a list restores the collection type for further operations.

This approach is extremely fast because set membership and insertion are both average O(1) operations, making it ideal for large datasets.

However, there’s one important consideration.

Caveat — Order Is Lost:
Sets are unordered, so converting a list to a set and back may rearrange elements.

For example:

numbers = [4, 3, 2, 1, 4, 2]
print(list(set(numbers)))  
# Output could be [1, 2, 3, 4] or any order depending on hashing

If the original order matters, you can use a dictionary-based approach, which preserves insertion order since Python 3.7+:

unique_ordered = list(dict.fromkeys(numbers))
print(unique_ordered)  # [4, 3, 2, 1]

Or equivalently, using collections.OrderedDict (for older Python versions):

from collections import OrderedDict
unique_ordered = list(OrderedDict.fromkeys(numbers))

 

6. Nested Sets and Frozensets

Regular sets in Python are mutable and therefore unhashable.
This means you cannot have a set inside another set, because inner sets can change — which breaks the immutability requirement for hash-based structures.

s1 = {1, 2}
s2 = {3, 4}
# nested_set = {s1, s2}  # ❌ TypeError: unhashable type: 'set'

To create nested collections of sets, Python provides frozenset, an immutable and hashable variant of set.

s1 = frozenset([1, 2])
s2 = frozenset([3, 4])
nested_set = {s1, s2}  # ✅ Works

Here:

  • Each frozenset acts as a fixed, unchangeable collection.
  • The outer set (nested_set) can then safely store these immutable inner sets.

Practical Use Cases:

  • Representing unique relationships or connections in graph algorithms.
  • Storing combinations of attributes that should remain fixed.
  • Tracking immutable groups in clustering or dependency analysis.

Example — Representing Relationships:

friendships = {
    frozenset(["Alice", "Bob"]),
    frozenset(["Bob", "Charlie"]),
    frozenset(["Alice", "David"])
}

Here, each pair is unordered (since sets don’t care about order), and no duplicate relationships can exist.
This is perfect for representing bidirectional links or non-hierarchical relationships in graphs and networks.

7. Advanced Iteration and Set Comprehensions

Set comprehensions are one of the most Pythonic and expressive ways to create and manipulate sets.
They offer a concise syntax for iteration, filtering, and transformation — all while maintaining the key property of sets: uniqueness.

Just like list comprehensions, set comprehensions allow you to define new sets in a single readable line, but they automatically discard duplicates and produce unordered collections.

Set Comprehensions in Action

A set comprehension uses curly braces {} instead of square brackets [] (used by list comprehensions).
Inside the braces, you can include an expression, a loop, and an optional condition.

squares = {x**2 for x in range(10) if x % 2 == 0}
print(squares)
# Output: {0, 64, 2, 4, 36, 16}

Explanation:

  • The loop iterates through numbers 0–9.
  • The condition if x % 2 == 0 filters only even numbers.
  • Each selected number is squared (x**2) and added to the set.
  • Duplicate results are automatically removed.

This compact syntax replaces longer for-loops while improving readability and performance.

Filtering and Transforming Data

Set comprehensions are particularly useful when processing collections where you need to:

  • Remove duplicates
  • Apply transformations (like string case changes)
  • Filter based on conditions

Example:

words = ["apple", "banana", "apple"]
unique_upper = {w.upper() for w in words}
print(unique_upper)
# Output: {'APPLE', 'BANANA'}

How it works:

  • Each word is converted to uppercase (w.upper()).
  • Since sets eliminate duplicates, "apple" appears only once in the result.
  • The final set contains unique, transformed elements.

This pattern is common in data cleaning, case normalization, and text processing pipelines.

Multiple Loops and Conditions

You can extend comprehensions with multiple loops and conditions for more complex transformations:

pairs = {(x, y) for x in range(3) for y in range(3) if x != y}
print(pairs)
# Output: {(0, 1), (1, 2), (2, 0), (0, 2), (2, 1), (1, 0)}

Here, the comprehension:

  • Iterates over two ranges.
  • Excludes pairs where x == y.
  • Produces a set of unique coordinate pairs.

This is an elegant, compact way to generate combinatorial datasets, graph edges, or coordinate grids without repetition.

Iteration Order and Hashing

It’s important to remember that sets are unordered collections.
Their iteration order is determined by internal hashing and can vary between runs.

for item in squares:
    print(item)

While the above loop will print all elements, the order is arbitrary — you should never rely on set ordering for consistent results.

If you need a deterministic order, convert the set to a sorted list first:

for item in sorted(squares):
    print(item)

This preserves the benefits of sets (uniqueness) while adding controlled ordering for display or comparison.

8. Set Performance Insights

  • Membership (x in s): O(1) average, O(n) worst-case (rare due to hash collisions).
  • Adding elements: O(1) amortized.
  • Union/Intersection: O(n + m) average.
  • Memory overhead: Higher than lists due to hash table structure.

Pro Tip:

  • For small datasets, lists may be faster due to lower memory overhead.
  • For large datasets or frequent membership tests, sets outperform lists significantly.

9. Real-World Applications

  1. Removing duplicates from large datasets:
emails = ["a@mail.com", "b@mail.com", "a@mail.com"]
unique_emails = set(emails)
  1. Fast membership tests:
allowed_ips = {"192.168.0.1", "10.0.0.1"}
if client_ip in allowed_ips:
    allow_access()
  1. Mathematical & algorithmic problems:
  • Finding common elements between arrays
  • Implementing graph adjacency sets
  • Calculating unique combinations
  1. Immutable collections with frozensets:
  • Representing unchangeable relationships
  • Using as dict keys in caching or memoization

10. Advanced Tips & Best Practices

  • Prefer sets over lists for frequent membership checks.
  • Use frozensets when hashable, immutable sets are needed.
  • Avoid nested mutable sets — use frozensets or alternative structures.
  • For ordered unique collections, combine OrderedDict.fromkeys() or dict.fromkeys() with set logic.

Conclusion

Python sets are powerful, high-performance collections optimized for uniqueness and fast membership tests. By understanding their hash table implementation, frozensets, advanced operations, and performance characteristics, you can leverage sets for algorithms, data cleaning, and optimization tasks that would be cumbersome with lists or tuples.

 

Next Blog- Python Tuples

 

Sanjiv
0

You must logged in to post comments.

Get In Touch

G06, Kristal Olivine Bellandur near Bangalore Central Mall, Bangalore Karnataka, 560103

+91-8076082435

techiefreak87@gmail.com