Iterators and Generators in Python —
When programmers first learn Python, looping feels deceptively simple. You write a for loop over a list, and it “just works.” But beneath this simplicity lies one of Python’s most elegant internal systems — a system that powers looping, streaming, file reading, asynchronous processing, and even advanced frameworks.
This system is built on two ideas:
- Iterators — objects that yield values on request
- Generators — functions that create such objects with minimal code
Understanding iterators and generators means understanding how Python moves through data, how it avoids unnecessary memory usage, and how it creates expressive data pipelines that scale from tiny scripts to production systems.
This guide explores them from the ground up — not with short definitions, but with the depth and clarity needed to actually master them.
Table of Contents
- Why Python Uses Iterators Everywhere
- The Iterator Protocol — The Machinery Behind Every Loop
- Creating Your Own Iterator — Understanding State
- Generator Functions — A Simpler Way to Create Iterators
- How Generators Actually Work Internally
- Generator Expressions — Compact and Memory Efficient
- Building Streaming Pipelines with Generators
- yield from — Delegating Work to Subgenerators
- Advanced Generator API — send(), throw(), close()
- Generator Return Values — A Hidden but Important Feature
- Why Generators Replace Custom Iterator Classes
- itertools — Python’s Toolbox of Iterator Recipes
- Powerful Iterator/Generator Patterns
A. Processing data in fixed-size chunks
B. Sliding window over a sequence
C. Merging multiple sorted sequences lazily - Coroutines with Generators — Before async/await
- Async Generators — The Modern Evolution
- Cleanup and Resource Management in Generators
- Common Pitfalls and Best Practices
- Full-Scale Runnable Examples
- Exercises to Build Mastery
- Final Summary
1. Why Python Uses Iterators Everywhere
To appreciate iterators, imagine what would happen if Python tried to load every possible sequence into memory upfront:
- Reading a 50GB log file?
- Generating the first 10 million primes?
- Streaming live sensor data that never ends?
If Python always constructed full lists before looping, it would run out of memory within seconds.
Instead, Python uses a model where values can be produced when needed, not stored ahead of time.
This concept — lazy production — allows Python to:
- work with huge datasets
- represent infinite sequences
- build efficient pipelines
- avoid unnecessary memory overhead
This is the exact problem iterators and generators solve.
2. The Iterator Protocol — The Machinery Behind Every Loop
When Python sees a for loop, it does not simply index through the object.
Instead, Python asks:
“Can this object give me values, one at a time, until it’s done?”
The only requirement is that the object must follow a simple protocol.
If an object implements:
- a method called __iter__
- and returns an object that has __next__
then Python considers it an iterator.
Let’s see how the for-loop actually works behind the scenes.
Suppose you write:
for item in [10, 20, 30]:
print(item)
Python internally does something much more explicit:
it = iter([10, 20, 30]) # call the object’s __iter__()
while True:
try:
value = next(it) # call the iterator’s __next__()
except StopIteration:
break # stop when iterator says “I’m done”
print(value)
The loop continues until the iterator decides to end by raising a specific exception: StopIteration.
This design gives the iterator complete control.
It decides:
- how values are produced
- when iteration stops
- what internal state is maintained between values
This becomes crucial later when we build our own iterators.
3. Creating Your Own Iterator — Understanding State
To really understand iterators, you need to build one manually.
Here’s a simple iterator that counts from 1 up to a number:
class CountToN:
def __init__(self, n):
self.n = n
self.current = 0
def __iter__(self):
return self
def __next__(self):
if self.current >= self.n:
raise StopIteration
self.current += 1
return self.current
How does this work?
- When the loop starts, Python calls __iter__, which returns the iterator object itself.
- Each call to __next__ increases the counter and returns the next value.
- When the maximum is reached, Python stops.
Why this matters
Writing iterators manually teaches two critical concepts:
- Iteration requires internal state (here: self.current)
- Iterators are exhausted after use
Try this:
it = CountToN(3)
for x in it:
print(x)
for x in it:
print("Again:", x) # prints nothing — iterator already consumed
Iterators do not automatically restart.
If you want the sequence again, you create a new iterator.
This exact behavior is what motivates the existence of generator functions.
4. Generator Functions — A Simpler Way to Create Iterators
Writing __iter__ and __next__ manually quickly becomes tedious.
Most iterators simply:
- loop over some logic
- produce values along the way
- maintain state between productions
This is exactly what Python’s yield keyword was invented for.
The Core Idea
A generator function looks like a normal function, but instead of returning a value and exiting, it can pause and resume.
Here is the simplest possible generator:
def simple_gen():
yield 1
yield 2
yield 3
Now try:
g = simple_gen()
next(g) # 1
next(g) # 2
next(g) # 3
next(g) # raises StopIteration
Each yield:
- produces a value
- pauses the function
- remembers all local variables
- resumes exactly where it left off
This behavior makes generators perfect for tasks that require:
- sequential computation
- maintenance of internal state
- lazy production of values
In other words, generators are iterators without the boilerplate.
5. How Generators Actually Work Internally
When a generator function is called, Python does not execute it.
Instead, it creates a generator object, which contains:
- the function’s code
- the function’s local variables
- an instruction pointer (where to resume)
Only when next() is called does Python run the code until the next yield.
Additionally, yield is not just “give this value.”
It is an expression.
You can do:
value = yield something
When the generator is resumed, the yield expression evaluates to:
- None if resumed with next()
- a value if resumed with send(value)
This makes generators powerful, even capable of implementing coroutines.
6. Generator Expressions — Compact and Memory Efficient
List comprehensions eagerly build entire lists:
[x*x for x in range(1000000)]
Generator expressions produce values one at a time, avoiding memory blowups:
(x*x for x in range(1000000))
You can pass them directly to loops or functions:
total = sum(x*x for x in range(1000000))
Python will never store all one million values — each square is computed and discarded.
7. Building Streaming Pipelines with Generators
This is where generators show their true power.
Imagine a huge log file — multiple gigabytes.
You want to:
- read it line by line
- filter only the lines with “ERROR”
- parse them
- send them into a database
Doing all this with lists is impossible (you cannot load 10GB into memory).
With generators, it becomes an elegant streaming pipeline.
def read_lines(path):
with open(path) as f:
for line in f:
yield line.rstrip("\n")
lines = read_lines("huge.log")
errors = (l for l in lines if "ERROR" in l)
parsed = (parse(l) for l in errors)
for entry in parsed:
store(entry)
Each stage processes one line at a time, never storing more than what is currently needed.
This is why generators are standard in:
- log processors
- ETL pipelines
- data science workflows
- machine learning preprocessors
- streaming architectures
- servers and async frameworks
8. yield from — Delegating Work to Subgenerators
When generators grow complex, you may need one generator to call another generator.
Before Python 3.3, this required manual looping.
But yield from changed everything:
def inner():
yield 1
yield 2
return "done inner"
def outer():
result = yield from inner()
yield result
When yield from inner() executes:
- all values from inner() are yielded automatically
- send(), throw(), close() pass through
- when inner() returns, its return value is captured
This creates cleaner, more modular generator architectures.
9. Advanced Generator API — send(), throw(), close()
Generators are not just producers; they can receive data and handle signals.
send(value)
Resumes the generator, sending a value into the current yield expression.
throw(exc_type)
Injects an exception where the generator paused.
close()
Asks the generator to terminate gracefully by raising GeneratorExit.
Practical Example: A Running Accumulator
def accumulator():
total = 0
while True:
x = yield total # produce current total, wait for new input
if x is None:
break
total += x
Usage:
g = accumulator()
print(next(g)) # 0
print(g.send(5)) # 5
print(g.send(10)) # 15
g.close()
This pattern is a simplified model of how:
- event processors
- actor systems
- message consumers
operate internally.
10. Generator Return Values — A Hidden but Important Feature
A generator can explicitly return value.
def g():
yield 1
return "finished"
When the generator ends, Python raises StopIteration:
try:
next(gen)
except StopIteration as e:
print(e.value)
Most of the time this is used through yield from:
result = yield from child_generator()
The ability to collect return values allows generators to become building blocks for:
- parser combinators
- state machines
- coroutine-style workflows
11. Why Generators Replace Custom Iterator Classes
Consider the earlier example of an iterator class that returned squares.
Class version:
class Squares:
def __init__(self, n):
self.n = n
self.i = 0
def __iter__(self):
return self
def __next__(self):
if self.i >= self.n:
raise StopIteration
self.i += 1
return self.i * self.i
Generator version:
def squares(n):
for i in range(1, n+1):
yield i * i
The generator version:
- is shorter
- is easier to read
- avoids manual state tracking
- avoids StopIteration boilerplate
- behaves identically
This is why generator functions are the idiomatic way to implement iterators in Python.
12. itertools — Python’s Toolbox of Iterator Recipes
itertools is a standard library module that provides fast, memory-efficient iterator tools.
All functions work lazily — they return values one by one, not all at once.
Think of it as:
Tools to loop smarter, faster, and without wasting memory
Infinite Iterators
🔹 count()
What it is:
An infinite counter that keeps increasing forever.
itertools.count(start=0, step=1)
What it does:
Generates numbers endlessly.
count(5) → 5, 6, 7, 8, ...
Use when:
- You need continuous numbering
- Replacing while True + counter
cycle()
What it is:
Repeats the elements of an iterable again and again.
cycle([1, 2, 3]) → 1, 2, 3, 1, 2, 3, ...
Use when:
- Repeating patterns
- Alternating values (A, B, A, B)
repeat()
What it is:
Repeats the same value multiple times or infinitely.
repeat(7) → 7, 7, 7, ...
repeat(7, 3) → 7, 7, 7
Use when:
- Constant default values
- Filling data
Filtering & Slicing
🔹 islice()
What it is:
Slices an iterator like list slicing, without converting to a list.
islice(iterable, start, stop, step)
Example:
islice(count(10), 5) → 10, 11, 12, 13, 14
Use when:
- Limiting infinite iterators
- Taking a window of data
🔹 takewhile()
What it is:
Takes values while condition is True, then stops.
takewhile(lambda x: x < 5, count())
→ 0, 1, 2, 3, 4
Use when:
- Stop when condition breaks
🔹 dropwhile()
What it is:
Skips values while condition is True, then yields everything after.
dropwhile(lambda x: x < 5, count())
→ 5, 6, 7, 8, ...
Use when:
- Ignoring initial unwanted values
Transformations
🔹 chain()
What it is:
Joins multiple iterables into one continuous iterator.
chain([1, 2], [3, 4]) → 1, 2, 3, 4
Use when:
- Avoiding nested loops
- Combining sequences lazily
🔹 accumulate()
What it is:
Returns running totals (or running results).
accumulate([1, 2, 3, 4])
→ 1, 3, 6, 10
Use when:
- Prefix sums
- Running balances
🔹 groupby()
What it is:
Groups consecutive identical elements.
groupby("AAABBCC")
→ A, A, A | B, B | C, C
⚠️ Important:
- Works only on adjacent items
- Data should usually be sorted first
Use when:
- Chunking repeated data
Utility
🔹 tee()
What it is:
Creates multiple independent iterators from one iterator.
a, b = tee(iterable, 2)
Use when:
- You need to loop over the same iterator twice
- Original iterator can’t be rewound
Uses internal buffering → memory cost
Why itertools is fast
- Written in C
- No intermediate lists
- One value at a time
- Very memory efficient
13. Powerful Iterator/Generator Patterns
A. Processing data in fixed-size chunks
def chunked(iterable, size):
it = iter(iterable)
while True:
chunk = list(itertools.islice(it, size))
if not chunk:
break
yield chunk
Useful for:
- batching database writes
- chunking files
- doing micro-batch processing
B. Sliding Window Over a Sequence
from collections import deque
def sliding_window(iterable, n):
it = iter(iterable)
window = deque(itertools.islice(it, n), maxlen=n)
if len(window) == n:
yield tuple(window)
for x in it:
window.append(x)
yield tuple(window)
This is used heavily in:
- moving averages
- NLP token windows
- signal processing
C. Merging Multiple Sorted Sequences Lazily
import heapq
for value in heapq.merge(seq1, seq2):
print(value)
Efficiently merges without constructing lists.
14. Coroutines with Generators — Before async/await
Before Python introduced async/await, generators served as lightweight coroutines.
Consider:
def grep(pattern):
print("Searching for:", pattern)
try:
while True:
line = (yield)
if pattern in line:
print("Found:", line)
except GeneratorExit:
print("Closing grep")
Usage:
g = grep("ERROR")
next(g)
g.send("INFO: All good")
g.send("ERROR: Something failed")
g.close()
Even though async/await replaced many coroutine patterns, generator-based coroutines remain useful in:
- data processing pipelines
- event simulation
- educational examples
15. Async Generators — The Modern Evolution
Python’s async generators extend the same ideas into the asynchronous world.
async def agen():
for i in range(3):
await asyncio.sleep(1)
yield i
Used in:
async for value in agen():
print(value)
They allow:
- yielding values one at a time
- waiting (await) between yields
Perfect for streaming asynchronous data.
16. Cleanup and Resource Management in Generators
Generators can contain try / finally blocks.
These run when:
- the generator finishes naturally
- the generator is closed manually
- an exception stops the generator
Example:
def gen():
try:
yield 1
yield 2
finally:
print("Cleaning up resources...")
This is especially valuable when working with:
- file handlers
- database cursors
- network connections
17. Common Pitfalls and Best Practices
Pitfalls
- Reusing exhausted iterators
- Expecting list-like behavior
- Using tee on large data streams (can cause hidden memory growth)
- Forgetting to prime a coroutine-style generator before send()
- Mixing blocking code in async generators
Best Practices
- Prefer generator functions over manual iterator classes
- Use yield from to delegate complex work
- Use itertools for optimized performance
- Use generators to build streaming pipelines
- Use try/finally when managing cleanup-sensitive resources
18. Full-Scale Runnable Examples
Reverse Iterator
class Reverse:
def __init__(self, data):
self.data = data
self.index = len(data)
def __iter__(self):
return self
def __next__(self):
if self.index == 0:
raise StopIteration
self.index -= 1
return self.data[self.index]
File Reader in Chunks
def file_chunks(path, chunk_size=8192):
with open(path, "rb") as f:
while True:
chunk = f.read(chunk_size)
if not chunk:
break
yield chunk
Flatten Deeply Nested Lists
def flatten(nested):
for item in nested:
if isinstance(item, (list, tuple)):
yield from flatten(item)
else:
yield item
19. Exercises to Build Mastery
- Write an iterator class PrimeNumbers(limit) that yields primes up to limit.
- Write a generator sliding_max(iterable, n) that yields the maximum of every sliding window.
- Re-implement itertools.pairwise using your own generator.
- Build a mini data pipeline using generator expressions.
20. Final Summary
Iterators and generators are not small features — they form the foundation of how Python processes sequences. They enable Python to work with huge datasets, infinite sequences, lazy computations, and streaming pipelines. They also open the door to coroutine-style programming, powerful modular designs, and efficient memory usage.
Understanding iterators and generators deeply transforms how you write Python code.
They push you to think:
- not in terms of lists, but in terms of streams
- not in terms of loading, but in terms of producing
- not in terms of memory, but in terms of flow
With these tools mastered, you can build highly scalable Python applications that remain elegant, efficient, and expressive.
Next Blog- DECORATORS IN PYTHON
