Dictionaries
Why Dicts & Sets Matters
The Problem: Looking up data by scanning lists is O(n) — fine for 10 items, disastrous for 10 million.
The Solution: Dicts give O(1) keyed lookup, sets give O(1) membership, and the collections module ships purpose-built variants (Counter, defaultdict, OrderedDict) that eliminate whole classes of bugs.
Real Impact: Switching from list-of-dicts to dict-of-records and from `if x in list` to `if x in set` is often a 100x speedup with no algorithmic effort.
Real-World Analogy
Think of a dict as a filing cabinet and a set as a guest list:
- Key = the label on the cabinet drawer
- Value = what's stored inside
- Set member = a name on the door checked at the entrance
- Counter = the doorman who tallies who arrived how many times
- defaultdict = drawers that auto-materialize when you reach for one
Dicts are mutable key-value mappings — Python's hash map. Keys must be hashable (strings, numbers, tuples of hashables). Since Python 3.7 insertion order is preserved.
user = {"name": "Alice", "age": 30}
empty = {} # or dict()
# Access
user["name"] # 'Alice'
user["email"] # KeyError
user.get("email") # None — safe
user.get("email", "unknown") # 'unknown' — with default
# Set / update / delete
user["email"] = "[email protected]"
user.update({"age": 31, "role": "admin"})
del user["age"]
user.pop("role", None) # remove and return value, or None
# Membership
"name" in user # True
Iteration
for key in user: # keys
...
for key, value in user.items(): # both
...
for value in user.values():
...
Merge with | (Python 3.9+)
defaults = {"timeout": 30, "retries": 3}
user_opts = {"timeout": 60}
final = defaults | user_opts # {'timeout': 60, 'retries': 3}
Dict Comprehensions
# Map of n -> n²
squares = {n: n * n for n in range(5)}
# {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}
# Invert a dict
flipped = {v: k for k, v in original.items()}
# Filter
adults = {name: age for name, age in users.items() if age >= 18}
# From two iterables
keys = ["a", "b", "c"]
values = [1, 2, 3]
d = dict(zip(keys, values))
collections Module
Specialized dict subclasses that solve common problems elegantly.
defaultdict
from collections import defaultdict
# Group items by category — no need to check if key exists
by_first_letter = defaultdict(list)
for word in ["apple", "banana", "avocado", "blueberry"]:
by_first_letter[word[0]].append(word)
# {'a': ['apple', 'avocado'], 'b': ['banana', 'blueberry']}
# Count occurrences without checks
tally = defaultdict(int)
for ch in "abracadabra":
tally[ch] += 1
Counter
from collections import Counter
c = Counter("abracadabra")
print(c.most_common(2)) # [('a', 5), ('b', 2)]
print(c["a"]) # 5
print(c["z"]) # 0 — never raises
# Combine counters
c1 = Counter(["a", "b"])
c2 = Counter(["a", "c"])
c1 + c2 # Counter({'a': 2, 'b': 1, 'c': 1})
OrderedDict and ChainMap
from collections import ChainMap
# Layer multiple dicts — first found wins
config = ChainMap(cli_args, env_vars, defaults)
print(config["port"]) # checks cli, then env, then defaults
Sets
Sets are mutable, unordered collections of unique, hashable elements. Use them for membership tests, deduplication, and set algebra.
tags = {"python", "web", "backend"}
empty = set() # NOT {} — that's a dict!
# Add / remove
tags.add("api")
tags.remove("web") # KeyError if missing
tags.discard("missing") # silent if missing
# Membership — O(1) average
if "python" in tags:
...
# Deduplicate a list
unique = list(set([1, 2, 2, 3, 3])) # order not preserved
# For order-preserving dedup: list(dict.fromkeys(items))
Set Algebra
a = {1, 2, 3}
b = {2, 3, 4}
a | b # {1, 2, 3, 4} — union
a & b # {2, 3} — intersection
a - b # {1} — difference
a ^ b # {1, 4} — symmetric difference
a <= b # False — subset check
frozenset — immutable, hashable
permissions = frozenset(["read", "write"])
# frozensets can be dict keys or set members
🎯 Practice Exercises
Exercise 1: Word frequency with Counter
Read a text file and print the 10 most common words (lowercased, punctuation stripped). Use Counter.most_common.
Exercise 2: Group by length
Given a list of words, group them into a dict by word length using defaultdict(list).
Exercise 3: Set operations
Given two lists of email addresses, return: emails in both, emails only in first, emails only in second.
Exercise 4: Invert a dict (handle duplicates)
Write invert(d) where multiple keys can map to the same value. The result should map each value to a list of its original keys.