Dictionaries and Sets

Easy35 min read

Dictionaries

Why Dicts & Sets Matters

The Problem: Looking up data by scanning lists is O(n) — fine for 10 items, disastrous for 10 million.

The Solution: Dicts give O(1) keyed lookup, sets give O(1) membership, and the collections module ships purpose-built variants (Counter, defaultdict, OrderedDict) that eliminate whole classes of bugs.

Real Impact: Switching from list-of-dicts to dict-of-records and from `if x in list` to `if x in set` is often a 100x speedup with no algorithmic effort.

Real-World Analogy

Think of a dict as a filing cabinet and a set as a guest list:

  • Key = the label on the cabinet drawer
  • Value = what's stored inside
  • Set member = a name on the door checked at the entrance
  • Counter = the doorman who tallies who arrived how many times
  • defaultdict = drawers that auto-materialize when you reach for one

Dicts are mutable key-value mappings — Python's hash map. Keys must be hashable (strings, numbers, tuples of hashables). Since Python 3.7 insertion order is preserved.

user = {"name": "Alice", "age": 30}
empty = {}                          # or dict()

# Access
user["name"]                    # 'Alice'
user["email"]                   # KeyError
user.get("email")              # None — safe
user.get("email", "unknown")   # 'unknown' — with default

# Set / update / delete
user["email"] = "[email protected]"
user.update({"age": 31, "role": "admin"})
del user["age"]
user.pop("role", None)            # remove and return value, or None

# Membership
"name" in user                  # True

Iteration

for key in user:                  # keys
    ...
for key, value in user.items():   # both
    ...
for value in user.values():
    ...

Merge with | (Python 3.9+)

defaults = {"timeout": 30, "retries": 3}
user_opts = {"timeout": 60}
final = defaults | user_opts        # {'timeout': 60, 'retries': 3}

Dict Comprehensions

# Map of n -> n²
squares = {n: n * n for n in range(5)}
# {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

# Invert a dict
flipped = {v: k for k, v in original.items()}

# Filter
adults = {name: age for name, age in users.items() if age >= 18}

# From two iterables
keys = ["a", "b", "c"]
values = [1, 2, 3]
d = dict(zip(keys, values))

collections Module

Specialized dict subclasses that solve common problems elegantly.

defaultdict

from collections import defaultdict

# Group items by category — no need to check if key exists
by_first_letter = defaultdict(list)
for word in ["apple", "banana", "avocado", "blueberry"]:
    by_first_letter[word[0]].append(word)
# {'a': ['apple', 'avocado'], 'b': ['banana', 'blueberry']}

# Count occurrences without checks
tally = defaultdict(int)
for ch in "abracadabra":
    tally[ch] += 1

Counter

from collections import Counter

c = Counter("abracadabra")
print(c.most_common(2))           # [('a', 5), ('b', 2)]
print(c["a"])                    # 5
print(c["z"])                    # 0 — never raises

# Combine counters
c1 = Counter(["a", "b"])
c2 = Counter(["a", "c"])
c1 + c2                             # Counter({'a': 2, 'b': 1, 'c': 1})

OrderedDict and ChainMap

from collections import ChainMap

# Layer multiple dicts — first found wins
config = ChainMap(cli_args, env_vars, defaults)
print(config["port"])              # checks cli, then env, then defaults

Sets

Sets are mutable, unordered collections of unique, hashable elements. Use them for membership tests, deduplication, and set algebra.

tags = {"python", "web", "backend"}
empty = set()                       # NOT {} — that's a dict!

# Add / remove
tags.add("api")
tags.remove("web")                # KeyError if missing
tags.discard("missing")           # silent if missing

# Membership — O(1) average
if "python" in tags:
    ...

# Deduplicate a list
unique = list(set([1, 2, 2, 3, 3]))    # order not preserved
# For order-preserving dedup: list(dict.fromkeys(items))

Set Algebra

a = {1, 2, 3}
b = {2, 3, 4}

a | b               # {1, 2, 3, 4} — union
a & b               # {2, 3} — intersection
a - b               # {1} — difference
a ^ b               # {1, 4} — symmetric difference
a <= b              # False — subset check

frozenset — immutable, hashable

permissions = frozenset(["read", "write"])
# frozensets can be dict keys or set members

🎯 Practice Exercises

Exercise 1: Word frequency with Counter

Read a text file and print the 10 most common words (lowercased, punctuation stripped). Use Counter.most_common.

Exercise 2: Group by length

Given a list of words, group them into a dict by word length using defaultdict(list).

Exercise 3: Set operations

Given two lists of email addresses, return: emails in both, emails only in first, emails only in second.

Exercise 4: Invert a dict (handle duplicates)

Write invert(d) where multiple keys can map to the same value. The result should map each value to a list of its original keys.