Python Interview Questions
45 questions — 13 easy · 22 medium · 10 hard
Fundamentals
(10)Lists are ordered, mutable sequences. Tuples are ordered, immutable sequences. Sets are unordered collections of unique elements. Dictionaries are key-value mappings with O(1) lookup.
Performance characteristics matter when choosing: sets and dicts provide O(1) membership testing via hashing, while lists require O(n) scans. Tuples are hashable (when all elements are hashable) and can serve as dict keys or set members, while lists cannot.
my_list = [1, 2, 3] # ordered, mutable, allows duplicates
my_tuple = (1, 2, 3) # ordered, immutable, allows duplicates
my_set = {1, 2, 3} # unordered, mutable, no duplicates
my_dict = {'a': 1, 'b': 2} # key-value pairs, O(1) lookup
Use lists for ordered collections you need to modify. Use tuples for fixed data (coordinates, return values, dict keys). Use sets for membership testing and deduplication. Use dicts for key-value associations. A list cannot be a dictionary key because lists are mutable and therefore unhashable.
Follow-up
Follow-up: Can you use a list as a dictionary key? Why or why not?
== checks value equality via the __eq__ method. is checks identity — whether two references point to the exact same object in memory.
CPython interns small integers (-5 to 256) and certain strings, so is may return True for equal small integers. But this is an implementation detail, not a language guarantee.
a = 256
b = 256
a is b # True in CPython — interned
a = 257
b = 257
a is b # False in most contexts — not interned
a == b # True — value equality
is should only be used for singletons like None, True, and False. The idiomatic pattern is if x is None: rather than if x == None:. Using is for value comparison leads to subtle bugs that depend on CPython internals.
Follow-up
Follow-up: What is CPython's integer interning and why should you never rely on it?
Default arguments are evaluated once at function definition time, not at each call. Mutable defaults like lists or dicts are shared across all calls to the function.
def append_to(item, target=[]):
target.append(item)
return target
append_to(1) # [1]
append_to(2) # [1, 2] — not [2]!
The idiomatic fix uses None as a sentinel:
def append_to(item, target=None):
if target is None:
target = []
target.append(item)
return target
This is one of Python's most well-known gotchas. The behavior exists because default values are attributes of the function object (func.__defaults__). In rare cases, mutable defaults are used intentionally — for example, as a simple cache or memo between calls — but this is generally considered an anti-pattern.
Follow-up
Follow-up: Is there ever a case where a mutable default argument is intentionally useful?
pass is a no-op placeholder — it does nothing and is used where a statement is syntactically required but no action is needed. continue skips to the next iteration of a loop. break exits the loop entirely.
for i in range(10):
if i == 5:
break # exits the loop
if i % 2 == 0:
continue # skips to next iteration
print(i) # prints 1, 3
Python has an else clause on loops that runs only if the loop completes without hitting break. This is a Python-specific feature most developers from other languages don't know:
for item in items:
if item.is_target():
print('Found it')
break
else:
print('Not found') # runs only if break was never hit
The for...else pattern is a clean alternative to using a boolean flag to track whether a loop completed naturally.
Follow-up
Follow-up: What does the else clause on a for loop do?
*args collects extra positional arguments into a tuple. **kwargs collects extra keyword arguments into a dict. They allow functions to accept arbitrary numbers of arguments.
On the caller side, * unpacks iterables and ** unpacks dicts into function arguments.
Python 3.8+ introduced positional-only parameters (before /) and keyword-only parameters (after *):
def example(pos_only, /, normal, *, kw_only):
pass
example(1, 2, kw_only=3) # valid
example(pos_only=1, normal=2) # TypeError — pos_only is positional-only
The full parameter order is: positional-only, /, regular, *args, keyword-only, **kwargs:
def full_example(a, /, b, *args, c, **kwargs):
pass
Positional-only parameters prevent callers from using the parameter name, which lets library authors change internal parameter names without breaking compatibility.
Follow-up
Follow-up: What do the / and * separators in a function signature mean?
Python has four string formatting approaches:
f-strings (Python 3.6+) — the preferred modern approach. Evaluated at runtime, can contain arbitrary expressions:
name = 'Alice'
f'Hello, {name.upper()}!' # 'Hello, ALICE!'
f'{3.14159:.2f}' # '3.14'
.format() method — useful when the template is defined separately from the values:
template = 'Hello, {name}!'
template.format(name='Bob') # 'Hello, Bob!'
% formatting — legacy C-style formatting, still seen in older codebases:
'Hello, %s! You are %d.' % ('Carol', 30)
string.Template — safe for user-supplied templates because it doesn't evaluate expressions:
from string import Template
t = Template('Hello, $name!')
t.substitute(name='Dave') # 'Hello, Dave!'
Prefer f-strings for most cases. Use .format() when templates are stored as data. Use Template when the format string comes from untrusted user input. Avoid % formatting in new code.
Type hints add static type information to Python code without affecting runtime behavior. They serve as documentation and enable static analysis tools like mypy to catch type errors before execution.
def greet(name: str, times: int = 1) -> str:
return (f'Hello, {name}! ' * times).strip()
def find_user(user_id: int) -> dict[str, str] | None:
pass
Common type hint constructs:
from typing import Optional, Union, TypeVar, Generic
from collections.abc import Callable, Sequence
x: list[int] = [1, 2, 3] # generic built-in (3.9+)
y: dict[str, list[int]] = {} # nested generics
fn: Callable[[int, str], bool] # function type
opt: str | None = None # union syntax (3.10+)
opt_legacy: Optional[str] = None # equivalent, older syntax
mypy is a static type checker that analyzes code without running it. It catches type mismatches, missing return types, incorrect argument types, and None safety issues:
$ mypy app.py
app.py:5: error: Argument 1 to "greet" has incompatible type "int"; expected "str"
Optional[str] and str | None are semantically identical. The | syntax (PEP 604, Python 3.10+) is preferred in modern code for readability. Type hints are not enforced at runtime — they require tools like mypy, pyright, or IDE integration.
Follow-up
Follow-up: What is the difference between Optional[str] and str | None?
Dataclasses (Python 3.7+) automatically generate __init__, __repr__, __eq__, and other boilerplate methods based on class attributes with type annotations:
from dataclasses import dataclass, field
@dataclass
class Point:
x: float
y: float
label: str = 'origin'
tags: list[str] = field(default_factory=list)
p = Point(1.0, 2.0)
print(p) # Point(x=1.0, y=2.0, label='origin', tags=[])
p == Point(1.0, 2.0) # True — __eq__ compares all fields
Use dataclasses when your class is primarily a data container. Use regular classes when you need complex initialization logic, custom __init__ signatures, or heavy behavioral methods.
Key options:
frozen=True— makes instances immutable (enables hashing, prevents accidental mutation)slots=True(Python 3.10+) — generates__slots__for lower memory usageorder=True— generates comparison methods (__lt__,__le__, etc.)
@dataclass(frozen=True, slots=True)
class Coordinate:
lat: float
lon: float
c = Coordinate(52.23, 21.01)
c.lat = 0 # FrozenInstanceError
{c} # works — frozen dataclasses are hashable
field(default_factory=list) avoids the mutable default argument pitfall by creating a new list for each instance.
Follow-up
Follow-up: How do frozen=True and slots=True options change behavior?
The walrus operator (:=), introduced in Python 3.8 (PEP 572), assigns a value to a variable as part of an expression. It eliminates the need for separate assignment and condition lines.
# Without walrus operator
line = input()
while line != 'quit':
process(line)
line = input()
# With walrus operator
while (line := input()) != 'quit':
process(line)
Common use cases:
# Filtering with computed value
results = [
stripped
for line in lines
if (stripped := line.strip())
]
# Regex matching
import re
if m := re.match(r'(\d+)-(\w+)', text):
number, word = m.groups()
# Avoiding redundant function calls
if (n := len(data)) > 10:
print(f'Processing {n} items')
Avoid the walrus operator when:
- The expression is already simple enough without it
- Nesting makes the line hard to read
- It's used in a context where side effects are confusing
# Too clever — just use two lines
result = (x := expensive()) if (y := check(x)) else default
The walrus operator is most valuable in while loops, comprehension filters, and if statements where you need both the test and the value.
Follow-up
Follow-up: When should you avoid using the walrus operator for readability?
Virtual environments isolate project dependencies, preventing conflicts between projects that need different versions of the same package.
Built-in venv:
python -m venv .venv
source .venv/bin/activate # Linux/macOS
.venv\Scripts\activate # Windows
pip install requests
pip freeze > requirements.txt
Modern tools:
pip— standard package installer, usesrequirements.txtpoetry— dependency management with lock files andpyproject.tomluv— ultra-fast Rust-based pip/venv replacementpipx— install CLI tools in isolated environmentsconda— popular for data science, manages non-Python dependencies too
Package configuration evolution:
setup.py — legacy, imperative configuration (still used but discouraged for new projects)
requirements.txt — flat list of pinned dependencies, no metadata
pyproject.toml — modern standard (PEP 621), declarative configuration:
[project]
name = "myapp"
version = "1.0.0"
dependencies = [
"requests>=2.28",
"pydantic>=2.0",
]
[project.optional-dependencies]
dev = ["pytest", "mypy"]
Best practices: always use virtual environments, pin dependencies with lock files, prefer pyproject.toml for new projects, and separate production and development dependencies.
Follow-up
Follow-up: What is the difference between requirements.txt, pyproject.toml, and setup.py?
OOP
(6)Python uses C3 linearization to determine the Method Resolution Order — the sequence in which base classes are searched when looking up a method. This algorithm ensures each class appears exactly once and respects the order in which parent classes are listed.
The diamond problem occurs when a class inherits from two classes that share a common ancestor:
class A:
def method(self):
return 'A'
class B(A):
def method(self):
return 'B'
class C(A):
def method(self):
return 'C'
class D(B, C):
pass
D().method() # 'B' — MRO is D -> B -> C -> A
C3 linearization guarantees that B is checked before C (left-to-right) and that A appears only once at the end. You can inspect the MRO using D.__mro__ or D.mro(). Understanding MRO is essential for super() calls in cooperative multiple inheritance — super() follows the MRO, not the direct parent.
Follow-up
Follow-up: How can you inspect the MRO of a class at runtime?
Instance methods receive self as the first argument and operate on the instance. Class methods receive cls and operate on the class itself. Static methods receive neither and are essentially namespaced functions.
class Date:
def __init__(self, year, month, day):
self.year = year
self.month = month
self.day = day
def display(self): # instance method
return f'{self.year}-{self.month}-{self.day}'
@classmethod
def from_string(cls, s): # factory method
year, month, day = map(int, s.split('-'))
return cls(year, month, day)
@staticmethod
def is_valid(s): # utility function
parts = s.split('-')
return len(parts) == 3
date = Date.from_string('2026-01-15')
Use @classmethod for factory methods and alternate constructors — they work correctly with inheritance because cls refers to the subclass. Use @staticmethod for utility functions that logically belong to the class but don't need access to instance or class state. Use regular methods for everything that operates on instance data.
Follow-up
Follow-up: Give an example of a factory method using @classmethod.
Magic methods (dunder methods) are special methods with double-underscore names that Python calls implicitly in response to operations. Key ones include:
__init__— constructor__repr__— unambiguous string for developers, used byrepr()__str__— readable string for users, used bystr()andprint()__eq__/__hash__— equality and hashing__len__—len()support__iter__/__next__— iteration protocol__enter__/__exit__— context manager protocol__getattr__/__setattr__— attribute access hooks
class Point:
def __init__(self, x, y):
self.x, self.y = x, y
def __repr__(self):
return f'Point({self.x}, {self.y})'
def __eq__(self, other):
return self.x == other.x and self.y == other.y
def __hash__(self):
return hash((self.x, self.y))
Defining __eq__ makes instances unhashable by default (Python sets __hash__ to None) unless you also define __hash__. This prevents subtle bugs when putting custom objects in sets or using them as dict keys after overriding equality.
Follow-up
Follow-up: What happens to __hash__ when you define __eq__?
Abstract Base Classes enforce interface contracts at instantiation time — you cannot create an instance of a class that doesn't implement all required abstract methods. Duck typing relies on runtime behavior: if an object has the right methods, it works.
from abc import ABC, abstractmethod
class Shape(ABC):
@abstractmethod
def area(self):
pass
@abstractmethod
def perimeter(self):
pass
class Circle(Shape):
def __init__(self, radius):
self.radius = radius
def area(self):
return 3.14159 * self.radius ** 2
def perimeter(self):
return 2 * 3.14159 * self.radius
Shape() # TypeError: Can't instantiate abstract class
Circle(5) # works fine
ABCs are worthwhile in large codebases and public APIs where explicit contracts prevent integration errors. Duck typing is sufficient for smaller codebases and internal code where flexibility matters more than safety. The collections.abc module provides standard ABCs like Iterable, Mapping, and Sequence that you can use for type checking and isinstance() tests.
Follow-up
Follow-up: When are ABCs worthwhile vs when is duck typing sufficient?
Descriptors are objects that define __get__, __set__, or __delete__ methods. They power Python's attribute access protocol and are the mechanism behind property, classmethod, and staticmethod.
There are two types:
- Data descriptors define
__set__or__delete__(e.g.,property) - Non-data descriptors define only
__get__(e.g.,classmethod,staticmethod, regular functions)
The lookup order matters: data descriptors take priority over instance __dict__, which takes priority over non-data descriptors.
class Validated:
def __set_name__(self, owner, name):
self.name = name
def __get__(self, obj, objtype=None):
return obj.__dict__.get(self.name)
def __set__(self, obj, value):
if not isinstance(value, int):
raise TypeError(f'{self.name} must be int')
obj.__dict__[self.name] = value
class Order:
quantity = Validated()
o = Order()
o.quantity = 5 # works
o.quantity = 'x' # TypeError: quantity must be int
Understanding descriptors explains how Python's entire attribute access system works under the hood.
__slots__ replaces the per-instance __dict__ with a fixed set of attribute slots, reducing memory footprint significantly for classes with many instances — roughly 40-50% for simple objects.
class Point:
__slots__ = ('x', 'y')
def __init__(self, x, y):
self.x = x
self.y = y
p = Point(1, 2)
p.z = 3 # AttributeError: 'Point' object has no attribute 'z'
Use __slots__ when you're creating millions of instances of a class and memory is a concern — for example, data points in a scientific application or nodes in a graph.
Trade-offs:
- No dynamic attribute assignment — you can only use declared attributes
- Complications with multiple inheritance — both parent classes need compatible
__slots__ __slots__doesn't inherit automatically — subclasses get__dict__unless they also define__slots__- Cannot use
__slots__with__dict__unless you explicitly include'__dict__'in slots - Slightly faster attribute access due to descriptor-based lookup instead of dict lookup
Follow-up
Follow-up: What are the trade-offs and limitations of using __slots__?
Patterns
(9)A decorator is a function that takes a function and returns a modified function. It's syntactic sugar for func = decorator(func).
import functools
import time
def timer(func):
@functools.wraps(func)
def wrapper(*args, **kwargs):
start = time.perf_counter()
result = func(*args, **kwargs)
elapsed = time.perf_counter() - start
print(f'{func.__name__} took {elapsed:.4f}s')
return result
return wrapper
@timer
def slow_function():
time.sleep(1)
slow_function() # slow_function took 1.0012s
functools.wraps copies the original function's metadata (__name__, __doc__, __module__, etc.) to the wrapper. Without it, introspection and debugging tools see the wrapper's name instead of the original function's name. This matters for logging, documentation generation, and frameworks that inspect function metadata.
Follow-up
Follow-up: Why is functools.wraps important and what happens if you omit it?
A decorator with arguments requires a three-level nested function pattern. The outermost function accepts the decorator arguments and returns the actual decorator.
import functools
import time
def retry(max_attempts=3, delay=1):
def decorator(func):
@functools.wraps(func)
def wrapper(*args, **kwargs):
for attempt in range(max_attempts):
try:
return func(*args, **kwargs)
except Exception:
if attempt == max_attempts - 1:
raise
time.sleep(delay)
return wrapper
return decorator
@retry(max_attempts=5, delay=0.5)
def fetch_data():
pass
The three levels are necessary because @retry(max_attempts=3) first calls retry(max_attempts=3), which returns decorator. Then Python applies decorator to the function, which returns wrapper. The outer call is evaluated at decoration time, the middle level receives the function, and the inner level handles each call. Candidates who implement this without hesitation have strong closure fundamentals.
Follow-up
Follow-up: Why does @retry(max_attempts=3) require three levels of nesting?
List comprehensions build the entire list in memory. Generators yield items lazily, one at a time, using constant memory regardless of the data size.
squares_list = [x**2 for x in range(1_000_000)] # ~8MB in memory
squares_gen = (x**2 for x in range(1_000_000)) # negligible memory
Choose generators when:
- Processing large datasets that don't fit in memory
- Working with pipelines where you only need one item at a time
- The consumer might stop early (short-circuiting)
Choose list comprehensions when:
- You need random access or multiple passes over the data
- The dataset is small enough to fit in memory
- You need
len(), indexing, or slicing
Generators are single-use — once exhausted, they cannot be iterated again. Lists support unlimited re-iteration. A common mistake is assigning a generator to a variable and trying to iterate it twice, getting an empty result the second time.
yield from delegates to another iterable, forwarding values, send() calls, and exceptions transparently. Without it, you need an explicit loop with manual send()/throw() forwarding.
def flatten(nested):
for item in nested:
if isinstance(item, list):
yield from flatten(item)
else:
yield item
list(flatten([1, [2, [3, 4], 5]])) # [1, 2, 3, 4, 5]
yield from is more than just for x in iterable: yield x. It also:
- Forwards
send()values to the sub-generator - Propagates exceptions into the sub-generator
- Captures the sub-generator's return value (via
StopIteration.value)
def accumulate():
total = 0
while True:
value = yield total
if value is None:
return total
total += value
def main():
result = yield from accumulate()
print(f'Total: {result}')
This makes yield from essential for composing complex generator pipelines and coroutine delegation.
The iterator protocol requires two methods: __iter__ (returns the iterator object) and __next__ (returns the next value or raises StopIteration).
An iterable has __iter__ and returns an iterator. An iterator has both __iter__ (returns self) and __next__. Iterators return self from __iter__ so they work in for loops whether you pass the iterable or the iterator.
class Countdown:
def __init__(self, start):
self.start = start
def __iter__(self):
return CountdownIterator(self.start)
class CountdownIterator:
def __init__(self, current):
self.current = current
def __iter__(self):
return self
def __next__(self):
if self.current <= 0:
raise StopIteration
self.current -= 1
return self.current + 1
for n in Countdown(3):
print(n) # 3, 2, 1
Separating the iterable from the iterator allows multiple independent iterations over the same data. If the class were its own iterator, you couldn't iterate it twice concurrently.
Follow-up
Follow-up: What is the difference between an iterable and an iterator?
Context managers implement the __enter__ and __exit__ protocol, used with with statements for resource management.
Class-based approach:
class ManagedFile:
def __init__(self, path, mode):
self.path = path
self.mode = mode
def __enter__(self):
self.file = open(self.path, self.mode)
return self.file
def __exit__(self, exc_type, exc_val, exc_tb):
self.file.close()
return False # don't suppress exceptions
Decorator-based approach using contextlib:
from contextlib import contextmanager
@contextmanager
def managed_file(path, mode):
f = open(path, mode)
try:
yield f
finally:
f.close()
Returning True from __exit__ suppresses the exception — the with block won't propagate it. This is rarely used but useful for specific error-handling patterns. The try/finally in the generator approach ensures cleanup happens even if the body raises an exception.
Follow-up
Follow-up: What does returning True from __exit__ do?
BaseException is the root of all exceptions. Exception inherits from it. KeyboardInterrupt and SystemExit inherit directly from BaseException, not Exception.
This design is intentional: except Exception won't catch Ctrl+C (KeyboardInterrupt) or sys.exit() (SystemExit), allowing programs to shut down cleanly.
BaseException
├── SystemExit
├── KeyboardInterrupt
├── GeneratorExit
└── Exception
├── StopIteration
├── ValueError
├── TypeError
├── KeyError
├── OSError
└── ...
Best practices:
- Catch specific exceptions, not broad ones
- Never use bare
except:— it catchesKeyboardInterruptandSystemExit except Exception: passsilently hides bugs — always log or re-raise- Use
except Exception as e:to capture the exception for logging
try:
result = process(data)
except ValueError as e:
logger.warning(f'Invalid data: {e}')
result = default_value
Writing except Exception without re-raising also catches StopIteration, which can silently break generator-based code.
Follow-up
Follow-up: Why should you almost never write except Exception without re-raising?
Python has four comprehension types, each providing concise syntax for creating collections:
squares = [x**2 for x in range(10)] # list
evens = {x for x in range(10) if x % 2 == 0} # set
mapping = {x: x**2 for x in range(5)} # dict
lazy = (x**2 for x in range(10)) # generator expression
Comprehensions become unreadable when nested or when they combine multiple conditions:
# Hard to read — use a regular loop instead
result = [
transform(x, y)
for x in range(10)
if x > 3
for y in range(x)
if y % 2 == 0
]
Nested comprehensions read inside-out, which confuses most developers. A good rule: if a comprehension needs more than one for clause or more than one condition, use an explicit loop.
Generator expressions use parentheses and are lazy — they don't build the entire collection in memory. When passed directly as the sole argument to a function, the extra parentheses can be omitted: sum(x**2 for x in range(10)).
Python 3.10 introduced structural pattern matching (PEP 634) via match/case statements. Unlike switch in C or Java, Python's match does structural decomposition — it can match and unpack complex data structures.
def handle_command(command):
match command:
case {'action': 'move', 'direction': d}:
print(f'Moving {d}')
case {'action': 'attack', 'target': t, 'weapon': w}:
print(f'Attacking {t} with {w}')
case {'action': 'quit'}:
print('Goodbye')
case _:
print('Unknown command')
Pattern types:
match value:
case 42: # literal pattern
pass
case str(s): # class pattern with capture
pass
case [x, y, *rest]: # sequence pattern with star
pass
case {'key': v}: # mapping pattern
pass
case Point(x=0, y=y): # class pattern — matches if x==0
pass
case x if x > 0: # guard clause
pass
case int() | float(): # OR pattern
pass
Key differences from switch:
- No fall-through — each
caseis independent - Patterns destructure and bind variables simultaneously
- Guards (
ifclauses) enable conditional matching - Works with custom classes via
__match_args__
Pattern matching is most useful for parsing commands, handling protocol messages, and processing ASTs — anywhere you need to match and decompose structured data.
Follow-up
Follow-up: What is structural pattern matching and how does it destructure complex objects?
Concurrency
(5)The GIL is a mutex in CPython that prevents multiple native threads from executing Python bytecode simultaneously. It exists because CPython's memory management — specifically reference counting — is not thread-safe. Without the GIL, simple operations like incrementing a reference count could corrupt memory in a multi-threaded program.
The GIL simplifies CPython's implementation and makes C extension development easier, since extensions don't need to worry about thread-safe reference counting.
The GIL only affects CPU-bound threads. I/O-bound threads release the GIL during system calls (network requests, file reads, sleep), allowing true concurrency for I/O workloads.
import threading
import time
def io_bound():
time.sleep(1) # GIL is released during sleep
threads = [threading.Thread(target=io_bound) for _ in range(10)]
for t in threads: t.start()
for t in threads: t.join() # completes in ~1 second, not 10
The GIL does NOT prevent all race conditions. Operations that span multiple bytecodes (like counter += 1, which is LOAD, ADD, STORE) can still be interleaved between threads.
Follow-up
Follow-up: Does the GIL prevent all race conditions in Python?
Each concurrency model maps to a specific workload type:
threading — I/O-bound work where the GIL is released during system calls. Good for network requests, file I/O, database queries. Threads share memory, making data sharing easy but requiring locks for thread safety.
multiprocessing — CPU-bound work that needs true parallelism. Each process has its own GIL, so they run on separate cores. Trade-off: higher memory overhead (full process per worker) and data must be serialized (pickled) to pass between processes.
asyncio — high-concurrency I/O-bound work with a single-threaded event loop. Excellent for thousands of concurrent connections (web servers, crawlers). Lower overhead than threads, but requires async/await syntax throughout, and a single blocking call stalls everything.
# I/O-bound: use threading or asyncio
import concurrent.futures
with concurrent.futures.ThreadPoolExecutor() as pool:
results = pool.map(fetch_url, urls)
# CPU-bound: use multiprocessing
with concurrent.futures.ProcessPoolExecutor() as pool:
results = pool.map(heavy_computation, data)
For most applications, concurrent.futures provides a clean high-level API that works with both threads and processes.
Follow-up
Follow-up: What are the trade-offs of each approach in terms of memory, complexity, and debugging?
import asyncio
import aiohttp
async def fetch_url(session, url):
async with session.get(url) as response:
return await response.text()
async def fetch_all(urls):
async with aiohttp.ClientSession() as session:
tasks = [fetch_url(session, url) for url in urls]
return await asyncio.gather(*tasks)
results = asyncio.run(fetch_all([
'https://example.com',
'https://example.org',
]))
asyncio.gather launches all tasks concurrently and collects results in order. By default, if one task raises an exception, the others continue running but gather propagates the first exception.
asyncio.TaskGroup (Python 3.11+) provides structured concurrency — if any task fails, all remaining tasks in the group are cancelled:
async def fetch_all(urls):
async with aiohttp.ClientSession() as session:
async with asyncio.TaskGroup() as tg:
tasks = [tg.create_task(fetch_url(session, url)) for url in urls]
return [t.result() for t in tasks]
async with is needed because aiohttp.ClientSession is an async context manager. Using a regular with would not properly clean up the session.
Follow-up
Follow-up: What is the difference between asyncio.gather and asyncio.TaskGroup?
A race condition occurs when the outcome of a program depends on the timing of thread execution. Even with the GIL, race conditions exist in Python because the GIL only guarantees atomic bytecode execution, not atomic compound operations.
import threading
counter = 0
def increment():
global counter
for _ in range(100_000):
counter += 1 # NOT atomic: LOAD_GLOBAL, ADD, STORE_GLOBAL
threads = [threading.Thread(target=increment) for _ in range(4)]
for t in threads: t.start()
for t in threads: t.join()
print(counter) # Less than 400,000 — race condition!
counter += 1 involves three bytecodes: LOAD, ADD, STORE. The GIL can release between any of them, causing threads to overwrite each other's increments.
Solutions:
threading.Lock()for mutual exclusionqueue.Queuefor thread-safe producer-consumer patternsasyncio.Lock()for async code- Atomic operations from
threading(e.g.,Event,Semaphore) - Redesigning with immutable data or message passing
lock = threading.Lock()
def safe_increment():
global counter
with lock:
counter += 1
Follow-up
Follow-up: Give an example of a race condition that exists despite the GIL.
asyncio.run() creates a new event loop, runs a coroutine to completion, and closes the loop. It's the entry point for async programs — typically called once from synchronous code:
async def main():
result = await some_async_function()
print(result)
asyncio.run(main())
asyncio.create_task() schedules a coroutine for concurrent execution within an existing event loop. The task starts running as soon as the current coroutine yields control:
async def main():
task1 = asyncio.create_task(fetch('url1'))
task2 = asyncio.create_task(fetch('url2'))
result1 = await task1
result2 = await task2
await suspends the current coroutine until the awaitable completes. It doesn't create tasks or start concurrent execution — it just waits.
The key distinction: await coroutine() runs it sequentially. create_task(coroutine()) followed by await task runs it concurrently. Without create_task, multiple await calls execute one after another, losing the benefit of async.
Memory & Internals
(5)CPython uses reference counting as its primary garbage collection mechanism. Every object has a count of references pointing to it. When the count drops to zero, the object is immediately deallocated.
Reference counting alone cannot handle circular references — when objects reference each other, their counts never reach zero:
class Node:
def __init__(self):
self.ref = None
a = Node()
b = Node()
a.ref = b
b.ref = a
del a, b # reference counts are still 1 — not collected!
Python's generational garbage collector supplements reference counting by periodically scanning for reference cycles. It uses three generations (0, 1, 2) based on object age. Young objects are scanned more frequently because most objects are short-lived.
You can interact with the collector via the gc module:
import gc
gc.collect() # force a collection
gc.disable() # disable automatic collection
gc.get_threshold() # (700, 10, 10) — default thresholds
gc.get_referrers(obj) # find what references an object
Use gc.collect() explicitly after deleting large data structures in memory-sensitive applications. Use weakref to break cycles without preventing garbage collection.
Follow-up
Follow-up: How can you interact with the garbage collector programmatically?
Interning is an optimization where CPython reuses existing objects instead of creating new ones for certain values. This saves memory and speeds up comparisons (identity checks are faster than value comparisons).
Integer interning: CPython pre-creates and caches integers from -5 to 256. Any variable assigned a value in this range points to the same object:
a = 256
b = 256
a is b # True — same object
a = 257
b = 257
a is b # False (in most contexts) — different objects
String interning: CPython automatically interns strings that look like identifiers (alphanumeric characters and underscores). You can manually intern strings with sys.intern():
import sys
a = 'hello'
b = 'hello'
a is b # True — automatically interned
a = 'hello world'
b = 'hello world'
a is b # False — contains space, not auto-interned
a = sys.intern('hello world')
b = sys.intern('hello world')
a is b # True — manually interned
These are CPython implementation details, not language guarantees. Code should never rely on interning behavior — always use == for value comparison, reserving is for None checks.
Every regular Python object has a __dict__ dictionary that stores its instance attributes. Classes also have their own __dict__ for class-level attributes and methods.
Attribute lookup follows a specific chain:
- Data descriptors on the class (and its MRO) — objects with
__get__and__set__ - Instance
__dict__— the object's own attributes - Non-data descriptors and class attributes — objects with only
__get__, or plain class variables __getattr__— called as a fallback if defined, only when normal lookup fails
class MyClass:
class_attr = 'class level'
def __init__(self):
self.instance_attr = 'instance level'
def __getattr__(self, name):
return f'fallback for {name}'
obj = MyClass()
obj.instance_attr # 'instance level' — from obj.__dict__
obj.class_attr # 'class level' — from MyClass.__dict__
obj.anything # 'fallback for anything' — __getattr__ fallback
Note the distinction: __getattr__ is called only when normal lookup fails, while __getattribute__ is called for every attribute access and can override the entire lookup chain. Understanding this protocol explains how property, classmethod, and custom descriptors work.
copy.copy() creates a new object but inserts references to the same nested objects. copy.deepcopy() recursively copies everything, creating fully independent copies at every level.
import copy
original = [[1, 2], [3, 4]]
shallow = copy.copy(original)
deep = copy.deepcopy(original)
shallow[0].append(5)
print(original[0]) # [1, 2, 5] — shared reference!
print(deep[0]) # [1, 2] — independent copy
Shallow copy is sufficient when:
- The structure is flat (no nested mutables)
- All nested elements are immutable (strings, tuples, frozensets)
Deep copy is needed when:
- Nested mutable objects exist and must be independently modifiable
- You're creating snapshots of complex state
deepcopy handles circular references by maintaining a memo dictionary that tracks already-copied objects, preventing infinite recursion. You can customize copying behavior by defining __copy__ and __deepcopy__ methods on your classes.
Follow-up
Follow-up: How does deepcopy handle circular references?
The weakref module provides references that don't prevent garbage collection. A weak reference to an object doesn't increment its reference count, so the object can be collected when no strong references remain.
import weakref
class ExpensiveObject:
def __init__(self, name):
self.name = name
obj = ExpensiveObject('data')
weak = weakref.ref(obj)
print(weak()) # <ExpensiveObject ...> — object still alive
del obj
print(weak()) # None — object was garbage collected
Use cases:
- Caching — objects stay cached only while used elsewhere, with
WeakValueDictionary - Observer patterns — observers don't keep subjects alive
- Avoiding circular reference leaks — break cycles without preventing collection
cache = weakref.WeakValueDictionary()
def get_or_create(key):
obj = cache.get(key)
if obj is None:
obj = ExpensiveObject(key)
cache[key] = obj
return obj
WeakSet is useful for tracking all instances of a class without preventing their collection. Note that not all objects support weak references — built-in types like int, str, and tuple cannot be weakly referenced.
Coding Challenges
(5)An LRU (Least Recently Used) cache evicts the oldest unused entry when full. Python 3.7+ guarantees dict insertion order, which simplifies the implementation:
class LRUCache:
def __init__(self, capacity):
self.capacity = capacity
self.cache = {}
def get(self, key):
if key not in self.cache:
return -1
self.cache[key] = self.cache.pop(key)
return self.cache[key]
def put(self, key, value):
if key in self.cache:
self.cache.pop(key)
elif len(self.cache) >= self.capacity:
oldest = next(iter(self.cache))
del self.cache[oldest]
self.cache[key] = value
Both get and put are O(1) operations. The trick is that pop followed by re-insertion moves the key to the end of the ordered dict.
Alternatively, collections.OrderedDict provides move_to_end():
from collections import OrderedDict
class LRUCache(OrderedDict):
def __init__(self, capacity):
self.capacity = capacity
def get(self, key):
if key not in self:
return -1
self.move_to_end(key)
return self[key]
def put(self, key, value):
if key in self:
self.move_to_end(key)
self[key] = value
if len(self) > self.capacity:
self.popitem(last=False)
For thread safety, wrap operations with threading.Lock(). For production use, consider functools.lru_cache which handles all of this.
Follow-up
Follow-up: How would you make this thread-safe? What is the time complexity?
Use sorted characters as a hash key to group words that are anagrams of each other:
from collections import defaultdict
def group_anagrams(words):
groups = defaultdict(list)
for word in words:
key = tuple(sorted(word.lower()))
groups[key].append(word)
return list(groups.values())
group_anagrams(['eat', 'tea', 'tan', 'ate', 'nat', 'bat'])
# [['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']]
The tuple(sorted(word)) approach has O(k log k) per word where k is the word length. For very long strings, a character frequency tuple is O(k):
def group_anagrams_fast(words):
groups = defaultdict(list)
for word in words:
counts = [0] * 26
for c in word.lower():
counts[ord(c) - ord('a')] += 1
groups[tuple(counts)].append(word)
return list(groups.values())
Using defaultdict over manual key checking is idiomatic Python. The overall complexity is O(n * k log k) for the sorted approach or O(n * k) for the frequency approach, where n is the number of words.
Follow-up
Follow-up: What alternative key strategy would work better for very long strings?
This combines decorators, closures, and caching — three intermediate concepts in one problem:
import functools
import time
def memoize(ttl=None):
def decorator(func):
cache = {}
@functools.wraps(func)
def wrapper(*args):
now = time.time()
if args in cache:
result, timestamp = cache[args]
if ttl is None or now - timestamp < ttl:
return result
result = func(*args)
cache[args] = (result, now)
return result
return wrapper
return decorator
@memoize(ttl=60)
def expensive_query(query_id):
pass
The TTL expiration check ensures stale entries are recomputed. The cache key is the args tuple, which works because tuples are hashable.
**kwargs makes caching harder because dicts are not hashable and can't be used as cache keys. Solutions include converting kwargs to a frozen set of items: key = (args, frozenset(kwargs.items())). However, this breaks for kwargs containing mutable values.
For production use, functools.lru_cache handles most cases. For TTL support, consider cachetools.TTLCache. For thread safety, add a threading.Lock around cache access.
Follow-up
Follow-up: Why does **kwargs make caching harder?
This tests the context manager pattern applied to temporary state changes — a common real-world pattern:
import os
from contextlib import contextmanager
@contextmanager
def change_dir(path):
original = os.getcwd()
os.chdir(path)
try:
yield
finally:
os.chdir(original)
with change_dir('/tmp'):
print(os.getcwd()) # /tmp
print(os.getcwd()) # back to original
The key elements are:
- Store original state before making changes
try/finallyguarantees cleanup even if the body raises an exceptionyieldwithout a value since the user doesn't need a reference
This pattern applies broadly to any temporary state change: environment variables, database transactions, monkey-patching, locale settings. The finally block is critical — without it, an exception in the with body would leave the process in the wrong directory.
A class-based version would store original in __enter__ and restore in __exit__, with the same try/finally guarantee built into the protocol.
A pipeline chains functions where each function's output feeds into the next function's input:
from functools import reduce
def pipeline(*funcs):
def apply(value):
return reduce(lambda acc, fn: fn(acc), funcs, value)
return apply
process = pipeline(
str.strip,
str.lower,
lambda s: s.replace(' ', '_'),
)
process(' Hello World ') # 'hello_world'
For error handling, wrap each step:
def safe_pipeline(*funcs):
def apply(value):
for fn in funcs:
try:
value = fn(value)
except Exception as e:
raise ValueError(
f'Pipeline failed at {fn.__name__}: {e}'
) from e
return value
return apply
For lazy evaluation, use generators:
def lazy_pipeline(*funcs):
def apply(iterable):
for fn in funcs:
iterable = map(fn, iterable)
return iterable
return apply
This pattern appears in data processing frameworks, middleware stacks, and build tools. Understanding function composition and reduce shows comfort with functional programming concepts.
Follow-up
Follow-up: How would you add error handling or lazy evaluation to this pipeline?
Pitfalls
(5)The output is 8, 8, 8, 8, 8 — not 0, 2, 4, 6, 8 as most developers expect.
This is Python's most infamous closure gotcha. The lambdas all capture the same variable i by reference, not by value. By the time any lambda is called, the loop has completed and i has the value 4. So every lambda computes x * 4.
The fix is to capture i as a default argument, which evaluates at definition time:
def create_multipliers():
return [lambda x, i=i: x * i for i in range(5)]
for multiplier in create_multipliers():
print(multiplier(2)) # 0, 2, 4, 6, 8
Alternatively, use functools.partial:
from functools import partial
def multiply(x, i):
return x * i
def create_multipliers():
return [partial(multiply, i=i) for i in range(5)]
This is the same late-binding closure issue that exists in JavaScript and other languages with closures. Candidates who recognize it instantly have debugged this in production code.
Follow-up
def create_multipliers():
return [lambda x: x * i for i in range(5)]
for multiplier in create_multipliers():
print(multiplier(2))
settings is a class attribute, shared across all instances. Mutating it via any instance affects all of them because self.settings[key] = value modifies the existing dict object — it doesn't create a new instance attribute.
a = Config()
b = Config()
a.settings is b.settings # True — same dict object
The fix is to initialize mutable attributes in __init__ so each instance gets its own copy:
class Config:
def __init__(self):
self.settings = {} # instance attribute — unique per instance
def set(self, key, value):
self.settings[key] = value
a = Config()
b = Config()
a.set('debug', True)
print(b.settings) # {} — independent copy
This is the class-level version of the mutable default argument gotcha. The underlying principle is the same: mutable objects defined at the class level (or as default arguments) are shared references. Immutable class attributes (strings, numbers, tuples) don't have this problem because they can't be mutated in place.
Follow-up
class Config:
settings = {}
def set(self, key, value):
self.settings[key] = value
a = Config()
b = Config()
a.set('debug', True)
print(b.settings) # {'debug': True}
This silently swallows every exception — including unexpected ones like TypeError from a bug in process(), MemoryError, or StopIteration from generator code. The program continues with undefined state, making bugs extremely difficult to diagnose.
Problems with except Exception: pass:
- Hides genuine bugs by catching exceptions you didn't anticipate
- Makes debugging nearly impossible — no error message, no traceback
- Can mask
StopIteration, silently breaking generator-based code - The program continues in an unknown state after the failure
The fix is to catch specific exceptions and handle them explicitly:
import logging
try:
data = fetch_remote_data()
result = process(data)
except ConnectionError:
logging.warning('Failed to fetch data, using cached version')
result = get_cached_data()
except ValueError as e:
logging.error(f'Invalid data format: {e}')
raise
If you genuinely need to catch all exceptions (rare), always log the error:
except Exception:
logging.exception('Unexpected error during processing')
raise # re-raise after logging
Never use bare except: — it catches KeyboardInterrupt and SystemExit, preventing clean shutdown.
Follow-up
try:
data = fetch_remote_data()
result = process(data)
except Exception:
pass
This is a circular import. When module_a starts loading, it tries to import helper_b from module_b. Python starts executing module_b, which tries to import helper_a from module_a. But module_a hasn't finished loading yet — helper_a hasn't been defined — so the import fails with ImportError.
Python's import machinery caches modules in sys.modules as they load. During circular imports, a partially-loaded module is returned, which may not have all its attributes defined yet.
Solutions:
1. Restructure to break the cycle — move shared code to a third module:
# shared.py
def helper_a(): return 'A'
def helper_b(): return helper_a()
2. Use local imports inside functions — defers the import until call time:
# module_b.py
def helper_b():
from module_a import helper_a
return helper_a()
3. Import the module, not the name — import module_a instead of from module_a import helper_a. Module-level imports succeed because the module object exists in sys.modules even while partially loaded.
Prefer restructuring over local imports. Circular dependencies usually indicate a design problem.
Follow-up
# module_a.py
from module_b import helper_b
def helper_a():
return 'A'
# module_b.py
from module_a import helper_a
def helper_b():
return helper_a()
print(b) # [1, 2, 3]
print(d) # [1, 2, 3, 4, 5]
a = a + [4, 5] creates a new list via list.__add__ and rebinds a to it. b still references the original list.
c += [4, 5] calls list.__iadd__, which mutates c in place by extending it. Since d references the same object as c, d sees the change.
This is a critical distinction:
__add__returns a new object (non-mutating)__iadd__modifies the object in place (mutating) for mutable types
For immutable types like tuples and strings, += creates a new object because in-place mutation is impossible:
a = (1, 2)
b = a
a += (3,)
# a is a new tuple, b is unchanged
This catches developers who assume += is always shorthand for = ... +. Understanding the difference between rebinding a name and mutating an object is fundamental to Python's data model.
Follow-up
a = [1, 2, 3]
b = a
a = a + [4, 5]
print(b) # ?
c = [1, 2, 3]
d = c
c += [4, 5]
print(d) # ?
Use these questions in your next interview
Import all 45 questions into Intervy with one click. Add scoring rubrics, organize by template, and conduct structured interviews.
Try Intervy Free