Introduction
Welcome to Machine Learning Engineering, a compendium of notable concepts for developing and understanding the engineering and technical foundations that power machine learning solutions.
This resource is not centered on the mathematical or statistical foundations of Artificial Intelligence. Instead, it highlights on the engineering and computer science aspects crucial for developing and creating effective machine learning systems.
Who This Resource Is For
This collection assumes prior knowledge of Machine Learning principles. Familiarity with programming languages used in ML, the ML/DL ecosystem, foundational concepts that power these models and an understanding of how Machine Learning addresses business challenges are essential for effectively engaging with and contributing to the content of this resource.
Those who are passionate about developer experience and appreciate the sophistication of tools and software that make ML accessible to organizations will find the content engaging. The goal is to foster a broader vision for creating elegant solutions that others can collaborate on, leveraging the right tools to tackle interesting engineering challenges.
Note: This material provides concise examples, ideas, and implementations rather than exhaustive definitions or in-depth explanations. It references external sources for core concepts, allowing you to revisit them later if needed.
Jupyter Project
Since Python is a multipurpose programming language, its various applications and use cases introduce people to specialized ecosystems and projects the community has built to enhance the development experience for specific workflows. In the Data Science and Machine Learning scene, the Jupyter Project stands out as one of the most influential tools supporting this area of development.
The Jupyter Project is an open-source initiative that defines standards and interfaces for interactive computing across multiple programming languages. It offers web-based tools, including notebooks, to combine code, data, and rich media in a single, interactive environment. While widely used in data science, machine learning, and scientific computing, Jupyter is not limited to these fields. It enables a unified workflow where users can write, run, and visualize code alongside notes, equations, and visualizations—all in one place.
IPython
The origins of the Jupyter Project are closely tied to a now independent program called IPython (Interactive Python). From 2011 to 2014, Jupyter and IPython were developed as a single, monolithic project before they eventually split to serve distinct purposes.
Interpreted languages such as Ruby and JavaScript, provide REPLs (Read Evaluate Print
Loops) like irb
and the browser console, respectively—tools that enable rapid code
testing and iteration. Python is no exception. Running the python
command in a shell
launches an interactive environment for executing Python code line by line, enabling a
fast feedback loop for experimentation and learning. These REPL experiences vary
between languages and implementations, each offering different levels of interactivity
and tooling.
IPython goes beyond the default Python REPL implementation by offering a more powerful interactive environment. Key features include:
- Embeded syntax higlight and auto-completion.
- Magic commands
using
%
and%%
syntax for tasks like timing code, running scripts, or managing the environment. - Automatic storage of output values, allowing reuse of results via special variables
like
_
,__
, and numbered outputs (_1
,_2
, ...) - Command history navigation across sessions.
IPython can be used as a standalone shell command, just as Jupyter can be used without IPython. However, the two remain connected—IPython is the reference Python kernel used within Jupyter for executing Python code inside notebooks and other Jupyter interfaces.
Kernel
Inspired by the Operating System's definition, the Jupyter ecosystem uses the term kernel to describe a different kind of bridge: one that connects an independent programming language process with Jupyter Clients, e.g. user interaces like Notebook. Using a well-defined messaging protocol, this design enables Jupyter to support multiple languages through interchangeable kernels, while keeping the user experience consistent.
Different programming languages—as well as different versions of the same language—can appear as kernels within a single Jupyter client. Note that Jupyter Lab and Notebook are Python applications, meaning that launching them starts a Python process. That Python interpreter may also act as a kernel, but it does not have to.
The Jupyter Stack can be installed as a standalone tool, independent of the kernels it manages. This separation is encouraged: it allows a single Jupyter installation to serve multiple, isolated environments without duplicating Jupyter for each one. This is especially useful in centralized setups (e.g., IDEs or servers) where Jupyter manages and connects to various kernels across different environments.
Integrated Workflows
While Jupyter can be run as a standalone application with its own user interface, it is also highly modular. Its UI can be replaced or extended—this is exactly what tools like VSCode’s Jupyter extension, Google Colab, and similar solutions do.
Jupyter’s architecture is particularly well-suited for a client–server model. The server handles code execution and computation, while the user interacts with Jupyter through a web browser—effectively abstracting away the underlying infrastructure.
This separation of interface and compute enables powerful, scalable setups—such as delegating workloads to cloud resources. Platforms like Amazon SageMaker, Azure Machine Learning Studio, Databricks Notebooks and others leverage this model to provide a seamless experience for running complex workflows with specialized computing needs.
Design Patterns
Although design patterns are a topic with their own history, intricacies, and opinionated implementations (not to say it varies based on the programming language), this section focuses on the most common design choices available to developers when integrating user code with the functionality provided by Python libraries.
Most of the time, library code would not require the implementation of a complex design pattern for an object to be compliant. Instead, it will provide foundational principles for how an interface can be accepted and how a generic piece of code can modify, attach to, or benefit from its implementation.
Decorator
One of the most popular design choices in Python is the use of decorators, which is
striking due to the usage of the @
character in the language, almost exclusive to the
implementation of this pattern1.
The definition of a decorator is very concise: a function that modifies the behavior of an object.
Decorators themselves are not complicated, but many examples implement them with poor type hinting, which makes the abstraction visually confusing.
The following is an example of the simplest possible decorator:
import typing as t
def decorator[T, **P](
func: t.Callable[P, T],
) -> t.Callable[P, T]:
"""
Args:
func: any callable
Returns:
func's signature, with additional attribute `__func__`
"""
setattr(func, "__func__", "__func__")
return func
# Using syntactic sugar, apply on statement
@decorator
def syntactic_sugar() -> None: ...
# Without syntactic sugar, as first-class citizens
def func() -> None: ...
syntactic_salt = decorator(func)
Even though this code snippet is quite simple, it represents a complete decorator implementation. Note that its usage does not involve creating a new callable nor internally consuming the function; it simply returns the function after adding some metadata to it.
Callable
Some definitions are very simple and describe a decorator as a function that accepts and returns a function (sometimes the same function).
These decorators are used for attribute-based modifications and often require the context of another object to embed some logic.
import typing as t
def zero_lvl_func_decorator[T, **P](
func: t.Callable[P, T],
) -> t.Callable[P, T]:
print("extend func behavior on DEFINITION")
setattr(func, "__func__", "__func__")
return func
For example, the standard library’s abc.abstractmethod
decorator, simply marks a flag
inside the decorated methods to determine which methods of the class must be overridden
by their children in order to create an instance.
Another decorator option is one that extends the function on its call, for that, a wrapper callable is required, which is returned when the decorated function is invoked. This wrapper will call the original function and execute additional functionality.
import typing as t
from functools import wraps
def one_lvl_func_decorator[T, **P](
func: t.Callable[P, T],
) -> t.Callable[P, T]:
print("extend func behavior ON DEFINITION")
# although a new callable is being returned, this accepts same params as func
# to hiddenly extend/decorate functionality
@wraps(func)
def wrapper(*args: P.args, **kwargs: P.kwargs) -> T:
print("extend func behavior ON CALL")
# returning same value as decorated callable
return func(*args, **kwargs)
return wrapper
To allow users to customize the functionality of the decorator or modify certain options, the decorator can accept parameters as well. One might be tempted to modify the decorator’s signature like this:
import typing as t
from functools import wraps
def one_lvl_func_decorator[T, **P](
func: t.Callable[P, T],
print_func_return: bool = False,
) -> t.Callable[P, T]:
print("extend func behavior ON DEFINITION")
# since a new a callable is being return, respect old params
@wraps(func)
def wrapper(*args: P.args, **kwargs: P.kwargs) -> T:
print("extend func behavior ON CALL")
if print_func_return:
returns = func(*args, **kwargs)
print(f"{returns=}")
return returns
# returning same value as decorated callable
return func(*args, **kwargs)
return wrapper
However, this code will fail at runtime:
@one_lvl_func_decorator(print_func_return=True)
def func() -> None: ...
$ uv run main.py
Traceback (most recent call last):
File "/mle/code/python/standards/design-patterns/04/main.py", line 26, in <module>
@one_lvl_func_decorator(print_func_return=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: one_lvl_func_decorator() missing 1 required positional argument: 'func'
This happens because the decorator syntactic sugar syntax implicitly passes the function statement as the first argument of the decorator. However, when using parentheses to specify options, this creates a conflict: the first argument is now expected to be the function itself.
To enable the use of decorators on callable statements while allowing parameters, a two-level decorator structure can be utilized.
import typing as t
from functools import wraps
def two_lvl_func_decorator[T, **P](
a: int,
b: int,
) -> t.Callable[
[t.Callable[P, T]],
t.Callable[P, T],
]:
"""
Parametrized Decorator that extends behavior of common function
"""
print(f"modify inner behavior based on {a=}")
def decorator(
func: t.Callable[P, T],
) -> t.Callable[P, T]:
@wraps(func)
# since a new a callable is being return, respect old params
def wrapper(*args: P.args, **kwargs: P.kwargs) -> T:
print("extend func behavior ON CALL")
print(f"modify inner behavior based on {b=}")
return func(*args, **kwargs)
return wrapper
return decorator
Throughout these examples, typing.ParamSpec
has been included in all wrapper callables
using the [**P]
notation. The benefit of this is that, if a user wants or needs to
store the functions on a different module —rather than decorating them on function
statement— such as for file conventions, or any other reason, they can still take
advantage of static type checking.
def func(c: str) -> str:
return c
two_lvl_func_decorator(a=1, b=2)(func)(c=3)
$ uv run mypy main.py
main.py:36: error: Argument "c" has incompatible type "int"; expected "str" [arg-type]
Found 1 error in 1 file (checked 1 source file)
Class
Simple abstraction for attribute-based modifications:
def zero_lvl_class_decorator[T: (type)](some_class: T) -> T:
"""
Extend class behavior on DEFINITION
"""
setattr(some_class, "__type__", "__type__")
return some_class
To define a class decorator that returns the same object and accepts parameters for customization, the following approach can be used:
import typing as t
def one_lvl_class_decorator[T: (type)](
some_value: int,
) -> t.Callable[[T], T]:
print("run operation ON DEFINITION")
def decorator(some_class: T) -> T:
print("extend class behavior on DEFINITION")
# modify methods of original class, e.g.
def some_method(self, x=some_value) -> int:
return x
setattr(some_class, "some_method", some_method)
return some_class
return decorator
A two-level decorator to accept parameters is not necessary. Even if the decorator’s
purpose were to return an object of a different type, the class abstraction itself
contains enough information to eliminate the need for a wrapper
-like object. However,
it may be confusing and is not considered good practice to have a decorator that
returns a different object type.
Object-Oriented
Python, by design, is an object-oriented programming language. It does not enforce
rigidity in this principle, enabling easy implementation of multiparadigm designs in
codebases. However, its nature heavily relies on the concept of objects and classes.
Any struct-like interface, trait, or even Enums must be addressed using the class
keyword.
dataclasses
is a popular example
of this. These are commonly used to represent a structure that includes methods without
relying on heavy Object-Oriented Programming (OOP) techniques. However, it still must be
declared using the class constructor. The same applies to Pydantic base models, which
require inheritance from pydantic.BaseModel
. Even though these objects are not
necessarily meant to incorporate traditional OOP design patterns, they still follow the
same construction experience.
Regardless, mid-level Object-Oriented Programming features are part of writing idiomatic Python. Not all libraries enforce this design, but understanding the core concepts most commonly used in the language enhances development:
-
Object Instantiation: objects that have a set of methods and attributes required by an operation. This is common with standard library objects like
io.BytesIO
. The instance of the object will comply with all the attributes/methods required by a different operator. -
Abstract Methods: Inheritance of an object that requires user implementation. The standard library uses this pattern to define concepts like Iterables, Mappings, Awaitables... In third-party libraries, it is more common to find them with a method that must be overridden, respecting some input parameters and returning a specific type.
-
Special Methods: Dunder (double underscore) methods, also known as magic methods. These methods are finite, and implementing some of them will have an effect when using built-in keywords, operators, and global functions. Others are more object-centric but are still common and relevant:
-
__init__
: Initializes the contents of a class and binds objects to the instance of the type.__init__
does not return.class Class: def __init__(self) -> None: ... # signature returns None.
-
__new__
: Method is called when a class is constructed and is responsible for creating the class. Must return an instance of the class.class Class: def __new__(cls) -> "Class": ... # modify creational behavior. Must return the type
-
__call__
: Method accesed when using()
on a initializedobject
. Python functions defined with the thedef
keyword also adhere to this protocol. This is the abstraction that enablescollections.abc.Callable
. A class that implements__call__
is as callable as a function object defined with thedef
keyword.class Class: def __call__(self) -> str: return "called" my_instance = Class() # does not run call, object has to be created first my_instance() # 'called'
def func() -> str: return "called" func.__call__() # 'called'
Accesing the type at runtime of a function returns the
function
type. However, function is not reserved keyword in Python, nor a type. It is not possible to use it as a base class on class definitiondef func() -> None: ... FunctionType = type(func) class Function(FunctionType): ...
Traceback (most recent call last): File "/mle/code/python/standards/design-patterns/11/main.py", line 7, in <module> class Function(FunctionType): ... TypeError: type 'function' is not an acceptable base type
TODO: improve transition
Even some built-in objects like
dict
define__dunder__
methods that were not originally intended for the type to enable more convenient operations. For example, the|
(__or__
) operator, originally introduced to represent the binaryOR
operation between numbers, can also be used to merge dictionaries. Libraries likeLangChain
use this operator in a pipe-centric way to stream the output of Runnables similar to%>%
in theR
programming language, inspired by Unix pipelines. -
-
Metaclasses go beyond the capabilities of methods like
__new__
. They allow modification of a class’s behavior before it is even created, providing access to the__dict__
that contains the entire class body, the base classes it inherits from, and more. Usage of metaclasses is part of what makes possible complex abstractions likesqlalchemy.orm.DeclarativeBase
,pydantic.BaseModel
in modern Python applications.
-
Decorators are not the only grammatical construct in Python that uses the
@
character. There is also the@
operator, which can be used when an object defines the__matmul__
magic method. ↩
Typing
Although Python is a dynamically typed programming language, support for function
annotations has been around since version 3.0, as introduced in
PEP 3107. Extended support, including the typing
library, was added to Python’s standard library in version 3.5 with the use of type
hints (see PEP 484).
Python’s type system is not enforced at all, it is completely1 ignored by the interpreter at runtime. This can be both beneficial and disadvantageous for the Developer Experience.
One advantage is that there is no transpiler requirement, unlike languages such as TypeScript in the JavaScript ecosystem. However, Python still offers richer ergonomics and functionality than comment-based types, like the ones used in Lua, with LuaLS type annotations.
Type hinting is built into the Python's grammar. Additionally, type hint definitions
can be accessed through the object's __annotations__
attribute. Nonetheless, typing
lacks of strict rules unless continuous integration tooling is implemented in the code’s
lifecycle, e.g. using type checkers such as mypy
.
The recommended resource for learning about Python’s type hints is the documentation on
the typing
library. It covers the
fundamentals of implementing type annotations to a codebase, including clever examples.
Nevertheless, the following sections will focus on commonly used functionalities and
types that may be ambiguous or insufficiently covered in the official documentation.
Annotations
Default behavior
Annotating a callable parameter with a type, as in:
def func(some_str: str) -> None: ...
indicates that some_str
must be an instance of str or a subclass of it2,
not the type per se. Type annotations assume values are instances of the class or any
of their children rather than the class itself.
Therefore, the following are completely valid:
class MyInt(int): ...
def func(int_instance: int) -> None: ...
func(int())
func(MyInt())
$ uv run mypy main.py
Success: no issues found in 1 source file
To pass a class
as a parameter, the notation requires the use of typing.Type
or
type
. This is because Python’s class syntax is syntactic sugar for creating
types. The same rule applies: subclasses of the specified class are also valid as
function arguments.
import typing as t
class MyInt(int): ...
def func(int_type: t.Type[int]) -> None: ...
func(int)
func(MyInt)
$ uv run mypy main.py
Success: no issues found in 1 source file
Generics
Python does not have generics like those in C++, Java, or Rust, where type-specific
versions of a function are resolved at compile time. Instead, generics in Python exist
purely for type annotation purposes. The language inherently supports writing functions
that operate on multiple types without requiring duplication. However, typing generics
allow developers to dynamically support types based on the object's definition,
narrowing the function’s scope: enabling better static analysis and code reusability.
Both PEP 484 and the
typing
library documentation
provide in-depth guidance on working with generics.
The typing
documentation introduces newer syntax for defining generics in both
callables and classes. Enhancements in Python 3.12 enable the creation of generics using
concise one-liners. Previously, defining generics required explicitly declaring type
variables, binding rules, and variance constraints. This is no longer necessary.
For functions:
def func[T: (int, str)](some_generic: T) -> T:
return some_generic * 2
int_expression = func(1) # inferred as int
str_expression = func("foo") # inferred as str
$ uv run mypy main.py
Success: no issues found in 1 source file
For classes:
from dataclasses import dataclass
@dataclass
class MyClass[T]:
some_attr: T
The documentation explains the implementation of generics under the hood, MyClass
silently inherits from typing.Generic
, in Python 3.11 and earlier, explicit
inheritance was required to use type generics.
An important distinction between class and callable generics, highlighted in PEP 484 but not explicitly mentioned in the typing documentation, is the scoping of type variables. In generic classes, a type variable used in a method or attributes that coincides with a class-level type variable is always bound to the class-level definition. This means generics apply at the class level, rather than being redefined per method or attribute.
import typing as t
T = t.TypeVar("T")
class MyClass(t.Generic[T]):
def __init__(self, x: T) -> None:
super(MyClass, self).__init__()
self.x = x
def instance_func(self, x: T) -> T:
return x
my_instance = MyClass(x=1) # Generic T is now type `int` for the whole instance
my_instance.instance_func("1")
$ uv run mypy main.py
main.py:16: error: Argument 1 to "instance_func" of "MyClass" has incompatible type "str"; expected "int" [arg-type]
Found 1 error in 1 file (checked 1 source file)
For further details, refer to the PEP 484 scoping rules.
Finally, there are some nuances behind the implementation of class generics that can be confusing to the eye. For example:
import typing as t
T = t.TypeVar("T", int, float)
class MyIterable(t.Iterable[T]): ...
Is a compact version of:
import typing as t
T = t.TypeVar("T", int, float)
class MyIterable(t.Iterable, t.Generic[T]): ...
The generic has nothing to the with the Iterable
Abstract Base Class. It is a
convenient syntax to avoid explicit inheritance of typing.Generic
.
Using Python 3.12’s generics syntax and the source import reference3, this can be rewritten as:
from collections.abc import Iterable # same as typing.Iterable
class MyIterable[T: (int, float)](Iterable): ...
Protocol
Trait-like or interface-based programming can be achieved using typing.Protocol
. It
serves as an alternative to abstract base classes without requiring full-fledged
Object-Oriented Programming (OOP) constructs.
import typing as t
class Proto(t.Protocol):
def must_implement(self) -> None:
return None
class InheritsProto(Proto): ... # implements by inheritance
class ImplementsProto:
def must_implement(self) -> None: ...
# These are equivalent
def func(implements_proto: Proto) -> None: ...
def fn[T: (Proto)](implements_proto: T) -> None: ...
func(InheritsProto())
func(ImplementsProto())
$ uv run mypy main.py
Success: no issues found in 1 source file
Note that the method body inside the protocol explicitly returns None
instead of using
an ellipsis (...
). In Python, the idiomatic way to define an empty function body—such
as for an abstract method or a protocol—is by using either the pass keyword or the
ellipsis.
In this case, the must_implement
method includes a return statement, meaning there is a
implementation of the protocol. This allows type checking to pass. However, if the
Proto
class had an empty function body and none of its subclasses implemented the
method, a type-checking error would occur. At least one class in the method resolution
order (__mro__
) must provide a concrete implementation that matches the protocol’s
method signature, including its parameter and return type.
type
: function and soft keyword
type
is both a function and a keyword in Python, serving four distinct purposes:
-
Obtain the type of a variable:
type(object)
-
Dynamically creating types at runtime by providing a name, base classes, and namespace:
def __init__(self, attr: str) -> None: super(MyClass, self).__init__() self.attr = attr return None MyClass = type("MyClass", (object,), {"__init__": __init__})
-
Defining metaclasses (beyond the scope of this section)
-
Optional keyword before a type expression (introduced in Python 3.12):
# These are equivalent type Str = str Str = str
This syntax explicitly marks a variable as a type alias for static type checking. Although type checkers will enforce correctness, there are no runtime exceptions when assigning instances to a type alias. The type declaration takes precedence over a normal assignment, creating an unbound type alias, which can lead to unintended behavior if misused.
Enums
The Python standard library has included support for Enums since version 3.4 with the
enum
library, introduced in PEP 435. While Enums
are not part of the typing
library, their functionality integrates seamlessly with
the typing principles.
Enum classes are not designed to be instantiated, even though this can be done passing one of the member values as an argument. Instead, the recommended way to use Enums is by accessing their members directly like class attributes.
from enum import Enum, auto
class MyEnum(Enum):
FOO = auto() # Assigns value 1
BAR = auto() # Assigns value 2
def func(formatter: MyEnum) -> None: ...
func(MyEnum.FOO)
$ uv run mypy main.py
Success: no issues found in 1 source file
It is important to note that accessing an Enum member does not return its associated
value; it returns the member itself. To access the underlying value, use the .value
attribute, or to get the member’s name as a string literal, use .name
.
Therefore:
from enum import Enum
class Formatter(str, Enum): # since python 3.11, can replace bases with enum.StrEnum
JSON = "json"
LOGFMT = "logfmt"
print(Formatter.JSON, f"{type(Formatter.JSON)=}")
print(Formatter.JSON.name, f"{type(Formatter.JSON.name)=}")
print(Formatter.JSON.value, f"{type(Formatter.JSON.value)=}")
$ uv run main.py
Formatter.JSON type(Formatter.JSON)=<enum 'Formatter'>
JSON type(Formatter.JSON.name)=<class 'str'>
json type(Formatter.JSON.value)=<class 'str'>
But for some use cases, operators with Enum members and values are interchangeable if
the Enum inherits from int
or str
, or their equivalents (IntEnum
, StrEnum
).
from enum import Enum
class TokenStatus(int, Enum):
ACTIVE = 1
EXPIRED = 2
def func(some_int: int) -> int:
if some_int == TokenStatus.ACTIVE:
return some_int
elif some_int == TokenStatus.EXPIRED:
return some_int
else:
return 0
func(1)
func(TokenStatus.EXPIRED)
Even though TokenStatus.EXPIRED
is not an instance of int
, it passes static type
checking
$ uv run mypy main.py
Success: no issues found in 1 source file
However, if func
’s parameter some_int
had a type hint of TokenStatus
,
a static type checking error would occur. Regardless, the function behavior remains the
same. This can lead to unexpected behavior. For that reason, inheriting exclusively from
Enum
can be beneficial, at the expense of losing convenience methods on the members.
String Enums, in particular, can often be better represented using typing.Literal
This allows users to type the value directly as a raw string without needing to import
library-specific Enums, while also improving awareness of possible values.
import typing as t
def func(some_literal: t.Literal["aio", "threading"]) -> None: ...
Casting types at runtime
A function may return typing.Any
or an unknown type that cannot be inferred from its
signature. This is common in older or poorly documented third-party libraries. In such
cases, developers can define type annotations to improve the development experience, at
the risk of encountering unexpected behavior if objects do not match the expected
signature.
Introduced in Python 3.6 with PEP 526, the following syntax is permitted:
import typing as t
def func() -> t.Any:
return str()
my_str: str = func()
In this example, it is safe to assume that the function returns a str
instance.
However, in real-world scenarios, determining the actual return type may be more
complex. If previous application logic suggests a predictable set of possible return types,
it can be useful, albeit unsafe to provide an explicit type hint.
It is worth noting that the previous example does not work for instance attributes:
import typing as t
class MyClass:
def __init__(self, attr: t.Any):
self.attr = attr
def func(some_instance: MyClass):
some_instance.attr = str()
return some_instance
my_instance = MyClass(1)
my_mod_instance = func(my_instance)
# can safely assume that attr is of type str
my_mod_instance.attr: str
$ uv run mypy main.py
main.py:17: error: Type cannot be declared in assignment to non-self attribute [misc]
Found 1 error in 1 file (checked 1 source file)
To address this, the typing
library provides the cast
function. Replace the last
statement with:
my_mod_instance.attr = t.cast(str, my_mod_instance)
The cast
function allows explicitly specifying a type with minimal runtime behavior,
ensuring type checkers recognize the expected type.
Coroutines
Python coroutines (functions marked with the async
keyword) are objects that silently
implement the collections.abc.Coroutine
protocol. Using the async
keyword at the
beginning of a function statement is syntactic sugar for inheriting from the mentioned
type and implementing the __await__
, send
, and throw
methods.
Therefore, there is no typing.AsyncCallable
type built into the typing
library;
instead, to type-annotate an async function, the return type of a callable has to be
wrapped inside collections.abc.Awaitable,
which has a generic parameter
representing the function’s return type. Or it can be annotated as a
collections.abc.Coroutine
, which would be more accurate, but more verbose.
import typing as t
type ACallable[T] = t.Callable[..., t.Coroutine[t.Any, t.Any, T]]
The implementation of asynchronous functionality into the language with the usage of
generators, results in async def
functions being callable objects that return a
Coroutine
with three
generic parameters.
The first two parameters can be safely omitted for these functions when considering type
annotations. In fact, they cannot be set when using the async def
syntax, as there is
no opportunity to modify the YieldType
, nor the SendType
of the object. Attempting
to do so would result in creating an AsyncGenerator
.
However, the last parameter represents the return type of the Coroutine when awaited, making it ideal for providing developers with precise type hints.
py.typed
and .pyi
extension
There are two specific Python-oriented extensions for typing purposes. py.typed
is an
explicit marker for libraries that include types, allowing type checkers like mypy
to
effectively look up type definitions for third-party sources. Advances in developing
tools like Integrated Development Environments (IDEs) with the use of Language Server
Protocols (LSP) have made this less necessary. Yet, it remains a good practice to
include it on library code.
On the other hand, files ending with .pyi
serve as stubs in the programming language
— a stub being a file that contains only type information, with no runtime code. It
simply reveals the interface of objects so developers can use them safely. With type
annotations now built into the language grammar, this may seem unnecessary — similar to
TypeScript type definitions with .d.ts
extension. However, .pyi
files are extremely
useful for codebases written before type hints became standard, and more importantly,
for Python interfaces where the code itself is not written in Python. Such as Python
global functions or any third-party software built with the C API or
PyO3
.
Assume some_package
is a third-party library that cannot be modified directly. The
structure is as follows:
.
├── main.py
└── some_package
├── module.py
└── module.pyi
In this setup, the file module.py
defines the following function:
def func(a, b):
return a + b
Meanwhile, the interface file module.pyi
provides:
def func(a: int, b: int) -> int: ...
Language Server Protocols and type checkers will examine the defined types and use them for features such as code completion, inline error detection, and improved code navigation.
Thus, if main.py
calls func
like:
from some_package.module import func
func("1", "2")
mypy
will catch the undesired usage of the callable
$ uv run mypy main.py
main.py:3: error: Argument 1 to "func" has incompatible type "str"; expected "int" [arg-type]
main.py:3: error: Argument 2 to "func" has incompatible type "str"; expected "int" [arg-type]
Found 2 errors in 1 file (checked 1 source file)
Type-Driven Behavior
Runtime access
The usage of type annotations has become an important element of Python development. Type hints/annotations have evolved into something greater than what the name suggests. There is now a significant portion of libraries that rely on this aspect of Python’s syntax to govern the runtime behavior of application.
To examine the type hints of an object at runtime, the following can be done:
import inspect
def func() -> None: ...
assert inspect.signature(func).return_annotation is None
This opens up the opportunity to create a lot of hidden magic using a decorator, for example.
Python’s standard library dataclass
invites the user to declare classes in a
type-hinted manner. It uses typing.ClassVar
to modify the default behavior of types
being instance attributes. Meanwhile, Python libraries like
Pydantic take type hints to the next level, making
type annotations the absolute source of truth for object instantiation, providing
validation and safety at runtime. Not to mention projects like
FastAPI, which define the complete HTTP structure of an
endpoint with a callable with type-annotated parameters by looking at the function
signature.
Modifiers
-
Strings: Forward references, detailed in PEP 484 occur when an object that has not been defined is used as a type annotation. For that reason, type hints can be expressed as string literals.
from dataclasses import dataclass @dataclass class Tree: left: "Tree" right: "Tree"
The benefit of this is that type checkers will pick the type during static type checking, and IDEs will too. However, this can cause unpredictable behavior for type-driven libraries. Since the following will result in an error:
import inspect # annotation returns string literal: "Tree" assert Tree is inspect.signature(Tree).parameters["left"].annotation # annotation returns string literal: "Tree"
Traceback (most recent call last): File "/mle/code/python/standards/typing/20/main.py", line 13, in <module> assert Tree is inspect.signature(Tree).parameters["left"].annotation ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AssertionError
-
typing.TYPE_CHECKING
: the source code of thetyping
library describes this constant as:# Constant that's True when type checking, but False here.
Combined with using string literals for type annotations, this comes in handy to prevent circular imports, mainly used with a condition. if
typing.TYPE_CHECKING
, import some type using double quotes in source code, since it will not be imported at runtimeimport typing as t if t.TYPE_CHECKING: from some_module.that.causes import CircularImport def func(some_type: "CircularImport") -> None: ...
Enables type checking during development: removes the source signature at runtime, which can conflict if a function expects to resolve the type based on the hint. In comparison with writing a string literal, functions likes
typing.get_type_hints
that search for the types of an object in different higher scopes of the program may raise an expection. -
from __future__ import annotations
: Adding this import in a Python file sets a configuration for the interpreter at runtime. It modifies all type hints to be in the form of string literals, avoiding forward references, exceptions raised by type hints not defined at runtime and more... This was introduced into the language on version 3.7, with details in PEP 563. There was some discussion about making this the default in Python 3.10, but it was postponed and reconsidered in PEP 649, due to Python that rely on the current behavior—where type hints are eagerly evaluated at the time the annotated object is bound
Evolution of Typing Notation
The typing
library has seen continuous extensions since its release. When modern
typing features are needed but the project requires an older Python version, the
typing_extensions
library offers
compatibility with minimal overhead, adding only a single dependency with no additional
requirements.
However, certain notation and syntax changes lack forward compatibility. For example, starting from Python 3.9, standard collection types can be parametrized using square bracket notation. In type checking, the following are equivalent:
import typing as t
def func(some_dict: t.Dict[str, str]) -> None: ...
def func(some_dict: dict[str, str]) -> None: ...
The choice between these notations depends on the context. When developing library code for a broad user base, justifying an upgrade solely for typing improvements may not always be compelling. It is the developer’s responsibility to decide whether to prioritize compatibility over a cleaner syntax or adhere to the more verbose but widely supported options.
Relevant Changes in Typing
PEP | Title | Python Release |
---|---|---|
526 | Syntax for Variable Annotations | 3.6 |
563 | Postponed Evaluation of Annotations | 3.7 |
585 | Type Hinting Generics In Standard Collections | 3.9 |
604 | Allow writing union types as X | Y | 3.10 |
695 | Type Parameter Syntax | 3.12 |
-
Type hints are almost entirely ignored at runtime. There are no automatic runtime validations: no implicit
isinstance
checks are performed on function calls. However, imports must still be resolved, and certain functions, such ascast
, may have a minimal runtime impact. ↩ -
This rule generally applies, with exceptions such as when using
typing.Protocol
or a children ofenum.Enum
. ↩ -
Iterable
is an Abstract Base Class (ABC). Thetyping
library provides an alias for this and other protocol-based objects specifically fortyping
purposes. Inheriting from an ABC has runtime implications beyond its "optional" role in type annotations, in this case, it mandates the implemention of the__iter__
method. ↩