Links: 108 Python Index
Dataclasses help us in writing data oriented classes like a class representing a vector.
- They are very different from behaviour oriented classes that expose a number of methods.
Example of a builtin dataclass
How does dataclasses help in representing data oriented classes?
It adds convenience mechanisms like
- Being able to represent the object as a string easily. Like
__repr__ methods are already implemented.
- Compare the objects with other objects.
- Add an easy initialisation mechanism to the data.
- Dataclasses by default automatically initialise a bunch of dunder methods for us in a class such as:
__init__The initialisation method for the class
__repr__How the class is represented with print() is called
__str__How the class is represented as a string (called with
__eq__Used when equality operators are used (eg,
__hash__The hash for the class (called with
Dataclasses should be preferred over normal dictionaries and typedicts if you are want type hinting and are planning to pass dictionaries around.
- Dataclasses can be used for creating Data Transfer Objects in Python.
- DTOs are objects that are used to pass data between functions and are not entities or basic types.
- If we forget to add
@dataclassdecorator over the class then we will end up with a bunch of class variables instead of instance variables.
Default values in non mutable objects
Default values in mutable objects
from dataclasses import dataclass, field def generate_id(): return "" @dataclass class Person: name: str """ email_addrsses: list[str] =  Each instance of Person will have access to same version of list. """ email_addresses: list[str] = field(default_factory=list) """ Having randomly generated values as default ids init=False means we cannot set IDs while creating an instance """ id: str = field(init=False, default_factory=generate_id) """ Create values from other instance variables using post init """ search_string: str def __post_init__(self) -> None: self.search_string = self.name + self.id obj1 = Test(a="abc") obj2 = Test(a="abc", email_addresses=["firstname.lastname@example.org"])
The above way of initialising the default value of b will lead to problems. Read more about Mutable Default Values
We pass functions to the
listis also a function, just like
For protected variables we use
_and for private variables we use
__in front of the variables.
- If we don't want to print a particular field when we print the object then we can use
some_variable: str = field(repr=False)
- We can freeze the dataclass using
- Frozen means once the dataclass is initialised it is read only.
- After python3.10 we can use the following
@dataclass(kw_only=True)- If we only want to initialise the class by providing key word arguments.
@dataclass(slots=True)- Faster dataclasses
Pydantic vs Dataclasses¶
We also have pydantic which can be used as dataclass and comes with validators.
- It does type checking at runtime.
Pydantic is an opinionated library built for parsing.
- By default pydantic will try to convert the values in the types mentioned in the class.
All the types are checked at runtime and type conversions take place at runtime
Checking/converting things at runtime makes a lot of sense when the data we are reading might not be valid or be of the right format. This only makes sense for the specific task of parsing.
Using pydantic is a huge waste of resources and time if parsing is not the goal.
- If some data structure is internal to your application and your application is responsible for creating the data then it would be a huge waste of resources to do runtime checking.
It is better to use static type checking.
- Pydantic is not an alternative to dataclasses.
- They serve a very different purpose of parsing. Like parsing JSON from a web API.
- It is very good at what it does but it is not a general purpose thing.
Dataclasses comparison chart¶
- Which Python @dataclass is best? Feat. Pydantic, NamedTuple, attrs... - YouTube
- This Is Why Python Data Classes Are Awesome - YouTube
Last updated: 2022-10-27