Exploring AiiDA’s ORM architecture#
In this post, we’ll take a deep dive into the implementation of AiiDA’s Object-Relational Mapping (ORM) system. We’ll explore how AiiDA leverages SQLAlchemy to create a flexible ORM backend that abstracts database operations and separates concerns between the user-facing Python objects and the underlying database persistence.
Architecture overview: The multi-layer design#
AiiDA’s ORM follows a four-layer architecture that provides clean separation between the user interface, business logic, and data persistence:
┌─────────────────────┐
│ User interface │ ← Node (Python ORM class)
│ (orm/nodes) │
├─────────────────────┤
│ Abstract interface │ ← BackendNode (Abstract base class)
│ (implementation) │
├─────────────────────┤
│ ORM backend │ ← SqlaNode (SQLAlchemy implementation)
│ (psql_dos/sqlite) │
├─────────────────────┤
│ Database models │ ← DBNode (SQLAlchemy models)
│ (models) │
└─────────────────────┘
Each layer serves a distinct purpose:
User interface (
Node
): Provides a clean, Pythonic API that hides database complexityAbstract interface (
BackendNode
): Defines contracts that all backend implementations must followORM backend (
SqlaNode
): Backend connection to the different database systemsDatabase model (
DbNode
): Defines the actual table schemas and relationships using SQLAlchemy’s declarative approach
Importantly, you, the user, will typically not have to interact directly with database-specific code.
Instead, you can work with the high-level Node
class (and its child classes), while AiiDA automatically delegates database operations to the appropriate backend implementation.
The design further allows AiiDA to support multiple database backends (currently PostgreSQL and SQLite), while providing a unified Python interface.
So let’s start from the bottom and work our way up, shall we?
The database models: DbNode
#
AiiDA uses SQLAlchemy’s declarative approach to define database tables.
This means the database schema is defined using Python classes rather than raw SQL.
Here’s how the core
DbNode
model is constructed[1]:
from sqlalchemy.dialects.postgresql import JSONB, UUID
from sqlalchemy.orm import relationship
from sqlalchemy.schema import Column, ForeignKey
from sqlalchemy.types import DateTime, Integer, String, Text
class DbNode(Base):
"""Database model to store data for :py:class:`aiida.orm.Node`."""
__tablename__ = 'db_dbnode'
# Primary key and unique identifier
id = Column(Integer, primary_key=True)
uuid = Column(UUID(as_uuid=True), default=get_new_uuid, nullable=False, unique=True)
# Node classification
node_type = Column(String(255), default='', nullable=False, index=True)
process_type = Column(String(255), index=True)
# Metadata
label = Column(String(255), nullable=False, default='', index=True)
description = Column(Text(), nullable=False, default='')
ctime = Column(DateTime(timezone=True), default=timezone.now, nullable=False, index=True)
mtime = Column(DateTime(timezone=True), default=timezone.now, onupdate=timezone.now, nullable=False, index=True)
# JSON storage for flexible data
attributes = Column(JSONB, default=dict)
extras = Column(JSONB, default=dict)
repository_metadata = Column(JSONB, nullable=False, default=dict)
# Foreign key relationships
dbcomputer_id = Column(Integer, ForeignKey('db_dbcomputer.id'), nullable=True, index=True)
user_id = Column(Integer, ForeignKey('db_dbuser.id'), nullable=False, index=True)
# SQLAlchemy relationships
dbcomputer = relationship('DbComputer', backref='dbnodes')
user = relationship('DbUser', backref='dbnodes')
The Base
class from which DbNode
inherits is obtained via SQLAlchemy’s declarative_base()
, which provides the metaclass and base functionality that allows Python classes to be automatically mapped to database tables:
Base = declarative_base(
cls=Model,
name='Model',
metadata=MetaData(naming_convention=dict(naming_convention))
)
This declarative approach means that table schemas, constraints, relationships, and indexes are all defined in Python code rather than separate SQL files, making the database structure self-documenting and version-controllable alongside the application logic.
Some of the key features of AiiDA’s SQLAlchemy models, that can already be seen from the DbNode
definition above, are:
Use of JSON(B) for flexibility: The use of PostgreSQL’s
JSONB
(and SQLite’sJSON
) type forattributes
,extras
, andrepository_metadata
provides schema flexibility while maintaining query performance. This is crucial for scientific computing where one can’t predict what data users will want to store. Traditional relational databases would require creating new columns for every new property, but JSON columns allow storing arbitrary structured data (given that it’s JSON-serializable).UUID-based identity: Each node has both an integer primary key (
id
in the table,pk
in the Python API[2]) for database efficiency and a UUID for global uniqueness and portability. The integerid
is user-friendly, fast for database joins and indexing, while the UUID allows nodes to be moved between different AiiDA installations without conflicts.Automatic timestamps: Creation and modification times are automatically managed through SQLAlchemy’s
default
andonupdate
parameters.Indexing: Important columns like
node_type
,process_type
, and timestamps are indexed for query performance. Without indexes, queries like “find all calculation nodes from last month” would be extremely slow. The indexes make these common queries fast even with millions of nodes.
The abstract interface: BackendNode
#
Above the database layer sits the abstract
BackendNode
class, which defines the interface contract that all database backend implementations must follow:
class BackendNode(BackendEntity, BackendEntityExtrasMixin, metaclass=abc.ABCMeta):
"""Backend implementation for the `Node` ORM class.
A node stores data input or output from a computation.
"""
@abc.abstractmethod
def clone(self: BackendNodeType) -> BackendNodeType:
"""Return an unstored clone of ourselves."""
@property
@abc.abstractmethod
def uuid(self) -> str:
"""Return the node UUID."""
@property
@abc.abstractmethod
def node_type(self) -> str:
"""Return the node type."""
# ... more abstract properties and methods
This abstract base class (ABC) ensures that regardless of which ORM backend is used, the same interface is available to higher-level code (currently, AiiDA uses only SQLAlchemy, but also Django ORM was supported in the past[3]).
It defines essential properties like uuid
, node_type
, process_type
, and methods for managing attributes, links, and storage operations.
The abstract class serves as a contract —any backend implementation must provide all these methods and properties.
The ORM backend: SqlaNode
#
The
SqlaNode
class bridges the abstract BackendNode
interface with the concrete SQLAlchemy models:
class SqlaNode(entities.SqlaModelEntity[models.DbNode], ExtrasMixin, BackendNode):
"""SQLA Node backend entity"""
MODEL_CLASS = models.DbNode
USER_CLASS = SqlaUser
COMPUTER_CLASS = SqlaComputer
LINK_CLASS = models.DbLink
def __init__(self, backend, node_type, user, computer=None, **kwargs):
"""Construct a new `BackendNode` instance wrapping a new `DbNode` instance."""
super().__init__(backend)
arguments = {
'node_type': node_type,
'user': user.bare_model,
'label': kwargs.get('label', ''),
'description': kwargs.get('description', ''),
}
if computer:
arguments['dbcomputer'] = computer.bare_model
self._model = sqla_utils.ModelWrapper(self.MODEL_CLASS(**arguments), backend)
# ... more properties and methods
def store(self, links=None, clean=True):
session = self.backend.get_session()
if clean:
self.clean_values()
session.add(self.model)
if links:
for link_triple in links:
self._add_link(*link_triple)
if not session.in_nested_transaction():
try:
session.commit()
except SQLAlchemyError:
session.rollback()
raise
else:
session.flush()
return self
SqlaNode
wraps a raw DbNode
SQLAlchemy model in a
ModelWrapper
that handles session management automatically.
This wrapping provides two levels of access: node.model
for the wrapped model with automatic session tracking, and node.bare_model
for direct access to the raw SQLAlchemy model when you need to bypass AiiDA’s management.
Beyond model wrapping, SqlaNode
is responsible for actual database storage via its store
method (see above).
The design further uses generic typing to ensure type safety and proper relationships between related objects like users and computers.
The user interface: Node
#
Finally, we reach the top level, the Node
class, which is the main interaction point for users.
The implementation uses composition to wrap an SqlaNode
instance, while providing a Pythonic interface to users:
class Node(Entity['BackendNode', NodeCollection], metaclass=AbstractNodeMeta):
"""Base class for all nodes in AiiDA."""
def __init__(self, backend=None, user=None, computer=None, extras=None, **kwargs):
backend = backend or get_manager().get_profile_storage()
user = user if user else backend.default_user
backend_entity = backend.nodes.create(
node_type=self.class_node_type,
user=user.backend_entity,
computer=computer.backend_entity if computer else None,
**kwargs
)
super().__init__(backend_entity)
The creation magic happens in backend.nodes.create()
which returns an SqlaNode
instance[4]
that gets stored in the backend_entity
attribute.
The Node
class then delegates all database operations to this contained SqlaNode
through the backend_entity
property.
From the user’s perspective, just a Node
is created—the backend selection and delegation are transparent.
Here’s how all of this looks like in action in a verdi shell
:
In [1]: n = Int(1).store() # of type: aiida.orm.nodes.data.int.Int
In [2]: n
Out[2]: <Int: uuid: 2247e32a-c802-449c-9246-35b74e066c3f (pk: 12) value: 1>
In [3]: n.backend_entity
Out[3]: <aiida.storage.psql_dos.orm.nodes.SqlaNode at 0x7d8ac6d53880>
In [4]: n.backend_entity.bare_model
Out[4]: <DbNode id=103522, uuid=UUID('70cd..., node_type='data.core..., process_type=None, label='', description='', ctime=datetime.d..., mtime=datetime.d..., attributes={'value': ..., extras={'_aiida_h..., repository_metadata={}, dbcomputer_id=None, user_id=1,>
In [5]: n.backend_entity.model
Out[5]: <aiida.storage.psql_dos.orm.utils.ModelWrapper at 0x7d8ac6c6fcd0>
Namespacing
AiiDA further uses a namespace pattern on the Node
class to organize functionality:
class Node(Entity['BackendNode', NodeCollection], metaclass=AbstractNodeMeta):
"""Base class for all nodes in AiiDA."""
# ... more properties and methods
@cached_property
def base(self) -> NodeBase:
"""Return the node base namespace."""
return NodeBase(self)
# ... more properties and methods
class NodeBase:
"""A namespace for node related functionality."""
@cached_property
def repository(self) -> 'NodeRepository':
"""Return the repository for this node."""
return NodeRepository(self._node)
@cached_property
def attributes(self) -> 'NodeAttributes':
"""Return an interface to interact with the attributes."""
return NodeAttributes(self._node)
@cached_property
def links(self) -> 'NodeLinks':
"""Return an interface to interact with the links."""
return NodeLinks(self._node)
This namespace pattern was introduced because having all properties and methods directly on the Node
class (as was the case in the past) made the API cluttered and prone to name conflicts.
The namespace approach further groups related functionality together, e.g.:
node.base.attributes -> NodeAttributes
- attribute managementnode.base.repository -> NodeRepository
- file repository operationsnode.base.links -> NodeLinks
- provenance links and the use of@cached_property
ensures that these namespace objects are created only once per node instance[5].
Honorable mentions#
Link management: modeling relationships#
AiiDA models relationships between nodes through a separate DbLink
table:
class DbLink(Base):
"""Database model to store links between nodes."""
__tablename__ = 'db_dblink'
id = Column(Integer, primary_key=True)
input_id = Column(Integer, ForeignKey('db_dbnode.id'), nullable=False, index=True)
output_id = Column(Integer, ForeignKey('db_dbnode.id', ondelete='CASCADE'), nullable=False, index=True)
input = relationship('DbNode', primaryjoin='DbLink.input_id == DbNode.id')
output = relationship('DbNode', primaryjoin='DbLink.output_id == DbNode.id')
label = Column(String(255), nullable=False, index=True)
type = Column(String(255), nullable=False, index=True)
Rather than storing relationships as simple foreign keys, each link is a first-class entity with its own metadata.
The label
field describes the specific role of the link (e.g., “structure”, “parameters”, “output_data”), while the type
field categorizes the kind of relationship (e.g., “input”, “output”, “call”).
This metadata enables complex provenance queries like “find all structures that were relaxed using PBE exchange-correlation functional.”
The ondelete='CASCADE'
on the output relationship ensures that when a node is deleted, all its incoming links are also removed, maintaining referential integrity.
Pydantic models: Modern serialization#
Recently,
pydantic
models were integrated into AiiDA’s ORM for modern data validation and serialization:
class Node(Entity['BackendNode', NodeCollection]):
"""Base class for all nodes in AiiDA with Pydantic model support."""
class Model(Entity.Model):
"""Pydantic model for Node serialization and validation."""
uuid: Optional[str] = MetadataField(None, description='The UUID of the node')
node_type: Optional[str] = MetadataField(None, description='The type of the node')
attributes: Optional[Dict[str, Any]] = MetadataField(
None,
description='The node attributes',
orm_to_model=lambda node, _: node.base.attributes.all,
is_subscriptable=True,
exclude_to_orm=True,
)
repository_content: Optional[dict[str, bytes]] = MetadataField(
None,
description='Dictionary of file repository content encoded as base64',
orm_to_model=lambda node, _: {
key: base64.encodebytes(content)
for key, content in node.base.repository.serialize_content().items()
},
exclude_to_orm=True,
)
def serialize(self, repository_path: Optional[pathlib.Path] = None) -> dict[str, Any]:
"""Serialize the entity instance to JSON."""
if repository_path is None:
repository_path = pathlib.Path(tempfile.mkdtemp()) / f'./aiida_serialization/{self.pk}/'
repository_path.mkdir(parents=True)
return self._to_model(repository_path).model_dump()
@classmethod
def from_serialized(cls, **kwargs: dict[str, Any]) -> 'Node':
"""Construct an entity instance from JSON serialized data."""
return cls._from_model(cls.Model(**kwargs))
The integration of pydantic
brings various additional features and advantages:
Automatic validation: Pydantic automatically validates data types and constraints based on type annotations
Serialization: ORM objects can be automatically converted to/from
JSON
for export/importAPI integration: Pydantic’s automatic serialization and validation capability allows for easy creation standard APIs (e.g., aiida-restapi) and integration with frameworks like FastAPI
The QueryBuilder#
The
QueryBuilder
is AiiDA’s main Python API to retrieve data from the database.
It provides a uniform, backend-agnostic interface:
# Simple node query
qb = QueryBuilder()
qb.append(Node, filters={'ctime': {'>': datetime(2025, 1, 1)}})
results = qb.all()
# Complex relationship query
qb = QueryBuilder()
qb.append(StructureData, tag='structure')
qb.append(CalcJobNode, with_incoming='structure', tag='calc')
qb.append(FolderData, with_incoming='calc', project=['*'])
The QueryBuilder
automatically handles several complex translation tasks behind the scenes.
Most importantly, it converts the high-level query syntax into the appropriate SQL statement (while respecting the specific dialect of the database backend), ensuring that one can write database-agnostic queries while still leveraging each backend’s specific capabilities.
For this, among other operations, it converts ORM classes like Node
to their corresponding database entities, translates relationship specifications such as with_incoming
and with_outgoing
into proper SQL joins, and automatically adds filters for node types and subtypes based on the classes specified in the query.
The case for NodeCollection
#
In addition to the QueryBuilder
, AiiDA also implements the NodeCollection
class which provides a clean interface for managing …wait for it… collections of nodes:
class NodeCollection(EntityCollection[NodeType], Generic[NodeType]):
"""The collection of nodes."""
def delete(self, pk: int) -> None:
"""Delete a Node from the collection with the given id"""
node = self.get(id=pk)
if node.base.links.get_incoming().all():
raise exceptions.InvalidOperation(f'cannot delete Node<{node.pk}> because it has incoming links')
if node.base.links.get_outgoing().all():
raise exceptions.InvalidOperation(f'cannot delete Node<{node.pk}> because it has outgoing links')
self._backend.nodes.delete(pk)
The collection pattern provides several benefits that complement the QueryBuilder
’s capabilities.
On one hand, while the QueryBuilder
is suitable for complex analytical queries and relationship traversal, collections can handle the simpler but essential task of managing individual entities efficiently.
Thus, collections enable efficient bulk operations and batch processing that would be cumbersome with multiple QueryBuilder
calls.
Finally, they also provide essential validation logic, such as preventing deletion of nodes with existing links to maintain provenance integrity, and again hide database-specific operations behind a clean interface.
Key Takeaways#
AiiDA’s ORM architecture provides the foundation for all derived data types, for both, data and processes. More specialized types like orm.Int
, orm.CalcJobNode
, and orm.SinglefileData
all build on Node
and the functionality it provides.
The implementation makes use of several important design principles:
Multi-layer abstraction: The multi-layer design achieves a clean separation and allows for a uniform API, while leveraging backend-specific optimizations.
Multi-database support: The architecture supports both PostgreSQL and SQLite backends through SQLAlchemy’s dialect system.
Use of
JSON
columns: JSON(B) columns provide schema flexibility without sacrificing query performance.Namespace organization: The namespace pattern on the user-facing
Node
class keeps the API clean by structuring it in distinct, nested categories.Modern integration: Pydantic model integration brings modern Python data validation and serialization capabilities to the ORM.
Universal query interface: The
QueryBuilder
allows for identical query syntax across database backends.Collection patterns: The collection system provides a consistent, intuitive interface for data access that complements the
QueryBuilder
.
This architecture allows AiiDA to provide a powerful and flexible ORM that supports different database backends while maintaining a consistent user experience.
Footnotes