Data Nodes
The primitive data type is the Node. It's a container for arbitrary user-supplied data. The API implementation takes care of storing the node objects in the database and provides a public web interface to build client applications. Whenever a node is being added, a pub/sub event is sent so that client services can take appropriate actions. Typically, this is the role of orchestrators: scheduling tasks in response to particular events.
Each node may have a parent to form a directed tree. A node with no parent is called a root. There can of course be many root nodes in the same database, each with their own arborescence. An interesting property is that every node has a single path to its root, which can be found by recursively walking through all the parent nodes.
Note: Node objects are read-only. Once added to the database, they can't be updated. However, child nodes can be added to grow the related data.
A node may also contain a Task object. This is not required as not all node objects are created by tasks, in the same way that tasks don't always create nodes. These are loosely-coupled concepts. Still it's a common scenario, and as such it's useful to have traceability between nodes and the tasks that created them whenever applicable.
Object Model
Node objects follow a model defined by the API. The .data field is an
arbitrary one defined by users, the only constraint being that it's a
dictionary and all keys must be strings. Similarly, the .artifacts field is
a dictionary with file names and user-provided URLs to access them.
All the other fields are managed directly by the API and play a role in how the nodes are used, following certain rules.
Here's a slightly simplified list of the fields found in the Node model:
This is the Node unique identifier, basically a MongoDB ObjectId. Please
note that it is only unique within the database of each individual API instance
and not universally unique like Task objects which use UUID. To refer to a
Node object outside the API, you may use its URL
e.g. https://api-hostname.com/latest/node/ID.
To form a tree, each node may have a parent node. This field is to keep the
parent node identifier. Since it's an internal one, parents and therefore
entire trees have to be contained within a single database. To have a parent
in another API instance or database, the .data field may be used with some
logic on the client side. Additional features may be built into the API to
facilitate this in future versions, say with .parent.api and .parent.node
fields to follow a more federated architecture. Similarly, separate trees in a
same database may be linked via the .data field with some logic in the client
application - for example, a previous version or iteration of the same node as
produced by repeated tasks.
Each node must have a name. This is to be able to identify it in the tree,
other than with its database identifier. There's no constraint on it other
than it needs to be a string, so for anonymous nodes the identifier may be used
again or just node or banana. It is however very much like files and
directories in a file system, having meaningful names is important. Users will
typically be interacting with the node names directly via a web dashboard or
command-line tools.
Since each node has a name and may be in a tree, each node also has a path.
This can be worked out by collecting the names of all the parent nodes
recursively up to the root node which has no parent. However, it's a costly
operation with lots of database lookups and the path is a common way for users
to retrieve nodes in a tree. So instead of computing it many times, it's
stored in the .path field. The current model uses a list of strings, another
popular approach is the dotted syntax but this would add some constraints on
the node names and require lots of string parsing operations outside the
database engine.
artifacts: Dict[str, AnyHttpUrl] = Field(description="Artifacts associated with the node (binaries, logs...)")
Artifacts are files or generally speaking any standalone piece of data that can be retrieved over a stable URL. This will usually be logs from the task that run and produced the node, or some binary files it generated. Ultimately it's up to the user to upload them to a third-party storage service. Each artifact has a key in the dictionary, basically a string with a name to identify what is to be found with its URL.
Users may also provide some arbitrary object data in the form of a dictionary.
The keys need to be strings, and the values can be any object accepted by the
underlying database engine (e.g. MongoDB). This will usually be a mix of
primitive types, lists, dictionaries and some slightly more advanced ones such
as timestamps. No schema is imposed on this data in a classic NoSQL document
database approach, except if one is supplied by the user for enhanced
validation - see the .kind field below.
Users may submit "kinds" of nodes, with a schema to describe the .data field
- see Issue #7 about the
on-going design of this feature. If the .kind field is set, the idea is then
to look up a previously registered schema with this name and use it to validate
the content of .data.
If the node was created by a task, it should become customary practice for the
task to store itself in the .task field as an
embedded
object. This is primarily for client-side usage, to keep track of which tasks
were run and how they relate to the nodes.
Each node belongs to a user. This is useful when searching for nodes to avoid
getting data from other users instead. It's worth mentioning that parent nodes
may belong to a different user since making a node a parent doesn't require
changing the parent itself - only setting its identifier in the child node's
.parent field. This is useful for example when orchestrating tasks that need
to be run when other users have added a node. Say, if a bot user is sending
meteorologic data reports, you may run an orchestrator with your personal user
account to run tasks that will process them and generate additional child nodes
with images as artifacts etc. and use the bot's report node as their parent.