Tasks Scheduling
While Nodes are all about data, Tasks are for actions. The definition of the actual task to accomplish is entirely managed by the user, only the mechanism to enable orchestration is provided through the API.
Tasks are essentially transitory objects. They first get created by sending a
message to a particular queue via the /task/schedule endpoint and they last
until they're done. During this time, they are stored internally in Redis and
may optionally be saved persistently in a Node object by the user.
On the receiving end of the message queues are the Schedulers, or basically consumer services in charge of executing the tasks and completing them. As each Scheduler will have particular characteristics with access to specific runtime environments and different provisioning methods, they need to get registered with the API using a dedicated queue. There can be multiple identical instances of a Scheduler waiting for messages on the same queue as each message will be delivered to only one of them. It's a typical design pattern in distributed systems which makes it easy to scale up or down dynamically.
Object Model
Task objects follow a model defined by the API. Like Node objects, Tasks are read-only. Their state is determined by the fact that they transition from a message queue to the in-memory database and then get deleted. Any application data produced by the task should be stored in Node objects.
Let's take a closer look at each field of the Task model:
The task identifier is a classic UUID. That means it won't collide with other API instances and makes it agnostic of any database engine where it may be stored.
Attributes are arbitrary, a bit like Node.data. They are set by the user
creating the task to describe how it should be run. For example, a Scheduler
receiving the task as a message could use its attributes to set the priority,
provision a particular type of resources or run a certain variant of the task.
This is the same value as the name of the message queue where the task was posted. In most cases, it won't be used by the Scheduler since it's already listening on that queue. It's mostly useful for special cases such as when a task failed to get accepted by a Scheduler or after the fact when it's done and stored in a Node to keep a trace of where it was run.
timeout: datetime = Field(
default_factory=DefaultTimeout(hours=6).get_timeout,
description="Task expiry timestamp"
)
If a task is still not done (i.e. aborted or complete) when reaching the timeout then it gets removed automatically by the Timeout daemon running on the server side of the API.
Runtime Environments
Work in progress
The Scheduler services are in charge of receiving tasks from message queues and
scheduling them in concrete runtime environments. Those are based on the
Runtime abstract class, currently only Inline is available to run tasks
directly as Python methods in the current process. Arbitrary ones can be
implemented such as: Docker, Kubernetes, on-demand VMs and various kinds of
hardware infrastructure.