Glossary
Here is an alphabetically sorted list to help you understand every term you might encounter in the ArmoniK project:
ActiveMQ
Open source message broker written in Java used as the tasks queue.
For more information, check ActiveMQ documentation
CLA
Contributor License Agreement, Contribution agreement in the form of a license that everyone contributing to a given project must sign. One CLA is to be signed per repository.
You may read the CLA here
Client
User-developed software that communicates with the ArmoniK Control Plane to submit a list of tasks to be executed (by one or several workers) and retrieves results and error.
Compute Plane
Agreed term designating the set of compute pods, i.e., the pods running a Scheduling Agent + Worker pair within the Kubernetes cluster.
Container
A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another. (see Docker documentation)
Control Plane
Agreed term designating two different things depending on the context. Kubernetes and ArmoniK each have a Control Plane.
- The Kubernetes Control Plane is a set of components that manage global decision-making for the cluster, as well as the detection and management of cluster events.
- the ArmoniK Control Plane is a set of applications that act as a bridge between the client and the Compute Plane. In particular, it handles communications between certain components such as the queue, Redis and MongoDB.
Data dependency
Input data for a given task that depends on another unique task. Data dependencies formalize dependencies between tasks.
Data Plane
Expression designating the set of software components running the various storage and database systems within ArmoniK.
Fluentbit
Log and metrics monitoring tool, optimized for scalable environments. For more information, check Fluentbit's documentation
HPA
Horizontal Pod Autoscaler : a Kubernetes module to automatically update the amount of resources to allocate based on the current workload.
For more information, check Kubernetes documentation about HPA
Grafana
Data visualization web application. Used to monitor resources and statistics in ArmoniK
For more information, check Grafana's documentation
gRPC
Open source framework capable of running on different platforms, computer or server, and can communicate between these different environments . Remote Procedure Call was made by Google originally.
For more information, check gRPC's website
Ingress
API object that manages external access to the services in a cluster, typically HTTP. Ingress may provide load balancing, SSL termination and name-based virtual hosting.
For more information, check Kubernetes documentation about Ingress
KEDA
Kubernetes Event-driven Autoscaler : this component exposes custom metrics external to Kubernetes. The HPA can have accesss to it
For more information, check KEDA's documentation
Kubernetes
Kubernetes is an open source container orchestration engine for automating deployment, scaling and management of containerized applications. The open source project is hosted by the Cloud Native Computing Foundation (CNCF)
For more information, check Kubernetes documentation
MongoDB
MongoDB is a document database designed for ease of application development and scaling. For more information, check MongoDB's documentation.
Node
In the context of Kubernetes, a node is a machine that runs containerized workloads as part of a Kubernetes cluster. A node can be a physical or virtual machine hosted in the cloud or on-premise.
NuGet
The name can be used to refer to the package manager using the .NET framework or the packages themselves. These packages contain code from other developers that you may download and use. For more information, check the NuGet documentation
Partition
Logical segmentation of the Kubernetes cluster's pool of machines to distribute workloads according to usage. This feature is provided and handled by ArmoniK.
Payload
Input data for a task that does not depend on any other task. This is the actual binary input of a task.
Pod
Pods are the smallest deployable units of computing that one can create and manage in Kubernetes. A Pod is a group of one or more containers, with shared storage and network resources and a specification for how to run the containers. A Pod's contents are always co-located and co-scheduled and run in a shared context. A Pod models an application-specific "logical host": it contains one or more application containers which are relatively tightly coupled (see Kubernetes documentation).
Polling agent
Former name of the scheduling agent.
Prometheus
Toolkit that collects and stores time series data from the different systems. These data can be visualized with Grafana for monitoring. For more information, check Prometheus documentation
Redis
Redis is an open source (BSD licensed), in-memory data structure store used as a database, cache, message broker and streaming engine. Redis provides data structures such as strings, hashes, lists, sets and sorted sets with range queries, bitmaps, hyperloglogs, geospatial indexes and streams. Redis has built-in replication, Lua scripting, LRU eviction, transactions and different levels of on-disk persistence and provides high availability via Redis Sentinel and automatic partitioning with Redis Cluster (see Redis documentation).
In ArmoniK, Redis is used as a key-value cache for task data (such as payloads and results).
Scheduling agent
Containerized software cohabiting with a worker within a pod, running a specific algorithm to determine which tasks "its" worker (the one with which it shares the pod) should perform. It also manages all interactions between the worker and the databases (retrieving/saving data, creating new tasks, etc.), as well as managing worker errors and retrying/resubmitting failed tasks when necessary. A scheduling agent, like a worker, exists within a single partition.
Session
A session is a logical container for tasks and associated data (task statut, results, errors, etc). Every task is submitted within a session. An existing session can be resumed to retrieve data or submit new tasks. When a session is cancelled, all associated executions still in progress are interrupted.
Seq
Logs aggregator software. Used in ArmoniK to identify and aggregate errors and information from the different components.
For more information, check Seq's documentation
Submitter
Containerized software in charge of submitting tasks, i.e., writing the corresponding data to the various databases (queue, Redis and MongoDB).
Task
Atomic computation taking one or several input data and outputting one or several results. A task is launched by a client and processed by a worker. In ArmoniK, tasks cannot communicate with each other. They can, however, depend on each other via their input/output data, known as data dependency.
Worker
User-developed containerized software capable of performing one or several tasks depending on its implementation. A worker can simply take input data and perform calculations on it to return a result. A worker can also submit new tasks that will be self-performed, or by different workers, other instances of itself.
Workload
Set of computational tasks