Glossary

Here is an alphabetically sorted list to help you understand every term you might encounter in the ArmoniK project:

ActiveMQ

Open source message broker written in Java used as the tasks queue.

For more information, check ActiveMQ documentation

CLA

Contributor License Agreement, Contribution agreement in the form of a license that everyone contributing to a given project must sign. One CLA is to be signed per repository.

You may read the CLA here

Client

User-developed software that communicates with the ArmoniK Control Plane to submit a list of tasks to be executed (by one or several workers) and retrieves results and error.

Compute Plane

Agreed term designating the set of compute pods, i.e., the pods running a Scheduling Agent + Worker pair within the Kubernetes cluster.

Container

A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another. (see Docker documentation)

Control Plane

Agreed term designating two different things depending on the context. Kubernetes and ArmoniK each have a Control Plane.

  • The Kubernetes Control Plane is a set of components that manage global decision-making for the cluster, as well as the detection and management of cluster events.
  • the ArmoniK Control Plane is a set of applications that act as a bridge between the client and the Compute Plane. In particular, it handles communications between certain components such as the queue, Redis and MongoDB.

Data dependency

Input data for a given task that depends on another unique task. Data dependencies formalize dependencies between tasks.

Data Plane

Expression designating the set of software components running the various storage and database systems within ArmoniK.

Fluentbit

Log and metrics monitoring tool, optimized for scalable environments. For more information, check Fluentbit's documentation

HPA

Horizontal Pod Autoscaler : a Kubernetes module to automatically update the amount of resources to allocate based on the current workload.

For more information, check Kubernetes documentation about HPA

Grafana

Data visualization web application. Used to monitor resources and statistics in ArmoniK

For more information, check Grafana's documentation

gRPC

Open source framework capable of running on different platforms, computer or server, and can communicate between these different environments . Remote Procedure Call was made by Google originally.

For more information, check gRPC's website

Ingress

API object that manages external access to the services in a cluster, typically HTTP. Ingress may provide load balancing, SSL termination and name-based virtual hosting.

For more information, check Kubernetes documentation about Ingress

KEDA

Kubernetes Event-driven Autoscaler : this component exposes custom metrics external to Kubernetes. The HPA can have accesss to it

For more information, check KEDA's documentation

Kubernetes

Kubernetes is an open source container orchestration engine for automating deployment, scaling and management of containerized applications. The open source project is hosted by the Cloud Native Computing Foundation (CNCF)

For more information, check Kubernetes documentation

MongoDB

MongoDB is a document database designed for ease of application development and scaling. For more information, check MongoDB's documentation.

Node

In the context of Kubernetes, a node is a machine that runs containerized workloads as part of a Kubernetes cluster. A node can be a physical or virtual machine hosted in the cloud or on-premise.

NuGet

The name can be used to refer to the package manager using the .NET framework or the packages themselves. These packages contain code from other developers that you may download and use. For more information, check the NuGet documentation

Partition

Logical segmentation of the Kubernetes cluster's pool of machines to distribute workloads according to usage. This feature is provided and handled by ArmoniK.

Payload

Input data for a task that does not depend on any other task. This is the actual binary input of a task.

Pod

Pods are the smallest deployable units of computing that one can create and manage in Kubernetes. A Pod is a group of one or more containers, with shared storage and network resources and a specification for how to run the containers. A Pod's contents are always co-located and co-scheduled and run in a shared context. A Pod models an application-specific "logical host": it contains one or more application containers which are relatively tightly coupled (see Kubernetes documentation).

Polling agent

Former name of the scheduling agent.

Prometheus

Toolkit that collects and stores time series data from the different systems. These data can be visualized with Grafana for monitoring. For more information, check Prometheus documentation

Redis

Redis is an open source (BSD licensed), in-memory data structure store used as a database, cache, message broker and streaming engine. Redis provides data structures such as strings, hashes, lists, sets and sorted sets with range queries, bitmaps, hyperloglogs, geospatial indexes and streams. Redis has built-in replication, Lua scripting, LRU eviction, transactions and different levels of on-disk persistence and provides high availability via Redis Sentinel and automatic partitioning with Redis Cluster (see Redis documentation).

In ArmoniK, Redis is used as a key-value cache for task data (such as payloads and results).

Scheduling agent

Containerized software cohabiting with a worker within a pod, running a specific algorithm to determine which tasks "its" worker (the one with which it shares the pod) should perform. It also manages all interactions between the worker and the databases (retrieving/saving data, creating new tasks, etc.), as well as managing worker errors and retrying/resubmitting failed tasks when necessary. A scheduling agent, like a worker, exists within a single partition.

Session

A session is a logical container for tasks and associated data (task statut, results, errors, etc). Every task is submitted within a session. An existing session can be resumed to retrieve data or submit new tasks. When a session is cancelled, all associated executions still in progress are interrupted.

Seq

Logs aggregator software. Used in ArmoniK to identify and aggregate errors and information from the different components.

For more information, check Seq's documentation

Submitter

Containerized software in charge of submitting tasks, i.e., writing the corresponding data to the various databases (queue, Redis and MongoDB).

Task

Atomic computation taking one or several input data and outputting one or several results. A task is launched by a client and processed by a worker. In ArmoniK, tasks cannot communicate with each other. They can, however, depend on each other via their input/output data, known as data dependency.

Worker

User-developed containerized software capable of performing one or several tasks depending on its implementation. A worker can simply take input data and perform calculations on it to return a result. A worker can also submit new tasks that will be self-performed, or by different workers, other instances of itself.

Workload

Set of computational tasks