Exploring Cloud Native projects in CNCF Sandbox. Part 6: 9 arrivals of Spring 2025

cncf sandbox 2025 spring

The second batch of Open Source projects introduced to the CNCF Sandbox over the last year comprises 9 projects accepted during March-May. They have a lot to do with two topics: AI/ML and alternative approaches/mechanisms for running various workloads. Another prominent feature of this batch is a shortened path to CNCF acceptance for most projects, including a unique, less-than-half-year timeframe for one of them.

Let’s see what all this software brings to a wider Cloud Native community. As we usually do, the projects below are listed by their formal categories in the CNCF, starting with the ones with more items.

Automation & Configuration

1. KitOps

KitOps packages AI/ML models into all-in-one bundles called ModelKits. They include code, model weights, datasets, prompts, environment configurations, and other data that are essential to reproduce, test, and deploy the model. KitOps can work with various types of models, such as LLMs, multimodal models, predictive models, and others.

Each ModelKit is defined in a YAML-formatted manifest called Kitfile that describes model, code (notebooks and scripts), datasets (for training, validation, etc.), prompts, and documentation. After it’s done, you can use the kit CLI to create, manage, run, and deploy your models. You can also run this tool’s commands in CI/CD pipelines, and there are tutorials for popular systems, including Argo CD, Dagger, and GitHub Actions. Another option for managing ModelKits is to use a Python library.

All assets (models, datasets, configurations, etc.) are stored in separate OCI layers with their own SHA-256 digests, and any OCI 1.1+ registry can be used for storing them.

After KitOps was built, its formal specification became a separate project known as ModelPack. Today, it’s a CNCF Sandbox project that represents a vendor-neutral specification standard for packaging, distributing, and running AI models. We will cover it in more detail in the next article (since it formally joined CNCF in the next batch).

2. OpenTofu

Despite being a recent addition to the CNCF Sandbox, OpenTofu is already well-known. It started in August 2023, shortly after HashiCorp changed Terraform’s license to the BSL (Business Source License), which is not Open Source and therefore met resistance from the community. The reaction was so strong that, within just two weeks after the license change announcement, 100+ companies and hundreds of individuals announced a Terraform fork called OpenTofu (its original name was OpenTF). The following month, it became a project owned by the Linux Foundation.

A few months later (in January 2024), it applied to join the CNCF Sandbox. Why did it take so long to fulfill this request? Because of the legal challenges — namely, the Cease and Desist letter from HashiCorp claiming copyright infringement over specific changes to the OpenTofu code. Another, smaller but still important, issue was the project’s MPL license, which required an exception from the CNCF Governing Board, because normally, CNCF accepts only Apache-licensed projects. After almost a year of waiting, the CNCF TOC reviewed the project in January 2025 and, within a few months, formally accepted it.

Today, OpenTofu is a community-driven Terraform alternative for IaC that supports the original HCL syntax, the same backends, and is similar to its parent in terms of features. We already reviewed OpenTofu a while ago, and perhaps the most prominent unique OpenTofu capability is still the client-side state encryption. Thus, generally speaking, opting for OpenTofu is often not about specific features but about sticking with the Open Source, vendor-neutral solution.

3. kagent

This project provides a framework for building and running AI agents in Kubernetes.

To use it, you’ll first need to define your AI agent with a system prompt, a set of tools, and an LLM configuration. What are those tools? First of all, there are built-in tools that work with numerous Cloud Native software projects, including Kubernetes, Helm, Istio, Cilium, Argo, Grafana, and Prometheus. Secondly, you can use other agents as tools. Finally, kagent supports other MCP (Model Configuration Protocol) tools and even HTTP tools (kagent can discover them if they are OpenAPI-compliant).

As for the LLM configuration, kagent supports various LLM providers — namely, Amazon Bedrock, Anthropic, Azure OpenAI, Gemini, Google Vertex AI, Ollama, and OpenAI. You can also use other models from an OpenAI API-compatible LLM, such as Cohere AI.

Returning to the AI agent configuration: once defined, it is stored as a Kubernetes custom resource. There are specific custom resources for all involved parts: agents themselves, their tools, LLM providers… Then, a Kubernetes controller that watches the kagent custom resources will create everything the agent needs to run. At this point, you can manage your AI agent via the UIs, CLI or GUI (Web). For example, you can list your agents and send them messages.

kagent web dashboard chat
Chatting with your agent using the kagent Web UI

kagent supports the Human-in-the-Loop approach, meaning agents can ask users questions, and user approval may be required for the specific actions performed by the tools. To improve how agents use the tools, they support skills, allowing you to define specific guidance on how and when to use tools effectively. Skills can be defined as static descriptions (called actions-to-actions, A2A) or as executable code/modules/functions packaged as container images (thus, they will be referred to as container-based).

kagent provides capabilities to audit all prompts and agents’ replies, as well as to enable tracing for them (the documentation includes an example for Jaeger). There’s also a separate project called kmcp that simplifies the local development of MCP servers and tools you can use with kagent.

4. Cadence

Cadence is an orchestration engine, or a code platform, for building distributed applications. It allows you to write code following the fault-oblivious stateful programming model and ensures durability, availability, and scalability of the resulting apps.

The main concept of Cadence is workflow, defined by code that is fault-oblivious (i.e., it doesn’t care about failures or downtime) and stateful. Another essential abstraction is activity, which lets you go beyond the deterministic limitations of workflows and, for example, call external APIs directly. (Cadence won’t recover the state of activity in case of failures.) Other essential concepts of this project are events to which workflows can respond and queries that expose the internal state of workflows to the external world.

Common use cases for Cadence include multiple microservice calls, periodically executed and batch jobs, polling tasks, event-driven apps, infrastructure provisioning and application deployment, and DSL workflows.

A typical Cadence-based application will include workflows, activities, and external clients, all of which rely on the Cadence service to operate. The application code can be written using the official SDKs, which are available for Go and Java. There have also been community efforts to provide SDKs for Python and Ruby, but both seem to be no longer maintained.

The Cadence service is a scalable multitenant application which exposes its functionality via a strongly typed gRPC API. The platform’s backend is stateless, but the workflow history is stored in a database (Apache Cassandra, MySQL/TiDB, or PostgreSQL/CockroachDB).

cadence application

Cadence provides a CLI tool for managing workflows and a Web UI for viewing them. The project offers Helm charts for deploying it in Kubernetes, as well as charts for simple integration with Grafana for monitoring, thanks to pre-configured dashboards.

Container Runtime

5. Hyperlight

This Rust library implements a lightweight virtual machine manager. The main idea is that Hyperlight can spin up microVMs in an impressive 1-2 milliseconds(!):

  • These VMs are micro (i.e., have minimal overhead and startup time) because, unlike traditional virtual machines, they don’t include an entire operating system or a guest kernel. Guests should be implemented specifically for Hyperlight using its guest library (in Rust or C).
  • At the same time, these VMs are hypervisor-isolated, providing a safe way to execute untrusted code. Hyperlight supports Linux KVM, Hyper-V on Windows (WHP), and Hyper-V for Linux (mshv) as hypervisors.
  • Since the VMs are created instantly, you can use Hyperlight in scale-to-zero scenarios and when reacting to events is needed. While it’s not as fast as native functions, it’s much faster than other virtualisation solutions, such as Firecracker.

Using Hyperlight across different cases can be easier with subprojects that enable running various workloads within microVMs. For example:

  • Hyperlight-wasm can run Wasm modules and components inside VM-backed sandboxes.
  • Hyperlight-js is a JavaScript runtime executing inside Hyperlight (based on the QuickJS engine).
  • Hyperlight-unikraft runs Unikraft unikernels.

The project also features Hyperlight Sandbox as a sandboxing framework with a unified API across multiple isolation backends and SDKs for various languages, including Python, Rust, and .NET.

6. interLink

interLink aims to bring Virtual Kubelet adoption to the next level. The latter is a “kubelet implementation that masquerades as a kubelet for the purposes of connecting Kubernetes to other APIs”. Its main idea is to extend the Kubernetes API into serverless container platforms, such as AWS Fargate. interLink pushes it further by creating an abstraction layer on top of the Virtual Kubelet with easy-to-implement plugins, enabling you to benefit from a wider variety of providers for running workloads.

To sum it up, interLink helps you offload the specific tasks executed in containers to numerous remote systems, including other Kubernetes clusters, HPC systems, or even virtual machines. Technically, interLink comprises two main components:

  • Virtual Kubernetes Node (based on Virtual Kubelet), which translates requests for Kubernetes Pod execution into a remote call to the interLink API server.
  • interLink API server, a modular and pluggable REST server with plugins that define how the offloaded containers will run on the remote system.

As mentioned above, the supported remote platforms include external Kubernetes clusters, HPC batch systems (Slurm and HTCondor), regular VMs (running Docker or other container runtimes), and serverless solutions. Importantly, the pluggable architecture means you can create your own plugins to run containers on other platforms.

interlink architecture
interLink architecture and its approach to offload workloads from Kubernetes to other systems

When looking at the currently supported platforms, it’s easy to see that typical use cases for interLink are AI/ML-related workloads (HPC, GPU-intensive tasks), batch processing, and also hybrid cloud setups where you want to distribute regular workloads across various providers. It’s also no wonder that scientific institutions, including CERN and the Italian National Institute for Nuclear Physics, are great examples of the project’s adopters.

7. urunc

Dubbed “runc for unikernels”, this project integrates traditional unikernels with the Cloud Native ecosystem. While urunc acts like other OCI (Open Container Initiative) runtimes, it launches not a simple process, but the unikernel. It supports various unikernel frameworks, Virtual Machine Monitors (VMMs), and similar technologies. For example, you can run Linux-compatible Unikraft (with QEMU or Firecracker), NetBSD-based Rumprun, OCaml-based MirageOS, or Hermit written in Rust.

The unikernels to be run with urunc should be placed in an OCI container image. To simplify this process, the project’s authors developed special tools that build and package a unikernel binary. One of them leverages Docker’s BuildKit (bunny), and another uses Nix packages (bunix).

The authors highlight numerous practical use cases for unikernels, including microservices (thanks to lightweight environments for running apps), serverless and Function as a Service (fast-spawning capabilities), edge computing (minimised resource consumption), and security-sensitive apps (minimised attack surface and VM-based isolation).

API Gateway

8. kgateway

Originally known as Gloo, kgateway is called “the most mature and widely deployed gateway in Kubernetes” today. It builds on top of well-known existing Cloud Native projects: Envoy as a proxy and the Kubernetes Gateway API as the configuration language. So, basically, kgateway serves as a control plane that translates Gateway API resources into the configuration Envoy understands.

kgateway architecture
kgateway architecture: how its components translate Gateway API resources into the proxy configuration

Kgateway is fully compliant with the Kubernetes Gateway API and extends its already impressive functionality further through custom extension APIs. They add more features to address various needs in:

  • traffic management: weighted routing, route delegation, requests and responses transformations, direct responses, and external processing;
  • security (access logging);
  • resiliency (traffic mirroring);
  • third-party integrations: AWS ELBs and Lambda, Cloud Native tools (ExternalDNS, cert-manager, Istio, and Argo Rollouts).

For observability, kgateway offers a ready-to-deploy stack that includes OpenTelemetry Collector, Prometheus, Loki, Tempo, and Grafana.

If you’re interested in a proxy for AI-native protocols (MCP, A2A), there is a sister project called agentgateway. Previously, it was a part of kgateway, but starting from v2.3.0 (released in May 2026), AI- and agentic-related features were moved to this separate project/repo.

Current kgateway adopters include numerous well-known companies, such as BMW, Mattel, NTT Communications, Schneider Electric, and T-Mobile.

PaaS/Container Service

9. Cozystack

Cozystack lets you run your own (i.e. self-hosted) cloud platform, similar to what typical cloud providers offer, built entirely on Kubernetes. To make this possible, it provides a Kubernetes distribution for bare-metal servers (based on Talos), deploys lots of ready-to-use modules on it, and provides a convenient interface for operating them.

What are those modules? They cover all the essentials you might think of:

  • networking based on CoreDNS, ExternalDNS, Cilium, Kube-OVN, MetalLB, Gateway API, NGINX, and HAProxy;
  • storage: LINSTOR, SeaweedFS, NFS, and Velero;
  • virtual machines powered by KubeVirt;
  • observability: Prometheus, Fluent Bit, Grafana, VictoriaMetrics;
  • security-related tools: OpenBao, Harbor, Keycloak;
  • databases: PostgreSQL, MariaDB, Percona Server for MongoDB, Redis, ClickHouse, OpenSearch;
  • messaging and caching: NATS, RabbitMQ, Apache Kafka.

This variety of services means that Cozystack users get fully managed Kubernetes and managed services to run their software easily. For example, they get managed PostgreSQL instances that are controlled by the CNPG operator and RabbitMQ instances managed by the RabbitMQ Cluster Kubernetes Operator.

cozystack architecture
The 4-layer diagram of the cloud platform Cozystack implements, from OS to managed services

Platform management itself is automated using a GitOps approach (based on Flux CD and the ControlPlane Flux Operator). Cozystack also comes with a web UI that displays available modules and their status, as well as lets you deploy new managed services.

You can install Cozystack in regular and air-gapped environments, use Ansible to automate the whole process, and benefit from guides for deploying it on specific infrastructure providers: Hetzner, Oracle Cloud Infrastructure, and Servers.com.

Afterword

Most of the projects joining CNCF in this batch are about 1-2 years old, which is younger than the typical 2-3 years. Two exceptions we have are Cadence (started in 2016) and kgateway (2018). But the most interesting and exciting case here is kagent. This project was started in January 2025, then it applied to join CNCF just 4 months later, and was accepted literally next month. It means the entire path from birth to becoming a CNCF Sandbox project took less than 6 months, which is truly impressive!

Notably, this batch is the second in a row in which Container Runtime is one of the leading categories for new CNCF projects. It definitely shows how diverse the Cloud Native ecosystem is becoming, with more and more platforms, environments, and their combinations for running various workloads. Another obvious trend is AI/ML, though it still looks quite balanced, delivering practical tools and no unnecessary hype.

Among companies contributing their projects to CNCF, the only one that stands out in this batch is Solo.io, which created two projects. Programming languages are, as usual, dominated by Go (8 of 9 projects) with a pinch of Rust.

In the next article of this series, we’ll cover the rest of the CNCF Sandbox additions in 2025, together with a few early 2026 newcomers.

P.S. Other articles in this series

  • Part 5: 13 arrivals of January 2025: Podman Container Tools and Podman Desktop, bootc, composefs, k0s, KubeFleet, SpinKube, container2wasm, Runme Notebooks for DevOps, SlimFaas, Tokenetes, CloudNativePG, and Drasi.
  • Part 4: 13 arrivals of 2024 H2: Ratify, Cartography, HAMi, KAITO, Kmesh, Sermant, LoxiLB, OVN-Kubernetes, Perses, Shipwright, KusionStack, youki, and OpenEBS.
  • Part 3: 14 arrivals of 2024 H1: Radius, Stacker, Score, Bank-Vaults, TrestleGRC, bpfman, Koordinator, KubeSlice, Atlantis, Kubean, Connect, Kairos, Kuadrant, and openGemini.
  • Part 2: 12 arrivals of 2023 H2: Logging operator, K8sGPT, kcp, KubeStellar, Copa, Kanister, KCL, Easegress, Kuasar, krkn, kube-burner, and Spiderpool.
  • Part 1: 13 arrivals of 2023 H1: Inspektor Gadget, Headlamp, Kepler, SlimToolkit, SOPS, Clusternet, Eraser, PipeCD, Microcks, kpt, Xline, HwameiStor, and KubeClipper.

Comments

Your email address will not be published. Required fields are marked *