This article shares our experience of getting rid of continuously running EC2 instances, setting up scalable GitLab Runners in AWS, and significantly cutting CI infrastructure costs. How did it happen?
At first, everything was working like clockwork, until one moment, it all went south. At some point, our GitLab Jobs started experiencing delays of 5, 10, and even 15 minutes. Pipeline queues started getting backed up, our DevOps engineers were growing nervous, our developers were frustrated, and AWS quietly kept charging us for hundreds of EC2 instance hours.
The “let’s just spin up some EC2s” trick only got us so far. Soon enough, we again found ourselves staring the very same issues in the face: idle instances, money going down the drain, and no real Job isolation. We realized that having to “fix” the issue over and over was a dead end — we had no choice but to set up real autoscaling for our GitLab Runners.
Goals: automated scaling, lower costs
We aimed to implement an approach that would scale CI to accommodate fluctuating workloads and reduce our AWS costs.
Rather than keeping runners on “always-on” EC2s, the idea was to automatically spin up instances for specific Jobs or groups of Jobs. When a new Job came in, an EC2 instance was created. Once the Job was completed, the instance either switched to the next one in the queue or shut down automatically if there was no more work to do.
In this case, the only “always-on” EC2 we were left with was a managing instance running the GitLab Runner, responsible for launching Jobs and orchestrating them. It consumed very modest resources, so keeping it running costs next to nothing compared to the actual production workloads.
To sum up, our objectives were to:
- auto-spin up EC2 instances as soon as GitLab CI Jobs come in;
- stop or terminate an instance upon remaining idle for N minutes;
- keep Jobs isolated: one VM = one Job (or a small Job batch);
- automate Amazon Machine Image (AMI) builds to quickly pop up identical instances featuring the same software;
- manage the runners using Terraform as part of the infrastructure as code approach.
Workflow and GitLab + AWS setup overview
Here is a helicopter view of our expected workflow:
- GitLab triggers a Job using a specific tag →
- The managing Runner picks it up and uses the Fleeting Plugin to spin up a new EC2 instance →
- The build is run on that instance →
- Once the Job is done, the instance is automatically stopped or deleted.
We have implemented it using GitLab Runner version 15.11+ with Fleet scaling support in our Ubuntu Linux-based setup. Fleet scaling is GitLab’s built-in feature for scaling runners using external resources. That means Jobs don’t run on the runner server, but on ephemeral virtual machines in the cloud (AWS in our case).
Fleeting is the library/plugin that implements this approach by connecting the runner to the cloud. It is responsible for provisioning AWS EC2 instances upon the runner’s request, connecting to them (typically via SSM), offloading the Jobs to them, and terminating or deleting the instances once they are no longer required.
Here are the steps we’ll be following to reach our goals:
- Choose an executor. Figure out how the GitLab Runner will run the Jobs — whether right on the VM, inside a Docker container, or on ephemeral EC2s.
- Install GitLab Runner on managing EC2. This “always-on” runner will accept Jobs from GitLab and use Fleeting to provision temporary instances for them.
- Create an IAM user. Set up a dedicated User (e.g.,
gitlab-autoscaler); the runner will use its credentials to interact with EC2 and Auto Scaling. - Configure the IAM policy. Grant the IAM user the exact permissions it needs: creating/deleting instances, interacting with the Auto Scaling Group, and reading resource details.
- Install the Fleeting Plugin. This plugin connects the runner to AWS and allows it to automatically start and stop EC2 instances.
- Configure GitLab Runner. Modify
config.tomlto set up executors, autoscaler parameters, S3 caching, idle policies (idle_time), and the maximum number of instances. - Prepare the AMI for worker instances. Build a base image containing all the software we need, such as gitlab-runner, docker, kubectl, helm, etc. We will reference that AMI in the Launch Template.
- Prepare the Auto Scaling Group. Create and configure the ASG and the Launch Template: specifying the instance type, AMI, scaling parameters, and updating the image if necessary.
Implementation
1. Picking the GitLab Runner executor
Before diving into the configs and setup, let’s go over a bit of theory: what GitLab Runner executors are out there and how they differ from one another. Here’s a brief technical comparison of executors:
| Executor | Isolation | Environment | Best suited for |
| shell | ❌ | Local VM | Basic scripts, quick tests |
| docker | ✅ | Docker on host | Frontend, unit tests, microservices |
| instance | ✅✅ | Dedicated EC2 instance | Terraform, shell jobs, Ansible, tools |
| docker-autoscaler | ✅✅ | Docker on EC2 | Containerized jobs, builds, frontend CI/CD |
| kubernetes | ✅✅ | Kubernetes Pod | Massive, scalable CI/CD infrastructures |
We will focus on the instance and docker-autoscaler executors, since they:
- can automatically spin up and shut down EC2 instances for specific Jobs;
- offer great isolation: one VM or one container on a dedicated instance per Job;
- use the GitLab’s Fleet Scaling API: the native autoscaling mechanism for Runners.
What about the kubernetes executor? While it also provides isolation and scaling, it requires you to already have a Kubernetes cluster and maintain it. That’s a huge topic that deserves its own write-up, so we’ll skip it in this guide for simplicity’s sake.
2. Installing the Runner on the managing instance
Now that we’ve nailed down the components, let’s move ahead to the practical part.
First, we will install the GitLab Runner on the managing EC2 instance. This “always-on” instance is responsible for:
- fetching Jobs from GitLab;
- interacting with the Fleeting plugin;
- provisioning temporary EC2 instances for Jobs.
Note that all further setup steps — installing plugins, tweaking config.toml, and running tests — will be performed on this managing instance specifically.
curl -L "https://packages.gitlab.com/install/repositories/runner/gitlab-runner/script.deb.sh" -o gitlab-repo-add.sh
less gitlab-repo-add.sh # check the script contents to be sure nothing bad will happen
sudo bash gitlab-repo-add.sh
sudo apt-get install -y gitlab-runner
3. Creating an IAM user
Before you install the Fleeting Plugin, you should be ready to provide access to AWS. This GitLab plugin needs to:
- connect to instances (usually via SSM/SSH);
- create and delete EC2 instances;
- access the Auto Scaling Group and Launch Template.
All of these are configured via IAM in AWS. The easiest way is to create a separate IAM user, e.g., gitlab-autoscaler. The GitLab Runner will use it to interact with AWS. These user credentials will be used to open up the AWS profile mentioned in config.toml:
profile = default
On the machine running the GitLab Runner, you need to add the user’s keys to the ~/.aws/credentials file:
[default]
aws_access_key_id = ...
aws_secret_access_key = ...
4. Granting permissions to the IAM user
For the gitlab-autoscaler user to be able to manage resources, we must grant it the appropriate permissions. Let’s create an IAM policy — e.g., gitlab-runner-autoscaling-policy, and attach it to the user. This policy allows the user to:
- create and delete EC2 instances;
- read instance descriptions, images, and tags;
- use the
gitlab-runner-ao-groupAuto Scaling Group (i.e.autoScalingGroupName/gitlab-runner-ao-group).
This specific policy allows the Fleeting Plugin to start and stop instances in the designated autoscaling group.
Here’s a JSON policy example listing (you’ll need to replace $ACCOUNT_ID with your actual AWS Account ID):
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"autoscaling:SetDesiredCapacity",
"autoscaling:TerminateInstanceInAutoScalingGroup"
],
"Resource": "arn:aws:autoscaling:eu-central-1:{$ACCOUNT_ID}:autoScalingGroup:...:autoScalingGroupName/gitlab-runner-ao-group"
},
{
"Effect": "Allow",
"Action": [
"autoscaling:DescribeAutoScalingGroups",
"ec2:DescribeInstances"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"ec2:GetPasswordData",
"ec2-instance-connect:SendSSHPublicKey"
],
"Resource": "arn:aws:ec2:eu-central-1:{$ACCOUNT_ID}:instance/*",
"Condition": {
"StringEquals": {
"ec2:ResourceTag/aws:autoscaling:groupName": "gitlab-runner-ao-group"
}
}
}
]
}
5. Installing the AWS Fleeting Plugin
Now that the IAM user is ready and the keys are set, let’s install the Fleeting Plugin:
# Install Fleeting Plugin for AWS
echo "Installing Fleeting Plugin..."
sudo gitlab-runner fleeting install aws:latest
# Create AWS credentials directory and files
sudo mkdir -p /home/gitlab-runner/.aws
sudo chown gitlab-runner:gitlab-runner /home/gitlab-runner/.aws
# Create AWS credentials file with S3 cache credentials
sudo tee /home/gitlab-runner/.aws/credentials > /dev/null <<'EOF'
[default]
aws_access_key_id = ${aws_s3_cache_access_key}
aws_secret_access_key = ${aws_s3_cache_secret_key}
EOF
sudo tee /home/gitlab-runner/.aws/config > /dev/null <<'EOF'
[default]
region = eu-central-1
output = json
EOF
sudo chown -R gitlab-runner:gitlab-runner /home/gitlab-runner/.aws
sudo chmod 600 /home/gitlab-runner/.aws/credentials
sudo chmod 600 /home/gitlab-runner/.aws/config
Now with IAM and the Fleeting Plugin in place, your GitLab Runner can:
- provision and delete EC2 instances for Jobs using the IAM user credentials;
- connect to these instances via AWS Systems Manager (SSM), with no need for direct SSH access;
- run Jobs on them using either the instance executor or Docker containers;
- shut down or delete instances based on
idle policyor when they hit themax_use_countlimit.
Brief note on SSM: Every EC2 instance created by the Fleeting Plugin runs under the AmazonSSMRoleForInstancesQuickSetuprole. This role lets the Runner securely connect to the instance via SSM, so you don’t have to deal with SSH or manage public keys.
The resulting workflow is as follows:
- The Runner dynamically creates instances within the specified Auto Scaling Group as Jobs come in →
- These instances are automatically assigned the right permissions and settings →
- Once the Job is done, the instances are gracefully terminated according to the defined policies.
6. Configuring the GitLab Runner
The next step is configuring the GitLab Runner via the /etc/gitlab-runner/config.toml file. This is the central configuration hub in which we define:
- the GitLab URL and the Runner registration tokens;
- the executor type for each Job group (instance, docker-autoscaler, etc.);
- autoscaling parameters: maximum instance count,
idle policy,max_use_count, and more; - cache settings (e.g., an S3 bucket for artifact caching);
- various scaling policies for different time periods (working hours, nights, weekends).
View a sample Runner configuration below:
listen_address = ":9252"
concurrent = 200
check_interval = 0
connection_max_age = "30m0s"
shutdown_timeout = 0
log_level = "info"
log_format = "text"
[session_server]
session_timeout = 1800
# Our instance runner that runs Jobs right on EC2 instances (using shell)
[[runners]]
name = "${runner_name}"
id = 400
output_limit = 50000
url = "${gitlab_url}"
token = "${registration_token}"
token_obtained_at = 2026-02-01T12:42:42Z
token_expires_at = 0001-01-01T00:00:00Z
executor = "instance"
# S3 caching to speed up your builds
[runners.cache]
Type = "s3"
Path = "cache"
Shared = true
MaxUploadedArchiveSize = 0
[runners.cache.s3]
ServerAddress = "s3.amazonaws.com"
AccessKey = "${aws_s3_cache_access_key}"
SecretKey = "${aws_s3_cache_secret_key}"
BucketName = "walli-gitlab-runner-cache"
BucketLocation = "eu-central-1"
[runners.autoscaler]
capacity_per_instance = ${capacity_per_instance}
max_use_count = ${max_use_count}
max_instances = ${max_instances}
plugin = "aws:latest"
instance_acquire_timeout = "0s"
update_interval = "0s"
update_interval_when_expecting = "0s"
[runners.autoscaler.plugin_config]
name = "gitlab-runner-ao-group2"
profile = "default"
[runners.autoscaler.connector_config]
protocol_port = 0
username = "gitlab-runner"
keepalive = "0s"
timeout = "0s"
[[runners.autoscaler.policy]]
idle_count = ${idle_count}
idle_time = "${idle_time}"
scale_factor = 1.5
scale_factor_limit = 10
[[runners.autoscaler.policy]]
periods = ["* 10-18 * * mon-fri"]
idle_count = 20
idle_time = "${idle_time}"
scale_factor = 1.5
scale_factor_limit = 10
# The docker-autoscaler executor, designed for container-based Jobs
[[runners]]
name = "${runner_name}"
id = 401
url = "${gitlab_url}"
token = "${docker_registration_token}"
token_obtained_at = 2026-02-01T12:43:43Z
token_expires_at = 0001-01-01T00:00:00Z
executor = "docker-autoscaler"
environment = ["DOCKER_AUTH_CONFIG={\"auths\":{\"${docker_registry_url}\":{\"auth\":\"${docker_registry_auth}\"}}}"]
# S3 caching to speed up your builds
[runners.cache]
Type = "s3"
Path = "cache"
Shared = true
MaxUploadedArchiveSize = 0
[runners.cache.s3]
ServerAddress = "s3.amazonaws.com"
AccessKey = "${aws_s3_cache_access_key}"
SecretKey = "${aws_s3_cache_secret_key}"
BucketName = "walli-gitlab-runner-cache"
BucketLocation = "eu-central-1"
[runners.docker]
tls_verify = false
image = "ubuntu:24.04"
privileged = true
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/var/run/docker.sock:/var/run/docker.sock", "/cache"]
extra_hosts = ${jsonencode(extra_hosts)}
shm_size = 0
network_mtu = 0
[runners.autoscaler]
capacity_per_instance = 5
max_use_count = 30
max_instances = 25
plugin = "aws:latest"
update_interval = "0s"
update_interval_when_expecting = "0s"
[runners.autoscaler.plugin_config]
name = "gitlab-runner-ao-group-docker2"
profile = "default"
[runners.autoscaler.connector_config]
username = "gitlab-runner"
keepalive = "0s"
timeout = "0s"
[[runners.autoscaler.policy]]
periods = ["* * * * *"]
idle_count = 1
idle_time = "20m0s"
scale_factor = 1.5
scale_factor_limit = 10
[[runners.autoscaler.policy]]
periods = ["* 10-18 * * mon-fri"]
idle_count = 20
idle_time = "20m0s"
scale_factor = 1.5
scale_factor_limit = 10
The [runners.autoscaler] sub-section and the [[runners.autoscaler.policy]] items are crucial to control autoscaling. They define how many instances to create at once, how many Jobs each instance can handle, how long to keep them idle, and how to scale during different times of the day.
7. Preparing an AMI for GitLab Runner
For the Auto Scaling Group to spin up the proper EC2 instances for your CI Jobs, you’ll need a valid AMI. This is a base image that comes pre-loaded with everything your environment might need. The minimum set of requirements for the AMI includes:
- System utilities required for running the Jobs (docker, kubectl, helm, terraform, etc.).
- Any extra agents or services; however, you don’t need to register a standalone gitlab-runner in the AMI, since the managing instance handles the Runner role.
- If necessary, a pre-configured
kubeconfiglocated at/home/gitlab-runner/.kube/config. - Pre-pulled Docker images to speed up the first build.
The easiest way is to build the first AMI manually and then automate everything with Packer later. Here’s a typical manual workflow:
- Start with a base image like Ubuntu 22.04 from the AWS Marketplace.
- Spin up a temporary EC2 instance.
- SSH into the instance and install your software (gitlab-runner, docker, kubectl, etc.).
- Create a
gitlab-runneruser and its home directory. - Add a
kubeconfigif you need it. - Shut down the instance and create an image via Actions → Create image.
- Use this AMI within your Auto Scaling Group’s Launch Template.
Once you’ve polished the manual process, move those steps into Packer so you can:
- stop messing with EC2s by hand every time you update something;
- make sure your images are consistent and reproducible;
- have a version-controlled AMI template.
Finally, you just need to feed that AMI into the Launch Template via Terraform.
8. Setting up the Auto Scaling Group
Our configuration uses two AWS Auto Scaling Groups: gitlab-runner-ao-group and gitlab-runner-ao-group-docker. They feature the following parameters:
- Launch Template:
gitlab-runner-autoscaler; - AMI:
ami-01f040934be890e5a— this was the current image at the time of this writing; it may be updated in the future. - Instance Type:
c7a.4xlarge— we picked this one based on our workload and the types of Jobs we run.
(As a side note, the GitLab Fleeting library works with other clouds as well. E.g., you can use Virtual Machine Scale Sets in Azure and instance groups in Google Cloud to implement the same approach.)
Here’s how you can update the AMI — for example, to add a new kubeconfig or bump some tool versions:
- Spin up an EC2 instance based on the existing AMI.
- Make your changes, like updating
/home/gitlab-runner/.kube/configor installing new software. - Stop the instance and create a new image from it.
- Go to Launch Templates →
gitlab-runner-autoscaler. - Create a new template version that features your AMI and set it as the default.
After this, the Auto Scaling Group will automatically start using that image for all new EC2 instances.
We used to do this manually for a while, but then we automated the whole process with Packer and Terraform. Now, updating the template is as simple as:
- Rebuilding the AMI with Packer.
- Running
terraform planandterraform applyto update the Launch Template and related resources.
In the end, the GitLab Runner infrastructure can be boiled down to the following:
- two Auto Scaling Groups for different types of Jobs, each with its own runner token;
- one “always-on” Runner Instance that handles scaling and communicates with GitLab;
- a custom AMI, built with Packer, that includes all the required software;
- IAM roles to grant the required permissions;
- CloudWatch Alarms for monitoring cluster state and load.
The only remaining weak point is tweaking the configuration of the managing Runner instance. If we need to modify config.toml or other system settings, we have to either:
- delete the current instance so the Auto Scaling Group can spin up a new one with the updated configuration, or
- carefully apply the changes manually on the running instance.
Luckily, we don’t need to do that very often. If you have a more elegant approach in mind to updating your managing Runner configs (like Ansible, SSM, or configuration drift control), I’d love to hear about it in the comments.
Useful considerations
If your Jobs use kubectl (e. g., for helm install or deployment scripts), the kubeconfig file must be pre-installed in the AMI. Without a valid kubeconfig, a new EC2 instance won’t be able to connect to the Kubernetes cluster.
You can store the config file at the regular location, /home/gitlab-runner/.kube/config, and make it part of the AMI. That way, any new EC2 instance created from that image will be able to use it.
While building your AMI, it’s also a good idea to pre-pull the base container images that are often used in CI. This helps to:
- shorten the first build time on a new instance;
- reduce dependency on external registries.
As a result, an instance with a pre-configured kubeconfig and pre-pulled Docker images will start faster and be ready to handle Kubernetes- and Docker-dependent Jobs right away.
Key takeaways with pros & cons
An autoscaler is a great option if you want to avoid keeping EC2 instances running 24/7 just for GitLab Runners. Once a Job comes in, an instance spins up, and once the Job is done, the resources are freed. Everything is transparent, manageable, and scales well.
Our approach with scalable GitLab Runners and custom AMIs has the following pros:
- Runners only start up when there are Jobs to run, saving a lot of money on AWS resources.
- Strong isolation and security: you can set up a “1 EC2 = 1 Job” model or assign a small pool of Jobs to each instance.
- AMIs are easy to update and rebuild with Packer, so the infrastructure remains reproducible.
- Great for high-load CIs: when the load spikes, more instances are simply created to handle it.
However, it comes with the following limitations:
- Editing
config.tomlon the managing instance by hand is a pain. Any change means you either have to recreate the instance or push the configuration into the instance somehow. - The AMI requires regular updates for the OS, packages, and tools.
- Connecting via SSM requires the correct IAM role and the SSM agent. If the role or its configuration is wrong, you can lose access if you don’t have SSH as a fallback.
What else can be improved?
- Instead of being embedded to AMI,
kubeconfigcan be pulled from AWS Secrets Manager when the instance launches. - Multiple autoscalers for different GitLab tags: separating frontend, backend, infrastructure jobs, and resource-hungry pipelines.
- Different AMIs for different stacks: dedicated images for Java, Node.js, Python, infrastructure tools, etc.
- Integration with EFS or S3 for caching, configuration of shared volumes, and enabling/disabling shared runners on a schedule (via cron/policy).
- Full automation via Terraform (ASG, Launch Template, IAM) and Packer (AMI builds) to ensure that every change is described as code and undergoes review.
Perhaps your specific use case will need some other customizations to benefit from this approach — feel free to share your thoughts and experience in the comments below!
Comments