I’d like to share my recent journey in getting the Sentry community chart to run in a Kubernetes cluster for one of our customers. I’ll cover the intricacies we encountered along the way and reveal the parameters we had to add to ensure best practices and bolster the application’s fault tolerance.
But first, what is the chart I’m referring to? Interestingly, there are no official instructions for installing Sentry in Kubernetes. (The only available option is a Docker Compose-based installation.) Luckily, however, there are several community-led Helm charts for Sentry that let you scale the resources and reap the rest of the perks Kubernetes offers. The chart from sentry-kubernetes is the most popular one. From now on, when I mention the “community chart”, assume that chart is the one I’m referring to.
Generally, while we often install and support Sentry for different needs, we don’t use publicly available charts “as is”. What we normally do is use them as the basis to devise our own charts and customize them according to our best practices and the specific customer’s needs.
However, this case was different. Why? Simply because that’s what the customer preferred. The reasoning behind this preference was to make it easier for the customer’s engineering team to navigate the Helm chart and make edits on their own. The second goal was to streamline updates, that is, to minimize the efforts required to modify the chart. Thus, when using the original community chart in this case, we only added a values file. How did that work out? Let’s dive right in!
General view on community charts
To be fully honest, what I mentioned above regarding leveraging the Sentry community chart applies to other Helm charts as well. We rarely rely on off-the-shelf Helm charts in their original forms. The reason is pretty straightforward — in most cases, those charts can’t fully address all our needs, so we had to modify them. We prefer this approach for several reasons:
- Since all manifests are hosted in our repository, chart customization grows more flexible. It simplifies adding or removing components, as opposed to bringing changes to a ready-made Helm chart when you first have to pull it.
- Readability improves as all unnecessary stuff is removed from manifests.
- The app’s fault tolerance and stability are enhanced. For example, adding a Pod Disruption Budget (assuming none exists in the chart) limits the number of Pods that can be restarted simultaneously; explicitly assigning resources to applications prevents resource congestion and ensures those apps have the proper amount of CPU and RAM.
This recent audit of 105 popular Kubernetes Helm charts conducted by Prequel at the end of 2025 found that only 17% were reliable. It vividly illustrates the issues we often encounter in community charts. Having said that, our engineering life is not perfect, and this was the time we had to use a ready-made community chart anyway…
Struggling with the Sentry community chart
You’re probably thinking that all you have to do is download the chart, deploy it, and you’re good to go. Things turned out to be a whole lot trickier in our case.
We originally planned to install a new version of Sentry using the community chart, then feed all the data from the existing version into it. However, we realized that wasn’t going to be possible due to the wide gap between versions. So we decided to set up a new Sentry instance with the same version as the current one, migrate the data to it, and then upgrade.
Unfortunately, it turned out to take about 12 steps(!) to upgrade the chart iteratively. So we decided to set up a new Sentry instance while refraining from data migration to speed up the process. (As a sidenote, data migration in Sentry is by no means trivial and calls for an article in its own right.)
Some of the resources we needed for our Sentry instance had to be pulled from the repository, but they weren’t included in the community chart. Thus, our deployment process involved executing a few commands to deploy “our” resources and deploy the community chart. To be more precise, we:
- Deployed particular resources not present in the community chart (such as a domain certificate and DexClient for SSO).
- Added the sentry-kubernetes repository and deployed the chart to the cluster.
Here’s what our configuration for GitLab CI/CD looked like:
# Applying custom resources
- werf converge --namespace prod-sentry --values .helm/values.yaml --secret-values .helm/secret-values.yaml
# Deploying the community chart
- werf helm repo add sentry https://sentry-kubernetes.github.io/charts
- werf helm repo update
- werf helm upgrade --install sentry sentry/sentry --version ${SENTRY_CHART_VERSION} --namespace prod-sentry --values .helm/values.yaml --secret-values .helm/secret-values.yaml --wait --timeout=1000s
As you can see in the example, we use werf for deployment in Kubernetes. However, it’s more of a preference than a requirement in this case. Since werf embeds an improved version of Helm under the hood, Helm users will notice familiar commands here (think of helm repo update vs. werf helm repo update. As for werf converge, it’s a more advanced command that involves both (re-)building the image (if needed) and deploying it to Kubernetes.
The first deployment was sunshine and rainbows. However, after making a few changes and trying to deploy Sentry again, the pipeline crashed.
Fixes and best practices we applied to the chart
It turned out that if you don’t specify the password for PostgreSQL in the chart values, it will generate a random value each time in the secret. This will cause the new db-init job to crash with a database connection error.
Thus, our first essential fix was adding the following value:
postgresql:
auth:
password: my-reliable-password
Still, it was just the beginning of our way to ensure Sentry follows best practices and runs reliably. Here’s a list of our further enhancements:
- Adding
--wait --timeout=1000sto the chart install command, since 300 seconds is not enough, which is actually stated in the chart’s README. - Moving all sensitive variables to
secret-values.yaml, where they are stored in encrypted form. This step is specific to werf as it leverages one of its features (see the werf documentation). If you’re relying on pure Helm, there is a helm-secrets plugin. - Specifying the appropriate retention period for ClickHouse data in ZooKeeper, as well as for Kafka, based on the needs and amount of incoming data. The default retention in Kafka and ZooKeeper proved to be too long for our needs, resulting in disk space filling up rather quickly. Note that instead of shortening the retention period, you can also enlarge the disk space.
- Specifying limits and requests for all the chart entities. This will ensure that Pods get the right amount of resources and do not impact or interfere with each other or third-party services.
- Installing specific versions of all components, rebuilding all images, and pushing them into our own container registry. The original chart relies on the
latesttag, which might result in unintended consequences. More specifically, we used werf to rebuild the images and added a custom tag to the image name using thewerf build --add-custom-tag "%image%"command. You could then define them in the values files as follows:
sentry:
repository: registry.org.com/prod/sentry
tag: sentry
pullPolicy: Always
imagePullSecrets:
- name: registrysecret
relay:
repository: registry.org.com/prod/sentry
tag: relay
pullPolicy: Always
imagePullSecrets:
- name: registrysecret
- Setting
podAntiAffinityfor Pods of the same type, e.g., all sentry-web Pods, so they do not pile up on a single node. This enhances fault tolerance and prevents a component from becoming fully unavailable when that single node fails. - Increasing the number of Sentry workers and enabling HPA for them to handle the heavy load. We had to use the CPU metric for the HPA here, since there’s no other way to change that in the community chart. (In our chart, we used a custom metric that tracked the number of messages in the RabbitMQ queue.)
NB. By the way, if you’re interested in the bigger picture of what to consider when deploying apps in Kubernetes, we have you covered with this K8s best-practices overview.
A few other considerations
The changes listed above were crucial to ensuring Sentry runs efficiently and reliably enough in Kubernetes. Additionally, I’d like to highlight some other things that may be less critical but still worth attention while using this community chart:
- Values for Kafka include the
kafka.provisioning.enabledparameter. When set totrue, it increased the deployment time by about 10 minutes in our case. This might happen even if we changed the sentry-web Pod’s memory. Initially, it made us think that turning the job off could be beneficial. However, after playing around withkafka.provisioningfor a while, we discovered that it defines theTOPIC_PARTITION_COUNTSvariable in the Snuba configuration. Thus, disabling the job and changing the number of partitions can result in inconsistency, potentially affecting how the snuba-consumer Pods with Kafka operate. To put it short, if you use Kafka from the subchart (not a standalone installation), don’t turn this job off. - The next value to keep in mind is
asHook: true. You only need it for the initial installation. Afterwards, it needs to be turned off, which is explicitly stated invalues.yaml. - If the Helm release failed to complete successfully, the following error may pop up when a new release is deployed: Error: UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress. To fix it, delete the secret with the previous Helm release. (Actually, you shouldn’t encounter this error when applying
werf converge.)
Conclusion
Installing the Sentry community chart proved to be a bigger challenge than I initially expected. Providing a sufficient level of reliability involved making several changes. On the positive side, these changes also made the setup easier for other engineers to understand and manage. Hopefully, the steps I’ve outlined here will save you some time and effort in installing Sentry in your Kubernetes clusters!
Comments