{"id":294,"date":"2025-09-16T09:50:10","date_gmt":"2025-09-16T09:50:10","guid":{"rendered":"https:\/\/cloudberry360.com\/blog\/?p=294"},"modified":"2025-09-16T10:26:31","modified_gmt":"2025-09-16T10:26:31","slug":"kubernetes-per-container-restart-policies","status":"publish","type":"post","link":"https:\/\/cloudberry360.com\/blog\/kubernetes-per-container-restart-policies\/","title":{"rendered":"Kubernetes Per-Container Restart Policies"},"content":{"rendered":"\n<p>With the release of Kubernetes 1.34, a new alpha feature called Container Restart Policy and Rules has been introduced. This feature provides more precise control over container restarts within a Pod. It allows us to specify a restart policy for each container individually, overriding the Pod\u2019s global restart policy. Additionally, it enables conditional restarts of individual containers based on their exit codes. This feature is accessible through the ContainerRestartRules alpha feature gate.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"alignright size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"412\" height=\"412\" src=\"https:\/\/cloudberry360.com\/blog\/wp-content\/uploads\/2025\/09\/container-restart-policy-1.jpg\" alt=\"\" class=\"wp-image-296\" srcset=\"https:\/\/cloudberry360.com\/blog\/wp-content\/uploads\/2025\/09\/container-restart-policy-1.jpg 412w, https:\/\/cloudberry360.com\/blog\/wp-content\/uploads\/2025\/09\/container-restart-policy-1-300x300.jpg 300w, https:\/\/cloudberry360.com\/blog\/wp-content\/uploads\/2025\/09\/container-restart-policy-1-150x150.jpg 150w\" sizes=\"auto, (max-width: 412px) 100vw, 412px\" \/><\/figure>\n<\/div>\n\n\n<p>This feature, a long-awaited enhancement, offers a range of practical applications. Let\u2019s explore its functionality and see how it can benefit us.<\/p>\n\n\n\n<p><strong>The Limitations of a Single Restart Policy<\/strong>:<\/p>\n\n\n\n<p>Before this feature, the restartPolicy was defined at the Pod level, so all containers shared the same policy (Always, OnFailure, or Never). This approach works for many cases but can be restrictive in others.<\/p>\n\n\n\n<p>For instance, consider a Pod comprising a primary application container and an initialization container that performs initial setup. It may be desirable for the primary container to always restart upon failure, whereas the initialization container should execute only once and never restart. However, with a single Pod-level restart policy, this configuration was not feasible.<\/p>\n\n\n\n<p><strong>The Implementation of Per-Container Restart Policies<\/strong>:<\/p>\n\n\n\n<p>With the introduction of the ContainerRestartRules feature gate, it is now possible to specify a restartPolicy for each container within a Pod\u2019s specification. Furthermore, restartPolicyRules can be defined to control restarts based on exit codes. This provides the necessary fine-grained control to manage complex scenarios effectively.<\/p>\n\n\n\n<p><strong>Use Cases: <\/strong><\/p>\n\n\n\n<p><strong>In-Place Restarts for Training Jobs:<\/strong><\/p>\n\n\n\n<p>This feature can be particularly beneficial for training jobs, where in-place restarts may be required to ensure the completion of training processes.<\/p>\n\n\n\n<p>In machine learning (ML) research, it is common to orchestrate a large number of long-running AI\/ML training workloads. In these scenarios, workload failures are inevitable. When a workload fails with a retriable exit code, it is desirable for the container to restart promptly without rescheduling the entire Pod, which incurs a substantial amount of time and resources. Restarting the failed container \u201cin-place\u201d is crucial for optimizing the utilization of compute resources. The container should only restart \u201cin-place\u201d if it failed due to a retriable error; otherwise, the container and Pod should terminate and potentially be rescheduled.<\/p>\n\n\n\n<p>This functionality can now be achieved through container-level restartPolicyRules. The workload can exit with distinct codes to signify retriable and non-retriable errors. With restartPolicyRules, the workload can be restarted in-place swiftly, but only when the error is retriable.<\/p>\n\n\n\n<p><strong>Pods with Multiple Containers:<\/strong><\/p>\n\n\n\n<p>For Pods that execute multiple containers, it may be necessary to establish different restart requirements for each container. Some containers may have a clear definition of success and should only be restarted upon failure. Others may require continuous restarts.<\/p>\n\n\n\n<p>This capability is now feasible through a container-level restartPolicy, enabling individual containers to have distinct restart policies.<\/p>\n\n\n\n<p><strong>Example Configurations: <\/strong><\/p>\n\n\n\n<p><strong>1. Restarting on specific exit codes: <\/strong><\/p>\n\n\n\n<p>In this example, the container should restart if and only if it fails with a retriable error, represented by exit code 42.<\/p>\n\n\n\n<p>To achieve this, the container has&nbsp;<code>restartPolicy: Never<\/code>, and a restart policy rule that tells Kubernetes to restart the container in-place if it exits with code 42.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><strong>apiVersion<\/strong>: v1\n<strong>kind<\/strong>: Pod\n<strong>metadata<\/strong>:\n  <strong>name<\/strong>: restart-on-exit-codes\n  <strong>annotations<\/strong>:\n    <strong>kubernetes.io\/description<\/strong>: \"This Pod only restart the container only when it exits with code 42.\"\n<strong>spec<\/strong>:\n  <strong>restartPolicy<\/strong>: Never\n  <strong>containers<\/strong>:\n  - <strong>name<\/strong>: restart-on-exit-codes\n    <strong>image<\/strong>: docker.io\/library\/busybox:1.28\n    <strong>command<\/strong>: &#91;'sh', '-c', 'sleep 60 &amp;&amp; exit 0']\n    <strong>restartPolicy<\/strong>: Never     <em># Container restart policy must be specified if rules are specified<\/em>\n    <strong>restartPolicyRules<\/strong>:      <em># Only restart the container if it exits with code 42<\/em>\n    - <strong>action<\/strong>: Restart\n      <strong>exitCodes<\/strong>:\n        <strong>operator<\/strong>: In\n        <strong>values<\/strong>: &#91;42]<\/code><\/pre>\n\n\n\n<p><strong>2. A try-once init container: <\/strong><\/p>\n\n\n\n<p>In this example, a Pod should always be restarted once the initialization succeeds. However, the initialization should only be tried once.<\/p>\n\n\n\n<p>To achieve this, the Pod has an&nbsp;<code>Always<\/code>&nbsp;restart policy. The&nbsp;<code>init-once<\/code>&nbsp;init container will only try once. If it fails, the Pod will fail. This allows the Pod to fail if the initialization failed, but also keep running once the initialization succeeds.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><strong>apiVersion<\/strong>: v1\n<strong>kind<\/strong>: Pod\n<strong>metadata<\/strong>:\n  <strong>name<\/strong>: fail-pod-if-init-fails\n  <strong>annotations<\/strong>:\n    <strong>kubernetes.io\/description<\/strong>: \"This Pod has an init container that runs only once. After initialization succeeds, the main container will always be restarted.\"\n<strong>spec<\/strong>:\n  <strong>restartPolicy<\/strong>: Always\n  <strong>initContainers<\/strong>:\n  - <strong>name<\/strong>: init-once      <em># This init container will only try once. If it fails, the Pod will fail.<\/em>\n    <strong>image<\/strong>: docker.io\/library\/busybox:1.28\n    <strong>command<\/strong>: &#91;'sh', '-c', 'echo \"Failing initialization\" &amp;&amp; sleep 10 &amp;&amp; exit 1']\n    <strong>restartPolicy<\/strong>: Never\n  <strong>containers<\/strong>:\n  - <strong>name<\/strong>: main-container <em># This container will always be restarted once initialization succeeds.<\/em>\n    <strong>image<\/strong>: docker.io\/library\/busybox:1.28\n    <strong>command<\/strong>: &#91;'sh', '-c', 'sleep 1800 &amp;&amp; exit 0']<\/code><\/pre>\n\n\n\n<p><strong>3. Containers with different restart policies: <\/strong><\/p>\n\n\n\n<p>In this example, there are two containers with different restart requirements. One should always be restarted, while the other should only be restarted on failure.<\/p>\n\n\n\n<p>This is achieved by using a different container-level&nbsp;<code>restartPolicy<\/code>&nbsp;on each of the two containers.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><strong>apiVersion<\/strong>: v1\n<strong>kind<\/strong>: Pod\n<strong>metadata<\/strong>:\n  <strong>name<\/strong>: on-failure-pod\n  <strong>annotations<\/strong>:\n    <strong>kubernetes.io\/description<\/strong>: \"This Pod has two containers with different restart policies.\"\n<strong>spec<\/strong>:\n  <strong>containers<\/strong>:\n  - <strong>name<\/strong>: restart-on-failure\n    <strong>image<\/strong>: docker.io\/library\/busybox:1.28\n    <strong>command<\/strong>: &#91;'sh', '-c', 'echo \"Not restarting after success\" &amp;&amp; sleep 10 &amp;&amp; exit 0']\n    <strong>restartPolicy<\/strong>: OnFailure\n  - <strong>name<\/strong>: restart-always\n    <strong>image<\/strong>: docker.io\/library\/busybox:1.28\n    <strong>command<\/strong>: &#91;'sh', '-c', 'echo \"Always restarting\" &amp;&amp; sleep 1800 &amp;&amp; exit 0']\n    <strong>restartPolicy<\/strong>: Always<\/code><\/pre>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>With the release of Kubernetes 1.34, a new alpha feature called Container Restart Policy and Rules has been introduced. This feature provides more precise control over container restarts within a Pod. It allows us to specify a restart policy for each container individually, overriding the Pod\u2019s global restart policy. Additionally, it enables conditional restarts of individual containers based on their exit codes. This feature is accessible through the ContainerRestartRules alpha feature gate. This feature, a long-awaited enhancement, offers a range&#8230;<\/p>\n","protected":false},"author":1,"featured_media":296,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[17,16],"tags":[33,31],"class_list":["post-294","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-devops","category-kubernetes","tag-containerrestartpolicy","tag-kubernetes"],"acf":[],"_links":{"self":[{"href":"https:\/\/cloudberry360.com\/blog\/wp-json\/wp\/v2\/posts\/294","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cloudberry360.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cloudberry360.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cloudberry360.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/cloudberry360.com\/blog\/wp-json\/wp\/v2\/comments?post=294"}],"version-history":[{"count":3,"href":"https:\/\/cloudberry360.com\/blog\/wp-json\/wp\/v2\/posts\/294\/revisions"}],"predecessor-version":[{"id":300,"href":"https:\/\/cloudberry360.com\/blog\/wp-json\/wp\/v2\/posts\/294\/revisions\/300"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cloudberry360.com\/blog\/wp-json\/wp\/v2\/media\/296"}],"wp:attachment":[{"href":"https:\/\/cloudberry360.com\/blog\/wp-json\/wp\/v2\/media?parent=294"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cloudberry360.com\/blog\/wp-json\/wp\/v2\/categories?post=294"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cloudberry360.com\/blog\/wp-json\/wp\/v2\/tags?post=294"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}