Kubernetes Per-Container Restart Policies

Posted on Sep 16, 2025

With the release of Kubernetes 1.34, a new alpha feature called Container Restart Policy and Rules has been introduced. This feature provides more precise control over container restarts within a Pod. It allows us to specify a restart policy for each container individually, overriding the Pod’s global restart policy. Additionally, it enables conditional restarts of individual containers based on their exit codes. This feature is accessible through the ContainerRestartRules alpha feature gate.

This feature, a long-awaited enhancement, offers a range of practical applications. Let’s explore its functionality and see how it can benefit us.

The Limitations of a Single Restart Policy:

Before this feature, the restartPolicy was defined at the Pod level, so all containers shared the same policy (Always, OnFailure, or Never). This approach works for many cases but can be restrictive in others.

For instance, consider a Pod comprising a primary application container and an initialization container that performs initial setup. It may be desirable for the primary container to always restart upon failure, whereas the initialization container should execute only once and never restart. However, with a single Pod-level restart policy, this configuration was not feasible.

The Implementation of Per-Container Restart Policies:

With the introduction of the ContainerRestartRules feature gate, it is now possible to specify a restartPolicy for each container within a Pod’s specification. Furthermore, restartPolicyRules can be defined to control restarts based on exit codes. This provides the necessary fine-grained control to manage complex scenarios effectively.

Use Cases:

In-Place Restarts for Training Jobs:

This feature can be particularly beneficial for training jobs, where in-place restarts may be required to ensure the completion of training processes.

In machine learning (ML) research, it is common to orchestrate a large number of long-running AI/ML training workloads. In these scenarios, workload failures are inevitable. When a workload fails with a retriable exit code, it is desirable for the container to restart promptly without rescheduling the entire Pod, which incurs a substantial amount of time and resources. Restarting the failed container “in-place” is crucial for optimizing the utilization of compute resources. The container should only restart “in-place” if it failed due to a retriable error; otherwise, the container and Pod should terminate and potentially be rescheduled.

This functionality can now be achieved through container-level restartPolicyRules. The workload can exit with distinct codes to signify retriable and non-retriable errors. With restartPolicyRules, the workload can be restarted in-place swiftly, but only when the error is retriable.

Pods with Multiple Containers:

For Pods that execute multiple containers, it may be necessary to establish different restart requirements for each container. Some containers may have a clear definition of success and should only be restarted upon failure. Others may require continuous restarts.

This capability is now feasible through a container-level restartPolicy, enabling individual containers to have distinct restart policies.

Example Configurations:

1. Restarting on specific exit codes:

In this example, the container should restart if and only if it fails with a retriable error, represented by exit code 42.

To achieve this, the container has restartPolicy: Never, and a restart policy rule that tells Kubernetes to restart the container in-place if it exits with code 42.

apiVersion: v1
kind: Pod
metadata:
  name: restart-on-exit-codes
  annotations:
    kubernetes.io/description: "This Pod only restart the container only when it exits with code 42."
spec:
  restartPolicy: Never
  containers:
  - name: restart-on-exit-codes
    image: docker.io/library/busybox:1.28
    command: ['sh', '-c', 'sleep 60 && exit 0']
    restartPolicy: Never     # Container restart policy must be specified if rules are specified
    restartPolicyRules:      # Only restart the container if it exits with code 42
    - action: Restart
      exitCodes:
        operator: In
        values: [42]

2. A try-once init container:

In this example, a Pod should always be restarted once the initialization succeeds. However, the initialization should only be tried once.

To achieve this, the Pod has an Always restart policy. The init-once init container will only try once. If it fails, the Pod will fail. This allows the Pod to fail if the initialization failed, but also keep running once the initialization succeeds.

apiVersion: v1
kind: Pod
metadata:
  name: fail-pod-if-init-fails
  annotations:
    kubernetes.io/description: "This Pod has an init container that runs only once. After initialization succeeds, the main container will always be restarted."
spec:
  restartPolicy: Always
  initContainers:
  - name: init-once      # This init container will only try once. If it fails, the Pod will fail.
    image: docker.io/library/busybox:1.28
    command: ['sh', '-c', 'echo "Failing initialization" && sleep 10 && exit 1']
    restartPolicy: Never
  containers:
  - name: main-container # This container will always be restarted once initialization succeeds.
    image: docker.io/library/busybox:1.28
    command: ['sh', '-c', 'sleep 1800 && exit 0']

3. Containers with different restart policies:

In this example, there are two containers with different restart requirements. One should always be restarted, while the other should only be restarted on failure.

This is achieved by using a different container-level restartPolicy on each of the two containers.

apiVersion: v1
kind: Pod
metadata:
  name: on-failure-pod
  annotations:
    kubernetes.io/description: "This Pod has two containers with different restart policies."
spec:
  containers:
  - name: restart-on-failure
    image: docker.io/library/busybox:1.28
    command: ['sh', '-c', 'echo "Not restarting after success" && sleep 10 && exit 0']
    restartPolicy: OnFailure
  - name: restart-always
    image: docker.io/library/busybox:1.28
    command: ['sh', '-c', 'echo "Always restarting" && sleep 1800 && exit 0']
    restartPolicy: Always

Posted in DevOps, Kubernetes

Tagged in containerrestartpolicy, kubernetes

Let’s create a measurable impact on
your business.