Failure Handling and Restart Policies

Learn how LeaderWorkerSet handles pod failures with configurable restart policies.

LeaderWorkerSet provides configurable failure handling for pod groups when failures occur.

Restart Policies

RecreateGroupOnPodRestart (Default)

When any pod in a group fails, the entire group is recreated. This ensures all pods start fresh together.

spec:
  leaderWorkerTemplate:
    restartPolicy: RecreateGroupOnPodRestart

Use case: Tightly coupled applications (distributed influence)

None

Only the failed pod is restarted. Other pods in the group are not affected.

spec:
  leaderWorkerTemplate:
    restartPolicy: None

RecreateGroupAfterStart (Experimental)

When any pod in a group fails, the entire group is recreated if and only if there are no pods currently pending. This feature is currently experimental on versions 0.8+.

apiVersion: leaderworkerset.x-k8s.io/v1
kind: LeaderWorkerSet
metadata:
  name: leaderworkerset-sample
  annotations:
    leaderworkerset.sigs.k8s.io/experimental-recreate-group-after-start: true

Node Failure Handling

With RecreateGroupOnPodRestart (default): When a node fails, the entire group is recreated on healthy nodes.

With None: Only pods on the failed node are rescheduled. Other pods continue running.

Feedback

Was this page helpful?

Glad to hear it! Please tell us how we can improve.

Sorry to hear that. Please tell us how we can improve.

Last modified January 7, 2026: Adding docs for RecreateGroupAfterStart (#729) (8bc4f48)