DEV Community

DevOps Start
DevOps Start

Posted on • Originally published at devopsstart.com

Fix CrashLoopBackOff in Kubernetes Pods

What CrashLoopBackOff Means

CrashLoopBackOff is not an error on its own. It is a status that tells you the container in a Pod started, exited, and Kubernetes is now waiting before it tries again. The kubelet restarts a failing container with an exponential backoff delay (10s, 20s, 40s, and so on, capped at 5 minutes). The "BackOff" part is the wait; the "CrashLoop" part is the repeated exit.

The key point: the container is doing exactly what you told it to do, then terminating. Your job is to find out why the process exits. The status itself never tells you the cause, so do not waste time staring at it. Go straight to the logs and events.

You can confirm the restart pattern with:

$ kubectl get pods -A | grep CrashLoop
$ kubectl get pod <pod-name> -o jsonpath='{.status.containerStatuses[0].restartCount}'
Enter fullscreen mode Exit fullscreen mode

A restart count climbing every minute or two confirms the loop. The official Pod lifecycle documentation explains how the kubelet drives these restart states.

Diagnose the Root Cause

Work through these four signals in order. One of them almost always points at the cause.

Container logs

Start with the current and previous container logs. The previous logs are critical because the running container may have already been killed:

kubectl logs <pod-name>
kubectl logs <pod-name> --previous
kubectl logs <pod-name> -c <container-name> --previous
Enter fullscreen mode Exit fullscreen mode

The --previous flag shows output from the last crashed instance. Use -c when the Pod has more than one container, since logs default to the first container only.

Pod events and state

kubectl describe surfaces scheduling problems, image pull failures, probe failures, and the exact exit reason:

kubectl describe pod <pod-name>
Enter fullscreen mode Exit fullscreen mode

Read the Events section at the bottom and the Last State block under the container status. Last State shows the Reason (for example Error or OOMKilled) and the Exit Code.

Exit codes

The exit code narrows the cause quickly:

kubectl get pod <pod-name> -o jsonpath='{.status.containerStatuses[0].lastState.terminated.exitCode}'
Enter fullscreen mode Exit fullscreen mode

Common values:

  • 0: the process exited cleanly. The container ran a short task and finished. Use a Job, not a Deployment, or keep the main process running.
  • 1: a generic application error. Check the logs.
  • 137: the process was killed by SIGKILL, usually OOMKilled or a failed liveness probe.
  • 139: a segmentation fault (SIGSEGV) inside the binary.
  • 143: terminated by SIGTERM during shutdown.

Common Causes and Fixes

Bad command or entrypoint

If there are no application logs at all, the container often never ran your code. A wrong command, args, or a binary that is not on the image PATH produces an immediate exit. Check what the manifest overrides:

kubectl get pod <pod-name> -o jsonpath='{.spec.containers[0].command}'
Enter fullscreen mode Exit fullscreen mode

Fix the command and args in the manifest, or correct the ENTRYPOINT/CMD in the Dockerfile, then rebuild.

Missing config or secret

A container that crashes on startup looking for an environment variable or mounted file is usually missing a ConfigMap or Secret. The Pod events will show CreateContainerConfigError if the reference itself is broken:

kubectl get configmap,secret -n <namespace>
kubectl describe pod <pod-name> | grep -A5 Events
Enter fullscreen mode Exit fullscreen mode

Create the missing object or fix the name in envFrom, valueFrom, or the volume reference.

Failing liveness or readiness probe

A liveness probe that fails repeatedly restarts the container, which looks identical to a crash loop. Look for Liveness probe failed in the events. The usual causes are a probe path that does not exist, a port mismatch, or an initialDelaySeconds that is too short for a slow-starting app.

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10
  failureThreshold: 3
Enter fullscreen mode Exit fullscreen mode

For apps with long warmups, add a startupProbe so the liveness probe does not fire until startup completes.

OOMKilled or resource limits

Exit code 137 with Reason: OOMKilled means the container exceeded its memory limit and the kernel killed it. The diagnosis flow is close enough to a standalone OOM kill that the step-by-step OOMKilled guide is worth following when memory is the trigger:

kubectl describe pod <pod-name> | grep -i oom
Enter fullscreen mode Exit fullscreen mode

Raise resources.limits.memory, or fix the leak in the app. Set a requests value too so the scheduler places the Pod on a node with enough memory:

resources:
  requests:
    memory: "256Mi"
  limits:
    memory: "512Mi"
Enter fullscreen mode Exit fullscreen mode

Dependency not ready

If the container exits because a database or upstream service is unreachable at boot, do not let it crash. Use an initContainer to wait for the dependency, or add retry logic in the app. An init container that blocks until the dependency answers keeps the main container from looping:

initContainers:
  - name: wait-for-db
    image: busybox:1.36
    command: ['sh', '-c', 'until nc -z db 5432; do sleep 2; done']
Enter fullscreen mode Exit fullscreen mode

Image issues

A wrong tag, a private registry without credentials, or a corrupt image shows as ImagePullBackOff or ErrImagePull rather than CrashLoopBackOff, but the two often get confused. Verify the image and pull secret:

kubectl describe pod <pod-name> | grep -i image
Enter fullscreen mode Exit fullscreen mode

Fix the tag, or attach an imagePullSecrets entry to the Pod or ServiceAccount.

Verify the Fix

After editing the manifest, apply it and roll the workload:

kubectl apply -f deployment.yaml
kubectl rollout restart deployment/<deployment-name>
kubectl rollout status deployment/<deployment-name>
Enter fullscreen mode Exit fullscreen mode

Watch the new Pods reach Running and confirm the restart count stays flat:

kubectl get pods -w
kubectl get pod <pod-name> -o jsonpath='{.status.containerStatuses[0].restartCount}'
Enter fullscreen mode Exit fullscreen mode

A restart count that holds steady for several minutes means the loop is broken. For a deeper walkthrough of the root causes behind a stuck container, see the CrashLoopBackOff root-cause breakdown.

Prevent It

  • Set both requests and limits for CPU and memory so the scheduler and kernel behave predictably.
  • Add a startupProbe for slow-starting apps and keep liveness probes lenient.
  • Test the image locally with docker run before deploying so entrypoint and command errors surface early.
  • Use initContainers to gate startup on real dependencies instead of crashing.
  • Ship a real /healthz endpoint that reflects actual readiness, not just process liveness.
  • Pin image tags to digests in production so a re-tagged image cannot break a running Deployment.

Top comments (0)