Kubernetes Security Best Practices: Production Checklist for Real Clusters

Production Kubernetes Security Is Mostly About Restraint

Most Kubernetes incidents do not begin with an exotic exploit. They begin with a cluster that was too open, too trusting, or too hard to reason about under pressure.

That is why production Kubernetes security is less about buying one more tool and more about tightening defaults.

The public lesson has repeated itself for years. In one widely reported Tesla cryptojacking incident, attackers found an exposed Kubernetes console and used the access to run mining workloads and reach sensitive cloud resources. The exact environment details were unusual, but the core pattern was familiar: exposed management surface, weak access boundaries, and too much trust inside the cluster.

If you are running Kubernetes in production, this checklist is the right baseline.

1. Lock Down Access First

If RBAC is loose, everything else becomes optional.

Production checklist:

No human users bound to cluster-admin for day-to-day work
Separate roles for deploy, read-only, break-glass, and cluster operations
Short-lived federated access instead of long-lived local credentials
automountServiceAccountToken: false unless the workload truly needs API access
Periodic review of service accounts, ClusterRoleBindings, and namespace-level roles

Example:

An internal metrics service does not need permission to list secrets, create pods, or read all configmaps in the namespace. It usually needs almost nothing. Yet many teams still deploy it with a broad default service account because it is convenient.

2. Enforce Restricted Pod Security by Default

Production workloads should not run as root, should not add Linux capabilities casually, and should not get writable filesystems without a reason.

Minimum baseline:

securityContext:
  runAsNonRoot: true
  allowPrivilegeEscalation: false
  readOnlyRootFilesystem: true
  seccompProfile:
    type: RuntimeDefault

Add Pod Security Admission labels to production namespaces and make exceptions rare and explicit.

3. Treat the Cluster Network as Hostile

The default flat network is one of Kubernetes' worst habits in production.

Checklist:

Default-deny ingress and egress at namespace level
Explicit allow rules between frontend, API, data, and observability tiers
No broad east-west access for convenience
Private control plane access where the platform allows it
Internet exposure only through reviewed ingress paths

Case pattern:

One compromised pod should not become a sightseeing pass for the rest of the cluster.

4. Stop Storing Secrets Like Configuration Files

Base64 is transport encoding, not protection.

In production, prefer:

External Secrets Operator with Vault, AWS Secrets Manager, Azure Key Vault, or GCP Secret Manager
Encryption at rest for etcd-backed secrets
Narrow namespace and workload access to secrets
Rotation plans for database credentials, API keys, and signing keys

If an etcd snapshot leak or backup exposure would reveal plaintext credentials, the design is still fragile.

5. Add Admission Control Before Developers Add Exceptions Everywhere

Use Kyverno or OPA Gatekeeper to block the patterns you already know are dangerous.

High-value policies:

Reject privileged containers
Reject :latest image tags
Require image digests for production
Enforce resource limits
Block hostPath, hostPID, and hostNetwork unless approved
Require non-root execution and seccomp profile

6. Sign, Scan, and Pin Images

Production clusters should not be pulling whatever tag happens to exist today.

Checklist:

Scan images in CI with Trivy or equivalent
Pin images to immutable digests
Prefer minimal base images
Sign release images with Cosign
Restrict image sources to approved registries

This is also where software supply chain discipline starts paying off.

7. Watch Runtime Behavior, Not Just Manifests

Static reviews find misconfigurations. Runtime monitoring catches what changed after deployment or what only becomes visible during an intrusion.

High-signal runtime events:

Shell spawned inside an application container
Unexpected outbound traffic from a normally quiet workload
Reads of sensitive files such as /proc/1/environ
New binaries written to a container filesystem
Container processes contacting mining pools or suspicious control servers

Falco and managed runtime detection tools can help here, but only if somebody owns triage.

8. Practice Cluster Recovery, Not Just Cluster Creation

Production Kubernetes security is also about how you recover from a bad day.

Make sure you have:

Backup strategy for manifests, state, and supporting data stores
Documented secret rotation path after compromise
A way to isolate namespaces or workloads quickly
Audit logs retained somewhere attackers cannot silently erase
A tested playbook for compromised service account or leaked kubeconfig scenarios

Production Kubernetes Checklist

No routine use of cluster-admin
Pod Security Admission enforced on production namespaces
Default-deny network policies in place
Secrets externalized or tightly controlled
Admission control blocks known-bad patterns
Images are scanned, signed, and pinned
Runtime detection configured and monitored
Audit logs retained and reviewed
Break-glass access documented and time-bounded
Recovery procedures tested

Kubernetes Security Best Practices: Production Checklist for Real Clusters

Production Kubernetes Security Is Mostly About Restraint

1. Lock Down Access First

2. Enforce Restricted Pod Security by Default

3. Treat the Cluster Network as Hostile

4. Stop Storing Secrets Like Configuration Files

5. Add Admission Control Before Developers Add Exceptions Everywhere

6. Sign, Scan, and Pin Images

7. Watch Runtime Behavior, Not Just Manifests

8. Practice Cluster Recovery, Not Just Cluster Creation

Production Kubernetes Checklist

Further Reading

Need a cloud security review before rollout?

Related Articles

Cloud Security Guide: AWS, Azure & GCP Misconfigurations 2025

Cloud Security in 2025: Comprehensive Guide for AWS, Azure & GCP

Security Misconfiguration Jumped to #2 in OWASP 2025: Complete Prevention Guide