Kubernetes Security Best Practices: Production Checklist for Real Clusters
On this page
Production Kubernetes Security Is Mostly About Restraint
Most Kubernetes incidents do not begin with an exotic exploit. They begin with a cluster that was too open, too trusting, or too hard to reason about under pressure.
That is why production Kubernetes security is less about buying one more tool and more about tightening defaults.
The public lesson has repeated itself for years. In one widely reported Tesla cryptojacking incident, attackers found an exposed Kubernetes console and used the access to run mining workloads and reach sensitive cloud resources. The exact environment details were unusual, but the core pattern was familiar: exposed management surface, weak access boundaries, and too much trust inside the cluster.
If you are running Kubernetes in production, this checklist is the right baseline.
1. Lock Down Access First
If RBAC is loose, everything else becomes optional.
Production checklist:
- No human users bound to cluster-admin for day-to-day work
- Separate roles for deploy, read-only, break-glass, and cluster operations
- Short-lived federated access instead of long-lived local credentials
- automountServiceAccountToken: false unless the workload truly needs API access
- Periodic review of service accounts, ClusterRoleBindings, and namespace-level roles
Example:
An internal metrics service does not need permission to list secrets, create pods, or read all configmaps in the namespace. It usually needs almost nothing. Yet many teams still deploy it with a broad default service account because it is convenient.
2. Enforce Restricted Pod Security by Default
Production workloads should not run as root, should not add Linux capabilities casually, and should not get writable filesystems without a reason.
Minimum baseline:
securityContext:
runAsNonRoot: true
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
seccompProfile:
type: RuntimeDefault
Add Pod Security Admission labels to production namespaces and make exceptions rare and explicit.
3. Treat the Cluster Network as Hostile
The default flat network is one of Kubernetes' worst habits in production.
Checklist:
- Default-deny ingress and egress at namespace level
- Explicit allow rules between frontend, API, data, and observability tiers
- No broad east-west access for convenience
- Private control plane access where the platform allows it
- Internet exposure only through reviewed ingress paths
Case pattern:
One compromised pod should not become a sightseeing pass for the rest of the cluster.
4. Stop Storing Secrets Like Configuration Files
Base64 is transport encoding, not protection.
In production, prefer:
- External Secrets Operator with Vault, AWS Secrets Manager, Azure Key Vault, or GCP Secret Manager
- Encryption at rest for etcd-backed secrets
- Narrow namespace and workload access to secrets
- Rotation plans for database credentials, API keys, and signing keys
If an etcd snapshot leak or backup exposure would reveal plaintext credentials, the design is still fragile.
5. Add Admission Control Before Developers Add Exceptions Everywhere
Use Kyverno or OPA Gatekeeper to block the patterns you already know are dangerous.
High-value policies:
- Reject privileged containers
- Reject :latest image tags
- Require image digests for production
- Enforce resource limits
- Block hostPath, hostPID, and hostNetwork unless approved
- Require non-root execution and seccomp profile
6. Sign, Scan, and Pin Images
Production clusters should not be pulling whatever tag happens to exist today.
Checklist:
- Scan images in CI with Trivy or equivalent
- Pin images to immutable digests
- Prefer minimal base images
- Sign release images with Cosign
- Restrict image sources to approved registries
This is also where software supply chain discipline starts paying off.
7. Watch Runtime Behavior, Not Just Manifests
Static reviews find misconfigurations. Runtime monitoring catches what changed after deployment or what only becomes visible during an intrusion.
High-signal runtime events:
- Shell spawned inside an application container
- Unexpected outbound traffic from a normally quiet workload
- Reads of sensitive files such as /proc/1/environ
- New binaries written to a container filesystem
- Container processes contacting mining pools or suspicious control servers
Falco and managed runtime detection tools can help here, but only if somebody owns triage.
8. Practice Cluster Recovery, Not Just Cluster Creation
Production Kubernetes security is also about how you recover from a bad day.
Make sure you have:
- Backup strategy for manifests, state, and supporting data stores
- Documented secret rotation path after compromise
- A way to isolate namespaces or workloads quickly
- Audit logs retained somewhere attackers cannot silently erase
- A tested playbook for compromised service account or leaked kubeconfig scenarios
Production Kubernetes Checklist
- No routine use of cluster-admin
- Pod Security Admission enforced on production namespaces
- Default-deny network policies in place
- Secrets externalized or tightly controlled
- Admission control blocks known-bad patterns
- Images are scanned, signed, and pinned
- Runtime detection configured and monitored
- Audit logs retained and reviewed
- Break-glass access documented and time-bounded
- Recovery procedures tested
Further Reading
- Kubernetes Security Checklist
- NSA and CISA Kubernetes Hardening Guidance
- CIS Kubernetes Benchmark
- Kubernetes Pod Security Standards
Related SecureCodeReviews guides:
- Kubernetes Security Complete Guide
- Top 10 Kubernetes Security Misconfigurations
- How to Store Secrets Securely in Kubernetes
The safest production clusters are usually the least surprising ones. Tight roles, narrow network paths, boring workload defaults, and tested recovery plans still beat cleverness.
Need a cloud security review before rollout?
We review IAM, network exposure, storage security, deployment posture, and the misconfigurations that usually get missed in fast-moving teams.
Advertisement
Free Security Tools
Try our tools now
Expert Services
Get professional help
OWASP Top 10
Learn the top risks
Related Articles
Cloud Security Guide: AWS, Azure & GCP Misconfigurations 2025
Master cloud security with comprehensive guides on S3 bucket security, IAM policies, secrets management, and real breach case studies.
Cloud Security in 2025: Comprehensive Guide for AWS, Azure & GCP
Deep-dive into cloud security best practices across all three major providers. Covers IAM, network security, data encryption, compliance, and real-world misconfigurations that led to breaches.
Security Misconfiguration Jumped to #2 in OWASP 2025: Complete Prevention Guide
Security misconfiguration surged from #5 to #2 in the OWASP Top 10 2025. Cloud misconfigs, default credentials, verbose errors, and unnecessary features expose millions of applications. This guide covers the most exploited misconfigurations with fixes.