A practical, non‑fluffy explainer for managers planning a move from VMware to KOB (Kubernetes‑on‑Bare‑metal / your Kubernetes platform). Keep it simple, keep it actionable.
TL;DR (1 minute)
- VMware = runs full virtual machines. Great for legacy apps, Windows workloads, and appliances.
- KOB (Kubernetes) = runs containers on Linux nodes. Great for modern microservices, APIs, batch jobs, and web apps.
- Why switch: reduce hypervisor/licensing spend, better hardware utilization, faster deploys, built‑in automation, consistent dev→prod flow.
- What it demands: new operating model (platform as a product), stronger DevOps discipline, app re‑platforming for some workloads, upgraded observability & backups.
- What not to force: ultra‑stateful legacy apps, vendor appliances, and OS‑tight workloads may stay on VMware (or move later with a special plan).
Audience & scope
This doc targets IT managers, product owners, and budget owners deciding whether/what/how to migrate from VMware to KOB.
Definitions (plain English)
- Container: a lightweight package that includes an app and its runtime. Starts in seconds. Not a full OS.
- Pod: one or more containers that run together.
- Node: a Linux server where pods run.
- Cluster: a set of nodes managed as one pool.
- Control plane: the brains of the cluster (API server, scheduler, etc.).
- CNI / CSI: plugins for networking (CNI) and storage (CSI).
When KOB makes sense (green flags)
- You run microservices, APIs, web frontends, ETL/ML jobs, or internal tools.
- Apps can run on Linux and don’t require a full Windows desktop/server OS.
- Teams can adopt CI/CD and Infrastructure‑as‑Code.
- You want faster releases, auto‑scaling, and self‑service environments.
When VMware stays (red/yellow flags)
- Vendor appliances that only ship as OVA/VM.
- Windows‑only or GUI‑heavy apps needing a full OS.
- Large stateful monoliths with tight kernel/driver dependencies.
Strategy: hybrid. Move “container‑friendly” apps first; keep/extract others later.
Architecture: how they differ
VMware (classic)
- ESXi hypervisors on hosts
- vCenter manages clusters
- vSwitch/NSX for networking
- vSAN/arrays for storage
- Workload unit = VM (guest OS + app)
KOB (Kubernetes)
- Linux on bare‑metal nodes
- Kubernetes control plane manages scheduling & desired state
- CNI for networking, CSI for storage
- Workload unit = Pod/Container
Key idea: VMware virtualizes hardware; Kubernetes orchestrates applications.
Cost & efficiency (high level)
- Licensing: KOB removes hypervisor licensing layers; you still pay for Linux support, container registry, and chosen add‑ons.
- Density: containers share the host OS → more apps per server (case‑by‑case).
- Operations: more automation (declarative configs), fewer “ticket‑driven” handoffs.
- Caveat: savings depend on app fit and team maturity. Budget for training and platform tooling.
Reliability, HA, and DR
- Kubernetes reschedules failed pods automatically; node failures are tolerated if capacity exists.
- Stateful: use StatefulSets + CSI volumes; design for RPO/RTO via snapshots, replication, and backups.
- DR: replicate data and cluster config (GitOps). Runbooks for cluster re‑creation and data restore.
Security model
- Shift‑left: image scanning, SBOMs, and policy checks in CI.
- Runtime: sandboxing (container isolation), least‑privilege (RBAC), and network policies (CNI).
- Secrets: use vaults or KMS‑backed secrets.
- Compliance: enforce via admission policies and audit logs.
Networking (simple view)
- VMware: vSwitch/NSX, VLANs, load balancers per VM networks.
- KOB: CNI creates pod networks; Services/Ingress expose apps; optional Service Mesh for mTLS and traffic shaping.
Translation map
- VIP / Load balancer → Service (LoadBalancer)
- Firewall rules → NetworkPolicies
- NSX features → CNI + (optionally) Service Mesh
Storage & data
- VMware VMDKs/vSAN → KOB uses CSI to provision persistent volumes.
- Prefer managed storage classes with clear IOPS/latency guarantees.
- For databases: start with operator‑managed or vendor‑supported containers; ensure backup/restore and DR are battle‑tested.
Observability & operations
- Metrics: Prometheus + dashboards.
- Logs: centralized (e.g., Loki/ELK).
- Traces: OpenTelemetry.
- GitOps: configs in Git, changes via PRs (Argo CD/Flux). Rollback = revert commit.
- Automation: autoscaling, health probes, self‑healing.
People, skills, and operating model
- Treat the platform as a product with an SLO.
- Upskill teams: containers, CI/CD, IaC, observability, on‑call.
- RACI example:
- Platform team: cluster lifecycle, security guardrails, shared services.
- App teams: Dockerfiles, Helm/Manifests, SLOs, on‑call for their apps.
- Security: policies, scanning, audits.
What migrates easily vs hard
Easier
- 12‑factor web services, APIs
- Batch workers, schedulers, ETL jobs
- Stateless services with external DBs
Harder
- Heavy stateful DBs without container‑ready ops
- Windows‑only workloads
- Vendor black‑box appliances (VM‑only)
Middle ground: KubeVirt (VMs inside Kubernetes) can help short‑term, but adds complexity. Use selectively.
Migration plan (phased)
- Discovery (2–4 weeks)
- Inventory apps, classify by migration fit (green/yellow/red).
- Map dependencies, RPO/RTO, compliance needs.
- Pilot (4–8 weeks)
- Pick 2–3 green‑flag services. Build CI/CD, observability, IaC. Prove deployments, scaling, rollback, and DR.
- Foundation (parallel)
- Harden platform: RBAC, network policies, backup, logging, monitoring, GitOps.
- Define golden paths (templates, Helm charts) and dev portal docs.
- Scale‑out (quarterly waves)
- Migrate app groups by domain. Hold post‑mortems and refine golden paths. Track KPIs.
- Legacy strategy
- Keep, retire, refactor, or replace. Use KubeVirt sparingly if needed.
KPIs to track (manager view)
- Time‑to‑deploy (code → prod)
- Change failure rate and MTTR
- Resource efficiency (CPU/RAM utilization, cost per service)
- SLO compliance (availability, latency)
- Incident volume before/after migration
Risk register (short list)
- Skill gap → training + pairing + external support for first waves
- Stateful data loss → tested backups, staged DR drills
- Security drift → policy‑as‑code, admission controls, regular audits
- Shadow configs → enforce GitOps only; block manual changes
Budgeting (what to expect)
- One‑off: training, consulting, platform bootstrap, initial hardware/network tweaks.
- Recurring: support for Linux, registries, observability stack, backup tooling, optional enterprise Kubernetes distro/support.
- Offset: reduced hypervisor licensing, better density, faster delivery (productivity gains).
FAQ (for execs)
- Will we shut down VMware? No. Expect hybrid: KOB for container‑friendly apps; VMware for VM‑bound workloads.
- Do we need to rewrite everything? No. Start with low‑risk services. Refactor selectively where ROI is clear.
- Is Kubernetes reliable? Yes, with the right SRE practices and capacity planning.
- What about DR and compliance? Possible and proven, but must be designed and tested—same as VMware.
Decision checklist (yes/no)
- Do we have at least 2–3 services that are container‑friendly and non‑critical for a pilot?
- Do we commit to GitOps, CI/CD, and observability standards?
- Do we have a platform team accountable for SLOs and golden paths?
- Have we budgeted for training and initial platform hardening?
- Do we accept a hybrid footprint for the next 12–24 months?
VMware → KOB feature mapping (quick reference)
- vCenter → Kubernetes API server + GitOps
- ESXi hosts → Worker nodes
- DRS/HA → Scheduler, Pod disruption budgets, multi‑AZ nodes
- vSwitch/NSX → CNI, NetworkPolicies, Ingress/Service Mesh
- vSAN/VMDK → CSI Persistent Volumes/StorageClasses
- VM templates → Container images, Helm charts
- Snapshots → CSI snapshots / backup tools
- vRealize/vROps → Prometheus/Grafana, Alertmanager
- vRO (automation) → Argo CD/Workflows, Flux, Terraform, Crossplane
- Load balancers → Service (LoadBalancer) / Ingress controllers
Appendix: adoption playbook (one page)
- Principles: platform as product; paved roads; security by default; everything as code.
- Golden path package: base Helm chart; CI pipeline template; logging/metrics sidecars; default network policy; SLO template.
- Governance: tenant quotas; namespace per team; RBAC roles; admission policies; image provenance.
- Runbooks: node failure, image rollback, PVC restore, cluster upgrade, DR failover.
- Cadence: monthly platform review; quarterly migration waves; bi‑annual DR test.
Next steps
- Approve pilot scope (apps + SLOs).
- Stand up golden path and GitOps.
- Schedule training for app teams (containers + CI/CD + observability).
- Define success metrics and QBR cadence.













