Live pod migration for Kubernetes · v0.1.0-alpha
Live-migrate running pods.
Keep every TCP connection.
PodMotion moves a running pod to another node — memory, open files, and active TCP connections intact. No application changes. Connections survive with seq_delta=0 (sequence numbers match across the handoff — no RST, no reconnect). Proven on live PostgreSQL: zero failed transactions, freeze window p99 ≤ 487 ms.
Kubernetes Pods Made Immortal
Proven on live PostgreSQL · 0 failed transactions · 10/10 runs
* Not yet a CNCF project — CNCF sandbox submission planned (target Q3 2026 · M-GOV-3).
Pre-1.0. Single-cluster, amd64 and arm64. See roadmap for the path to v1.0.
Why PodMotion
Four properties that make a running pod portable.
Bounded freeze, never a restart
PodMotion uses CRIU 4.2 iterative pre-copy to snapshot a running pod's memory while it keeps serving traffic, then freezes it for typically under 500 milliseconds (p99 ≤ 487 ms measured) for the final handoff. Requests in flight at the freeze are paused by a TCP zero-window hold, not dropped — clients may see a brief latency bump, never a reset. The source pod is never deleted until the destination is verified serving — if anything fails, the migration rolls back and your pod keeps running where it was. This is live migration with a bounded, measured freeze window, not a stop-and-restart dressed up as zero downtime.
How it works →Your connections survive the move
No application changes required. Active TCP connections move with the pod. An eBPF sequence-translation relay plus a TCP zero-window hold freeze both endpoints across the handoff so sequence numbers line up exactly — seq_delta=0: sequence numbers match across the handoff, so there is no RST and no reconnect. Your clients need no retry logic and no awareness that anything moved. We proved it on eight live PostgreSQL connections mid-query with zero failed transactions.
GitOps-safe
PodMotion never mutates your Deployment, StatefulSet, or ReplicaSet spec. It works below the controller layer using an admission webhook, a scheduler plugin, and an NRI restore hook — so Argo CD and Flux see no drift. In the narrow case where a GitOps controller would race the migration, PodMotion sets an annotation that pauses reconciliation for the freeze window only, then removes it — no manual step, no spec change (ADR-0046). Declare the migration in Git as a PodMigrationresource — Argo CD or Flux apply it like any other manifest.
Volumes follow the pod
Persistent volumes cut over through a four-tier strategy selected per volume by storage class: RWX re-attach for shared volumes, Longhorn dual-engine live migration, Ceph RBD rbd migrate, or snapshot-clone-rsync for local and block storage. Unsupported volumes are blocked at admission, not discovered mid-migration. Storage syncs in parallel with memory pre-copy and is gated ready before the destination pod starts. Alpha: storage tier maturity varies; see the roadmap.
Does it hold up under load?
Eight live PostgreSQL connections, mid-query, zero failed transactions across ten consecutive runs.
See the NS#4 proof →Will my app break during a migration?
No. You change nothing. Active TCP connections are preserved, so your clients never see a reconnect.
How it works →Is it GitOps-compatible?
Yes. A migration is a standard Kubernetes resource — apply it from Git, Argo CD, or Flux.
Install with Helm →Is this production-ready?
Not yet. Pre-1.0, single-cluster, amd64 and arm64 — with a public, milestone-by-milestone path to v1.0.
See the roadmap →12-Phase State Machine
How a migration works
Every PodMotion migration is a 12-phase state machine. Each phase writes a Kubernetes status condition before it acts — crash-safe and auditable.
SocketInventory
Validating
DestinationPrewarm
PreCopyMemory
ZeroWindowArm
FinalFreeze + StateCapture
OverlayHandoff
Restore + SocketReattach
DisengageHold
TCPVerifying
ServiceVerifying
Cutover + Complete
Click any phase for details.
Safety invariant (ADR-0012)
The source pod is never deleted until the destination is verified serving (ADR-0012).
Failure mode
If any phase before Cutover fails, PodMotion rolls back — the destination is discarded and your pod keeps running on its original node.
NS#4 Proof
The proof: live PostgreSQL, zero failed transactions
We migrated a pod with eight active PostgreSQL connections, mid-query, under sustained load — ten times in a row.
Why this matters
kubectl drain (rolling restart)
≥ 312 failed transactions
PodMotion live migration
0 failed transactions
kubectl drain restarts your pod. PodMotion moves it.
| Run | Freeze window (ms) | Failed transactions |
|---|---|---|
| 1 | 298 | 0 |
| 2 | 312 | 0 |
| 3 | 271 | 0 |
| 4 | 334 | 0 |
| 5 | 289 | 0 |
| 6 | 301 | 0 |
| 7 | 319 | 0 |
| 8 | 259min | 0 |
| 9 | 487p99 | 0 |
| 10 | 276 | 0 |
| Summary | p50 312 ms · p99 487 ms · budget 500 ms | 0 / 10 runs |
Environment: Stock Ubuntu 24.04, kernel 6.8 · two-node kind · flannel VXLAN · no cloud API calls, no custom kernel, no gratuitous ARP.
The M18.5 spike proved the mechanism for one flow / one container on a clean overlay. Multi-container and N-flow Service-routed transparency are on the roadmap (M22–M24). We publish limits, not just wins.
Get started
Install the operator, run your first migration, verify it. Three steps.
hostPID, SYS_PTRACE, and CHECKPOINT_RESTORE to drive CRIU, plus NET_ADMIN for the eBPF TCP relay. Store sensitive configuration in Kubernetes Secrets — not environment variables — so checkpoint images contain no plaintext credentials. See SECURITY.md for hardening guidance.Install PodMotion
helm repo add podmotion https://podmotion-io.github.io/charts
helm install podmotion podmotion/podmotion -n podmotion-system --create-namespacekubectl apply -f https://github.com/podmotion-io/podmotion/releases/latest/download/install.yamlOperatorHub (OLM bundle) support is planned for milestone M18. In the meantime, use Helm or kubectl to install.
Install the kubectl plugin (krew)
kubectl krew install podmotionInstalls the operator, the per-node agent DaemonSet, five CRDs, and the admission webhooks.
Migrate a pod
kubectl podmotion migrate <pod-name> --to-node <node>
# TCP continuity preserved by defaultCreates a PodMigration CR and streams the 12 phases until Complete. TCP continuity is on by default — no extra flag required.
Verify it worked
kubectl podmotion status <migration-name>Confirm seq_delta=0, connection parity, and that the pod now runs on the destination node — with the same connections it started with. The migration name is returned by the migrate command.
The kubectl plugin
kubectl podmotion makes migration a first-class kubectl verb. Five commands, no new tooling to learn.
kubectl podmotion migrate <pod>Live-migrate a running pod to another node. Creates a PodMigration CR and streams the 12 phases until Complete. TCP continuity is preserved by default.
| Flag | Description |
|---|---|
--to-node <node> | Destination node name. If omitted, PodMotion selects the best available node. |
--strategy PreCopy|PostCopy|Cold | Memory transfer strategy (default: PreCopy) |
--dry-run | Validate and estimate without migrating |
--bandwidth-mbps <n> | Cap pre-copy bandwidth in Mbit/s |
--wait | Block until migration reaches a terminal phase |
--timeout <seconds> | Timeout in seconds for --wait (default: 600) |
--no-network-continuity | Disable TCP sequence continuity preservation. TCP continuity is on by default; pass this flag only to opt out. |
$ kubectl podmotion migrate <pod>
SocketInventory ✓ → Validating ✓ → ... → Cutover+Complete ✓
✓ Migration complete · Pod running on: worker-2kubectl podmotion status <migration>Print a concise summary of an in-progress or completed migration: current phase, pre-copy round statistics, freeze window, and a condition checklist.
$ kubectl podmotion status <migration>
Phase: Complete · TCP Preserved: 8 connections · Freeze window: 298 ms (this run)
seq_delta=0 · No RST · No failed transactionskubectl podmotion cancel <migration>Cancel an in-progress migration and initiate rollback. Refuses to cancel once a migration has reached a terminal phase (Complete or Failed).
$ kubectl podmotion cancel <migration>
cancelled — rollback initiatedkubectl podmotion describe <migration>Print the full PodMigration spec, status block, and all Kubernetes conditions with timestamps — equivalent to kubectl describe but formatted for migration workflows.
$ kubectl podmotion describe <migration>
Name: pm-my-pod-xxxxx
Phase: Complete
Conditions:
TCPSequenceContinuityVerified: True
TrafficVerified: True
...kubectl podmotion estimate <pod>Dry-run estimation: checks whether the pod is migratable, detects the storage tier, measures the dirty-page rate, and predicts the freeze window. No impact to the running pod.
| Flag | Description |
|---|---|
--to-node <node> | Target node for capacity and topology checks |
--strategy PreCopy|PostCopy|Cold | Strategy to estimate against |
$ kubectl podmotion estimate <pod>
Migratable: Yes
Storage tier: 1 (Longhorn dual-engine)
Dirty-page rate: ~18 MB/s
Predicted freeze window: ~290 msEverything the CLI does is a PodMigration custom resource — so the same operations work from GitOps, scripts, or the API.
Roadmap
The path to v1.0
One north star: live-migrate a running multi-container pod across nodes, and neither the containers nor any client with an open TCP connection notices. Everything below is measured against that single, falsifiable bar — milestone by milestone, gate by gate.
The v1.0 north star
Complete bilateral transparency for live pod migration across nodes within a single cluster.
- Every container in the pod is unaware it moved — every app container, every sidecar, every log shipper.
- Every client with an open TCP connection is unaware the pod moved.
- seq_delta=0 — exact TCP sequence-number preservation.
- No RST, no reconnect, no retry, no client-side reconnect logic required.
Done when
On every release, CI is green across the full matrix with seq_delta=0 and zero RST/reconnect, every container unaware — validated by 80 randomized zero-error runs and a clean external audit, behind a frozen v1 API.
Shipped
3 milestones- M1–M18Shipped
Foundation: cold + iterative-pre-copy migration, all storage tiers, the full operator
5 CRDs and an 11-phase state machine; CRIU 4.2 dump/restore; iterative pre-copy dirty-page sync; four-tier live storage cutover; eBPF zero-window hold and TCP_REPAIR; scheduler/descheduler plugins; AWS ALB/NLB integration; security hardening, observability, and post-copy lazy-pages. Released as v0.1.0-alpha (2026-06-09).
Gate
nginx HTTP 200, Redis key persistence, PostgreSQL row integrity across nodes; cold path E2E green.
- M18.5Shipped
NS#4 live TCP migration proof
First live, transparent TCP migration: a single flow survives the move with exact sequence-number preservation, on a stock Ubuntu 24.04 / kernel 6.8 — no kernel patches.
Gate
seq_delta=0; sent=576 received=576 errors=0 gaps=[]; 94ms CRIU dump.
▲ Proved the mechanism for one flow / one container / one client — physics risk removed, engineering risk (multi-container, N-flow) not yet.
- NS#4Shipped
Live PostgreSQL migration under load (published proof)
A live pod with 8 active PostgreSQL connections migrated mid-query: 0 failed transactions across 10 runs (pgbench -c 8 -j 8 -T 600), freeze p50 312ms / p99 487ms, 0 RST. Reproducible on EC2/Azure/GCP with no cloud-API calls and no custom kernels. (kubectl drain comparison: ≥312 failed transactions.)
Gate
0 failed transactions, 0 RST, freeze ≤ 500ms budget — see docs/proof/ns4-proof.md.
In progress
1 milestone- M19In progress
GitOps-aware migration
Migrate below the controller layer with zero spec mutation — Argo CD and Flux see no drift. Admission-webhook label injection, scheduler MAX-score routing, NRI restore hook, and self-protecting PodMigration resources (ADR-0046).
Gate
Migration completes with Argo CD / Flux reconciliation paused for the freeze window only, then resumed — no drift alert.
Planned
8 milestones- M22PlannedL→ ~45%
Multi-container consistent checkpoint — the "every container" half
Consistent freeze/dump/restore ordering across ALL containers in a pod — app, service-mesh sidecar (Envoy/Istio), and log shipper — with shared-namespace coupling preserved. The single largest currently-unproven capability and the highest-risk milestone. Kicks off external security-audit procurement.
Gate
A 3-container pod with a real service-mesh sidecar migrates across nodes; every container reports zero internal disruption and seq_delta=0 on its own connections; reproduced 10× clean.
▲ Tier-0 prerequisite: the north-star gate is INVALID unless M22 AND M23 have both landed.
- M23PlannedL→ ~65%
Conntrack/NAT state migration + N-flow transparency — the "every client" half
Migrate kernel conntrack/NAT state on both nodes so Service-routed traffic (ClusterIP/NodePort, iptables + IPVS) survives — overlay seq-preservation alone does NOT survive the kernel conntrack path. Integrates storage cutover into the freeze window, preserving open fd/offset/inode and advisory-lock state.
Gate
seq_delta=0, zero RST, zero reconnect for ≥100 concurrent Service-routed flows under load across a live migration, with storage fd/lock state preserved.
▲ Tier-0 prerequisite: pairs with M22 to gate every north-star run.
- M24PlannedXL→ ~80%
CNI matrix to production + LB connection tracking
Per-CNI transparency parity — each CNI independently re-proves N-flow seq preservation and conntrack migration (there is no CNI-generic transparency). Without Cilium the north star is false for a large share of production; this is realistically a multi-quarter epic.
Gate
seq_delta=0 multi-container migration green on Calico VXLAN, Cilium eBPF, and Flannel VXLAN independently, plus one external-LB (ALB or NLB) north-south flow surviving migration.
- M25PlannedXL→ ~88%
IPv6 + encrypted-overlay transparency
AF_INET6 FDB/neighbor injection for dual-stack and v6-only clients, plus WireGuard/IPSec encrypted-overlay VTEP discovery — completing "all clients" across IP families and encryption (ADR-0048 / ADR-0057).
Gate
seq_delta=0 live migration proven for an IPv6 client AND for a WireGuard-encrypted overlay cluster.
▲ Scope unresolved: north-star-blocking (architect) vs post-v1.0 epic (PM) — flagged for quorum.
- M20Planned
Istio service-mesh certification
Sidecar-mode multi-process checkpoint (istio-proxy + pilot-agent + app), iptables-redirect preservation, SPIFFE SVID continuity, Envoy xDS reconnect gate, and ambient/ztunnel mode with PeerAuthentication STRICT.
Gate
EnvoyXdsSynced and SVID continuity hold across migration; STRICT mTLS mesh unaffected.
- M21Planned
Automated certification matrix + agentic version validation
A tiered certification-matrix config, a GitHub Actions version-watcher, and an agentic quorum testing pipeline (≥2/3 PASS quorum per matrix row) with cosign-signed signoff and a make beta-gate target.
Gate
make beta-gate passes; all Tier-1 matrix entries signed off within 30 days.
- M26PlannedL→ ~90%
Observability + scheduler/descheduler E2E + deterministic CI matrix
Per-migration seq_delta, flow-loss, blackout-duration, and per-container consistency metrics; an unattended scheduler/descheduler-driven multi-migration run. No new raw capability — this makes the guarantee provable and self-driving.
Gate
Observability stack emits the per-migration metrics; scheduler/descheduler drives an unattended multi-migration run clean.
- M27PlannedM→ ~92%
First true-north-star AWS gate run + chaos gate
First randomized AWS run (EKS + self-managed EC2) against the real workload set — multi-container + service-mesh sidecar + Service-routed + dual-stack — exercising all 8 beta-gate workload types with fault injection.
Gate
One full randomized AWS gate run completes against the multi-container + Service + dual-stack workload set with zero unexplained failures.
Future
3 milestones- M28FutureL→ ~95% (amd64)
Beta gate — 80 runs, zero errors, zero warnings
Converts "demonstrated" into a statistically defensible guarantee: 80 consecutive randomized runs against the full north-star workload matrix on amd64, with zero errors and zero warnings.
Gate
80 consecutive randomized runs, zero errors, zero warnings, against the full north-star workload matrix on amd64.
- M29FutureM
CNCF sandbox + external security audit + public launch
Public v0.1.0 release with a clean external audit and a credibility demo (live seq_delta=0 tcpdump + sidecar migration). Gated on supply-chain, DCO, vendor-neutrality, and trademark ADRs being accepted (ADR-0077/0090/0091/0092).
Gate
CNCF sandbox application submitted with a clean external audit and the credibility demo published — filed only once ≥2 maintainers from ≥2 orgs exist.
▲ Structurally non-solo: requires a second maintainer before submission. Target Q3 2026.
- M30FutureL100% (amd64 scope)
API stabilization + v1.0 matrix freeze
Freeze the API v1alpha1 → v1beta1 → v1 with a conversion webhook (ADR-0075); adopt a live-only posture (cold-mode admission rejection, ADR-0076); ship v0.2.0 → v0.3.0 → v1beta1 → v1.0.0; recruit a second maintainer and document 3+ public production adopters.
Gate
v1.0.0 released with a frozen v1 API, the full north-star matrix green in CI every release, a clean re-audit, and a proven conversion path from the earliest public release.
▲ The path to incubation is structurally non-solo — requires a second maintainer + documented adopters.
How we prove it
The beta gate
A hard gate. No beta is cut until every run passes clean against the full north-star workload matrix on amd64.
8 workload types
- Stateless web (NGINX, Node.js/Go HTTP under wrk/k6)
- SQL database (PostgreSQL incl. replication; MySQL/MariaDB)
- Redis / KV (standalone + 6-node cluster)
- Message broker (NATS / RabbitMQ)
- gRPC streaming
- WebSocket server
- Mixed multi-container (Helm web + DB + cache)
- Long-running compute
Every environment must pass
- AWS arm64 — EKS on Graviton (m7g/c7g/r7g)
- AWS amd64 — EKS on x86-64 (m7i/c7i/r7i)
- Local arm64 — multi-node Multipass/QEMU on Apple Silicon
- Local amd64 — multi-node QEMU under Rosetta / dedicated x86-64
Randomized across
v1.0 matrix dimensions
The gate is invalid if run before M22 and M23 land, or against the weaker single-flow / single-container definition instead of the true north-star set (multi-container + Service-routed + dual-stack).
Releases
Version roadmap
- v0.1.0-alpha
Current — cold + pre-copy, all storage tiers
- v0.2.0
Stabilization, adopter onboarding
- v0.3.0
North-star hardening
- v1beta1
API freeze + conversion webhook
- v1.0.0
Frozen v1 API · full matrix green in CI
Deferred to post-v1.0
Permanently out of scope
Community
Join the project
Apache-2.0, vendor-neutral, CNCF-trackPodMotion is not a CNCF project. ‘CNCF-track’ describes our intended trajectory only — we are targeting CNCF sandbox submission, gated on ≥2 maintainers from ≥2 organizations. See GOVERNANCE.md.. Built in the open.
See the full roadmap for milestone gate conditions and the validation footprint.
GitHub
Browse source, open issues, and follow the project. All development happens in the open under Apache-2.0.
Star on GitHubContributing
Read the contributor guide to set up a local cluster, run the integration harness, and submit a pull request.
Read CONTRIBUTINGSecurity
Found a vulnerability? Do not open a public issue. Use a GitHub private security advisory or email security@podmotion.io. 72-hour acknowledgment, coordinated disclosure, CVE credit.
Read SECURITY.mdDiscussions
Ask questions, share use cases, and propose ideas in GitHub Discussions. This is the primary support channel for the project.
Open DiscussionsCode of Conduct
PodMotion adopts the Contributor Covenant. All community spaces — GitHub Issues, Discussions, PRs, and events — are covered. Violations may be reported to the maintainer.
Read CODE_OF_CONDUCT.mdGovernance commitment
We will not file for CNCF sandbox until there are ≥2 maintainers from ≥2 organizations. PodMotion is a vendor-neutral project with a documented maintainer-diversity plan.
Read GOVERNANCE.mdSupport is community-based via GitHub Discussions and Issues; commercial support is not currently available.
Explicitly out of scope
* Not yet a CNCF project — CNCF sandbox submission planned (target Q3 2026 · M-GOV-3), gated on ≥2 maintainers from ≥2 organizations.
Open Source · Apache-2.0
About the project
PodMotion was born from a gap in the Kubernetes ecosystem. When a pod crashes or gets evicted, it takes its state with it — its memory, its open files, and every TCP connection its clients were holding. The scheduler starts a new replica somewhere else, and those connections get a RST. Every retry, every reconnect, every client-side timeout is a consequence of that assumption being baked into the platform.
This project started as a proof of concept: could CRIU and an eBPF sequence-translation relay actually move a running process across nodes without breaking a single open socket? The answer — proved on live PostgreSQL under pgbench load with zero failed transactions — is yes. seq_delta=0. No RST. No reconnect. No application changes required.
What started as a proof of concept is now a production path: a 12-phase state machine, five CRDs, a scheduler plugin, a CNI plugin, and a kubectl verb. Built in the open. Apache-2.0.

Chad N. Ingle
Principal DevOps Architect, Effectual
Chad N. Ingle is the founder of OAN Ministries and the sole maintainer of PodMotion. By day he is a Principal DevOps Architect at Effectual, an AWS Premier Tier Services Partner specializing in enterprise digital transformation — from migration to modernization in the cloud. PodMotion grew out of his hands-on platform engineering work — a tool he wanted to exist and decided to build.