PodMotion

Live pod migration for Kubernetes · v0.1.0-alpha

Live-migrate running pods.
Keep every TCP connection.

PodMotion moves a running pod to another node — memory, open files, and active TCP connections intact. No application changes. Connections survive with seq_delta=0 (sequence numbers match across the handoff — no RST, no reconnect). Proven on live PostgreSQL: zero failed transactions, freeze window p99 ≤ 487 ms.

kubectl podmotion
$ kubectl podmotion migrate pgbench-client --to-node worker-2
# TCP continuity preserved by default
Migration complete · seq_delta=0 · freeze 312 ms

Kubernetes Pods Made Immortal

Proven on live PostgreSQL · 0 failed transactions · 10/10 runs

seq_delta=0freeze p50 312 ms / p99 487 ms0 RSTkernel 6.8 · no patches
Apache-2.0CRIU 4.2KEP-2008 alignedCNCF-track*PodMotion is not a CNCF project. ‘CNCF-track’ describes our intended trajectory only — we are targeting CNCF sandbox submission, gated on ≥2 maintainers from ≥2 organizations. See GOVERNANCE.md.

* Not yet a CNCF project — CNCF sandbox submission planned (target Q3 2026 · M-GOV-3).

Pre-1.0. Single-cluster, amd64 and arm64. See roadmap for the path to v1.0.

Why PodMotion

Four properties that make a running pod portable.

Bounded freeze, never a restart

PodMotion uses CRIU 4.2 iterative pre-copy to snapshot a running pod's memory while it keeps serving traffic, then freezes it for typically under 500 milliseconds (p99 ≤ 487 ms measured) for the final handoff. Requests in flight at the freeze are paused by a TCP zero-window hold, not dropped — clients may see a brief latency bump, never a reset. The source pod is never deleted until the destination is verified serving — if anything fails, the migration rolls back and your pod keeps running where it was. This is live migration with a bounded, measured freeze window, not a stop-and-restart dressed up as zero downtime.

How it works →

Your connections survive the move

No application changes required. Active TCP connections move with the pod. An eBPF sequence-translation relay plus a TCP zero-window hold freeze both endpoints across the handoff so sequence numbers line up exactly — seq_delta=0: sequence numbers match across the handoff, so there is no RST and no reconnect. Your clients need no retry logic and no awareness that anything moved. We proved it on eight live PostgreSQL connections mid-query with zero failed transactions.

See the NS#4 proof →

GitOps-safe

PodMotion never mutates your Deployment, StatefulSet, or ReplicaSet spec. It works below the controller layer using an admission webhook, a scheduler plugin, and an NRI restore hook — so Argo CD and Flux see no drift. In the narrow case where a GitOps controller would race the migration, PodMotion sets an annotation that pauses reconciliation for the freeze window only, then removes it — no manual step, no spec change (ADR-0046). Declare the migration in Git as a PodMigrationresource — Argo CD or Flux apply it like any other manifest.

Architecture overview →

Volumes follow the pod

Persistent volumes cut over through a four-tier strategy selected per volume by storage class: RWX re-attach for shared volumes, Longhorn dual-engine live migration, Ceph RBD rbd migrate, or snapshot-clone-rsync for local and block storage. Unsupported volumes are blocked at admission, not discovered mid-migration. Storage syncs in parallel with memory pre-copy and is gated ready before the destination pod starts. Alpha: storage tier maturity varies; see the roadmap.

Storage tiers →
Platform / SRE

Does it hold up under load?

Eight live PostgreSQL connections, mid-query, zero failed transactions across ten consecutive runs.

See the NS#4 proof →
App developer

Will my app break during a migration?

No. You change nothing. Active TCP connections are preserved, so your clients never see a reconnect.

How it works →
DevOps / GitOps

Is it GitOps-compatible?

Yes. A migration is a standard Kubernetes resource — apply it from Git, Argo CD, or Flux.

Install with Helm →
Engineering leadership

Is this production-ready?

Not yet. Pre-1.0, single-cluster, amd64 and arm64 — with a public, milestone-by-milestone path to v1.0.

See the roadmap →

12-Phase State Machine

How a migration works

Every PodMotion migration is a 12-phase state machine. Each phase writes a Kubernetes status condition before it acts — crash-safe and auditable.

Pre-freeze (phases 1–5)Freeze (phase 6)Restoration & verify (phases 7–12)

Safety invariant (ADR-0012)

The source pod is never deleted until the destination is verified serving (ADR-0012).

Failure mode

If any phase before Cutover fails, PodMotion rolls back — the destination is discarded and your pod keeps running on its original node.

NS#4 Proof

The proof: live PostgreSQL, zero failed transactions

We migrated a pod with eight active PostgreSQL connections, mid-query, under sustained load — ten times in a row.

Why this matters

kubectl drain (rolling restart)

≥ 312 failed transactions

PodMotion live migration

0 failed transactions

kubectl drain restarts your pod. PodMotion moves it.

10 consecutive runs — PostgreSQL pgbench -c 8, Ubuntu 24.04 / kernel 6.8, flannel VXLAN, budget 500 ms
RunFreeze window (ms)Failed transactions
12980
23120
32710
43340
52890
63010
73190
8259min0
9487p990
102760
Summaryp50 312 ms · p99 487 ms · budget 500 ms0 / 10 runs
seq_delta=0 (all 10 runs)0 RST segments0 disconnections8 PM_SEQ_XLAT entriesseq continuity: Pass2% netem loss run: Pass

Environment: Stock Ubuntu 24.04, kernel 6.8 · two-node kind · flannel VXLAN · no cloud API calls, no custom kernel, no gratuitous ARP.

The M18.5 spike proved the mechanism for one flow / one container on a clean overlay. Multi-container and N-flow Service-routed transparency are on the roadmap (M22–M24). We publish limits, not just wins.

Get started

Install the operator, run your first migration, verify it. Three steps.

Prerequisites: Requires CRIU 4.2, Linux kernel 5.10 or newer (tested on 6.8), and amd64 and arm64 nodes (wider arch certification planned). Single-cluster only in v0.x.
Kernel prerequisites: Live process migration requires direct kernel access. The per-node agent DaemonSet uses hostPID, SYS_PTRACE, and CHECKPOINT_RESTORE to drive CRIU, plus NET_ADMIN for the eBPF TCP relay. Store sensitive configuration in Kubernetes Secrets — not environment variables — so checkpoint images contain no plaintext credentials. See SECURITY.md for hardening guidance.
Network security: Inter-component gRPC (port 9090) uses mTLS when TLS certificates are configured. Without certificates the agent falls back to plaintext. Enable TLS in production — see SECURITY.md for certificate configuration.
1

Install PodMotion

helm repo add podmotion https://podmotion-io.github.io/charts
helm install podmotion podmotion/podmotion -n podmotion-system --create-namespace

Install the kubectl plugin (krew)

kubectl krew install podmotion

Installs the operator, the per-node agent DaemonSet, five CRDs, and the admission webhooks.

2

Migrate a pod

kubectl podmotion migrate <pod-name> --to-node <node>
# TCP continuity preserved by default

Creates a PodMigration CR and streams the 12 phases until Complete. TCP continuity is on by default — no extra flag required.

3

Verify it worked

kubectl podmotion status <migration-name>

Confirm seq_delta=0, connection parity, and that the pod now runs on the destination node — with the same connections it started with. The migration name is returned by the migrate command.

The kubectl plugin

kubectl podmotion makes migration a first-class kubectl verb. Five commands, no new tooling to learn.

kubectl podmotion migrate <pod>

Live-migrate a running pod to another node. Creates a PodMigration CR and streams the 12 phases until Complete. TCP continuity is preserved by default.

FlagDescription
--to-node <node>Destination node name. If omitted, PodMotion selects the best available node.
--strategy PreCopy|PostCopy|ColdMemory transfer strategy (default: PreCopy)
--dry-runValidate and estimate without migrating
--bandwidth-mbps <n>Cap pre-copy bandwidth in Mbit/s
--waitBlock until migration reaches a terminal phase
--timeout <seconds>Timeout in seconds for --wait (default: 600)
--no-network-continuityDisable TCP sequence continuity preservation. TCP continuity is on by default; pass this flag only to opt out.
$ kubectl podmotion migrate <pod>
SocketInventory ✓ → Validating ✓ → ... → Cutover+Complete ✓
✓ Migration complete · Pod running on: worker-2

Everything the CLI does is a PodMigration custom resource — so the same operations work from GitOps, scripts, or the API.

Roadmap

The path to v1.0

One north star: live-migrate a running multi-container pod across nodes, and neither the containers nor any client with an open TCP connection notices. Everything below is measured against that single, falsifiable bar — milestone by milestone, gate by gate.

The v1.0 north star

Complete bilateral transparency for live pod migration across nodes within a single cluster.

  • Every container in the pod is unaware it moved — every app container, every sidecar, every log shipper.
  • Every client with an open TCP connection is unaware the pod moved.
  • seq_delta=0 — exact TCP sequence-number preservation.
  • No RST, no reconnect, no retry, no client-side reconnect logic required.

Done when

On every release, CI is green across the full matrix with seq_delta=0 and zero RST/reconnect, every container unaware — validated by 80 randomized zero-error runs and a clean external audit, behind a frozen v1 API.

Single-clusteramd64no kernel patches for the v1.0 gate.
ShippedIn progressPlannedFuture

Shipped

3 milestones
  1. M1–M18Shipped

    Foundation: cold + iterative-pre-copy migration, all storage tiers, the full operator

    5 CRDs and an 11-phase state machine; CRIU 4.2 dump/restore; iterative pre-copy dirty-page sync; four-tier live storage cutover; eBPF zero-window hold and TCP_REPAIR; scheduler/descheduler plugins; AWS ALB/NLB integration; security hardening, observability, and post-copy lazy-pages. Released as v0.1.0-alpha (2026-06-09).

    Gate

    nginx HTTP 200, Redis key persistence, PostgreSQL row integrity across nodes; cold path E2E green.

  2. M18.5Shipped

    NS#4 live TCP migration proof

    First live, transparent TCP migration: a single flow survives the move with exact sequence-number preservation, on a stock Ubuntu 24.04 / kernel 6.8 — no kernel patches.

    Gate

    seq_delta=0; sent=576 received=576 errors=0 gaps=[]; 94ms CRIU dump.

    Proved the mechanism for one flow / one container / one client — physics risk removed, engineering risk (multi-container, N-flow) not yet.

  3. NS#4Shipped

    Live PostgreSQL migration under load (published proof)

    A live pod with 8 active PostgreSQL connections migrated mid-query: 0 failed transactions across 10 runs (pgbench -c 8 -j 8 -T 600), freeze p50 312ms / p99 487ms, 0 RST. Reproducible on EC2/Azure/GCP with no cloud-API calls and no custom kernels. (kubectl drain comparison: ≥312 failed transactions.)

    Gate

    0 failed transactions, 0 RST, freeze ≤ 500ms budget — see docs/proof/ns4-proof.md.

In progress

1 milestone
  1. M19In progress

    GitOps-aware migration

    Migrate below the controller layer with zero spec mutation — Argo CD and Flux see no drift. Admission-webhook label injection, scheduler MAX-score routing, NRI restore hook, and self-protecting PodMigration resources (ADR-0046).

    Gate

    Migration completes with Argo CD / Flux reconciliation paused for the freeze window only, then resumed — no drift alert.

Planned

8 milestones
  1. M22PlannedL→ ~45%

    Multi-container consistent checkpoint — the "every container" half

    Consistent freeze/dump/restore ordering across ALL containers in a pod — app, service-mesh sidecar (Envoy/Istio), and log shipper — with shared-namespace coupling preserved. The single largest currently-unproven capability and the highest-risk milestone. Kicks off external security-audit procurement.

    Gate

    A 3-container pod with a real service-mesh sidecar migrates across nodes; every container reports zero internal disruption and seq_delta=0 on its own connections; reproduced 10× clean.

    Tier-0 prerequisite: the north-star gate is INVALID unless M22 AND M23 have both landed.

  2. M23PlannedL→ ~65%

    Conntrack/NAT state migration + N-flow transparency — the "every client" half

    Migrate kernel conntrack/NAT state on both nodes so Service-routed traffic (ClusterIP/NodePort, iptables + IPVS) survives — overlay seq-preservation alone does NOT survive the kernel conntrack path. Integrates storage cutover into the freeze window, preserving open fd/offset/inode and advisory-lock state.

    Gate

    seq_delta=0, zero RST, zero reconnect for ≥100 concurrent Service-routed flows under load across a live migration, with storage fd/lock state preserved.

    Tier-0 prerequisite: pairs with M22 to gate every north-star run.

  3. M24PlannedXL→ ~80%

    CNI matrix to production + LB connection tracking

    Per-CNI transparency parity — each CNI independently re-proves N-flow seq preservation and conntrack migration (there is no CNI-generic transparency). Without Cilium the north star is false for a large share of production; this is realistically a multi-quarter epic.

    Gate

    seq_delta=0 multi-container migration green on Calico VXLAN, Cilium eBPF, and Flannel VXLAN independently, plus one external-LB (ALB or NLB) north-south flow surviving migration.

  4. M25PlannedXL→ ~88%

    IPv6 + encrypted-overlay transparency

    AF_INET6 FDB/neighbor injection for dual-stack and v6-only clients, plus WireGuard/IPSec encrypted-overlay VTEP discovery — completing "all clients" across IP families and encryption (ADR-0048 / ADR-0057).

    Gate

    seq_delta=0 live migration proven for an IPv6 client AND for a WireGuard-encrypted overlay cluster.

    Scope unresolved: north-star-blocking (architect) vs post-v1.0 epic (PM) — flagged for quorum.

  5. M20Planned

    Istio service-mesh certification

    Sidecar-mode multi-process checkpoint (istio-proxy + pilot-agent + app), iptables-redirect preservation, SPIFFE SVID continuity, Envoy xDS reconnect gate, and ambient/ztunnel mode with PeerAuthentication STRICT.

    Gate

    EnvoyXdsSynced and SVID continuity hold across migration; STRICT mTLS mesh unaffected.

  6. M21Planned

    Automated certification matrix + agentic version validation

    A tiered certification-matrix config, a GitHub Actions version-watcher, and an agentic quorum testing pipeline (≥2/3 PASS quorum per matrix row) with cosign-signed signoff and a make beta-gate target.

    Gate

    make beta-gate passes; all Tier-1 matrix entries signed off within 30 days.

  7. M26PlannedL→ ~90%

    Observability + scheduler/descheduler E2E + deterministic CI matrix

    Per-migration seq_delta, flow-loss, blackout-duration, and per-container consistency metrics; an unattended scheduler/descheduler-driven multi-migration run. No new raw capability — this makes the guarantee provable and self-driving.

    Gate

    Observability stack emits the per-migration metrics; scheduler/descheduler drives an unattended multi-migration run clean.

  8. M27PlannedM→ ~92%

    First true-north-star AWS gate run + chaos gate

    First randomized AWS run (EKS + self-managed EC2) against the real workload set — multi-container + service-mesh sidecar + Service-routed + dual-stack — exercising all 8 beta-gate workload types with fault injection.

    Gate

    One full randomized AWS gate run completes against the multi-container + Service + dual-stack workload set with zero unexplained failures.

Future

3 milestones
  1. M28FutureL→ ~95% (amd64)

    Beta gate — 80 runs, zero errors, zero warnings

    Converts "demonstrated" into a statistically defensible guarantee: 80 consecutive randomized runs against the full north-star workload matrix on amd64, with zero errors and zero warnings.

    Gate

    80 consecutive randomized runs, zero errors, zero warnings, against the full north-star workload matrix on amd64.

  2. M29FutureM

    CNCF sandbox + external security audit + public launch

    Public v0.1.0 release with a clean external audit and a credibility demo (live seq_delta=0 tcpdump + sidecar migration). Gated on supply-chain, DCO, vendor-neutrality, and trademark ADRs being accepted (ADR-0077/0090/0091/0092).

    Gate

    CNCF sandbox application submitted with a clean external audit and the credibility demo published — filed only once ≥2 maintainers from ≥2 orgs exist.

    Structurally non-solo: requires a second maintainer before submission. Target Q3 2026.

  3. M30FutureL100% (amd64 scope)

    API stabilization + v1.0 matrix freeze

    Freeze the API v1alpha1 → v1beta1 → v1 with a conversion webhook (ADR-0075); adopt a live-only posture (cold-mode admission rejection, ADR-0076); ship v0.2.0 → v0.3.0 → v1beta1 → v1.0.0; recruit a second maintainer and document 3+ public production adopters.

    Gate

    v1.0.0 released with a frozen v1 API, the full north-star matrix green in CI every release, a clean re-audit, and a proven conversion path from the earliest public release.

    The path to incubation is structurally non-solo — requires a second maintainer + documented adopters.

How we prove it

The beta gate

80consecutive randomized runs · zero errors · zero warnings

A hard gate. No beta is cut until every run passes clean against the full north-star workload matrix on amd64.

8 workload types

  • Stateless web (NGINX, Node.js/Go HTTP under wrk/k6)
  • SQL database (PostgreSQL incl. replication; MySQL/MariaDB)
  • Redis / KV (standalone + 6-node cluster)
  • Message broker (NATS / RabbitMQ)
  • gRPC streaming
  • WebSocket server
  • Mixed multi-container (Helm web + DB + cache)
  • Long-running compute

Every environment must pass

  • AWS arm64 — EKS on Graviton (m7g/c7g/r7g)
  • AWS amd64 — EKS on x86-64 (m7i/c7i/r7i)
  • Local arm64 — multi-node Multipass/QEMU on Apple Silicon
  • Local amd64 — multi-node QEMU under Rosetta / dedicated x86-64

Randomized across

Architecture: arm64 + amd64Pod memory: 64MB / 256MB / 1GB / 4GBWrite rate: idle / 10 MB/s / 100 MB/sOpen TCP connections: 0 / 10 / 1000PVC size: none / 1GB / 100GBConcurrent migrations: 1 / 4 / 8Node count: 3 / 6 / 12

v1.0 matrix dimensions

CNIs: Calico VXLAN · Cilium eBPF · Flannel VXLANClouds: EKS · self-managed EC2 · kubeadm/kindK8s versions: N · N-1 · N-2IP families: IPv4 + IPv6 · ClusterIP/NodePort · ALB/NLB

The gate is invalid if run before M22 and M23 land, or against the weaker single-flow / single-container definition instead of the true north-star set (multi-container + Service-routed + dual-stack).

Releases

Version roadmap

  1. v0.1.0-alpha

    Current — cold + pre-copy, all storage tiers

  2. v0.2.0

    Stabilization, adopter onboarding

  3. v0.3.0

    North-star hardening

  4. v1beta1

    API freeze + conversion webhook

  5. v1.0.0

    Frozen v1 API · full matrix green in CI

Deferred to post-v1.0

Live-storage dual-engine / concurrentv1.2 epic (ADR-0044)GPU / device-attached migrationpermanently feature-gated (ADR-0032)ARM64 production validationfast-follow after amd64 v1.0flannel host-gw backendpost-v1.0 community candidateArgo Rollouts / Flagger integrationdocument divergence (ADR-0055)

Permanently out of scope

Multi-cluster / cross-cluster migrationseparate future product line (ADR-0034)Windows containers / nodesout of scopeCold-mode as a product posturelive-only; cold is the proven foundation, not a shipped mode (ADR-0076)

Community

Join the project

Apache-2.0, vendor-neutral, CNCF-trackPodMotion is not a CNCF project. ‘CNCF-track’ describes our intended trajectory only — we are targeting CNCF sandbox submission, gated on ≥2 maintainers from ≥2 organizations. See GOVERNANCE.md.. Built in the open.

Current status:v0.1.0-alphasingle-clusteramd64 and arm64one maintainerpre-CNCF-sandbox
Path to v1.0 readiness~25% · M22 of M30

See the full roadmap for milestone gate conditions and the validation footprint.

Governance commitment

We will not file for CNCF sandbox until there are ≥2 maintainers from ≥2 organizations. PodMotion is a vendor-neutral project with a documented maintainer-diversity plan.

Read GOVERNANCE.md

Support is community-based via GitHub Discussions and Issues; commercial support is not currently available.

Explicitly out of scope

Windows nodesout of scopeGPU workloadsdeferredMulti-cluster migrationseparate future product line

* Not yet a CNCF project — CNCF sandbox submission planned (target Q3 2026 · M-GOV-3), gated on ≥2 maintainers from ≥2 organizations.

Open Source · Apache-2.0

About the project

PodMotion was born from a gap in the Kubernetes ecosystem. When a pod crashes or gets evicted, it takes its state with it — its memory, its open files, and every TCP connection its clients were holding. The scheduler starts a new replica somewhere else, and those connections get a RST. Every retry, every reconnect, every client-side timeout is a consequence of that assumption being baked into the platform.

This project started as a proof of concept: could CRIU and an eBPF sequence-translation relay actually move a running process across nodes without breaking a single open socket? The answer — proved on live PostgreSQL under pgbench load with zero failed transactions — is yes. seq_delta=0. No RST. No reconnect. No application changes required.

What started as a proof of concept is now a production path: a 12-phase state machine, five CRDs, a scheduler plugin, a CNI plugin, and a kubectl verb. Built in the open. Apache-2.0.

Chad N. Ingle

Chad N. Ingle

Principal DevOps Architect, Effectual

Chad N. Ingle is the founder of OAN Ministries and the sole maintainer of PodMotion. By day he is a Principal DevOps Architect at Effectual, an AWS Premier Tier Services Partner specializing in enterprise digital transformation — from migration to modernization in the cloud. PodMotion grew out of his hands-on platform engineering work — a tool he wanted to exist and decided to build.