Restore and migration
This page covers restoring from Velero backups and migrating to a new cluster. For restoring the Ziti database from a
local snapshot (without Velero), use nf-restore-snapshot — see
Backup overview.
Restoring from a Velero backup
The included restore script walks you through selecting and restoring from a Velero backup:
./velero/velero_restore.sh
The script will:
- Verify AWS credentials are available.
- Install the Velero plugin if not already present.
- Display available backups and prompt you to select one.
- Restore the selected backup.
In order for Velero to restore the Ziti controller PVC from the backup, it first needs to delete the existing PVC.
The restore script will prompt for this option. If n is selected, the restore will skip restoring the PVC but restore
all other resources. By default, Velero will skip restoring a resource if it already exists. See the
Velero restore reference documentation for more information.
Restores can also be run manually if you need to use specific Velero flags:
velero restore create --from-backup <backup-name>
Migrating to a new cluster
Migration uses the same backup and restore workflow to move a NetFoundry Self-Hosted installation from one cluster to another.
The controller's advertise address must remain the same after migration. The controller's TLS certificates are issued for this DNS name, and every Ziti client, router, and identity is configured to reach the controller at this address. If the DNS name changes, certificates will be invalid and all clients will lose connectivity.
When migrating, update your DNS records to point the same advertise address at the new cluster's Load Balancer or node IP. Do not change the advertise address itself.
Step 1: Back up the existing cluster
-
Ensure AWS credentials are loaded into the environment or saved to the credentials file.
-
Install Velero if not already present:
K3s:
velero install --provider aws --plugins velero/velero-plugin-for-aws:v1.12.2 \
--bucket <S3_BUCKET_NAME> --features=EnableRestic --default-volumes-to-fs-backup --use-node-agent \
--backup-location-config region=us-east-1 --snapshot-location-config region=us-east-1 \
--secret-file <credentials-file>EKS / multi-node:
velero install --provider aws --plugins velero/velero-plugin-for-aws:v1.12.2 \
--bucket <S3_BUCKET_NAME> --features=EnableCSI --use-volume-snapshots=true \
--backup-location-config region=us-east-1 --snapshot-location-config region=us-east-1 \
--secret-file <credentials-file> -
Back up all resources including persistent volumes:
velero backup create <backup-name> --include-cluster-resources -
Destroy the existing cluster.
Step 2: Restore to the new cluster
-
Create the new cluster.
-
Load AWS credentials and install Velero (same commands as above).
-
Run the restore script:
./velero/velero_restore.sh
- EKS: The new cluster will have new Load Balancer addresses. Update your DNS records so that the existing controller and router advertise addresses point to the new Load Balancer endpoints. Do not change the advertise addresses themselves.
- K3s: Update your DNS records so that the existing controller and router advertise addresses point to the new node's IP. The new cluster should use the same node configuration and default storage class.
Verifying the restore
Run nf-status to confirm all deployments are healthy:
nf-status
All deployments should show the expected replica count in the READY column. For more detail:
kubectl get pods -n ziti
kubectl get pods -n cert-manager
kubectl get pods -n support
Known issues after restore
Common
-
The
ziti-edge-tunneldeployment in thesupportnamespace may need to be restarted, since the tunneler can come back online before the Ziti controller is ready:kubectl rollout restart deployment ziti-edge-tunnel -n support -
If the DNS address changes for the controller or router advertise address, it may take a few minutes for client resources to reconnect. Restarting hosting routers or identities will accelerate recovery.
EKS
- Load Balancer addresses will likely change after restoring from backup. Update the DNS entries for the controller and
router advertise addresses. The
ziti-router-1deployment will not come back online until it can reach the controller over its advertise address — this is expected during a restore.
K3s
-
The
trust-managerdeployment incert-managercan fail with:Error: container has runAsNonRoot and image has non-numeric user (cnb), cannot verify user is non-rootTo fix, edit the deployment and add
runAsUser: 1000under thesecurityContextblock:kubectl edit deployment/trust-manager -n cert-managersecurityContext:
# add this
runAsUser: 1000Then restart:
kubectl rollout restart deployment trust-manager -n cert-manager -
The
elasticsearch-es-elastic-nodesstatefulset can fail to start, causing Kibana to show "Kibana server is not ready yet." To fix:kubectl rollout restart statefulset elasticsearch-es-elastic-nodes -n support
Stalled restore jobs
If the restore appears to have worked but the restore job seems hung and never completes:
kubectl delete restore -n velero <restore-name>
# If the above command hangs, cancel it and run:
kubectl rollout restart deployment velero -n velero