Hey hey
Any teleport users here?
Running on everything ourselves (i.e not Teleport Cloud), on AWS with the usual HA architecture on ASG EC2 instances.
Their upgrade process is abysmal when you are running auth-servers yourself,
their suggested process is quite manual:
- scale down the auth-server ASGs to one instance,
- upgrade the binary, let it finish any migrations,
- wait for it startup - no documented health-checks, but I've found some in their code
- scale up the Auth ASG, rinse-repeat for all auth severs
- then proceed to do the rest on all other components (which may live in different AWS accounts) in a similar fashion
The fun part is catering for the ASGs in different accounts (and to ensure that all components are actually updated - but only after the auth servers have completed)
ATM my plan is to have a CI pipeline in place which:
- runs an ansible playbook which does the scaling, upgrading and waiting on the auth servers
- builds a new packer AMI with the new version of teleport (used by all components in all the accounts) - this will ensure that the upgrade persists even after a scaling event
- deploy a TF pipeline which uses the new AMI (in every account where teleport components run)
Any better ideas?