We have a bit of a larger workload, and we’ve settled on a 3-tier system: (1)“system”, (2)“shared” and (3)“application”. 1&2 are environment (and AWS-account) centric, 3 is application centric. We can ignore system for now, shared is where we provision things like EKS clusters as we classify them as shared features that applications depend on. Another shared ‘feature’ is state. We don’t allow state persistence in EKS by default, state has to go into S3, RDS, MSK etc. That way the application lifecycle gets much more robust. We also assume an EKS cluster might be lost or compromised, and as such must be replacable without impacting state. Therefore, we put EKS clusters in their own AWS account, and data in a separate AWS account. That does mean that any cluster that uses ISRA will need to have an Identity Provider entry for the cluster CA in that data environment.
So what we ended up doing to enable this, is storing a record of every active cluster in an s3 bucket (pretty easy with terraform to get some map into JSON format). This allows you go store things like the cluster FQDN, Name, Environment, CA fingerprint etc. in a way that can be easily retrieved later. Then we have two modules that are used with this data: a cluster record module that pulls in all those files, turns them back into a local terraform map which you can then filter in a for loop to only get the clusters you want (i.e. only ‘development’ or ‘production’). You end up with a list of data about those clusters which you then use with a second module: identitiy provider. You give it a list of the client IDs and thumbprints you want it to support, which you get an up-to-date copy of from the first module. It ensures that you can create identity providers anywhere you want, with support for any source cluster you want.
Operationally, that means you can give any AWS account an identity provider that works with IRSA with the clusters of your choice (since IRSA uses the AWS general STS endpoints anyway). A third module is used for setting up Roles that allow you to give it an AssumeRoleWithWebIdentity trust policy. We made it usable on an application level where you provide it with at least the name of your application and the cluster it runs in (and it will use those parameters to generate the subject for the Kubernetes SA and use it to read cluster records to verify the IDs for the client exists). It then generates an AWS IAM Role that can be assumed with the web identity, based on the Identity Provider that was pre-provisioned with the module in that shared stage.
The only thing you need to ensure to make this work in a stable fashion is to have something like a workflow engine or renovate bot to automatically trigger a terraform plan (and apply if needed) when a cluster is added or removed so identity providers and roles are updated if the infrastructure changes. But that’s something you would need to do anyway if you ever add or remove a cluster, even if it is only once a year.