Introduction
GitOps promises infrastructure managed through Git - every change tracked, reviewed, and auditable. But implementing GitOps for AWS infrastructure isn't straightforward. I built a hybrid approach that uses:
- Terraform: Foundational infrastructure (VPCs, EKS clusters, IAM)
- ArgoCD: Application deployments and Kubernetes resources
- Crossplane: AWS resources as Kubernetes CRDs (databases, S3, etc.)
This combination gives us the best of each tool while maintaining a single source of truth in Git.
Architecture Overview
flowchart TB
subgraph GitRepos["Git Repositories"]
INFRA_REPO[Infrastructure Repo<br/>Terraform + Crossplane]
APP_REPO[Application Repo<br/>Helm Charts + Kustomize]
CONFIG_REPO[Config Repo<br/>Environment Values]
end
subgraph CICD["CI/CD Pipeline"]
GHA[GitHub Actions]
TF_PLAN[Terraform Plan]
TF_APPLY[Terraform Apply]
end
subgraph EKSCluster["EKS Cluster"]
subgraph ArgoCD["ArgoCD"]
ARGO_SERVER[ArgoCD Server]
APP_CONTROLLER[Application Controller]
REPO_SERVER[Repo Server]
end
subgraph Crossplane["Crossplane"]
XP_CONTROLLER[Crossplane Controller]
AWS_PROVIDER[AWS Provider]
end
subgraph Apps["Applications"]
APP1[App 1]
APP2[App 2]
APP3[App 3]
end
end
subgraph AWSResources["AWS Resources"]
subgraph Foundation["Terraform Managed"]
VPC[VPC]
EKS[EKS Cluster]
IAM[IAM Roles]
end
subgraph Dynamic["Crossplane Managed"]
RDS[(RDS Databases)]
S3[(S3 Buckets)]
SQS[SQS Queues]
SNS[SNS Topics]
end
end
INFRA_REPO --> GHA
GHA --> TF_PLAN --> TF_APPLY
TF_APPLY --> Foundation
APP_REPO --> ARGO_SERVER
CONFIG_REPO --> ARGO_SERVER
ARGO_SERVER --> Apps
INFRA_REPO --> ARGO_SERVER
ARGO_SERVER --> XP_CONTROLLER
XP_CONTROLLER --> AWS_PROVIDER
AWS_PROVIDER --> Dynamic
style GitRepos fill:#1a1a2e,stroke:#00d9ff,stroke-width:2px,color:#fff
style ArgoCD fill:#264653,stroke:#f77f00,stroke-width:2px,color:#fff
style Crossplane fill:#264653,stroke:#9b5de5,stroke-width:2px,color:#fff
style Foundation fill:#2a9d8f,stroke:#fff,stroke-width:2px,color:#fff
style Dynamic fill:#e63946,stroke:#fff,stroke-width:2px,color:#fff
When to Use What
flowchart TD
START[Need to provision<br/>AWS resource?] --> Q1{Foundational<br/>infrastructure?}
Q1 -->|Yes| TF[Terraform]
Q1 -->|No| Q2{Application-specific<br/>resource?}
Q2 -->|Yes| Q3{Lifecycle tied to<br/>Kubernetes workload?}
Q2 -->|No| TF
Q3 -->|Yes| XP[Crossplane]
Q3 -->|No| Q4{Needs K8s-native<br/>management?}
Q4 -->|Yes| XP
Q4 -->|No| TF
TF --> TF_USE["Use Cases:<br/>- VPCs, Subnets<br/>- EKS Clusters<br/>- IAM Roles/Policies<br/>- Transit Gateway<br/>- KMS Keys"]
XP --> XP_USE["Use Cases:<br/>- App databases<br/>- S3 for app data<br/>- SQS/SNS per service<br/>- Secrets per namespace<br/>- Dynamic resources"]
style START fill:#ff6b6b,stroke:#fff,stroke-width:2px,color:#fff
style TF fill:#2a9d8f,stroke:#fff,stroke-width:2px,color:#fff
style XP fill:#9b5de5,stroke:#fff,stroke-width:2px,color:#fff
Repository Structure
platform-infrastructure/
├── terraform/
│ ├── modules/
│ │ ├── vpc/
│ │ ├── eks/
│ │ ├── iam/
│ │ └── networking/
│ ├── environments/
│ │ ├── dev/
│ │ ├── staging/
│ │ └── prod/
│ └── backend.tf
├── crossplane/
│ ├── providers/
│ │ └── aws-provider.yaml
│ ├── compositions/
│ │ ├── database/
│ │ ├── storage/
│ │ └── messaging/
│ └── claims/
│ ├── dev/
│ ├── staging/
│ └── prod/
├── argocd/
│ ├── bootstrap/
│ │ └── argocd-install.yaml
│ ├── projects/
│ │ ├── infrastructure.yaml
│ │ └── applications.yaml
│ ├── applicationsets/
│ │ ├── apps.yaml
│ │ └── crossplane-resources.yaml
│ └── apps/
│ └── root-app.yaml
└── .github/
└── workflows/
├── terraform-plan.yaml
├── terraform-apply.yaml
└── crossplane-validate.yamlArgoCD Installation
# argocd/main.tf
resource "helm_release" "argocd" {
name = "argocd"
repository = "https://argoproj.github.io/argo-helm"
chart = "argo-cd"
version = "5.51.6"
namespace = "argocd"
create_namespace = true
values = [
yamlencode({
global = {
domain = "argocd.${var.domain}"
}
configs = {
params = {
"server.insecure" = true # TLS at ALB
}
repositories = {
"platform-infrastructure" = {
url = "git@github.com:company/platform-infrastructure.git"
name = "platform-infrastructure"
type = "git"
sshPrivateKeySecret = {
name = "repo-ssh-key"
key = "sshPrivateKey"
}
}
"application-configs" = {
url = "git@github.com:company/application-configs.git"
name = "application-configs"
type = "git"
sshPrivateKeySecret = {
name = "repo-ssh-key"
key = "sshPrivateKey"
}
}
}
cm = {
"resource.customizations.health.argoproj.io_Application" = <<-EOF
hs = {}
hs.status = "Progressing"
hs.message = ""
if obj.status ~= nil then
if obj.status.health ~= nil then
hs.status = obj.status.health.status
if obj.status.health.message ~= nil then
hs.message = obj.status.health.message
end
end
end
return hs
EOF
# Crossplane health checks
"resource.customizations.health.database.aws.crossplane.io_RDSInstance" = <<-EOF
hs = {}
if obj.status ~= nil then
if obj.status.conditions ~= nil then
for i, condition in ipairs(obj.status.conditions) do
if condition.type == "Ready" and condition.status == "True" then
hs.status = "Healthy"
hs.message = "RDS instance is ready"
return hs
end
end
end
end
hs.status = "Progressing"
hs.message = "Waiting for RDS instance"
return hs
EOF
}
}
server = {
replicas = 2
ingress = {
enabled = true
ingressClassName = "alb"
annotations = {
"alb.ingress.kubernetes.io/scheme" = "internal"
"alb.ingress.kubernetes.io/target-type" = "ip"
"alb.ingress.kubernetes.io/listen-ports" = "[{\"HTTPS\":443}]"
"alb.ingress.kubernetes.io/certificate-arn" = var.certificate_arn
}
hosts = ["argocd.${var.domain}"]
}
rbacConfig = {
"policy.csv" = <<-EOF
p, role:platform-admin, applications, *, */*, allow
p, role:platform-admin, clusters, *, *, allow
p, role:platform-admin, repositories, *, *, allow
p, role:platform-admin, projects, *, *, allow
p, role:developer, applications, get, */*, allow
p, role:developer, applications, sync, */*, allow
p, role:developer, logs, get, */*, allow
g, platform-admins, role:platform-admin
g, developers, role:developer
EOF
}
}
controller = {
replicas = 2
metrics = {
enabled = true
serviceMonitor = {
enabled = true
}
}
}
repoServer = {
replicas = 2
}
applicationSet = {
enabled = true
replicas = 2
}
notifications = {
enabled = true
notifiers = {
"service.slack" = {
token = "$slack-token"
}
}
templates = {
"template.app-deployed" = <<-EOF
message: |
Application {{.app.metadata.name}} has been deployed.
Sync Status: {{.app.status.sync.status}}
Health Status: {{.app.status.health.status}}
EOF
"template.app-sync-failed" = <<-EOF
message: |
Application {{.app.metadata.name}} sync failed.
Error: {{.app.status.operationState.message}}
EOF
}
triggers = {
"trigger.on-deployed" = <<-EOF
- when: app.status.operationState.phase in ['Succeeded']
send: [app-deployed]
EOF
"trigger.on-sync-failed" = <<-EOF
- when: app.status.operationState.phase in ['Error', 'Failed']
send: [app-sync-failed]
EOF
}
}
})
]
}App of Apps Pattern
# argocd/apps/root-app.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: root
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: default
source:
repoURL: git@github.com:company/platform-infrastructure.git
targetRevision: main
path: argocd/apps
destination:
server: https://kubernetes.default.svc
namespace: argocd
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=trueApplicationSet for Multi-Environment
# argocd/applicationsets/apps.yaml
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: applications
namespace: argocd
spec:
generators:
- matrix:
generators:
# Environment generator
- list:
elements:
- env: dev
cluster: https://kubernetes.default.svc
values_file: values-dev.yaml
- env: staging
cluster: https://kubernetes.default.svc
values_file: values-staging.yaml
- env: prod
cluster: https://kubernetes.default.svc
values_file: values-prod.yaml
# Application generator from Git
- git:
repoURL: git@github.com:company/application-configs.git
revision: main
directories:
- path: apps/*
template:
metadata:
name: '{{path.basename}}-{{env}}'
namespace: argocd
labels:
app: '{{path.basename}}'
env: '{{env}}'
spec:
project: applications
source:
repoURL: git@github.com:company/application-configs.git
targetRevision: main
path: '{{path}}'
helm:
valueFiles:
- '{{values_file}}'
destination:
server: '{{cluster}}'
namespace: '{{path.basename}}-{{env}}'
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
retry:
limit: 5
backoff:
duration: 5s
factor: 2
maxDuration: 3mCrossplane Setup
# crossplane/main.tf
resource "helm_release" "crossplane" {
name = "crossplane"
repository = "https://charts.crossplane.io/stable"
chart = "crossplane"
version = "1.14.5"
namespace = "crossplane-system"
create_namespace = true
values = [
yamlencode({
provider = {
packages = [] # Installed via ArgoCD
}
resourcesCrossplane = {
requests = {
cpu = "100m"
memory = "256Mi"
}
limits = {
cpu = "500m"
memory = "512Mi"
}
}
})
]
}AWS Provider Configuration
# crossplane/providers/aws-provider.yaml
apiVersion: pkg.crossplane.io/v1
kind: Provider
metadata:
name: provider-aws
spec:
package: xpkg.upbound.io/upbound/provider-family-aws:v1.1.0
controllerConfigRef:
name: aws-config
---
apiVersion: pkg.crossplane.io/v1alpha1
kind: ControllerConfig
metadata:
name: aws-config
spec:
podSecurityContext:
fsGroup: 2000
serviceAccountName: crossplane-provider-aws
args:
- --debug
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: crossplane-provider-aws
namespace: crossplane-system
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT_ID:role/CrossplaneProviderAWS
---
apiVersion: aws.upbound.io/v1beta1
kind: ProviderConfig
metadata:
name: default
spec:
credentials:
source: IRSACrossplane Compositions
flowchart TD
subgraph Composition["Crossplane Composition"]
XRD[CompositeResourceDefinition<br/>XPostgreSQLInstance]
COMP[Composition<br/>aws-postgresql]
end
subgraph ManagedResources["Managed Resources"]
RDS[RDSInstance]
SUBNET_GROUP[DBSubnetGroup]
PARAM_GROUP[DBParameterGroup]
SG[SecurityGroup]
SECRET[Kubernetes Secret]
end
subgraph Claim["Application Claim"]
CLAIM[PostgreSQLInstance<br/>my-app-db]
end
CLAIM --> XRD
XRD --> COMP
COMP --> RDS
COMP --> SUBNET_GROUP
COMP --> PARAM_GROUP
COMP --> SG
COMP --> SECRET
style Composition fill:#9b5de5,stroke:#fff,stroke-width:2px,color:#fff
style ManagedResources fill:#264653,stroke:#2a9d8f,stroke-width:2px,color:#fff
style Claim fill:#1a1a2e,stroke:#f77f00,stroke-width:2px,color:#fff
PostgreSQL Composition
# crossplane/compositions/database/postgresql-composition.yaml
apiVersion: apiextensions.crossplane.io/v1
kind: CompositeResourceDefinition
metadata:
name: xpostgresqlinstances.database.company.io
spec:
group: database.company.io
names:
kind: XPostgreSQLInstance
plural: xpostgresqlinstances
claimNames:
kind: PostgreSQLInstance
plural: postgresqlinstances
versions:
- name: v1alpha1
served: true
referenceable: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties:
parameters:
type: object
properties:
size:
type: string
enum: ["small", "medium", "large"]
default: "small"
version:
type: string
default: "15.4"
storageGB:
type: integer
default: 20
required:
- size
status:
type: object
properties:
endpoint:
type: string
port:
type: integer
secretName:
type: string
---
apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
name: postgresql-aws
labels:
provider: aws
database: postgresql
spec:
compositeTypeRef:
apiVersion: database.company.io/v1alpha1
kind: XPostgreSQLInstance
patchSets:
- name: common-tags
patches:
- type: FromCompositeFieldPath
fromFieldPath: metadata.labels
toFieldPath: spec.forProvider.tags
policy:
mergeOptions:
keepMapValues: true
resources:
# Security Group
- name: security-group
base:
apiVersion: ec2.aws.upbound.io/v1beta1
kind: SecurityGroup
spec:
forProvider:
region: us-east-1
vpcId: vpc-xxxxxxxx
description: "PostgreSQL security group"
providerConfigRef:
name: default
patches:
- type: FromCompositeFieldPath
fromFieldPath: metadata.name
toFieldPath: spec.forProvider.name
transforms:
- type: string
string:
fmt: "%s-postgresql-sg"
# Security Group Rule
- name: security-group-rule
base:
apiVersion: ec2.aws.upbound.io/v1beta1
kind: SecurityGroupRule
spec:
forProvider:
region: us-east-1
type: ingress
fromPort: 5432
toPort: 5432
protocol: tcp
cidrBlocks:
- "10.0.0.0/8"
providerConfigRef:
name: default
patches:
- type: FromCompositeFieldPath
fromFieldPath: metadata.name
toFieldPath: spec.forProvider.securityGroupIdSelector.matchLabels.crossplane.io/claim-name
# DB Subnet Group
- name: db-subnet-group
base:
apiVersion: rds.aws.upbound.io/v1beta1
kind: SubnetGroup
spec:
forProvider:
region: us-east-1
subnetIds:
- subnet-xxxxxxxx
- subnet-yyyyyyyy
- subnet-zzzzzzzz
description: "PostgreSQL subnet group"
providerConfigRef:
name: default
patches:
- type: FromCompositeFieldPath
fromFieldPath: metadata.name
toFieldPath: metadata.name
transforms:
- type: string
string:
fmt: "%s-subnet-group"
# RDS Instance
- name: rds-instance
base:
apiVersion: rds.aws.upbound.io/v1beta1
kind: Instance
spec:
forProvider:
region: us-east-1
engine: postgres
publiclyAccessible: false
storageEncrypted: true
storageType: gp3
autoMinorVersionUpgrade: true
backupRetentionPeriod: 7
deletionProtection: true
skipFinalSnapshot: false
autoGeneratePassword: true
passwordSecretRef:
namespace: crossplane-system
key: password
providerConfigRef:
name: default
writeConnectionSecretToRef:
namespace: crossplane-system
patches:
- type: FromCompositeFieldPath
fromFieldPath: metadata.name
toFieldPath: spec.forProvider.identifier
- type: FromCompositeFieldPath
fromFieldPath: spec.parameters.version
toFieldPath: spec.forProvider.engineVersion
- type: FromCompositeFieldPath
fromFieldPath: spec.parameters.storageGB
toFieldPath: spec.forProvider.allocatedStorage
# Size mapping
- type: FromCompositeFieldPath
fromFieldPath: spec.parameters.size
toFieldPath: spec.forProvider.instanceClass
transforms:
- type: map
map:
small: db.t3.micro
medium: db.t3.small
large: db.t3.medium
# Connection secret
- type: FromCompositeFieldPath
fromFieldPath: metadata.name
toFieldPath: spec.writeConnectionSecretToRef.name
transforms:
- type: string
string:
fmt: "%s-connection"
# Status patches
- type: ToCompositeFieldPath
fromFieldPath: status.atProvider.endpoint
toFieldPath: status.endpoint
- type: ToCompositeFieldPath
fromFieldPath: status.atProvider.port
toFieldPath: status.port
connectionDetails:
- name: endpoint
fromFieldPath: status.atProvider.endpoint
- name: port
fromFieldPath: status.atProvider.port
- name: username
fromFieldPath: spec.forProvider.username
- name: password
fromConnectionSecretKey: passwordUsing Crossplane Claims
# crossplane/claims/prod/my-app-database.yaml
apiVersion: database.company.io/v1alpha1
kind: PostgreSQLInstance
metadata:
name: my-app-db
namespace: my-app
spec:
parameters:
size: medium
version: "15.4"
storageGB: 50
compositionSelector:
matchLabels:
provider: aws
database: postgresql
writeConnectionSecretToRef:
name: my-app-db-credentialsGitOps Workflow
sequenceDiagram
participant Dev as Developer
participant Git as GitHub
participant GHA as GitHub Actions
participant Argo as ArgoCD
participant K8s as Kubernetes
participant AWS as AWS
Dev->>Git: Push changes (PR)
Git->>GHA: Trigger workflow
alt Terraform changes
GHA->>GHA: terraform plan
GHA->>Git: Post plan as PR comment
Dev->>Git: Approve & merge
GHA->>AWS: terraform apply
end
alt Application/Crossplane changes
Git->>Argo: Webhook notification
Argo->>Git: Sync manifests
Argo->>K8s: Apply changes
alt Crossplane resource
K8s->>AWS: Provision resource
AWS-->>K8s: Resource ready
end
Argo-->>Dev: Sync notification
end
Terraform GitHub Actions
# .github/workflows/terraform-plan.yaml
name: Terraform Plan
on:
pull_request:
paths:
- 'terraform/**'
jobs:
plan:
runs-on: ubuntu-latest
permissions:
contents: read
pull-requests: write
id-token: write
strategy:
matrix:
environment: [dev, staging, prod]
steps:
- uses: actions/checkout@v4
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::${{ secrets.AWS_ACCOUNT_ID }}:role/GitHubActionsRole
aws-region: us-east-1
- name: Setup Terraform
uses: hashicorp/setup-terraform@v3
with:
terraform_version: 1.6.0
- name: Terraform Init
working-directory: terraform/environments/${{ matrix.environment }}
run: terraform init
- name: Terraform Plan
id: plan
working-directory: terraform/environments/${{ matrix.environment }}
run: |
terraform plan -no-color -out=tfplan 2>&1 | tee plan.txt
- name: Post Plan to PR
uses: actions/github-script@v7
with:
script: |
const fs = require('fs');
const plan = fs.readFileSync('terraform/environments/${{ matrix.environment }}/plan.txt', 'utf8');
const output = `### Terraform Plan - ${{ matrix.environment }}
<details>
<summary>Show Plan</summary>
\`\`\`hcl
${plan.substring(0, 65000)}
\`\`\`
</details>
`;
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: output
});Drift Detection
# argocd/apps/drift-detection.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: drift-detector
namespace: argocd
spec:
project: infrastructure
source:
repoURL: git@github.com:company/platform-infrastructure.git
targetRevision: main
path: crossplane/claims/prod
destination:
server: https://kubernetes.default.svc
namespace: crossplane-system
syncPolicy:
# Don't auto-sync - just detect drift
automated:
prune: false
selfHeal: false
syncOptions:
- Validate=true
- CreateNamespace=falseDrift Alert via Notifications
# argocd notification trigger for drift
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-notifications-cm
namespace: argocd
data:
trigger.on-out-of-sync: |
- when: app.status.sync.status == 'OutOfSync'
send: [drift-detected]
template.drift-detected: |
message: |
Infrastructure drift detected!
Application: {{.app.metadata.name}}
Namespace: {{.app.spec.destination.namespace}}
Resources out of sync:
{{range .app.status.resources}}
- {{.kind}}/{{.name}}: {{.status}}
{{end}}
Review and sync: {{.context.argocdUrl}}/applications/{{.app.metadata.name}}Best Practices
| Practice | Why |
|---|---|
| Terraform for foundations | State management, complex dependencies |
| Crossplane for app resources | Lifecycle tied to K8s, self-service |
| App of Apps pattern | Single entry point, easy management |
| PR-based workflows | Review, audit trail |
| Drift detection | Catch manual changes |
| Separate repos by concern | Clear ownership |
Troubleshooting
"ArgoCD sync stuck"
# Check application status
kubectl get app -n argocd my-app -o yaml
# Force refresh
argocd app get my-app --refresh
# Check repo server logs
kubectl logs -n argocd -l app.kubernetes.io/name=argocd-repo-server"Crossplane resource not provisioning"
# Check managed resource status
kubectl get managed -A
# Describe the claim
kubectl describe postgresqlinstance my-app-db -n my-app
# Check Crossplane provider logs
kubectl logs -n crossplane-system -l pkg.crossplane.io/provider=provider-aws"Terraform state drift"
# Import existing resource
terraform import aws_instance.example i-1234567890abcdef0
# Refresh state
terraform refreshConclusion
GitOps for infrastructure isn't about choosing one tool - it's about using the right tool for each layer:
- Terraform excels at foundational, stateful infrastructure
- ArgoCD provides declarative, auditable application delivery
- Crossplane enables self-service cloud resources with Kubernetes-native UX
The combination creates a platform where everything is tracked in Git, changes go through PR review, and drift is automatically detected. Teams get self-service capabilities while platform engineers maintain control over the underlying infrastructure.