Introduction
Financial services require proxy infrastructure that can handle thousands of external connections while maintaining strict IP-level access controls. I built a system where each port on the NLB could have its own whitelist of allowed source IPs, all driven by a simple JSON configuration.
The challenge was creating infrastructure that:
- Supports 100+ ports with individual IP whitelists
- Routes intelligently to different proxy clusters based on port ranges
- Works across multiple environments (dev, staging, prod)
- Integrates with on-premises via Transit Gateway
- Manages state safely with S3/DynamoDB backend
Architecture Overview
flowchart TB
subgraph External["External Traffic"]
CLIENT1[Partner A<br/>203.0.113.10]
CLIENT2[Partner B<br/>198.51.100.20]
CLIENT3[Partner C<br/>192.0.2.30]
end
subgraph AWSCloud["AWS Cloud"]
subgraph PublicSubnets["Public Subnets"]
EXT_NLB[External NLB<br/>Ports 10000-20000]
end
subgraph PrivateSubnets["Private Subnets"]
INT_NLB[Internal NLB<br/>Ports 3128, 8080]
subgraph ProxyCluster1["Proxy Cluster A - Ports 10000-15000"]
PROXY_A1[Squid Proxy 1]
PROXY_A2[Squid Proxy 2]
end
subgraph ProxyCluster2["Proxy Cluster B - Ports 15001-20000"]
PROXY_B1[HAProxy 1]
PROXY_B2[HAProxy 2]
end
end
subgraph SecurityGroups["Security Layer"]
SG_EXT[External SG<br/>Per-port IP whitelist]
SG_INT[Internal SG<br/>VPC + On-prem CIDR]
end
end
subgraph OnPrem["On-Premises"]
INTERNAL_APPS[Internal Applications]
TGW[Transit Gateway]
end
CLIENT1 -->|Port 10001| EXT_NLB
CLIENT2 -->|Port 12500| EXT_NLB
CLIENT3 -->|Port 16000| EXT_NLB
EXT_NLB --> SG_EXT
SG_EXT --> ProxyCluster1
SG_EXT --> ProxyCluster2
INTERNAL_APPS --> TGW
TGW --> INT_NLB
INT_NLB --> SG_INT
SG_INT --> ProxyCluster1
style External fill:#6c757d,stroke:#fff,stroke-width:2px,color:#fff
style PublicSubnets fill:#e63946,stroke:#fff,stroke-width:2px,color:#fff
style PrivateSubnets fill:#264653,stroke:#2a9d8f,stroke-width:2px,color:#fff
style SecurityGroups fill:#1a1a2e,stroke:#f77f00,stroke-width:2px,color:#fff
style OnPrem fill:#3a3a5c,stroke:#fff,stroke-width:2px,color:#fff
Port Whitelist Configuration
The key innovation is a JSON-driven configuration that maps each port to its allowed source IPs:
{
"port_whitelists": {
"10001": {
"description": "Partner A - Payment Gateway",
"allowed_ips": ["203.0.113.10/32", "203.0.113.11/32"],
"target_group": "proxy-cluster-a",
"protocol": "TCP"
},
"10002": {
"description": "Partner A - Reconciliation",
"allowed_ips": ["203.0.113.10/32"],
"target_group": "proxy-cluster-a",
"protocol": "TCP"
},
"12500": {
"description": "Partner B - Transaction API",
"allowed_ips": ["198.51.100.0/24"],
"target_group": "proxy-cluster-a",
"protocol": "TCP"
},
"16000": {
"description": "Partner C - Batch Processing",
"allowed_ips": ["192.0.2.0/28", "192.0.2.64/28"],
"target_group": "proxy-cluster-b",
"protocol": "TCP"
}
},
"port_ranges": {
"10000-15000": {
"default_target": "proxy-cluster-a",
"health_check_port": 8080
},
"15001-20000": {
"default_target": "proxy-cluster-b",
"health_check_port": 8081
}
}
}Terraform Project Structure
nlb-proxy-infrastructure/
├── modules/
│ ├── nlb/
│ │ ├── main.tf
│ │ ├── listeners.tf
│ │ ├── target-groups.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ ├── security-groups/
│ │ ├── main.tf
│ │ ├── per-port-rules.tf
│ │ └── variables.tf
│ └── proxy-cluster/
│ ├── main.tf
│ ├── asg.tf
│ └── variables.tf
├── environments/
│ ├── dev/
│ │ ├── main.tf
│ │ ├── port-config.json
│ │ └── terraform.tfvars
│ ├── staging/
│ └── prod/
├── config/
│ └── port-whitelists/
│ ├── dev.json
│ ├── staging.json
│ └── prod.json
└── backend.tfNLB Module
# modules/nlb/main.tf
resource "aws_lb" "external" {
name = "${var.environment}-external-nlb"
internal = false
load_balancer_type = "network"
subnets = var.public_subnet_ids
enable_cross_zone_load_balancing = true
enable_deletion_protection = var.environment == "prod"
tags = merge(var.tags, {
Name = "${var.environment}-external-nlb"
Type = "external"
})
}
resource "aws_lb" "internal" {
name = "${var.environment}-internal-nlb"
internal = true
load_balancer_type = "network"
subnets = var.private_subnet_ids
enable_cross_zone_load_balancing = true
enable_deletion_protection = var.environment == "prod"
tags = merge(var.tags, {
Name = "${var.environment}-internal-nlb"
Type = "internal"
})
}
# Dynamic listeners based on port configuration
resource "aws_lb_listener" "external" {
for_each = var.port_config.port_whitelists
load_balancer_arn = aws_lb.external.arn
port = each.key
protocol = each.value.protocol
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.proxy[each.value.target_group].arn
}
}
# Target groups for each proxy cluster
resource "aws_lb_target_group" "proxy" {
for_each = toset(distinct([for k, v in var.port_config.port_whitelists : v.target_group]))
name = "${var.environment}-${each.key}"
port = var.proxy_port
protocol = "TCP"
vpc_id = var.vpc_id
target_type = "instance"
health_check {
enabled = true
protocol = "TCP"
port = var.port_config.port_ranges[each.key].health_check_port
healthy_threshold = 2
unhealthy_threshold = 2
interval = 10
}
stickiness {
enabled = true
type = "source_ip"
}
tags = merge(var.tags, {
Name = "${var.environment}-${each.key}-tg"
})
}Per-Port Security Group Rules
flowchart TD
subgraph SecurityGroupGeneration["Security Group Rule Generation"]
JSON[port-config.json] --> PARSE[Parse JSON in Terraform]
PARSE --> LOOP[For each port in config]
LOOP --> RULE1["Rule: Port 10001<br/>Allow: 203.0.113.10/32, 203.0.113.11/32"]
LOOP --> RULE2["Rule: Port 10002<br/>Allow: 203.0.113.10/32"]
LOOP --> RULE3["Rule: Port 12500<br/>Allow: 198.51.100.0/24"]
LOOP --> RULEN["Rule: Port N<br/>Allow: ..."]
end
subgraph AppliedRules["Applied to Security Group"]
SG[NLB Security Group]
RULE1 --> SG
RULE2 --> SG
RULE3 --> SG
RULEN --> SG
end
style SecurityGroupGeneration fill:#1a1a2e,stroke:#00d9ff,stroke-width:2px,color:#fff
style AppliedRules fill:#264653,stroke:#2a9d8f,stroke-width:2px,color:#fff
# modules/security-groups/per-port-rules.tf
locals {
# Flatten the port whitelist into individual rules
port_rules = flatten([
for port, config in var.port_config.port_whitelists : [
for cidr in config.allowed_ips : {
port = port
cidr = cidr
description = config.description
}
]
])
}
resource "aws_security_group" "nlb_external" {
name_prefix = "${var.environment}-nlb-external-"
vpc_id = var.vpc_id
description = "Security group for external NLB with per-port IP whitelists"
tags = merge(var.tags, {
Name = "${var.environment}-nlb-external-sg"
})
lifecycle {
create_before_destroy = true
}
}
# Generate ingress rules dynamically from JSON config
resource "aws_security_group_rule" "port_whitelist" {
for_each = {
for idx, rule in local.port_rules :
"${rule.port}-${replace(rule.cidr, "/", "-")}" => rule
}
type = "ingress"
from_port = tonumber(each.value.port)
to_port = tonumber(each.value.port)
protocol = "tcp"
cidr_blocks = [each.value.cidr]
security_group_id = aws_security_group.nlb_external.id
description = each.value.description
}
# Egress to proxy clusters
resource "aws_security_group_rule" "to_proxy_cluster_a" {
type = "egress"
from_port = 0
to_port = 65535
protocol = "tcp"
source_security_group_id = var.proxy_cluster_a_sg_id
security_group_id = aws_security_group.nlb_external.id
description = "Traffic to Proxy Cluster A"
}
resource "aws_security_group_rule" "to_proxy_cluster_b" {
type = "egress"
from_port = 0
to_port = 65535
protocol = "tcp"
source_security_group_id = var.proxy_cluster_b_sg_id
security_group_id = aws_security_group.nlb_external.id
description = "Traffic to Proxy Cluster B"
}Internal NLB for On-Premises Traffic
# modules/nlb/internal.tf
resource "aws_lb_listener" "internal_proxy" {
load_balancer_arn = aws_lb.internal.arn
port = 3128
protocol = "TCP"
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.proxy["proxy-cluster-a"].arn
}
}
resource "aws_lb_listener" "internal_http" {
load_balancer_arn = aws_lb.internal.arn
port = 8080
protocol = "TCP"
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.proxy["proxy-cluster-a"].arn
}
}
# Security group for internal NLB
resource "aws_security_group" "nlb_internal" {
name_prefix = "${var.environment}-nlb-internal-"
vpc_id = var.vpc_id
description = "Security group for internal NLB"
# Allow from VPC
ingress {
from_port = 3128
to_port = 3128
protocol = "tcp"
cidr_blocks = [var.vpc_cidr]
description = "Proxy from VPC"
}
ingress {
from_port = 8080
to_port = 8080
protocol = "tcp"
cidr_blocks = [var.vpc_cidr]
description = "HTTP Proxy from VPC"
}
# Allow from on-premises via Transit Gateway
ingress {
from_port = 3128
to_port = 3128
protocol = "tcp"
cidr_blocks = var.onprem_cidrs
description = "Proxy from on-premises"
}
ingress {
from_port = 8080
to_port = 8080
protocol = "tcp"
cidr_blocks = var.onprem_cidrs
description = "HTTP Proxy from on-premises"
}
tags = merge(var.tags, {
Name = "${var.environment}-nlb-internal-sg"
})
}Proxy Cluster Auto Scaling Group
# modules/proxy-cluster/asg.tf
resource "aws_launch_template" "proxy" {
name_prefix = "${var.environment}-${var.cluster_name}-"
image_id = var.ami_id
instance_type = var.instance_type
network_interfaces {
associate_public_ip_address = false
security_groups = [aws_security_group.proxy.id]
}
block_device_mappings {
device_name = "/dev/xvda"
ebs {
volume_size = 50
volume_type = "gp3"
encrypted = true
delete_on_termination = true
}
}
iam_instance_profile {
name = aws_iam_instance_profile.proxy.name
}
metadata_options {
http_endpoint = "enabled"
http_tokens = "required"
http_put_response_hop_limit = 1
}
user_data = base64encode(templatefile("${path.module}/userdata.sh.tpl", {
cluster_name = var.cluster_name
squid_conf = var.squid_config
cloudwatch_conf = var.cloudwatch_config
}))
tag_specifications {
resource_type = "instance"
tags = merge(var.tags, {
Name = "${var.environment}-${var.cluster_name}"
})
}
}
resource "aws_autoscaling_group" "proxy" {
name = "${var.environment}-${var.cluster_name}-asg"
desired_capacity = var.desired_capacity
max_size = var.max_size
min_size = var.min_size
vpc_zone_identifier = var.private_subnet_ids
target_group_arns = var.target_group_arns
health_check_type = "ELB"
launch_template {
id = aws_launch_template.proxy.id
version = "$Latest"
}
instance_refresh {
strategy = "Rolling"
preferences {
min_healthy_percentage = 75
}
}
tag {
key = "Name"
value = "${var.environment}-${var.cluster_name}"
propagate_at_launch = true
}
lifecycle {
ignore_changes = [desired_capacity]
}
}
# Scaling policies
resource "aws_autoscaling_policy" "scale_up" {
name = "${var.environment}-${var.cluster_name}-scale-up"
scaling_adjustment = 2
adjustment_type = "ChangeInCapacity"
cooldown = 300
autoscaling_group_name = aws_autoscaling_group.proxy.name
}
resource "aws_autoscaling_policy" "scale_down" {
name = "${var.environment}-${var.cluster_name}-scale-down"
scaling_adjustment = -1
adjustment_type = "ChangeInCapacity"
cooldown = 300
autoscaling_group_name = aws_autoscaling_group.proxy.name
}
resource "aws_cloudwatch_metric_alarm" "high_cpu" {
alarm_name = "${var.environment}-${var.cluster_name}-high-cpu"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 2
metric_name = "CPUUtilization"
namespace = "AWS/EC2"
period = 120
statistic = "Average"
threshold = 70
alarm_actions = [aws_autoscaling_policy.scale_up.arn]
dimensions = {
AutoScalingGroupName = aws_autoscaling_group.proxy.name
}
}Traffic Flow with Port Routing
sequenceDiagram
participant Partner as Partner System
participant NLB as External NLB
participant SG as Security Group
participant Proxy as Proxy Cluster
participant Target as Target System
Partner->>NLB: TCP Connect (Port 10001)
NLB->>SG: Check source IP
alt IP in whitelist for port 10001
SG->>NLB: Allow
NLB->>Proxy: Forward to proxy-cluster-a
Proxy->>Target: Proxy request
Target-->>Proxy: Response
Proxy-->>NLB: Response
NLB-->>Partner: Response
else IP not in whitelist
SG-->>NLB: Deny
NLB-->>Partner: Connection refused
end
State Management
# backend.tf
terraform {
backend "s3" {
bucket = "company-terraform-state"
key = "nlb-proxy/${var.environment}/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-state-lock"
# Assume role for cross-account state access
role_arn = "arn:aws:iam::SHARED_SERVICES_ACCOUNT:role/TerraformStateAccess"
}
}
# State locking table
resource "aws_dynamodb_table" "terraform_locks" {
name = "terraform-state-lock"
billing_mode = "PAY_PER_REQUEST"
hash_key = "LockID"
attribute {
name = "LockID"
type = "S"
}
tags = {
Name = "terraform-state-lock"
}
}Environment Configuration
# environments/prod/main.tf
locals {
environment = "prod"
port_config = jsondecode(file("${path.module}/port-config.json"))
}
module "vpc" {
source = "../../modules/vpc"
environment = local.environment
vpc_cidr = "10.100.0.0/16"
public_subnet_cidrs = ["10.100.1.0/24", "10.100.2.0/24", "10.100.3.0/24"]
private_subnet_cidrs = ["10.100.10.0/24", "10.100.11.0/24", "10.100.12.0/24"]
enable_transit_gateway = true
transit_gateway_id = var.transit_gateway_id
}
module "security_groups" {
source = "../../modules/security-groups"
environment = local.environment
vpc_id = module.vpc.vpc_id
vpc_cidr = module.vpc.vpc_cidr
port_config = local.port_config
onprem_cidrs = ["172.16.0.0/12", "192.168.0.0/16"]
proxy_cluster_a_sg_id = module.proxy_cluster_a.security_group_id
proxy_cluster_b_sg_id = module.proxy_cluster_b.security_group_id
}
module "nlb" {
source = "../../modules/nlb"
environment = local.environment
vpc_id = module.vpc.vpc_id
public_subnet_ids = module.vpc.public_subnet_ids
private_subnet_ids = module.vpc.private_subnet_ids
port_config = local.port_config
proxy_port = 3128
security_group_ids = {
external = module.security_groups.nlb_external_sg_id
internal = module.security_groups.nlb_internal_sg_id
}
tags = {
Environment = local.environment
Project = "proxy-infrastructure"
}
}
module "proxy_cluster_a" {
source = "../../modules/proxy-cluster"
environment = local.environment
cluster_name = "proxy-cluster-a"
vpc_id = module.vpc.vpc_id
private_subnet_ids = module.vpc.private_subnet_ids
instance_type = "c6i.xlarge"
desired_capacity = 3
min_size = 2
max_size = 10
target_group_arns = [
module.nlb.target_group_arns["proxy-cluster-a"]
]
}
module "proxy_cluster_b" {
source = "../../modules/proxy-cluster"
environment = local.environment
cluster_name = "proxy-cluster-b"
vpc_id = module.vpc.vpc_id
private_subnet_ids = module.vpc.private_subnet_ids
instance_type = "c6i.xlarge"
desired_capacity = 2
min_size = 2
max_size = 6
target_group_arns = [
module.nlb.target_group_arns["proxy-cluster-b"]
]
}Adding New Ports
flowchart TD
subgraph Process["Adding New Port Whitelist"]
A[Partner requests new port] --> B[Update port-config.json]
B --> C[Create PR]
C --> D[Review changes]
D --> E[Terraform plan]
E --> F{Changes look correct?}
F -->|Yes| G[Merge PR]
F -->|No| H[Fix config]
H --> B
G --> I[Terraform apply]
I --> J[New listener created]
I --> K[Security group rule added]
J & K --> L[Port ready for traffic]
end
style Process fill:#1a1a2e,stroke:#00d9ff,stroke-width:2px,color:#fff
Example PR to add a new port:
{
"port_whitelists": {
"10001": { ... },
"10002": { ... },
+ "10003": {
+ "description": "Partner A - New Settlement API",
+ "allowed_ips": ["203.0.113.10/32", "203.0.113.12/32"],
+ "target_group": "proxy-cluster-a",
+ "protocol": "TCP"
+ }
}
}Monitoring and Alerting
# monitoring.tf
resource "aws_cloudwatch_dashboard" "proxy" {
dashboard_name = "${var.environment}-proxy-dashboard"
dashboard_body = jsonencode({
widgets = [
{
type = "metric"
x = 0
y = 0
width = 12
height = 6
properties = {
title = "NLB Active Connections"
region = var.region
metrics = [
["AWS/NetworkELB", "ActiveFlowCount", "LoadBalancer", aws_lb.external.arn_suffix]
]
}
},
{
type = "metric"
x = 12
y = 0
width = 12
height = 6
properties = {
title = "NLB New Connections"
region = var.region
metrics = [
["AWS/NetworkELB", "NewFlowCount", "LoadBalancer", aws_lb.external.arn_suffix]
]
}
},
{
type = "metric"
x = 0
y = 6
width = 12
height = 6
properties = {
title = "Healthy Hosts per Target Group"
region = var.region
metrics = [
for tg_name, tg in aws_lb_target_group.proxy :
["AWS/NetworkELB", "HealthyHostCount", "TargetGroup", tg.arn_suffix, "LoadBalancer", aws_lb.external.arn_suffix]
]
}
}
]
})
}
# Alert on unhealthy targets
resource "aws_cloudwatch_metric_alarm" "unhealthy_hosts" {
for_each = aws_lb_target_group.proxy
alarm_name = "${var.environment}-${each.key}-unhealthy"
comparison_operator = "LessThanThreshold"
evaluation_periods = 2
metric_name = "HealthyHostCount"
namespace = "AWS/NetworkELB"
period = 60
statistic = "Minimum"
threshold = 2
alarm_description = "Less than 2 healthy hosts in ${each.key}"
dimensions = {
TargetGroup = each.value.arn_suffix
LoadBalancer = aws_lb.external.arn_suffix
}
alarm_actions = [var.sns_topic_arn]
ok_actions = [var.sns_topic_arn]
}Best Practices
| Practice | Why |
|---|---|
| Use JSON for port config | Easy to review in PRs, version controlled |
| Separate target groups | Different proxy clusters for different use cases |
| Enable cross-zone LB | Better distribution, higher availability |
| Use source IP stickiness | Consistent routing for stateful connections |
| Monitor per-port metrics | Identify issues with specific partners |
| Document each port | Know who owns what |
Troubleshooting
"Connection timeout on specific port"
# Check if listener exists
aws elbv2 describe-listeners --load-balancer-arn <nlb-arn> | jq '.Listeners[] | select(.Port == 10001)'
# Check security group rules
aws ec2 describe-security-group-rules --filters "Name=group-id,Values=<sg-id>" | jq '.SecurityGroupRules[] | select(.FromPort == 10001)'
# Check target health
aws elbv2 describe-target-health --target-group-arn <tg-arn>"Partner IP not whitelisted"
- Verify IP is in port-config.json
- Run
terraform planto see if rule would be created - Check for CIDR notation errors (missing /32)
"Proxy cluster scaling issues"
- Check ASG desired vs actual capacity
- Review CloudWatch metrics for CPU/memory
- Verify target group health check settings
Conclusion
This architecture provides enterprise-grade proxy infrastructure that scales to hundreds of ports while maintaining strict per-port access controls. The JSON-driven configuration makes it easy for teams to request new ports through PRs, with full audit trail and review process.
The combination of:
- NLB for high-performance TCP load balancing
- Dynamic security groups for per-port whitelisting
- Auto Scaling Groups for resilient proxy clusters
- Transit Gateway for hybrid connectivity
Creates a system that handles financial services traffic securely and reliably.