status: building

cloud

Engineering AWS NLB Infrastructure for Financial Services Proxy Networks

Designing a multi-environment AWS NLB infrastructure for financial services using Terraform - featuring dual internal/external load balancers, JSON-driven per-port IP whitelists, intelligent port-range routing, and Transit Gateway hybrid connectivity.

Milan Dangol

Sr DevOps & DevSecOps Engineer

Jul 27, 2025

12 min read

Introduction

Financial services require proxy infrastructure that can handle thousands of external connections while maintaining strict IP-level access controls. I built a system where each port on the NLB could have its own whitelist of allowed source IPs, all driven by a simple JSON configuration.

The challenge was creating infrastructure that:

Supports 100+ ports with individual IP whitelists
Routes intelligently to different proxy clusters based on port ranges
Works across multiple environments (dev, staging, prod)
Integrates with on-premises via Transit Gateway
Manages state safely with S3/DynamoDB backend

Architecture Overview

flowchart TB subgraph External["External Traffic"] CLIENT1[Partner A 203.0.113.10] CLIENT2[Partner B 198.51.100.20] CLIENT3[Partner C 192.0.2.30] end subgraph AWSCloud["AWS Cloud"] subgraph PublicSubnets["Public Subnets"] EXT_NLB[External NLB Ports 10000-20000] end subgraph PrivateSubnets["Private Subnets"] INT_NLB[Internal NLB Ports 3128, 8080] subgraph ProxyCluster1["Proxy Cluster A - Ports 10000-15000"] PROXY_A1[Squid Proxy 1] PROXY_A2[Squid Proxy 2] end subgraph ProxyCluster2["Proxy Cluster B - Ports 15001-20000"] PROXY_B1[HAProxy 1] PROXY_B2[HAProxy 2] end end subgraph SecurityGroups["Security Layer"] SG_EXT[External SG Per-port IP whitelist] SG_INT[Internal SG VPC + On-prem CIDR] end end subgraph OnPrem["On-Premises"] INTERNAL_APPS[Internal Applications] TGW[Transit Gateway] end CLIENT1 -->|Port 10001| EXT_NLB CLIENT2 -->|Port 12500| EXT_NLB CLIENT3 -->|Port 16000| EXT_NLB EXT_NLB --> SG_EXT SG_EXT --> ProxyCluster1 SG_EXT --> ProxyCluster2 INTERNAL_APPS --> TGW TGW --> INT_NLB INT_NLB --> SG_INT SG_INT --> ProxyCluster1 style External fill:#6c757d,stroke:#fff,stroke-width:2px,color:#fff style PublicSubnets fill:#e63946,stroke:#fff,stroke-width:2px,color:#fff style PrivateSubnets fill:#264653,stroke:#2a9d8f,stroke-width:2px,color:#fff style SecurityGroups fill:#1a1a2e,stroke:#f77f00,stroke-width:2px,color:#fff style OnPrem fill:#3a3a5c,stroke:#fff,stroke-width:2px,color:#fff

Port Whitelist Configuration

The key innovation is a JSON-driven configuration that maps each port to its allowed source IPs:

{
  "port_whitelists": {
    "10001": {
      "description": "Partner A - Payment Gateway",
      "allowed_ips": ["203.0.113.10/32", "203.0.113.11/32"],
      "target_group": "proxy-cluster-a",
      "protocol": "TCP"
    },
    "10002": {
      "description": "Partner A - Reconciliation",
      "allowed_ips": ["203.0.113.10/32"],
      "target_group": "proxy-cluster-a",
      "protocol": "TCP"
    },
    "12500": {
      "description": "Partner B - Transaction API",
      "allowed_ips": ["198.51.100.0/24"],
      "target_group": "proxy-cluster-a",
      "protocol": "TCP"
    },
    "16000": {
      "description": "Partner C - Batch Processing",
      "allowed_ips": ["192.0.2.0/28", "192.0.2.64/28"],
      "target_group": "proxy-cluster-b",
      "protocol": "TCP"
    }
  },
  "port_ranges": {
    "10000-15000": {
      "default_target": "proxy-cluster-a",
      "health_check_port": 8080
    },
    "15001-20000": {
      "default_target": "proxy-cluster-b", 
      "health_check_port": 8081
    }
  }
}

Terraform Project Structure

nlb-proxy-infrastructure/
├── modules/
│   ├── nlb/
│   │   ├── main.tf
│   │   ├── listeners.tf
│   │   ├── target-groups.tf
│   │   ├── variables.tf
│   │   └── outputs.tf
│   ├── security-groups/
│   │   ├── main.tf
│   │   ├── per-port-rules.tf
│   │   └── variables.tf
│   └── proxy-cluster/
│       ├── main.tf
│       ├── asg.tf
│       └── variables.tf
├── environments/
│   ├── dev/
│   │   ├── main.tf
│   │   ├── port-config.json
│   │   └── terraform.tfvars
│   ├── staging/
│   └── prod/
├── config/
│   └── port-whitelists/
│       ├── dev.json
│       ├── staging.json
│       └── prod.json
└── backend.tf

NLB Module

# modules/nlb/main.tf

resource "aws_lb" "external" {
  name               = "${var.environment}-external-nlb"
  internal           = false
  load_balancer_type = "network"
  subnets            = var.public_subnet_ids

  enable_cross_zone_load_balancing = true
  enable_deletion_protection       = var.environment == "prod"

  tags = merge(var.tags, {
    Name = "${var.environment}-external-nlb"
    Type = "external"
  })
}

resource "aws_lb" "internal" {
  name               = "${var.environment}-internal-nlb"
  internal           = true
  load_balancer_type = "network"
  subnets            = var.private_subnet_ids

  enable_cross_zone_load_balancing = true
  enable_deletion_protection       = var.environment == "prod"

  tags = merge(var.tags, {
    Name = "${var.environment}-internal-nlb"
    Type = "internal"
  })
}

# Dynamic listeners based on port configuration
resource "aws_lb_listener" "external" {
  for_each = var.port_config.port_whitelists

  load_balancer_arn = aws_lb.external.arn
  port              = each.key
  protocol          = each.value.protocol

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.proxy[each.value.target_group].arn
  }
}

# Target groups for each proxy cluster
resource "aws_lb_target_group" "proxy" {
  for_each = toset(distinct([for k, v in var.port_config.port_whitelists : v.target_group]))

  name        = "${var.environment}-${each.key}"
  port        = var.proxy_port
  protocol    = "TCP"
  vpc_id      = var.vpc_id
  target_type = "instance"

  health_check {
    enabled             = true
    protocol            = "TCP"
    port                = var.port_config.port_ranges[each.key].health_check_port
    healthy_threshold   = 2
    unhealthy_threshold = 2
    interval            = 10
  }

  stickiness {
    enabled = true
    type    = "source_ip"
  }

  tags = merge(var.tags, {
    Name = "${var.environment}-${each.key}-tg"
  })
}

Per-Port Security Group Rules

flowchart TD subgraph SecurityGroupGeneration["Security Group Rule Generation"] JSON[port-config.json] --> PARSE[Parse JSON in Terraform] PARSE --> LOOP[For each port in config] LOOP --> RULE1["Rule: Port 10001 Allow: 203.0.113.10/32, 203.0.113.11/32"] LOOP --> RULE2["Rule: Port 10002 Allow: 203.0.113.10/32"] LOOP --> RULE3["Rule: Port 12500 Allow: 198.51.100.0/24"] LOOP --> RULEN["Rule: Port N Allow: ..."] end subgraph AppliedRules["Applied to Security Group"] SG[NLB Security Group] RULE1 --> SG RULE2 --> SG RULE3 --> SG RULEN --> SG end style SecurityGroupGeneration fill:#1a1a2e,stroke:#00d9ff,stroke-width:2px,color:#fff style AppliedRules fill:#264653,stroke:#2a9d8f,stroke-width:2px,color:#fff

# modules/security-groups/per-port-rules.tf

locals {
  # Flatten the port whitelist into individual rules
  port_rules = flatten([
    for port, config in var.port_config.port_whitelists : [
      for cidr in config.allowed_ips : {
        port        = port
        cidr        = cidr
        description = config.description
      }
    ]
  ])
}

resource "aws_security_group" "nlb_external" {
  name_prefix = "${var.environment}-nlb-external-"
  vpc_id      = var.vpc_id
  description = "Security group for external NLB with per-port IP whitelists"

  tags = merge(var.tags, {
    Name = "${var.environment}-nlb-external-sg"
  })

  lifecycle {
    create_before_destroy = true
  }
}

# Generate ingress rules dynamically from JSON config
resource "aws_security_group_rule" "port_whitelist" {
  for_each = {
    for idx, rule in local.port_rules : 
    "${rule.port}-${replace(rule.cidr, "/", "-")}" => rule
  }

  type              = "ingress"
  from_port         = tonumber(each.value.port)
  to_port           = tonumber(each.value.port)
  protocol          = "tcp"
  cidr_blocks       = [each.value.cidr]
  security_group_id = aws_security_group.nlb_external.id
  description       = each.value.description
}

# Egress to proxy clusters
resource "aws_security_group_rule" "to_proxy_cluster_a" {
  type                     = "egress"
  from_port                = 0
  to_port                  = 65535
  protocol                 = "tcp"
  source_security_group_id = var.proxy_cluster_a_sg_id
  security_group_id        = aws_security_group.nlb_external.id
  description              = "Traffic to Proxy Cluster A"
}

resource "aws_security_group_rule" "to_proxy_cluster_b" {
  type                     = "egress"
  from_port                = 0
  to_port                  = 65535
  protocol                 = "tcp"
  source_security_group_id = var.proxy_cluster_b_sg_id
  security_group_id        = aws_security_group.nlb_external.id
  description              = "Traffic to Proxy Cluster B"
}

Internal NLB for On-Premises Traffic

# modules/nlb/internal.tf

resource "aws_lb_listener" "internal_proxy" {
  load_balancer_arn = aws_lb.internal.arn
  port              = 3128
  protocol          = "TCP"

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.proxy["proxy-cluster-a"].arn
  }
}

resource "aws_lb_listener" "internal_http" {
  load_balancer_arn = aws_lb.internal.arn
  port              = 8080
  protocol          = "TCP"

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.proxy["proxy-cluster-a"].arn
  }
}

# Security group for internal NLB
resource "aws_security_group" "nlb_internal" {
  name_prefix = "${var.environment}-nlb-internal-"
  vpc_id      = var.vpc_id
  description = "Security group for internal NLB"

  # Allow from VPC
  ingress {
    from_port   = 3128
    to_port     = 3128
    protocol    = "tcp"
    cidr_blocks = [var.vpc_cidr]
    description = "Proxy from VPC"
  }

  ingress {
    from_port   = 8080
    to_port     = 8080
    protocol    = "tcp"
    cidr_blocks = [var.vpc_cidr]
    description = "HTTP Proxy from VPC"
  }

  # Allow from on-premises via Transit Gateway
  ingress {
    from_port   = 3128
    to_port     = 3128
    protocol    = "tcp"
    cidr_blocks = var.onprem_cidrs
    description = "Proxy from on-premises"
  }

  ingress {
    from_port   = 8080
    to_port     = 8080
    protocol    = "tcp"
    cidr_blocks = var.onprem_cidrs
    description = "HTTP Proxy from on-premises"
  }

  tags = merge(var.tags, {
    Name = "${var.environment}-nlb-internal-sg"
  })
}

Proxy Cluster Auto Scaling Group

# modules/proxy-cluster/asg.tf

resource "aws_launch_template" "proxy" {
  name_prefix   = "${var.environment}-${var.cluster_name}-"
  image_id      = var.ami_id
  instance_type = var.instance_type

  network_interfaces {
    associate_public_ip_address = false
    security_groups             = [aws_security_group.proxy.id]
  }

  block_device_mappings {
    device_name = "/dev/xvda"
    ebs {
      volume_size           = 50
      volume_type           = "gp3"
      encrypted             = true
      delete_on_termination = true
    }
  }

  iam_instance_profile {
    name = aws_iam_instance_profile.proxy.name
  }

  metadata_options {
    http_endpoint               = "enabled"
    http_tokens                 = "required"
    http_put_response_hop_limit = 1
  }

  user_data = base64encode(templatefile("${path.module}/userdata.sh.tpl", {
    cluster_name    = var.cluster_name
    squid_conf      = var.squid_config
    cloudwatch_conf = var.cloudwatch_config
  }))

  tag_specifications {
    resource_type = "instance"
    tags = merge(var.tags, {
      Name = "${var.environment}-${var.cluster_name}"
    })
  }
}

resource "aws_autoscaling_group" "proxy" {
  name                = "${var.environment}-${var.cluster_name}-asg"
  desired_capacity    = var.desired_capacity
  max_size            = var.max_size
  min_size            = var.min_size
  vpc_zone_identifier = var.private_subnet_ids
  target_group_arns   = var.target_group_arns
  health_check_type   = "ELB"

  launch_template {
    id      = aws_launch_template.proxy.id
    version = "$Latest"
  }

  instance_refresh {
    strategy = "Rolling"
    preferences {
      min_healthy_percentage = 75
    }
  }

  tag {
    key                 = "Name"
    value               = "${var.environment}-${var.cluster_name}"
    propagate_at_launch = true
  }

  lifecycle {
    ignore_changes = [desired_capacity]
  }
}

# Scaling policies
resource "aws_autoscaling_policy" "scale_up" {
  name                   = "${var.environment}-${var.cluster_name}-scale-up"
  scaling_adjustment     = 2
  adjustment_type        = "ChangeInCapacity"
  cooldown               = 300
  autoscaling_group_name = aws_autoscaling_group.proxy.name
}

resource "aws_autoscaling_policy" "scale_down" {
  name                   = "${var.environment}-${var.cluster_name}-scale-down"
  scaling_adjustment     = -1
  adjustment_type        = "ChangeInCapacity"
  cooldown               = 300
  autoscaling_group_name = aws_autoscaling_group.proxy.name
}

resource "aws_cloudwatch_metric_alarm" "high_cpu" {
  alarm_name          = "${var.environment}-${var.cluster_name}-high-cpu"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "CPUUtilization"
  namespace           = "AWS/EC2"
  period              = 120
  statistic           = "Average"
  threshold           = 70
  alarm_actions       = [aws_autoscaling_policy.scale_up.arn]

  dimensions = {
    AutoScalingGroupName = aws_autoscaling_group.proxy.name
  }
}

Traffic Flow with Port Routing

sequenceDiagram participant Partner as Partner System participant NLB as External NLB participant SG as Security Group participant Proxy as Proxy Cluster participant Target as Target System Partner->>NLB: TCP Connect (Port 10001) NLB->>SG: Check source IP alt IP in whitelist for port 10001 SG->>NLB: Allow NLB->>Proxy: Forward to proxy-cluster-a Proxy->>Target: Proxy request Target-->>Proxy: Response Proxy-->>NLB: Response NLB-->>Partner: Response else IP not in whitelist SG-->>NLB: Deny NLB-->>Partner: Connection refused end

State Management

# backend.tf

terraform {
  backend "s3" {
    bucket         = "company-terraform-state"
    key            = "nlb-proxy/${var.environment}/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-state-lock"
    
    # Assume role for cross-account state access
    role_arn = "arn:aws:iam::SHARED_SERVICES_ACCOUNT:role/TerraformStateAccess"
  }
}

# State locking table
resource "aws_dynamodb_table" "terraform_locks" {
  name         = "terraform-state-lock"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }

  tags = {
    Name = "terraform-state-lock"
  }
}

Environment Configuration

# environments/prod/main.tf

locals {
  environment = "prod"
  port_config = jsondecode(file("${path.module}/port-config.json"))
}

module "vpc" {
  source = "../../modules/vpc"

  environment = local.environment
  vpc_cidr    = "10.100.0.0/16"
  
  public_subnet_cidrs  = ["10.100.1.0/24", "10.100.2.0/24", "10.100.3.0/24"]
  private_subnet_cidrs = ["10.100.10.0/24", "10.100.11.0/24", "10.100.12.0/24"]
  
  enable_transit_gateway = true
  transit_gateway_id     = var.transit_gateway_id
}

module "security_groups" {
  source = "../../modules/security-groups"

  environment         = local.environment
  vpc_id              = module.vpc.vpc_id
  vpc_cidr            = module.vpc.vpc_cidr
  port_config         = local.port_config
  onprem_cidrs        = ["172.16.0.0/12", "192.168.0.0/16"]
  proxy_cluster_a_sg_id = module.proxy_cluster_a.security_group_id
  proxy_cluster_b_sg_id = module.proxy_cluster_b.security_group_id
}

module "nlb" {
  source = "../../modules/nlb"

  environment        = local.environment
  vpc_id             = module.vpc.vpc_id
  public_subnet_ids  = module.vpc.public_subnet_ids
  private_subnet_ids = module.vpc.private_subnet_ids
  port_config        = local.port_config
  proxy_port         = 3128

  security_group_ids = {
    external = module.security_groups.nlb_external_sg_id
    internal = module.security_groups.nlb_internal_sg_id
  }

  tags = {
    Environment = local.environment
    Project     = "proxy-infrastructure"
  }
}

module "proxy_cluster_a" {
  source = "../../modules/proxy-cluster"

  environment       = local.environment
  cluster_name      = "proxy-cluster-a"
  vpc_id            = module.vpc.vpc_id
  private_subnet_ids = module.vpc.private_subnet_ids
  
  instance_type    = "c6i.xlarge"
  desired_capacity = 3
  min_size         = 2
  max_size         = 10
  
  target_group_arns = [
    module.nlb.target_group_arns["proxy-cluster-a"]
  ]
}

module "proxy_cluster_b" {
  source = "../../modules/proxy-cluster"

  environment       = local.environment
  cluster_name      = "proxy-cluster-b"
  vpc_id            = module.vpc.vpc_id
  private_subnet_ids = module.vpc.private_subnet_ids
  
  instance_type    = "c6i.xlarge"
  desired_capacity = 2
  min_size         = 2
  max_size         = 6
  
  target_group_arns = [
    module.nlb.target_group_arns["proxy-cluster-b"]
  ]
}

Adding New Ports

flowchart TD subgraph Process["Adding New Port Whitelist"] A[Partner requests new port] --> B[Update port-config.json] B --> C[Create PR] C --> D[Review changes] D --> E[Terraform plan] E --> F{Changes look correct?} F -->|Yes| G[Merge PR] F -->|No| H[Fix config] H --> B G --> I[Terraform apply] I --> J[New listener created] I --> K[Security group rule added] J & K --> L[Port ready for traffic] end style Process fill:#1a1a2e,stroke:#00d9ff,stroke-width:2px,color:#fff

Example PR to add a new port:

{
  "port_whitelists": {
    "10001": { ... },
    "10002": { ... },
+   "10003": {
+     "description": "Partner A - New Settlement API",
+     "allowed_ips": ["203.0.113.10/32", "203.0.113.12/32"],
+     "target_group": "proxy-cluster-a",
+     "protocol": "TCP"
+   }
  }
}

Monitoring and Alerting

# monitoring.tf

resource "aws_cloudwatch_dashboard" "proxy" {
  dashboard_name = "${var.environment}-proxy-dashboard"

  dashboard_body = jsonencode({
    widgets = [
      {
        type   = "metric"
        x      = 0
        y      = 0
        width  = 12
        height = 6
        properties = {
          title  = "NLB Active Connections"
          region = var.region
          metrics = [
            ["AWS/NetworkELB", "ActiveFlowCount", "LoadBalancer", aws_lb.external.arn_suffix]
          ]
        }
      },
      {
        type   = "metric"
        x      = 12
        y      = 0
        width  = 12
        height = 6
        properties = {
          title  = "NLB New Connections"
          region = var.region
          metrics = [
            ["AWS/NetworkELB", "NewFlowCount", "LoadBalancer", aws_lb.external.arn_suffix]
          ]
        }
      },
      {
        type   = "metric"
        x      = 0
        y      = 6
        width  = 12
        height = 6
        properties = {
          title  = "Healthy Hosts per Target Group"
          region = var.region
          metrics = [
            for tg_name, tg in aws_lb_target_group.proxy : 
            ["AWS/NetworkELB", "HealthyHostCount", "TargetGroup", tg.arn_suffix, "LoadBalancer", aws_lb.external.arn_suffix]
          ]
        }
      }
    ]
  })
}

# Alert on unhealthy targets
resource "aws_cloudwatch_metric_alarm" "unhealthy_hosts" {
  for_each = aws_lb_target_group.proxy

  alarm_name          = "${var.environment}-${each.key}-unhealthy"
  comparison_operator = "LessThanThreshold"
  evaluation_periods  = 2
  metric_name         = "HealthyHostCount"
  namespace           = "AWS/NetworkELB"
  period              = 60
  statistic           = "Minimum"
  threshold           = 2
  alarm_description   = "Less than 2 healthy hosts in ${each.key}"
  
  dimensions = {
    TargetGroup  = each.value.arn_suffix
    LoadBalancer = aws_lb.external.arn_suffix
  }

  alarm_actions = [var.sns_topic_arn]
  ok_actions    = [var.sns_topic_arn]
}

Best Practices

Practice	Why
Use JSON for port config	Easy to review in PRs, version controlled
Separate target groups	Different proxy clusters for different use cases
Enable cross-zone LB	Better distribution, higher availability
Use source IP stickiness	Consistent routing for stateful connections
Monitor per-port metrics	Identify issues with specific partners
Document each port	Know who owns what

Troubleshooting

"Connection timeout on specific port"

# Check if listener exists
aws elbv2 describe-listeners --load-balancer-arn <nlb-arn> | jq '.Listeners[] | select(.Port == 10001)'

# Check security group rules
aws ec2 describe-security-group-rules --filters "Name=group-id,Values=<sg-id>" | jq '.SecurityGroupRules[] | select(.FromPort == 10001)'

# Check target health
aws elbv2 describe-target-health --target-group-arn <tg-arn>

"Partner IP not whitelisted"

Verify IP is in port-config.json
Run terraform plan to see if rule would be created
Check for CIDR notation errors (missing /32)

"Proxy cluster scaling issues"

Check ASG desired vs actual capacity
Review CloudWatch metrics for CPU/memory
Verify target group health check settings

Conclusion

This architecture provides enterprise-grade proxy infrastructure that scales to hundreds of ports while maintaining strict per-port access controls. The JSON-driven configuration makes it easy for teams to request new ports through PRs, with full audit trail and review process.

The combination of:

NLB for high-performance TCP load balancing
Dynamic security groups for per-port whitelisting
Auto Scaling Groups for resilient proxy clusters
Transit Gateway for hybrid connectivity

Creates a system that handles financial services traffic securely and reliably.

system-design13 min read

Payment Processing System at Scale: Stripe/Adyen Integration with AWS EventBridge, Lambda, and DynamoDB

Building a payment processing system handling millions of daily transactions - featuring EventBridge for event-driven orchestration, Lambda for serverless processing, DynamoDB for transaction state, idempotency guarantees, and real-time fraud detection with Kinesis.

system-design12 min read

AI Chatbot System Architecture: WhatsApp Business API, Facebook Messenger, and AWS Bedrock Integration

Designing a multi-channel AI chatbot system handling 5M+ conversations monthly - featuring AWS Bedrock for conversational AI, SQS for message queuing, DynamoDB for conversation state, and Lambda for serverless processing across WhatsApp and Facebook Messenger.

cloud9 min read

Multi-Region AWS Infrastructure for Resilience: A Terraform Deep Dive

Learn how to architect highly available, multi-region AWS infrastructure using Terraform, Transit Gateway, Network Load Balancers, and intelligent routing strategies for enterprise-grade applications.

back to blog

cloud

Engineering AWS NLB Infrastructure for Financial Services Proxy Networks

Milan Dangol

Sr DevOps & DevSecOps Engineer

Jul 27, 2025

12 min read

Introduction

The challenge was creating infrastructure that:

Supports 100+ ports with individual IP whitelists
Routes intelligently to different proxy clusters based on port ranges
Works across multiple environments (dev, staging, prod)
Integrates with on-premises via Transit Gateway
Manages state safely with S3/DynamoDB backend

Architecture Overview

Port Whitelist Configuration

The key innovation is a JSON-driven configuration that maps each port to its allowed source IPs:

{
  "port_whitelists": {
    "10001": {
      "description": "Partner A - Payment Gateway",
      "allowed_ips": ["203.0.113.10/32", "203.0.113.11/32"],
      "target_group": "proxy-cluster-a",
      "protocol": "TCP"
    },
    "10002": {
      "description": "Partner A - Reconciliation",
      "allowed_ips": ["203.0.113.10/32"],
      "target_group": "proxy-cluster-a",
      "protocol": "TCP"
    },
    "12500": {
      "description": "Partner B - Transaction API",
      "allowed_ips": ["198.51.100.0/24"],
      "target_group": "proxy-cluster-a",
      "protocol": "TCP"
    },
    "16000": {
      "description": "Partner C - Batch Processing",
      "allowed_ips": ["192.0.2.0/28", "192.0.2.64/28"],
      "target_group": "proxy-cluster-b",
      "protocol": "TCP"
    }
  },
  "port_ranges": {
    "10000-15000": {
      "default_target": "proxy-cluster-a",
      "health_check_port": 8080
    },
    "15001-20000": {
      "default_target": "proxy-cluster-b", 
      "health_check_port": 8081
    }
  }
}

Terraform Project Structure

nlb-proxy-infrastructure/
├── modules/
│   ├── nlb/
│   │   ├── main.tf
│   │   ├── listeners.tf
│   │   ├── target-groups.tf
│   │   ├── variables.tf
│   │   └── outputs.tf
│   ├── security-groups/
│   │   ├── main.tf
│   │   ├── per-port-rules.tf
│   │   └── variables.tf
│   └── proxy-cluster/
│       ├── main.tf
│       ├── asg.tf
│       └── variables.tf
├── environments/
│   ├── dev/
│   │   ├── main.tf
│   │   ├── port-config.json
│   │   └── terraform.tfvars
│   ├── staging/
│   └── prod/
├── config/
│   └── port-whitelists/
│       ├── dev.json
│       ├── staging.json
│       └── prod.json
└── backend.tf

NLB Module

# modules/nlb/main.tf

resource "aws_lb" "external" {
  name               = "${var.environment}-external-nlb"
  internal           = false
  load_balancer_type = "network"
  subnets            = var.public_subnet_ids

  enable_cross_zone_load_balancing = true
  enable_deletion_protection       = var.environment == "prod"

  tags = merge(var.tags, {
    Name = "${var.environment}-external-nlb"
    Type = "external"
  })
}

resource "aws_lb" "internal" {
  name               = "${var.environment}-internal-nlb"
  internal           = true
  load_balancer_type = "network"
  subnets            = var.private_subnet_ids

  enable_cross_zone_load_balancing = true
  enable_deletion_protection       = var.environment == "prod"

  tags = merge(var.tags, {
    Name = "${var.environment}-internal-nlb"
    Type = "internal"
  })
}

# Dynamic listeners based on port configuration
resource "aws_lb_listener" "external" {
  for_each = var.port_config.port_whitelists

  load_balancer_arn = aws_lb.external.arn
  port              = each.key
  protocol          = each.value.protocol

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.proxy[each.value.target_group].arn
  }
}

# Target groups for each proxy cluster
resource "aws_lb_target_group" "proxy" {
  for_each = toset(distinct([for k, v in var.port_config.port_whitelists : v.target_group]))

  name        = "${var.environment}-${each.key}"
  port        = var.proxy_port
  protocol    = "TCP"
  vpc_id      = var.vpc_id
  target_type = "instance"

  health_check {
    enabled             = true
    protocol            = "TCP"
    port                = var.port_config.port_ranges[each.key].health_check_port
    healthy_threshold   = 2
    unhealthy_threshold = 2
    interval            = 10
  }

  stickiness {
    enabled = true
    type    = "source_ip"
  }

  tags = merge(var.tags, {
    Name = "${var.environment}-${each.key}-tg"
  })
}

Per-Port Security Group Rules

# modules/security-groups/per-port-rules.tf

locals {
  # Flatten the port whitelist into individual rules
  port_rules = flatten([
    for port, config in var.port_config.port_whitelists : [
      for cidr in config.allowed_ips : {
        port        = port
        cidr        = cidr
        description = config.description
      }
    ]
  ])
}

resource "aws_security_group" "nlb_external" {
  name_prefix = "${var.environment}-nlb-external-"
  vpc_id      = var.vpc_id
  description = "Security group for external NLB with per-port IP whitelists"

  tags = merge(var.tags, {
    Name = "${var.environment}-nlb-external-sg"
  })

  lifecycle {
    create_before_destroy = true
  }
}

# Generate ingress rules dynamically from JSON config
resource "aws_security_group_rule" "port_whitelist" {
  for_each = {
    for idx, rule in local.port_rules : 
    "${rule.port}-${replace(rule.cidr, "/", "-")}" => rule
  }

  type              = "ingress"
  from_port         = tonumber(each.value.port)
  to_port           = tonumber(each.value.port)
  protocol          = "tcp"
  cidr_blocks       = [each.value.cidr]
  security_group_id = aws_security_group.nlb_external.id
  description       = each.value.description
}

# Egress to proxy clusters
resource "aws_security_group_rule" "to_proxy_cluster_a" {
  type                     = "egress"
  from_port                = 0
  to_port                  = 65535
  protocol                 = "tcp"
  source_security_group_id = var.proxy_cluster_a_sg_id
  security_group_id        = aws_security_group.nlb_external.id
  description              = "Traffic to Proxy Cluster A"
}

resource "aws_security_group_rule" "to_proxy_cluster_b" {
  type                     = "egress"
  from_port                = 0
  to_port                  = 65535
  protocol                 = "tcp"
  source_security_group_id = var.proxy_cluster_b_sg_id
  security_group_id        = aws_security_group.nlb_external.id
  description              = "Traffic to Proxy Cluster B"
}

Internal NLB for On-Premises Traffic

# modules/nlb/internal.tf

resource "aws_lb_listener" "internal_proxy" {
  load_balancer_arn = aws_lb.internal.arn
  port              = 3128
  protocol          = "TCP"

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.proxy["proxy-cluster-a"].arn
  }
}

resource "aws_lb_listener" "internal_http" {
  load_balancer_arn = aws_lb.internal.arn
  port              = 8080
  protocol          = "TCP"

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.proxy["proxy-cluster-a"].arn
  }
}

# Security group for internal NLB
resource "aws_security_group" "nlb_internal" {
  name_prefix = "${var.environment}-nlb-internal-"
  vpc_id      = var.vpc_id
  description = "Security group for internal NLB"

  # Allow from VPC
  ingress {
    from_port   = 3128
    to_port     = 3128
    protocol    = "tcp"
    cidr_blocks = [var.vpc_cidr]
    description = "Proxy from VPC"
  }

  ingress {
    from_port   = 8080
    to_port     = 8080
    protocol    = "tcp"
    cidr_blocks = [var.vpc_cidr]
    description = "HTTP Proxy from VPC"
  }

  # Allow from on-premises via Transit Gateway
  ingress {
    from_port   = 3128
    to_port     = 3128
    protocol    = "tcp"
    cidr_blocks = var.onprem_cidrs
    description = "Proxy from on-premises"
  }

  ingress {
    from_port   = 8080
    to_port     = 8080
    protocol    = "tcp"
    cidr_blocks = var.onprem_cidrs
    description = "HTTP Proxy from on-premises"
  }

  tags = merge(var.tags, {
    Name = "${var.environment}-nlb-internal-sg"
  })
}

Proxy Cluster Auto Scaling Group

# modules/proxy-cluster/asg.tf

resource "aws_launch_template" "proxy" {
  name_prefix   = "${var.environment}-${var.cluster_name}-"
  image_id      = var.ami_id
  instance_type = var.instance_type

  network_interfaces {
    associate_public_ip_address = false
    security_groups             = [aws_security_group.proxy.id]
  }

  block_device_mappings {
    device_name = "/dev/xvda"
    ebs {
      volume_size           = 50
      volume_type           = "gp3"
      encrypted             = true
      delete_on_termination = true
    }
  }

  iam_instance_profile {
    name = aws_iam_instance_profile.proxy.name
  }

  metadata_options {
    http_endpoint               = "enabled"
    http_tokens                 = "required"
    http_put_response_hop_limit = 1
  }

  user_data = base64encode(templatefile("${path.module}/userdata.sh.tpl", {
    cluster_name    = var.cluster_name
    squid_conf      = var.squid_config
    cloudwatch_conf = var.cloudwatch_config
  }))

  tag_specifications {
    resource_type = "instance"
    tags = merge(var.tags, {
      Name = "${var.environment}-${var.cluster_name}"
    })
  }
}

resource "aws_autoscaling_group" "proxy" {
  name                = "${var.environment}-${var.cluster_name}-asg"
  desired_capacity    = var.desired_capacity
  max_size            = var.max_size
  min_size            = var.min_size
  vpc_zone_identifier = var.private_subnet_ids
  target_group_arns   = var.target_group_arns
  health_check_type   = "ELB"

  launch_template {
    id      = aws_launch_template.proxy.id
    version = "$Latest"
  }

  instance_refresh {
    strategy = "Rolling"
    preferences {
      min_healthy_percentage = 75
    }
  }

  tag {
    key                 = "Name"
    value               = "${var.environment}-${var.cluster_name}"
    propagate_at_launch = true
  }

  lifecycle {
    ignore_changes = [desired_capacity]
  }
}

# Scaling policies
resource "aws_autoscaling_policy" "scale_up" {
  name                   = "${var.environment}-${var.cluster_name}-scale-up"
  scaling_adjustment     = 2
  adjustment_type        = "ChangeInCapacity"
  cooldown               = 300
  autoscaling_group_name = aws_autoscaling_group.proxy.name
}

resource "aws_autoscaling_policy" "scale_down" {
  name                   = "${var.environment}-${var.cluster_name}-scale-down"
  scaling_adjustment     = -1
  adjustment_type        = "ChangeInCapacity"
  cooldown               = 300
  autoscaling_group_name = aws_autoscaling_group.proxy.name
}

resource "aws_cloudwatch_metric_alarm" "high_cpu" {
  alarm_name          = "${var.environment}-${var.cluster_name}-high-cpu"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "CPUUtilization"
  namespace           = "AWS/EC2"
  period              = 120
  statistic           = "Average"
  threshold           = 70
  alarm_actions       = [aws_autoscaling_policy.scale_up.arn]

  dimensions = {
    AutoScalingGroupName = aws_autoscaling_group.proxy.name
  }
}

Traffic Flow with Port Routing

State Management

# backend.tf

terraform {
  backend "s3" {
    bucket         = "company-terraform-state"
    key            = "nlb-proxy/${var.environment}/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-state-lock"
    
    # Assume role for cross-account state access
    role_arn = "arn:aws:iam::SHARED_SERVICES_ACCOUNT:role/TerraformStateAccess"
  }
}

# State locking table
resource "aws_dynamodb_table" "terraform_locks" {
  name         = "terraform-state-lock"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }

  tags = {
    Name = "terraform-state-lock"
  }
}

Environment Configuration

# environments/prod/main.tf

locals {
  environment = "prod"
  port_config = jsondecode(file("${path.module}/port-config.json"))
}

module "vpc" {
  source = "../../modules/vpc"

  environment = local.environment
  vpc_cidr    = "10.100.0.0/16"
  
  public_subnet_cidrs  = ["10.100.1.0/24", "10.100.2.0/24", "10.100.3.0/24"]
  private_subnet_cidrs = ["10.100.10.0/24", "10.100.11.0/24", "10.100.12.0/24"]
  
  enable_transit_gateway = true
  transit_gateway_id     = var.transit_gateway_id
}

module "security_groups" {
  source = "../../modules/security-groups"

  environment         = local.environment
  vpc_id              = module.vpc.vpc_id
  vpc_cidr            = module.vpc.vpc_cidr
  port_config         = local.port_config
  onprem_cidrs        = ["172.16.0.0/12", "192.168.0.0/16"]
  proxy_cluster_a_sg_id = module.proxy_cluster_a.security_group_id
  proxy_cluster_b_sg_id = module.proxy_cluster_b.security_group_id
}

module "nlb" {
  source = "../../modules/nlb"

  environment        = local.environment
  vpc_id             = module.vpc.vpc_id
  public_subnet_ids  = module.vpc.public_subnet_ids
  private_subnet_ids = module.vpc.private_subnet_ids
  port_config        = local.port_config
  proxy_port         = 3128

  security_group_ids = {
    external = module.security_groups.nlb_external_sg_id
    internal = module.security_groups.nlb_internal_sg_id
  }

  tags = {
    Environment = local.environment
    Project     = "proxy-infrastructure"
  }
}

module "proxy_cluster_a" {
  source = "../../modules/proxy-cluster"

  environment       = local.environment
  cluster_name      = "proxy-cluster-a"
  vpc_id            = module.vpc.vpc_id
  private_subnet_ids = module.vpc.private_subnet_ids
  
  instance_type    = "c6i.xlarge"
  desired_capacity = 3
  min_size         = 2
  max_size         = 10
  
  target_group_arns = [
    module.nlb.target_group_arns["proxy-cluster-a"]
  ]
}

module "proxy_cluster_b" {
  source = "../../modules/proxy-cluster"

  environment       = local.environment
  cluster_name      = "proxy-cluster-b"
  vpc_id            = module.vpc.vpc_id
  private_subnet_ids = module.vpc.private_subnet_ids
  
  instance_type    = "c6i.xlarge"
  desired_capacity = 2
  min_size         = 2
  max_size         = 6
  
  target_group_arns = [
    module.nlb.target_group_arns["proxy-cluster-b"]
  ]
}

Adding New Ports

Example PR to add a new port:

{
  "port_whitelists": {
    "10001": { ... },
    "10002": { ... },
+   "10003": {
+     "description": "Partner A - New Settlement API",
+     "allowed_ips": ["203.0.113.10/32", "203.0.113.12/32"],
+     "target_group": "proxy-cluster-a",
+     "protocol": "TCP"
+   }
  }
}

Monitoring and Alerting

# monitoring.tf

resource "aws_cloudwatch_dashboard" "proxy" {
  dashboard_name = "${var.environment}-proxy-dashboard"

  dashboard_body = jsonencode({
    widgets = [
      {
        type   = "metric"
        x      = 0
        y      = 0
        width  = 12
        height = 6
        properties = {
          title  = "NLB Active Connections"
          region = var.region
          metrics = [
            ["AWS/NetworkELB", "ActiveFlowCount", "LoadBalancer", aws_lb.external.arn_suffix]
          ]
        }
      },
      {
        type   = "metric"
        x      = 12
        y      = 0
        width  = 12
        height = 6
        properties = {
          title  = "NLB New Connections"
          region = var.region
          metrics = [
            ["AWS/NetworkELB", "NewFlowCount", "LoadBalancer", aws_lb.external.arn_suffix]
          ]
        }
      },
      {
        type   = "metric"
        x      = 0
        y      = 6
        width  = 12
        height = 6
        properties = {
          title  = "Healthy Hosts per Target Group"
          region = var.region
          metrics = [
            for tg_name, tg in aws_lb_target_group.proxy : 
            ["AWS/NetworkELB", "HealthyHostCount", "TargetGroup", tg.arn_suffix, "LoadBalancer", aws_lb.external.arn_suffix]
          ]
        }
      }
    ]
  })
}

# Alert on unhealthy targets
resource "aws_cloudwatch_metric_alarm" "unhealthy_hosts" {
  for_each = aws_lb_target_group.proxy

  alarm_name          = "${var.environment}-${each.key}-unhealthy"
  comparison_operator = "LessThanThreshold"
  evaluation_periods  = 2
  metric_name         = "HealthyHostCount"
  namespace           = "AWS/NetworkELB"
  period              = 60
  statistic           = "Minimum"
  threshold           = 2
  alarm_description   = "Less than 2 healthy hosts in ${each.key}"
  
  dimensions = {
    TargetGroup  = each.value.arn_suffix
    LoadBalancer = aws_lb.external.arn_suffix
  }

  alarm_actions = [var.sns_topic_arn]
  ok_actions    = [var.sns_topic_arn]
}

Best Practices

Practice	Why
Use JSON for port config	Easy to review in PRs, version controlled
Separate target groups	Different proxy clusters for different use cases
Enable cross-zone LB	Better distribution, higher availability
Use source IP stickiness	Consistent routing for stateful connections
Monitor per-port metrics	Identify issues with specific partners
Document each port	Know who owns what

Troubleshooting

"Connection timeout on specific port"

# Check if listener exists
aws elbv2 describe-listeners --load-balancer-arn <nlb-arn> | jq '.Listeners[] | select(.Port == 10001)'

# Check security group rules
aws ec2 describe-security-group-rules --filters "Name=group-id,Values=<sg-id>" | jq '.SecurityGroupRules[] | select(.FromPort == 10001)'

# Check target health
aws elbv2 describe-target-health --target-group-arn <tg-arn>

"Partner IP not whitelisted"

Verify IP is in port-config.json
Run terraform plan to see if rule would be created
Check for CIDR notation errors (missing /32)

"Proxy cluster scaling issues"

Check ASG desired vs actual capacity
Review CloudWatch metrics for CPU/memory
Verify target group health check settings

Conclusion

The combination of:

NLB for high-performance TCP load balancing
Dynamic security groups for per-port whitelisting
Auto Scaling Groups for resilient proxy clusters
Transit Gateway for hybrid connectivity

Creates a system that handles financial services traffic securely and reliably.

system-design13 min read

Engineering AWS NLB Infrastructure for Financial Services Proxy Networks

Introduction

Architecture Overview

Port Whitelist Configuration

Terraform Project Structure

NLB Module

Per-Port Security Group Rules

Internal NLB for On-Premises Traffic

Proxy Cluster Auto Scaling Group

Traffic Flow with Port Routing

State Management

Environment Configuration

Adding New Ports

Monitoring and Alerting

Best Practices

Troubleshooting

"Connection timeout on specific port"

"Partner IP not whitelisted"

"Proxy cluster scaling issues"

Conclusion

Related Articles

Payment Processing System at Scale: Stripe/Adyen Integration with AWS EventBridge, Lambda, and DynamoDB

AI Chatbot System Architecture: WhatsApp Business API, Facebook Messenger, and AWS Bedrock Integration

Multi-Region AWS Infrastructure for Resilience: A Terraform Deep Dive

Engineering AWS NLB Infrastructure for Financial Services Proxy Networks

Introduction

Architecture Overview

Port Whitelist Configuration

Terraform Project Structure

NLB Module

Per-Port Security Group Rules

Internal NLB for On-Premises Traffic

Proxy Cluster Auto Scaling Group

Traffic Flow with Port Routing

State Management

Environment Configuration

Adding New Ports

Monitoring and Alerting

Best Practices

Troubleshooting

"Connection timeout on specific port"

"Partner IP not whitelisted"

"Proxy cluster scaling issues"

Conclusion

Related Articles

Payment Processing System at Scale: Stripe/Adyen Integration with AWS EventBridge, Lambda, and DynamoDB

AI Chatbot System Architecture: WhatsApp Business API, Facebook Messenger, and AWS Bedrock Integration

Multi-Region AWS Infrastructure for Resilience: A Terraform Deep Dive