⚡
Milan.dev
>Home>Projects>Experience>Blog
GitHubLinkedIn
status: building
>Home>Projects>Experience>Blog
status: building

Connect

Let's collaborate on infrastructure challenges

Open to discussing DevOps strategies, cloud architecture optimization, security implementations, and interesting infrastructure problems.

send a message→

Find me elsewhere

GitHub
@milandangol
LinkedIn
/in/milan-dangol
Email
milandangol57@gmail.com
Forged with& code

© 2026 Milan Dangol — All systems reserved

back to blog
cloud

Engineering AWS NLB Infrastructure for Financial Services Proxy Networks

Designing a multi-environment AWS NLB infrastructure for financial services using Terraform - featuring dual internal/external load balancers, JSON-driven per-port IP whitelists, intelligent port-range routing, and Transit Gateway hybrid connectivity.

M

Milan Dangol

Sr DevOps & DevSecOps Engineer

Jul 27, 2025
12 min read

Introduction

Financial services require proxy infrastructure that can handle thousands of external connections while maintaining strict IP-level access controls. I built a system where each port on the NLB could have its own whitelist of allowed source IPs, all driven by a simple JSON configuration.

The challenge was creating infrastructure that:

  • Supports 100+ ports with individual IP whitelists
  • Routes intelligently to different proxy clusters based on port ranges
  • Works across multiple environments (dev, staging, prod)
  • Integrates with on-premises via Transit Gateway
  • Manages state safely with S3/DynamoDB backend

Architecture Overview

flowchart TB subgraph External["External Traffic"] CLIENT1[Partner A<br/>203.0.113.10] CLIENT2[Partner B<br/>198.51.100.20] CLIENT3[Partner C<br/>192.0.2.30] end subgraph AWSCloud["AWS Cloud"] subgraph PublicSubnets["Public Subnets"] EXT_NLB[External NLB<br/>Ports 10000-20000] end subgraph PrivateSubnets["Private Subnets"] INT_NLB[Internal NLB<br/>Ports 3128, 8080] subgraph ProxyCluster1["Proxy Cluster A - Ports 10000-15000"] PROXY_A1[Squid Proxy 1] PROXY_A2[Squid Proxy 2] end subgraph ProxyCluster2["Proxy Cluster B - Ports 15001-20000"] PROXY_B1[HAProxy 1] PROXY_B2[HAProxy 2] end end subgraph SecurityGroups["Security Layer"] SG_EXT[External SG<br/>Per-port IP whitelist] SG_INT[Internal SG<br/>VPC + On-prem CIDR] end end subgraph OnPrem["On-Premises"] INTERNAL_APPS[Internal Applications] TGW[Transit Gateway] end CLIENT1 -->|Port 10001| EXT_NLB CLIENT2 -->|Port 12500| EXT_NLB CLIENT3 -->|Port 16000| EXT_NLB EXT_NLB --> SG_EXT SG_EXT --> ProxyCluster1 SG_EXT --> ProxyCluster2 INTERNAL_APPS --> TGW TGW --> INT_NLB INT_NLB --> SG_INT SG_INT --> ProxyCluster1 style External fill:#6c757d,stroke:#fff,stroke-width:2px,color:#fff style PublicSubnets fill:#e63946,stroke:#fff,stroke-width:2px,color:#fff style PrivateSubnets fill:#264653,stroke:#2a9d8f,stroke-width:2px,color:#fff style SecurityGroups fill:#1a1a2e,stroke:#f77f00,stroke-width:2px,color:#fff style OnPrem fill:#3a3a5c,stroke:#fff,stroke-width:2px,color:#fff

Port Whitelist Configuration

The key innovation is a JSON-driven configuration that maps each port to its allowed source IPs:

{
  "port_whitelists": {
    "10001": {
      "description": "Partner A - Payment Gateway",
      "allowed_ips": ["203.0.113.10/32", "203.0.113.11/32"],
      "target_group": "proxy-cluster-a",
      "protocol": "TCP"
    },
    "10002": {
      "description": "Partner A - Reconciliation",
      "allowed_ips": ["203.0.113.10/32"],
      "target_group": "proxy-cluster-a",
      "protocol": "TCP"
    },
    "12500": {
      "description": "Partner B - Transaction API",
      "allowed_ips": ["198.51.100.0/24"],
      "target_group": "proxy-cluster-a",
      "protocol": "TCP"
    },
    "16000": {
      "description": "Partner C - Batch Processing",
      "allowed_ips": ["192.0.2.0/28", "192.0.2.64/28"],
      "target_group": "proxy-cluster-b",
      "protocol": "TCP"
    }
  },
  "port_ranges": {
    "10000-15000": {
      "default_target": "proxy-cluster-a",
      "health_check_port": 8080
    },
    "15001-20000": {
      "default_target": "proxy-cluster-b", 
      "health_check_port": 8081
    }
  }
}

Terraform Project Structure

nlb-proxy-infrastructure/
├── modules/
│   ├── nlb/
│   │   ├── main.tf
│   │   ├── listeners.tf
│   │   ├── target-groups.tf
│   │   ├── variables.tf
│   │   └── outputs.tf
│   ├── security-groups/
│   │   ├── main.tf
│   │   ├── per-port-rules.tf
│   │   └── variables.tf
│   └── proxy-cluster/
│       ├── main.tf
│       ├── asg.tf
│       └── variables.tf
├── environments/
│   ├── dev/
│   │   ├── main.tf
│   │   ├── port-config.json
│   │   └── terraform.tfvars
│   ├── staging/
│   └── prod/
├── config/
│   └── port-whitelists/
│       ├── dev.json
│       ├── staging.json
│       └── prod.json
└── backend.tf

NLB Module

# modules/nlb/main.tf

resource "aws_lb" "external" {
  name               = "${var.environment}-external-nlb"
  internal           = false
  load_balancer_type = "network"
  subnets            = var.public_subnet_ids

  enable_cross_zone_load_balancing = true
  enable_deletion_protection       = var.environment == "prod"

  tags = merge(var.tags, {
    Name = "${var.environment}-external-nlb"
    Type = "external"
  })
}

resource "aws_lb" "internal" {
  name               = "${var.environment}-internal-nlb"
  internal           = true
  load_balancer_type = "network"
  subnets            = var.private_subnet_ids

  enable_cross_zone_load_balancing = true
  enable_deletion_protection       = var.environment == "prod"

  tags = merge(var.tags, {
    Name = "${var.environment}-internal-nlb"
    Type = "internal"
  })
}

# Dynamic listeners based on port configuration
resource "aws_lb_listener" "external" {
  for_each = var.port_config.port_whitelists

  load_balancer_arn = aws_lb.external.arn
  port              = each.key
  protocol          = each.value.protocol

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.proxy[each.value.target_group].arn
  }
}

# Target groups for each proxy cluster
resource "aws_lb_target_group" "proxy" {
  for_each = toset(distinct([for k, v in var.port_config.port_whitelists : v.target_group]))

  name        = "${var.environment}-${each.key}"
  port        = var.proxy_port
  protocol    = "TCP"
  vpc_id      = var.vpc_id
  target_type = "instance"

  health_check {
    enabled             = true
    protocol            = "TCP"
    port                = var.port_config.port_ranges[each.key].health_check_port
    healthy_threshold   = 2
    unhealthy_threshold = 2
    interval            = 10
  }

  stickiness {
    enabled = true
    type    = "source_ip"
  }

  tags = merge(var.tags, {
    Name = "${var.environment}-${each.key}-tg"
  })
}

Per-Port Security Group Rules

flowchart TD subgraph SecurityGroupGeneration["Security Group Rule Generation"] JSON[port-config.json] --> PARSE[Parse JSON in Terraform] PARSE --> LOOP[For each port in config] LOOP --> RULE1["Rule: Port 10001<br/>Allow: 203.0.113.10/32, 203.0.113.11/32"] LOOP --> RULE2["Rule: Port 10002<br/>Allow: 203.0.113.10/32"] LOOP --> RULE3["Rule: Port 12500<br/>Allow: 198.51.100.0/24"] LOOP --> RULEN["Rule: Port N<br/>Allow: ..."] end subgraph AppliedRules["Applied to Security Group"] SG[NLB Security Group] RULE1 --> SG RULE2 --> SG RULE3 --> SG RULEN --> SG end style SecurityGroupGeneration fill:#1a1a2e,stroke:#00d9ff,stroke-width:2px,color:#fff style AppliedRules fill:#264653,stroke:#2a9d8f,stroke-width:2px,color:#fff
# modules/security-groups/per-port-rules.tf

locals {
  # Flatten the port whitelist into individual rules
  port_rules = flatten([
    for port, config in var.port_config.port_whitelists : [
      for cidr in config.allowed_ips : {
        port        = port
        cidr        = cidr
        description = config.description
      }
    ]
  ])
}

resource "aws_security_group" "nlb_external" {
  name_prefix = "${var.environment}-nlb-external-"
  vpc_id      = var.vpc_id
  description = "Security group for external NLB with per-port IP whitelists"

  tags = merge(var.tags, {
    Name = "${var.environment}-nlb-external-sg"
  })

  lifecycle {
    create_before_destroy = true
  }
}

# Generate ingress rules dynamically from JSON config
resource "aws_security_group_rule" "port_whitelist" {
  for_each = {
    for idx, rule in local.port_rules : 
    "${rule.port}-${replace(rule.cidr, "/", "-")}" => rule
  }

  type              = "ingress"
  from_port         = tonumber(each.value.port)
  to_port           = tonumber(each.value.port)
  protocol          = "tcp"
  cidr_blocks       = [each.value.cidr]
  security_group_id = aws_security_group.nlb_external.id
  description       = each.value.description
}

# Egress to proxy clusters
resource "aws_security_group_rule" "to_proxy_cluster_a" {
  type                     = "egress"
  from_port                = 0
  to_port                  = 65535
  protocol                 = "tcp"
  source_security_group_id = var.proxy_cluster_a_sg_id
  security_group_id        = aws_security_group.nlb_external.id
  description              = "Traffic to Proxy Cluster A"
}

resource "aws_security_group_rule" "to_proxy_cluster_b" {
  type                     = "egress"
  from_port                = 0
  to_port                  = 65535
  protocol                 = "tcp"
  source_security_group_id = var.proxy_cluster_b_sg_id
  security_group_id        = aws_security_group.nlb_external.id
  description              = "Traffic to Proxy Cluster B"
}

Internal NLB for On-Premises Traffic

# modules/nlb/internal.tf

resource "aws_lb_listener" "internal_proxy" {
  load_balancer_arn = aws_lb.internal.arn
  port              = 3128
  protocol          = "TCP"

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.proxy["proxy-cluster-a"].arn
  }
}

resource "aws_lb_listener" "internal_http" {
  load_balancer_arn = aws_lb.internal.arn
  port              = 8080
  protocol          = "TCP"

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.proxy["proxy-cluster-a"].arn
  }
}

# Security group for internal NLB
resource "aws_security_group" "nlb_internal" {
  name_prefix = "${var.environment}-nlb-internal-"
  vpc_id      = var.vpc_id
  description = "Security group for internal NLB"

  # Allow from VPC
  ingress {
    from_port   = 3128
    to_port     = 3128
    protocol    = "tcp"
    cidr_blocks = [var.vpc_cidr]
    description = "Proxy from VPC"
  }

  ingress {
    from_port   = 8080
    to_port     = 8080
    protocol    = "tcp"
    cidr_blocks = [var.vpc_cidr]
    description = "HTTP Proxy from VPC"
  }

  # Allow from on-premises via Transit Gateway
  ingress {
    from_port   = 3128
    to_port     = 3128
    protocol    = "tcp"
    cidr_blocks = var.onprem_cidrs
    description = "Proxy from on-premises"
  }

  ingress {
    from_port   = 8080
    to_port     = 8080
    protocol    = "tcp"
    cidr_blocks = var.onprem_cidrs
    description = "HTTP Proxy from on-premises"
  }

  tags = merge(var.tags, {
    Name = "${var.environment}-nlb-internal-sg"
  })
}

Proxy Cluster Auto Scaling Group

# modules/proxy-cluster/asg.tf

resource "aws_launch_template" "proxy" {
  name_prefix   = "${var.environment}-${var.cluster_name}-"
  image_id      = var.ami_id
  instance_type = var.instance_type

  network_interfaces {
    associate_public_ip_address = false
    security_groups             = [aws_security_group.proxy.id]
  }

  block_device_mappings {
    device_name = "/dev/xvda"
    ebs {
      volume_size           = 50
      volume_type           = "gp3"
      encrypted             = true
      delete_on_termination = true
    }
  }

  iam_instance_profile {
    name = aws_iam_instance_profile.proxy.name
  }

  metadata_options {
    http_endpoint               = "enabled"
    http_tokens                 = "required"
    http_put_response_hop_limit = 1
  }

  user_data = base64encode(templatefile("${path.module}/userdata.sh.tpl", {
    cluster_name    = var.cluster_name
    squid_conf      = var.squid_config
    cloudwatch_conf = var.cloudwatch_config
  }))

  tag_specifications {
    resource_type = "instance"
    tags = merge(var.tags, {
      Name = "${var.environment}-${var.cluster_name}"
    })
  }
}

resource "aws_autoscaling_group" "proxy" {
  name                = "${var.environment}-${var.cluster_name}-asg"
  desired_capacity    = var.desired_capacity
  max_size            = var.max_size
  min_size            = var.min_size
  vpc_zone_identifier = var.private_subnet_ids
  target_group_arns   = var.target_group_arns
  health_check_type   = "ELB"

  launch_template {
    id      = aws_launch_template.proxy.id
    version = "$Latest"
  }

  instance_refresh {
    strategy = "Rolling"
    preferences {
      min_healthy_percentage = 75
    }
  }

  tag {
    key                 = "Name"
    value               = "${var.environment}-${var.cluster_name}"
    propagate_at_launch = true
  }

  lifecycle {
    ignore_changes = [desired_capacity]
  }
}

# Scaling policies
resource "aws_autoscaling_policy" "scale_up" {
  name                   = "${var.environment}-${var.cluster_name}-scale-up"
  scaling_adjustment     = 2
  adjustment_type        = "ChangeInCapacity"
  cooldown               = 300
  autoscaling_group_name = aws_autoscaling_group.proxy.name
}

resource "aws_autoscaling_policy" "scale_down" {
  name                   = "${var.environment}-${var.cluster_name}-scale-down"
  scaling_adjustment     = -1
  adjustment_type        = "ChangeInCapacity"
  cooldown               = 300
  autoscaling_group_name = aws_autoscaling_group.proxy.name
}

resource "aws_cloudwatch_metric_alarm" "high_cpu" {
  alarm_name          = "${var.environment}-${var.cluster_name}-high-cpu"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "CPUUtilization"
  namespace           = "AWS/EC2"
  period              = 120
  statistic           = "Average"
  threshold           = 70
  alarm_actions       = [aws_autoscaling_policy.scale_up.arn]

  dimensions = {
    AutoScalingGroupName = aws_autoscaling_group.proxy.name
  }
}

Traffic Flow with Port Routing

sequenceDiagram participant Partner as Partner System participant NLB as External NLB participant SG as Security Group participant Proxy as Proxy Cluster participant Target as Target System Partner->>NLB: TCP Connect (Port 10001) NLB->>SG: Check source IP alt IP in whitelist for port 10001 SG->>NLB: Allow NLB->>Proxy: Forward to proxy-cluster-a Proxy->>Target: Proxy request Target-->>Proxy: Response Proxy-->>NLB: Response NLB-->>Partner: Response else IP not in whitelist SG-->>NLB: Deny NLB-->>Partner: Connection refused end

State Management

# backend.tf

terraform {
  backend "s3" {
    bucket         = "company-terraform-state"
    key            = "nlb-proxy/${var.environment}/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-state-lock"
    
    # Assume role for cross-account state access
    role_arn = "arn:aws:iam::SHARED_SERVICES_ACCOUNT:role/TerraformStateAccess"
  }
}

# State locking table
resource "aws_dynamodb_table" "terraform_locks" {
  name         = "terraform-state-lock"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }

  tags = {
    Name = "terraform-state-lock"
  }
}

Environment Configuration

# environments/prod/main.tf

locals {
  environment = "prod"
  port_config = jsondecode(file("${path.module}/port-config.json"))
}

module "vpc" {
  source = "../../modules/vpc"

  environment = local.environment
  vpc_cidr    = "10.100.0.0/16"
  
  public_subnet_cidrs  = ["10.100.1.0/24", "10.100.2.0/24", "10.100.3.0/24"]
  private_subnet_cidrs = ["10.100.10.0/24", "10.100.11.0/24", "10.100.12.0/24"]
  
  enable_transit_gateway = true
  transit_gateway_id     = var.transit_gateway_id
}

module "security_groups" {
  source = "../../modules/security-groups"

  environment         = local.environment
  vpc_id              = module.vpc.vpc_id
  vpc_cidr            = module.vpc.vpc_cidr
  port_config         = local.port_config
  onprem_cidrs        = ["172.16.0.0/12", "192.168.0.0/16"]
  proxy_cluster_a_sg_id = module.proxy_cluster_a.security_group_id
  proxy_cluster_b_sg_id = module.proxy_cluster_b.security_group_id
}

module "nlb" {
  source = "../../modules/nlb"

  environment        = local.environment
  vpc_id             = module.vpc.vpc_id
  public_subnet_ids  = module.vpc.public_subnet_ids
  private_subnet_ids = module.vpc.private_subnet_ids
  port_config        = local.port_config
  proxy_port         = 3128

  security_group_ids = {
    external = module.security_groups.nlb_external_sg_id
    internal = module.security_groups.nlb_internal_sg_id
  }

  tags = {
    Environment = local.environment
    Project     = "proxy-infrastructure"
  }
}

module "proxy_cluster_a" {
  source = "../../modules/proxy-cluster"

  environment       = local.environment
  cluster_name      = "proxy-cluster-a"
  vpc_id            = module.vpc.vpc_id
  private_subnet_ids = module.vpc.private_subnet_ids
  
  instance_type    = "c6i.xlarge"
  desired_capacity = 3
  min_size         = 2
  max_size         = 10
  
  target_group_arns = [
    module.nlb.target_group_arns["proxy-cluster-a"]
  ]
}

module "proxy_cluster_b" {
  source = "../../modules/proxy-cluster"

  environment       = local.environment
  cluster_name      = "proxy-cluster-b"
  vpc_id            = module.vpc.vpc_id
  private_subnet_ids = module.vpc.private_subnet_ids
  
  instance_type    = "c6i.xlarge"
  desired_capacity = 2
  min_size         = 2
  max_size         = 6
  
  target_group_arns = [
    module.nlb.target_group_arns["proxy-cluster-b"]
  ]
}

Adding New Ports

flowchart TD subgraph Process["Adding New Port Whitelist"] A[Partner requests new port] --> B[Update port-config.json] B --> C[Create PR] C --> D[Review changes] D --> E[Terraform plan] E --> F{Changes look correct?} F -->|Yes| G[Merge PR] F -->|No| H[Fix config] H --> B G --> I[Terraform apply] I --> J[New listener created] I --> K[Security group rule added] J & K --> L[Port ready for traffic] end style Process fill:#1a1a2e,stroke:#00d9ff,stroke-width:2px,color:#fff

Example PR to add a new port:

{
  "port_whitelists": {
    "10001": { ... },
    "10002": { ... },
+   "10003": {
+     "description": "Partner A - New Settlement API",
+     "allowed_ips": ["203.0.113.10/32", "203.0.113.12/32"],
+     "target_group": "proxy-cluster-a",
+     "protocol": "TCP"
+   }
  }
}

Monitoring and Alerting

# monitoring.tf

resource "aws_cloudwatch_dashboard" "proxy" {
  dashboard_name = "${var.environment}-proxy-dashboard"

  dashboard_body = jsonencode({
    widgets = [
      {
        type   = "metric"
        x      = 0
        y      = 0
        width  = 12
        height = 6
        properties = {
          title  = "NLB Active Connections"
          region = var.region
          metrics = [
            ["AWS/NetworkELB", "ActiveFlowCount", "LoadBalancer", aws_lb.external.arn_suffix]
          ]
        }
      },
      {
        type   = "metric"
        x      = 12
        y      = 0
        width  = 12
        height = 6
        properties = {
          title  = "NLB New Connections"
          region = var.region
          metrics = [
            ["AWS/NetworkELB", "NewFlowCount", "LoadBalancer", aws_lb.external.arn_suffix]
          ]
        }
      },
      {
        type   = "metric"
        x      = 0
        y      = 6
        width  = 12
        height = 6
        properties = {
          title  = "Healthy Hosts per Target Group"
          region = var.region
          metrics = [
            for tg_name, tg in aws_lb_target_group.proxy : 
            ["AWS/NetworkELB", "HealthyHostCount", "TargetGroup", tg.arn_suffix, "LoadBalancer", aws_lb.external.arn_suffix]
          ]
        }
      }
    ]
  })
}

# Alert on unhealthy targets
resource "aws_cloudwatch_metric_alarm" "unhealthy_hosts" {
  for_each = aws_lb_target_group.proxy

  alarm_name          = "${var.environment}-${each.key}-unhealthy"
  comparison_operator = "LessThanThreshold"
  evaluation_periods  = 2
  metric_name         = "HealthyHostCount"
  namespace           = "AWS/NetworkELB"
  period              = 60
  statistic           = "Minimum"
  threshold           = 2
  alarm_description   = "Less than 2 healthy hosts in ${each.key}"
  
  dimensions = {
    TargetGroup  = each.value.arn_suffix
    LoadBalancer = aws_lb.external.arn_suffix
  }

  alarm_actions = [var.sns_topic_arn]
  ok_actions    = [var.sns_topic_arn]
}

Best Practices

Practice Why
Use JSON for port config Easy to review in PRs, version controlled
Separate target groups Different proxy clusters for different use cases
Enable cross-zone LB Better distribution, higher availability
Use source IP stickiness Consistent routing for stateful connections
Monitor per-port metrics Identify issues with specific partners
Document each port Know who owns what

Troubleshooting

"Connection timeout on specific port"

# Check if listener exists
aws elbv2 describe-listeners --load-balancer-arn <nlb-arn> | jq '.Listeners[] | select(.Port == 10001)'

# Check security group rules
aws ec2 describe-security-group-rules --filters "Name=group-id,Values=<sg-id>" | jq '.SecurityGroupRules[] | select(.FromPort == 10001)'

# Check target health
aws elbv2 describe-target-health --target-group-arn <tg-arn>

"Partner IP not whitelisted"

  • Verify IP is in port-config.json
  • Run terraform plan to see if rule would be created
  • Check for CIDR notation errors (missing /32)

"Proxy cluster scaling issues"

  • Check ASG desired vs actual capacity
  • Review CloudWatch metrics for CPU/memory
  • Verify target group health check settings

Conclusion

This architecture provides enterprise-grade proxy infrastructure that scales to hundreds of ports while maintaining strict per-port access controls. The JSON-driven configuration makes it easy for teams to request new ports through PRs, with full audit trail and review process.

The combination of:

  • NLB for high-performance TCP load balancing
  • Dynamic security groups for per-port whitelisting
  • Auto Scaling Groups for resilient proxy clusters
  • Transit Gateway for hybrid connectivity

Creates a system that handles financial services traffic securely and reliably.

Share this article

Tags

#aws#nlb#terraform#networking#load-balancing#fintech

Related Articles

system-design13 min read

Payment Processing System at Scale: Stripe/Adyen Integration with AWS EventBridge, Lambda, and DynamoDB

Building a payment processing system handling millions of daily transactions - featuring EventBridge for event-driven orchestration, Lambda for serverless processing, DynamoDB for transaction state, idempotency guarantees, and real-time fraud detection with Kinesis.

system-design12 min read

AI Chatbot System Architecture: WhatsApp Business API, Facebook Messenger, and AWS Bedrock Integration

Designing a multi-channel AI chatbot system handling 5M+ conversations monthly - featuring AWS Bedrock for conversational AI, SQS for message queuing, DynamoDB for conversation state, and Lambda for serverless processing across WhatsApp and Facebook Messenger.

cloud9 min read

Multi-Region AWS Infrastructure for Resilience: A Terraform Deep Dive

Learn how to architect highly available, multi-region AWS infrastructure using Terraform, Transit Gateway, Network Load Balancers, and intelligent routing strategies for enterprise-grade applications.