Smarter DNS Routing with AWS Route 53 Traffic Policies

If you’ve been following along with the CDN series, you’ll know I’ve been running a multi-node setup across a bunch of providers to get content closer to users. The next natural step was making the DNS layer smarter, not just round-robining across all nodes, but actually sending traffic to the right place based on where it’s coming from, and failing over gracefully when something dies.

Enter Route 53 Traffic Policies.

What’s the goal?

The short version: geoproximity routing at the top level, weighted pools underneath, with health checks on every endpoint so Route 53 can skip dead nodes automatically.

The topology looks something like this:

A geoproximity rule maps incoming requests to the nearest AWS region anchor
Each region maps to a weighted pool of actual VPS or Dedicated Server endpoints
Health checks on every node mean if something goes down, Route 53 stops sending traffic there without any manual intervention

The nodes

I’m running a mix of providers across North America and Europe, some unmetered, some on metered bandwidth. That distinction matters when deciding how aggressively to weight things.

For example, the EU London pool heavily favours the nodes sitting on 1Gbps unmetered pipes, with a solid 250Mbps secondary, and a handful of metered nodes tucked away as low-weight backups. No point hammering a metered node when you’ve got a fat pipe available.

Regional IP routing

One of the more powerful aspects of this setup is that different parts of the world resolve your domain to entirely different IPs. A user in Sydney gets routed to a different pool than someone in Frankfurt or Toronto — all from the same domain name, transparently.

This is handled at the top level by a geoproximity rule that anchors each pool to an AWS region. Route 53 uses the region as a geographic proxy. us-east-1 covers eastern North America, eu-west-2 covers the UK and western Europe, ap-southeast-2 covers Australia, and so on. When a DNS query comes in, Route 53 picks the closest anchor and hands it off to that region’s pool.

"geo-1": {
  "RuleType": "geoproximity",
  "GeoproximityLocations": [
    {
      "EvaluateTargetHealth": true,
      "Location": { "Type": "Region", "LocationName": "aws:route53:us-east-1" },
      "Bias": 0,
      "RuleReference": "us-east-pool"
    },
    {
      "EvaluateTargetHealth": true,
      "Location": { "Type": "Region", "LocationName": "aws:route53:eu-west-2" },
      "Bias": 0,
      "RuleReference": "eu-london-pool"
    },
    {
      "EvaluateTargetHealth": true,
      "Location": { "Type": "Region", "LocationName": "aws:route53:ap-southeast-2" },
      "Bias": 0,
      "RuleReference": "au-sy-pool"
    }
  ]
}

The Bias field is worth knowing about, it lets you expand or shrink the effective geographic footprint of a region. A positive bias pulls more traffic toward that anchor, a negative one pushes it away. Useful if your node distribution is uneven and the default boundaries aren’t quite right.

Each pool then does its own thing with weighted routing, so within a region you still get load distribution and failover, the geo layer just determines which pool you land in first.

The practical upside is that you can tune each region independently. A pool serving Southeast Asia might weight completely different nodes than one serving North America, reflecting the actual latency and capacity realities of your infrastructure rather than just blindly splitting traffic everywhere equally.

The traffic policy JSON

Route 53 traffic policies are defined in a JSON format that’s… a bit finicky. A few things I ran into:

Health checks don’t go in the Endpoints block. This one got me. The HealthCheck field needs to be on each item inside the weighted rule, not on the endpoint definition itself. Putting it in the wrong place gets you a vague InvalidTrafficPolicyDocument error with no real hint as to why.

EvaluateTargetHealth only works if a health check is actually wired up. Seems obvious in retrospect, but if you set it to true without a HealthCheck UUID on the same item, Route 53 just treats the endpoint as always healthy. Completely defeats the purpose.

Here’s what a pool looks like once it’s properly wired:

"example-pool": {
  "RuleType": "weighted",
  "Items": [
    { "Weight": "40", "EndpointReference": "node-a", "EvaluateTargetHealth": true, "HealthCheck": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" },
    { "Weight": "30", "EndpointReference": "node-b", "EvaluateTargetHealth": true, "HealthCheck": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" },
    { "Weight": "15", "EndpointReference": "node-c", "EvaluateTargetHealth": true, "HealthCheck": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" },
    { "Weight": "7",  "EndpointReference": "node-d", "EvaluateTargetHealth": true, "HealthCheck": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" },
    { "Weight": "4",  "EndpointReference": "node-e", "EvaluateTargetHealth": true, "HealthCheck": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" }
  ]
}

The weights don’t need to add up to 100, Route 53 normalizes them. Keeping them roughly percentage-like makes it easier to reason about at a glance.

Health checks

Route 53 health checks are set up separately under the health checks console, each polling the node on a specific port and path. Once you have the health check UUIDs, you wire them into the policy JSON as shown above.

The end result is that if your primary node goes dark, Route 53 detects it within a minute or two and redistributes that traffic across the remaining healthy nodes automatically. No alerting, no manual DNS edits, no 3am pages.

Wrapping up

Traffic policies in Route 53 are genuinely powerful once you get past the JSON quirks. The combination of geoproximity at the top and weighted failover pools underneath gives you a lot of control over where traffic lands and what happens when things break. If you’re already running multi-region infrastructure, it’s worth the couple of hours to set up properly.

July 28, 2024

Brett Petch

projects