Hi, I'm Brett

Marketing guru by day, FOSS dev by night.

Automating the Edge: The Part Nobody Sees


If you read the last post, you saw the fun parts.

OpenResty routing requests.
Redis acting as a control plane.
Traffic getting pushed closer to the edge so it actually feels fast.

That’s the part people like to talk about.

This is the part that makes it work.


The problem you only notice at 2am

At some point, the CDN stopped being “a couple servers” and became infrastructure.

More domains.
More routing rules.
More edge nodes.
More things that can go slightly wrong.

And “slightly wrong” is the dangerous one.

Not broken enough to fail.
Just inconsistent enough to waste your time.

  • One node has a different sysctl value
  • Another is missing a firewall rule
  • Docker is a slightly different version somewhere else

Now you’re SSHing between boxes trying to figure out why the same request behaves differently depending on where it lands.

That’s when you stop managing servers and start managing state.


The decision: boring over clever

There are a lot of ways to solve this.

Most of them involve adding more infrastructure.

Agents. Control planes. Databases. Dashboards.

I didn’t want any of that.

I wanted:

  • Something that works over SSH
  • Something I can read in plain text
  • Something that doesn’t introduce another failure point

So I used Ansible.

Not because it’s exciting — because it isn’t.


What the system actually looks like

At the top level, the entire CDN build process is this:

- hosts: cdn
  become: true
  roles:
    - common
    - sysctl
    - docker
    - crowdsec
    - app

That’s the system.

Five layers, applied in order.

  • common → lock down the box
  • sysctl → make the kernel behave
  • docker → give it a runtime
  • crowdsec → watch traffic
  • app → actually run the CDN

Nothing fancy. Just the correct order.

Because the order is the architecture.


Where things actually get interesting

Not in the playbook.

In what it enforces.


Kernel tuning stops being tribal knowledge

Everyone has a list of sysctl tweaks.

Nobody remembers why they set half of them.

And if you’re doing it manually, they slowly drift over time.

So instead, it’s one template:

  • versioned
  • reviewed
  • deployed everywhere

BBR, TCP Fast Open, buffer sizes, conntrack limits, all locked in.

Not because the values are perfect.

Because they’re consistent.


The firewall becomes intentional

Default drop. Always.

Everything else is explicit.

tcp dport { 80, 443 } accept
udp dport 443 accept

That second line is the one people miss.

No UDP 443 = no real HTTP/3.

Clients try. Fail. Fall back. Add latency.

You don’t notice until you go looking for it.


Docker doesn’t get to be “special”

Docker likes to pretend networking just works.

It does — until your firewall says otherwise.

So you allow it explicitly:

iifname "br-*"

Now container networking is predictable instead of magical.


The subtle stuff that breaks everything

This is where most people get burned.


Templates vs. static files

It sounds minor. It isn’t.

If a file doesn’t need variables, it shouldn’t be a template.

Because the moment you accidentally drop {{ }} into something like a Lua script, you’ve created a problem that doesn’t look like an Ansible problem anymore.

It looks like your app is broken.

Keep dynamic things dynamic. Keep everything else dumb.


Handlers don’t do what you think

Handlers run at the end.

Which is great, until one thing fails and everything else still reloads.

Now your system is half-updated.

Which is worse than fully broken.

The fix is simple:

  • flush handlers when order matters
  • always check diffs before running

Nothing clever. Just discipline.


Secrets are still a mess

Right now, they live in group_vars.

Yes, in plaintext.

It’s fine for now. It won’t be later.

The real challenge isn’t encrypting secrets.

It’s doing it in a way people won’t work around.


What I’d change if I did it again

A few scars, learned the normal way.


Pin your packages

Unpinned dependencies are just delayed outages.


Use linting early

Not for correctness — for consistency.


Don’t use /opt

This is mostly personal.

But also correct.

And also too late to fix.


What this actually gives you

This is the part that matters.

One command:

  • every node updated
  • every config identical
  • kernel tuned
  • firewall applied
  • services running

No guessing.

No drift.

No “it works on one node but not the other.”

Scaling becomes boring:

  • add node
  • run playbook
  • done

Which is exactly what you want.


The through-line

The last post was about getting traffic to the edge.

This is about making the edge behave the same everywhere.

Because performance doesn’t just come from proximity.

It comes from consistency.


Final thought

The CDN has some clever parts.

This isn’t one of them.

And that’s the point.

Infrastructure should be:

  • predictable
  • repeatable
  • boring

Because the moment this layer gets interesting, everything above it gets harder.