If you read the last post, you saw the fun parts.
OpenResty routing requests.
Redis acting as a control plane.
Traffic getting pushed closer to the edge so it actually feels fast.
That’s the part people like to talk about.
This is the part that makes it work.
The problem you only notice at 2am
At some point, the CDN stopped being “a couple servers” and became infrastructure.
More domains.
More routing rules.
More edge nodes.
More things that can go slightly wrong.
And “slightly wrong” is the dangerous one.
Not broken enough to fail.
Just inconsistent enough to waste your time.
- One node has a different sysctl value
- Another is missing a firewall rule
- Docker is a slightly different version somewhere else
Now you’re SSHing between boxes trying to figure out why the same request behaves differently depending on where it lands.
That’s when you stop managing servers and start managing state.
The decision: boring over clever
There are a lot of ways to solve this.
Most of them involve adding more infrastructure.
Agents. Control planes. Databases. Dashboards.
I didn’t want any of that.
I wanted:
- Something that works over SSH
- Something I can read in plain text
- Something that doesn’t introduce another failure point
So I used Ansible.
Not because it’s exciting — because it isn’t.
What the system actually looks like
At the top level, the entire CDN build process is this:
- hosts: cdn
become: true
roles:
- common
- sysctl
- docker
- crowdsec
- app
That’s the system.
Five layers, applied in order.
- common → lock down the box
- sysctl → make the kernel behave
- docker → give it a runtime
- crowdsec → watch traffic
- app → actually run the CDN
Nothing fancy. Just the correct order.
Because the order is the architecture.
Where things actually get interesting
Not in the playbook.
In what it enforces.
Kernel tuning stops being tribal knowledge
Everyone has a list of sysctl tweaks.
Nobody remembers why they set half of them.
And if you’re doing it manually, they slowly drift over time.
So instead, it’s one template:
- versioned
- reviewed
- deployed everywhere
BBR, TCP Fast Open, buffer sizes, conntrack limits, all locked in.
Not because the values are perfect.
Because they’re consistent.
The firewall becomes intentional
Default drop. Always.
Everything else is explicit.
tcp dport { 80, 443 } accept
udp dport 443 accept
That second line is the one people miss.
No UDP 443 = no real HTTP/3.
Clients try. Fail. Fall back. Add latency.
You don’t notice until you go looking for it.
Docker doesn’t get to be “special”
Docker likes to pretend networking just works.
It does — until your firewall says otherwise.
So you allow it explicitly:
iifname "br-*"
Now container networking is predictable instead of magical.
The subtle stuff that breaks everything
This is where most people get burned.
Templates vs. static files
It sounds minor. It isn’t.
If a file doesn’t need variables, it shouldn’t be a template.
Because the moment you accidentally drop {{ }} into something like a Lua script, you’ve created a problem that doesn’t look like an Ansible problem anymore.
It looks like your app is broken.
Keep dynamic things dynamic. Keep everything else dumb.
Handlers don’t do what you think
Handlers run at the end.
Which is great, until one thing fails and everything else still reloads.
Now your system is half-updated.
Which is worse than fully broken.
The fix is simple:
- flush handlers when order matters
- always check diffs before running
Nothing clever. Just discipline.
Secrets are still a mess
Right now, they live in group_vars.
Yes, in plaintext.
It’s fine for now. It won’t be later.
The real challenge isn’t encrypting secrets.
It’s doing it in a way people won’t work around.
What I’d change if I did it again
A few scars, learned the normal way.
Pin your packages
Unpinned dependencies are just delayed outages.
Use linting early
Not for correctness — for consistency.
Don’t use /opt
This is mostly personal.
But also correct.
And also too late to fix.
What this actually gives you
This is the part that matters.
One command:
- every node updated
- every config identical
- kernel tuned
- firewall applied
- services running
No guessing.
No drift.
No “it works on one node but not the other.”
Scaling becomes boring:
- add node
- run playbook
- done
Which is exactly what you want.
The through-line
The last post was about getting traffic to the edge.
This is about making the edge behave the same everywhere.
Because performance doesn’t just come from proximity.
It comes from consistency.
Final thought
The CDN has some clever parts.
This isn’t one of them.
And that’s the point.
Infrastructure should be:
- predictable
- repeatable
- boring
Because the moment this layer gets interesting, everything above it gets harder.
