Crowdsec at the Edge: Securing a CDN That Grew Up
If you’ve been following along, you might remember my posts about building a CDN for self-hosted media. NGINX, Route53 geolocation, getting Plex posters closer to the edge without paying Cloudflare a fortune. Since then, things got a little out of hand. The project changed hands, grew considerably, and last year we pushed nearly 100PB through it.
Today I want to talk about something that’s been doing a lot of quiet work: CrowdSec, and how I wired it into the logging pipeline.
The Setup
The CDN runs on OpenResty, logging every request as structured JSON. At this scale, watching dashboards isn’t enough, you need something that acts on what it sees. CrowdSec reads logs, runs them through parsers and scenarios, and when something looks bad, a bouncer acts on it. The community threat intel means IPs that are already known bad actors get flagged before they do anything on your network.
The first challenge was teaching CrowdSec to understand OpenResty’s JSON log format. The parser normalizes fields into names that CrowdSec’s enrichment layer expects — critically, meta: service must be http and meta: log_type must be http_access-log, otherwise the community http-base-scenarios never fire. The path field also needs to be evt.Parsed.request specifically, so the crowdsecurity/http-logs enrichment stage can split it into http_path and http_args downstream.
The pipeline ends up as:
s01: custom JSON parser → s02: crowdsecurity/http-logs → http-base-scenarios
(parse, set meta) (split path/args)
Custom Scenarios
Beyond the community scenarios, I wrote a couple targeting auth-related abuse, the kind of patterns you see when someone is probing for credentials or hammering protected endpoints:
type: leaky
name: custom/http-auth-abuse
filter: "evt.Meta.log_type == 'http_access-log' && evt.Meta.http_status in ['401', '403', '407']"
groupby: evt.Meta.source_ip
capacity: 25
leakspeed: "1m"
blackhole: 5m
duration: 15m
labels:
behavior: "http:bruteforce"
remediation: true
A flood detector handles raw volume regardless of response code:
type: leaky
name: custom/http-flood
filter: "evt.Meta.log_type == 'http_access-log'"
groupby: "evt.Meta.source_ip"
capacity: 1000
leakspeed: "10s"
blackhole: 5m
labels:
remediation: true
How It’s Going
Better than expected. The community blocklist is doing real work, a solid chunk of IPs flagged by the custom scenarios were already known before they did anything interesting on our network. Tuning the leaky bucket values took a week of watching real traffic, and the blackhole setting is essential to avoid re-triggering storms during an active incident.
A few things I’d tell myself to do earlier: run cscli explain against real log samples before touching production, and whitelist your own upstream IPs immediately, they will trigger flood scenarios.
More to come as the scenarios mature.
