I self host a small static website and a cgit instance on an e2-micro VPS from Google Cloud, and I have got around 8.5 million requests combined from openai and claude over around 160 days. They just infinitely crawl the cgit pages forever unless I block them!
So I have lighttpd setup to match "claude|openai" in the user agent string and return a 403 if it matches, and a nftables firewall seutp to rate limit spammers, and this seems to help a lot.
Yeah the flood of these Chrome UAs with every version number under the sun, and a really large portion being *.0.0.0 version numbers, that's what I've tended to experience lately. Also just kind of every browser user agent ever:
There were waves of big and sometimes intrusive traffic admitting to being from Amazon, Anthropic, Google, Meta, etc., but those are easy to block or throttle and aren't that big a deal in the scheme of things.
It’s unfortunate that you have to resort to this. OpenAI does publish their bot IP addresses at https://platform.openai.com/docs/bots, but Anthropic doesn’t seem to publish the IP addresses of their bots.
It is https://github.com/silentsoft/hits . It works by loading an SVG "shield" file (like the ones you see at the top of GitHub readmes all the time) from their server from a unique URL (you just choose one when you write/render your HTML). The server, implemented in Java, just counts hits to each URL in a database and sends back the corresponding SVG data. There's also a mini dashboard website where you can check basic stats for a given URL (no login required, everyone's hits-per-day stats are just public) and preview styling options for the SVG. For example, for my most recent blog post https://zahlman.github.io/posts/2025/12/31/oxidation/, I configured it such that you can view the stats via https://hits.sh/zahlman.github.io+oxidation/ (note that the trailing slash is required).
(The about section on GitHub bills the project as "privacy-friendly", which I would say is nonsense as these dashboards are public and their URLs are trivially computed. But it's also hard to imagine caring.)