I self host a small static website and a cgit instance on an e2-micro VPS from G...

dang · 2026-01-14T03:47:45 1768362465

And those are the good actors! We're under a crawlocalpyse from botnets, er, residential proxies.

"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36", anyone?

zerocrates · 2026-01-14T06:27:18 1768372038

Yeah the flood of these Chrome UAs with every version number under the sun, and a really large portion being *.0.0.0 version numbers, that's what I've tended to experience lately. Also just kind of every browser user agent ever:

Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12 (.NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; .NET CLR 3.5.21022)

There were waves of big and sometimes intrusive traffic admitting to being from Amazon, Anthropic, Google, Meta, etc., but those are easy to block or throttle and aren't that big a deal in the scheme of things.

telotortium · 2026-01-14T07:36:30 1768376190

It’s unfortunate that you have to resort to this. OpenAI does publish their bot IP addresses at https://platform.openai.com/docs/bots, but Anthropic doesn’t seem to publish the IP addresses of their bots.

zahlman · 2026-01-14T04:19:14 1768364354

The third-party hit-counting service I use implies that I'm not getting any of this bot scraping on my GitHub blog.

Is Microsoft doing something to prevent it? Or am I so uncool that even bots don't want to read my content :(

lelanthran · 2026-01-14T06:41:42 1768372902

I'm interested in that service and how it works. Link?

zahlman · 2026-01-14T07:04:48 1768374288

It is https://github.com/silentsoft/hits . It works by loading an SVG "shield" file (like the ones you see at the top of GitHub readmes all the time) from their server from a unique URL (you just choose one when you write/render your HTML). The server, implemented in Java, just counts hits to each URL in a database and sends back the corresponding SVG data. There's also a mini dashboard website where you can check basic stats for a given URL (no login required, everyone's hits-per-day stats are just public) and preview styling options for the SVG. For example, for my most recent blog post https://zahlman.github.io/posts/2025/12/31/oxidation/, I configured it such that you can view the stats via https://hits.sh/zahlman.github.io+oxidation/ (note that the trailing slash is required).

(The about section on GitHub bills the project as "privacy-friendly", which I would say is nonsense as these dashboards are public and their URLs are trivially computed. But it's also hard to imagine caring.)

aembleton · 2026-01-14T11:11:10 1768389070

They're probably not downloading every svg each time they scrape the site. Probably focused on scraping the text.

zahlman · 2026-01-14T17:23:30 1768411410

What? No, I mean the HTML for the SVG contains a custom URL for an API request. There's no scraping involved on either end.