Cloudflare has a service for this now that will detect AI scrapers and send them...

bitbasher · 2026-01-13T22:48:24 1768344504

Wow, so to prevent AI scrapers from harvesting my data I need to send all of my traffic through a third party company that gets to decide who gets to view my content. Great idea!

Aurornis · 2026-01-13T23:16:48 1768346208

You don’t need to do anything. You can use any number of solutions or roll your own.

Someone shared an alternative. Must everything in AI threads be so negative and condescending?

dannersy · 2026-01-14T07:28:13 1768375693

Yes, they could roll their own, but you have no issues with this being necessary? I think the attitude of "just deal with it" is far more negative than someone expressing they are upset with the state of the internet, its controllers, and its abusers.

TitaRusell · 2026-01-14T10:15:25 1768385725

There's trillions invested in AI. Don't expect any introspective insight or criticism about it.

bs7280 · 2026-01-14T21:57:40 1768427860

This is like saying "lets just get rid of all the guns" to solve gun violence and gun crime in the USA. The cat is out of the bag and no one can put it back. We live in a different world now and we have to figure it out.

thefz · 2026-01-14T07:12:29 1768374749

> Must everything in AI threads be so negative and condescending?

Because if I own a website or a service and it is being degraded or slowed by some third party tool that wants to slurp its content for his own profit and don't even share, I tend to be irritated. And AI apologists/evangelists don't help when they try to justify the behavior.

rester324 · 2026-01-13T22:56:19 1768344979

You can implement this yourself, who is stopping you?

zzzeek · 2026-01-13T23:01:16 1768345276

Citation needed

zimpenfish · 2026-01-13T23:10:25 1768345825

I use iocaine[0] to generate a tarpit. Yesterday it served ~278k "pages" consisting of ~500MB of gibberish (and that's despite banning most AI scrapers in robots.txt.)

[0] https://iocaine.madhouse-project.org

chao- · 2026-01-13T23:41:13 1768347673

Can't seem to access this.

It flashes some text briefly then gives me an 418 TEAPOT response. I wonder if it's because I'm on Linux?

EDIT: Begrudgingly checked Chrome, and it loads. I guess it doesn't like Firefox?

zephen · 2026-01-14T00:23:54 1768350234

Doesn't work on my firefox either.

Friendly fire, I suppose.

godelski · 2026-01-14T00:40:58 1768351258

Works on my Firefox. Mac and Linux

dpkirchner · 2026-01-14T00:48:38 1768351718

Nor Safari on iOS.

zimpenfish · 2026-01-14T08:46:24 1768380384

Works fine on my iOS Safari - maybe there's some extension that's tickling it just the wrong way?

dpkirchner · 2026-01-14T13:14:54 1768396494

It still fails with all of my extensions disabled (wipr, privacy redirect). I just get a download dialog. I don't know what the HTTP status code is, however.

I found a flagged HN submission about it and it has just about the same result for me and for others. My first tap failed in a weird way (showed some text then redirected quickly to its git repo) and all subsequent taps trigger a download.

https://news.ycombinator.com/item?id=44538010

doublerabbit · 2026-01-13T23:40:56 1768347656

Unfortunately and you kind of have to count this as the cost of the Internet. You've wasted 500Mb of bandwidth.

I've had colocation for eight years+. My monthly b/w cost is now around 20-30Gb a month given to scrapers where I was only be using 1-2Gb a month, years prior.

I pay for premium bandwidth (it's a thing) and only get 2TB of usable data. Do I go offline or let it continue?

zimpenfish · 2026-01-14T09:09:47 1768381787

> You've wasted 500Mb of bandwidth.

Yep, it sucks, but on the positive side, I'm feeding 500Mb of garbage into them every day and that feels like enough of a small win for me.

> My monthly b/w cost is now around 20-30Gb a month given to scrapers [...] 1-2Gb a month

That definitely sucks.

> Do I go offline or let it continue?

Might be time to start blocking entire IP ranges and ASNs and see if that helps.

zzzeek · 2026-01-14T17:22:34 1768411354

i have no idea what this does because the site is rejecting my ordinary firefox browser with "Error code: 418 I'm a teapot". Even from a private browser.

If I hit it with Chrome, now I can see a site.

Seems pretty not ready for prime time as a lot of my viewers use Firefox

godelski · 2026-01-14T00:36:01 1768350961

One of the most popular ones is Anubis. It uses a proof of work and can even do poisoning: https://anubis.techaro.lol/

They even mention iocaine. I know, inconceivable!: https://iocaine.madhouse-project.org/

There's also tons of HN posts on the topic with varying solutions:

https://news.ycombinator.com/item?id=45935729

https://news.ycombinator.com/item?id=45711094

https://news.ycombinator.com/item?id=44142761

https://news.ycombinator.com/item?id=44378127

zzzeek · 2026-01-14T03:05:22 1768359922

Anubis is the only tool that claims to have heuristics to identify a bot, but my understanding is that it does this by presenting obnoxious challenges to all users. Not really feasible. Old school approaches like ip blocking or even ASN blocking are obsolete - these crawlers purposely spam from thousands of IPs, and if you block them on a common ASN, they come back a few days later from thousands of unique ASNs. So this is not really a "roll your own" situation, especially if you are running off the shelf software that doesn't have some straightforward means of building in these various approaches of endless page mazes (which I would still have to serve anyway).

GuinansEyebrows · 2026-01-13T23:09:26 1768345766

https://forge.hackers.town/hackers.town/nepenthes

> Citation needed

this reply kinda sucks :)

timpera · 2026-01-13T22:51:25 1768344685

Unfortunately, Cloudflare often destroys the experience for users with shared connections, VPNs, exotic browsers… I had to remove it from my site after too many complaints.

xorcist · 2026-01-13T23:29:10 1768346950

I am sure Cloudflare would have no problem selling you a VPN service.

After all, it's not very far from hosting booters and selling DoS protection.

sadeshmukh · 2026-01-14T01:31:24 1768354284

Well... https://developers.cloudflare.com/warp-client/warp-modes/#wa...

Price is $5/mo

rudedogg · 2026-01-13T23:22:06 1768346526

Also iCloud Private Relay.

CloudFlare is making it impossible to browse privately

acdha · 2026-01-13T23:43:44 1768347824

Cloudflare works fine with public relay - they and Fastly provide infrastructure for that service (one half of the blinded pair) so it’s definitely something they test.

loopback_device · 2026-01-14T00:57:57 1768352277

Not sure "TLS added and removed here :)" as a Service is the right tool in the drawer for this.

atomic128 · 2026-01-13T23:28:05 1768346885

Poison Fountain: https://news.ycombinator.com/item?id=46577464

m463 · 2026-01-14T00:17:01 1768349821

cloudflare also blocks my human-is-driving browser all the time

"enahble javascript and cookies to continue"

also unsupported browser

taberiand · 2026-01-14T04:18:55 1768364335

Savvy move by cloudflare, once they have enough sites behind their service they can charge the AI companies to access their cached copies on a back channel

ranger_danger · 2026-01-13T22:48:29 1768344509

Modern scrapers are using headless chromium which will not see the invisible links, so I'm not sure how long this will be effective.

inferiorhuman · 2026-01-13T22:52:04 1768344724

Which is still a far worse experience than if Cloudflare's services weren't needed.

RobotToaster · 2026-01-13T23:12:08 1768345928

Except for the scrapers that pay cloudflare to exempt them.

themafia · 2026-01-13T23:22:16 1768346536

The solution, as always, is noise.

yakattak · 2026-01-13T22:36:50 1768343810

Do you have a link to that?

SchemaLoad · 2026-01-13T22:37:30 1768343850

https://blog.cloudflare.com/ai-labyrinth/