Ask HN: How can ChatGPT serve 700M users when I can't run one GPT-4 locally?

canyon289 · 2025-08-08T19:47:42 1754682462

I work at Google on these systems everyday (caveat this is my own words not my employers)). So I simultaneously can tell you that its smart people really thinking about every facet of the problem, and I can't tell you much more than that.

However I can share this written by my colleagues! You'll find great explanations about accelerator architectures and the considerations made to make things fast.

https://jax-ml.github.io/scaling-book/

In particular your questions are around inference which is the focus of this chapter https://jax-ml.github.io/scaling-book/inference/

Edit: Another great resource to look at is the unsloth guides. These folks are incredibly good at getting deep into various models and finding optimizations, and they're very good at writing it up. Here's the Gemma 3n guide, and you'll find others as well.

https://docs.unsloth.ai/basics/gemma-3n-how-to-run-and-fine-...

KaiserPro · 2025-08-08T21:50:29 1754689829

Same explanation but with less mysticism:

Inference is (mostly) stateless. So unlike training where you need to have memory coherence over something like 100k machines and somehow avoid the certainty of machine failure, you just need to route mostly small amounts of data to a bunch of big machines.

I don't know what the specs of their inference machines are, but where I worked the machines research used were all 8gpu monsters. so long as your model fitted in (combined) vram, you could job was a goodun.

To scale the secret ingredient was industrial amounts of cash. Sure we had DGXs (fun fact, nvidia sent literal gold plated DGX machines) but they wernt dense, and were very expensive.

Most large companies have robust RPC, and orchestration, which means the hard part isn't routing the message, its making the model fit in the boxes you have. (thats not my area of expertise though)

zozbot234 · 2025-08-09T05:10:58 1754716258

> Inference is (mostly) stateless. ... you just need to route mostly small amounts of data to a bunch of big machines.

I think this might just be the key insight. The key advantage of doing batched inference at a huge scale is that once you maximize parallelism and sharding, your model parameters and the memory bandwidth associated with them are essentially free (since at any given moment they're being shared among a huge amount of requests!), you "only" pay for the request-specific raw compute and the memory storage+bandwidth for the activations. And the proprietary models are now huge, highly-quantized extreme-MoE models where the former factor (model size) is huge and the latter (request-specific compute) has been correspondingly minimized - and where it hasn't, you're definitely paying "pro" pricing for it. I think this goes a long way towards explaining how inference at scale can work better than locally.

(There are "tricks" you could do locally to try and compete with this setup, such as storing model parameters on disk and accessing them via mmap, at least when doing token gen on CPU. But of course you're paying for that with increased latency, which you may or may not be okay with in that context.)

patrick451 · 2025-08-09T06:04:26 1754719466

> The key advantage of doing batched inference at a huge scale is that once you maximize parallelism and sharding, your model parameters and the memory bandwidth associated with them are essentially free (since at any given moment they're being shared among a huge amount of requests!)

Kind of unrelated, but this comment made me wonder when we will start seeing side channel attacks that force queries to leak into each other.

jeffrallen · 2025-08-09T21:53:49 1754776429

I asked a colleague about this recently and he explained it away with a wave of the hand saying, "different streams of tokens and their context are on different ranks of the matrices". And I kinda believed him, based on the diagrams I see on Welch Labs YouTube channel.

On the other hand, I've learned that when I ask questions about security to experts in a field (who are not experts in security) I almost always get convincing hand waves, and they are almost always proven to be completely wrong.

Sigh.

saagarjha · 2025-08-09T09:02:50 1754730170

mmap is not free. It just moves bandwidth around.

zozbot234 · 2025-08-09T11:15:51 1754738151

Using mmap for model parameters allows you to run vastly larger models for any given amount of system RAM. It's especially worthwhile when you're running MoE models and parameters for unused "experts" can just be evicted from RAM, leaving room for more relevant data. But of course this applies more generally to, e.g. single model layers, etc.

abdullin · 2025-08-09T13:20:36 1754745636

> Inference is (mostly) stateless

Quite the opposite. Context caching requires state (K/V cache) close to the VRAM. Streaming requires state. Constrained decoding (known as Structured Outputs) also requires state.

KaiserPro · 2025-08-09T20:15:15 1754770515

> Quite the opposite.

Unless something has dramatically changed, the model is stateless. The context cache needs to be injected before the new prompt, but for what I understand (and please do correct me if I'm wrong) the the context cache isn't that big, like in the order of a few tens of kilobytes. Plus the cache saves seconds of GPU time, so having an extra 100ms of latency is nothing compare to a cache miss. so a broad cache is much much better than a narrow local cache.

But! even if its larger, Your bottleneck isn't the network, its waiting on the GPUs to be free[1]. So whilst having the cache really close ie in the same rack, or same machine, will give the best performance, it will limit your scale (because the cache is only effective for a small number of users)

[1] a 100megs of data shared over the same datacentre network every 2-3 seconds per node isn't that much, especially if you have a partitioned network (ie like AWS where you have a block network and a "network" network)

spott · 2025-08-11T14:49:12 1754923752

KV cache for dense models is order 50% of parameters. For sparse moe models it can be significantly smaller I believe, but I don’t think it is measured in kb.

blibble · 2025-08-08T23:10:34 1754694634

> So I simultaneously can tell you that its smart people really thinking about every facet of the problem, and I can't tell you much more than that.

"we do 1970s mainframe style timesharing"

there, that was easy

kstrauser · 2025-08-09T01:40:07 1754703607

For real. Say it takes 1 machine 5 seconds to reply, and that a machine can only possibly form 1 reply at a time (which I doubt, but for argument).

If the requests were regularly spaced, and they certainly won’t be, but for the sake of argument, then 1 machine could serve 17,000 requests per day, or 120,000 per week. At that rate, you’d need about 5,600 machines to serve 700M requests. That’s a lot to me, but not to someone who owns a data center.

Yes, those 700M users will issue more than 1 query per week and they won’t be evenly spaced. However, I’d bet most of those queries will take well under 1 second to answer, and I’d also bet each machine can handle more than one at a time.

It’s a large problem, to be sure, but that seems tractable.

mh- · 2025-08-09T02:01:42 1754704902

Yes. And batched inference is a thing, where intelligent grouping/bin packing and routing of requests happens. I expect a good amount of "secret sauce" is at this layer.

Here's an entry-level link I found quickly on Google, OP: https://medium.com/@wearegap/a-brief-introduction-to-optimiz...

brookst · 2025-08-09T05:01:29 1754715689

But that’s not accurate. There are all sorts of tricks around KV cache where different users will have the same first X bytes because they share system prompts, caching entire inputs / outputs when the context and user data is identical, and more.

Not sure if you were just joking or really believe that, but for other peoples’ sake, it’s wildly wrong.

kossTKR · 2025-08-10T15:59:46 1754841586

Really? So the system recognises someone asked the same question and serves the same answer? And who on earth shares the exact same context?

I mean i get the idea but sounds so incredibly rare it would mean absolutely nothing optimisation wise.

brookst · 2025-08-11T20:50:11 1754945411

Yes. It is not incredibly rare, it's incredibly common. A huge percentage of queries to retail LLMs are things like "hello" and "what can you do", with static system prompts that make the total context identical.

It's worth maybe a 3% reduction in GPU usage. So call it a half billion dollars a year or so, for a medium to large service.

fc417fc802 · 2025-08-10T21:34:17 1754861657

Even if that were the case you wouldn't be wrong. Adding caching and deduplication (and clever routing and sharding, and ...) on top of timesharing doesn't somehow make it not timesharing anymore. The core observation about the raw numbers still applies.

claytongulick · 2025-08-09T13:31:26 1754746286

I'm pretty sure that's not right.

They're definitely running cluster knoppix.

:-)

rootsudo · 2025-08-09T02:45:06 1754707506

Makes perfect sense, completely understand now!

benreesman · 2025-08-09T04:24:51 1754713491

I don't think it's either useful or particularly accurate to characterize modern disagg racks of inference gear, well-understood RDMA and other low-overhead networking techniques, aggressive MLA and related cache optimizations that are in the literature, and all the other stuff that goes into a system like this as being some kind of mystical thing attended to by a priesthood of people from a different tier of hacker.

This stuff is well understood in public, and where a big name has something highly custom going on? Often as not it's a liability around attachment to some legacy thing. You run this stuff at scale by having the correct institutions and processes in place that it takes to run any big non-trivial system: that's everything from procurement and SRE training to the RTL on the new TPU, and all of the stuff is interesting, but if anyone was 10x out in front of everyone else? You'd be able to tell.

Signed, Someone Who Also Did Megascale Inference for a TOP-5 For a Decade.

tough · 2025-08-08T20:01:03 1754683263

Doesn't google have TPU's that makes inference of their own models much more profitable than say having to rent out NVDIA cards?

Doesn't OpenAI depend mostly on its relationship/partnership with Microsoft to get GPUs to inference on?

Thanks for the links, interesting book!

ActorNightly · 2025-08-08T20:12:39 1754683959

Yes. Google is probably gonna win the LLM game tbh. They had a massive head start with TPUs which are very energy efficient compared to Nvidia Cards.

baxtr · 2025-08-08T21:47:42 1754689662

The only one who can stop Google is Google.

They’ll definitely have the best model, but there is a chance they will f*up the product / integration into their products.

scarface_74 · 2025-08-08T22:12:00 1754691120

It would take talent for them to mess up hosting businesses who want to use their TPUs on GCP.

But then again even there, their reputation for abandoning products, lack of customer service, condescension when it came to large enterprises’ “legacy tech” lets Microsoft who is king of hand holding big enterprise and even AWS run rough shod over them.

When I was at AWS ProServe, we didn’t even bother coming up with talking points when competing with GCP except to point out how they abandon services. Was it partially FUD? Probably. But it worked.

serf · 2025-08-08T23:11:32 1754694692

>It would take talent for them to mess up hosting businesses who want to use their TPUs on GCP.

there are few groups as talented at losing a head start as google.

JoshuaDavid · 2025-08-08T22:45:05 1754693105

Google employees collectively have a lot of talent.

bee_rider · 2025-08-09T02:01:57 1754704917

A truly astonishing amount of talent applied to… hosting emails very well, and losing the search battle against SEO spammers.

big_hacker · 2025-08-09T08:17:47 1754727467

Well, Search had no chance when the sites also make money from Google ads. Google fucked their Search by creating themselves incentives for bounce rate.

rlupi · 2025-08-09T10:21:09 1754734869

> It would take talent for them to mess up hosting businesses who want to use their TPUs on GCP. > But then again even there, their reputation for abandoning products

What are the chances of abandoning TPU-related projects where the company literally invested billions in infrastructure? Zero.

scarface_74 · 2025-08-09T11:12:43 1754737963

Enterprise sales and support takes a lot of people skills, hand holding, showing respect for the current state, being willing to deal with and navigate the internal politics of the customer, etc.

All things that Google is remarkably bad at.

thechao · 2025-08-09T13:23:16 1754745796

I don't know what scale of "billions" you're talking about; but, Intel blew 1–2 billion on Larrabee. Even worse: Intel blew 5+ billion on mobile pre-iPhone. I remember when that team was shown the door — that's when we had to evaluate the early RGX GPUs as a backstop to try to win Apple's business; the RGX's were turds.

Penny-wise pound-foolish.

fc417fc802 · 2025-08-10T22:38:20 1754865500

Bit of an aside but Larrabee didn't fail. Intel inexplicably abandoned the consumer GPU market but the same tech was successfully sold to enterprise customers in the form of Xeon Phi. Several of the largest supercomputing clusters have used them.

https://tomforsyth1000.github.io/blog.wiki.html#%5B%5BWhy%20...

scarface_74 · 2025-08-09T14:36:51 1754750211

Intel also wasted untold billions trying to compete with Qualcomm building cellular chips with lackluster results and the sold the division to Apple which has spent billions more just to end up with the lackluster C1 in the SE.

adastra22 · 2025-08-09T02:11:24 1754705484

There is plenty of time left to fumble the ball.

qcnguy · 2025-08-09T13:05:47 1754744747

And they already did many times.

benreesman · 2025-08-09T07:11:49 1754723509

Google will win the LLM game if the LLM game is about compute, which is the common wisdom and maybe true, but not foreordained by God. There's an argument that if compute was the dominant term that Google would never have been anything but leading by a lot.

Personally right now I see one clear leader and one group going 0-99 like a five sigma cosmic ray: Anthropic and the PRC. But this is because I believe/know that all the benchmarks are gamed as hell, its like asking if a movie star had cosmetic surgery. On quality, Opus 4 is 15x the cost and sold out / backordered. Qwen 3 is arguably in next place.

In both of those cases, extreme quality expert labeling at scale (assisted by the tool) seems to be the secret sauce.

Which is how it would play out if history is any guide: when compute as a scaling lever starts to flatten, you expert label like its 1987 and claim its compute and algorithms until the government wises up and stops treating your success persobally as a national security priority. It's the easiest trillion Xi Xianping ever made: pretending to think LLMs are AGI too, fast following for pennies on the dollar, and propping up a stock market bubble to go with the fentanyl crisis? 9-D chess. It's what I would do about AI if I were China.

Time will tell.

0_____0 · 2025-08-09T10:24:03 1754735043

I believe Google might win the LLM game simply because they have the infrastructure to make it profitable - via ads.

All the LLM vendors are going to have to cope with the fact that they're lighting money on fire, and Google have the paying customers (advertisers) and with the user-specific context they get from their LLM products, one of the juciest and most targetable ad audiences of all time.

ActorNightly · 2025-08-10T03:33:43 1754796823

Everyone seems to forget about Mu Zero which was arguably more important than transformer architecture.

fakedang · 2025-08-08T21:13:54 1754687634

Yeah honestly. They could just try selling solutions and SLAs combining their TPU hardware with on-prem SOTA models and practically dominate enterprise. From what I understand, that's GCP's gameplay too for most regulated enterprise clients.

ActorNightly · 2025-08-08T21:53:16 1754689996

Googles bread and butter is advertising, so they have a huge interest in keeping things in house. Data is more valuable to them than money from hardware sales.

Even then, I think that their primary use case is going to be consumer grade good AI on phones. I dunno why Gemma QAT model fly so low on the radar, but you can basically get full scale Llamma 3 like performance from a single 3090 now, at home.

fakedang · 2025-08-08T22:02:42 1754690562

https://www.cnbc.com/2025/04/09/google-will-let-companies-ru...

Google has already started the process of letting companies self-host Gemini, even on NVidia Blackwell GPUs.

Although imho, they really should bundle it with their TPUs as a turnkey solution for those clients who haven't invested in large scale infra like DCs yet.

to11mtm · 2025-08-09T03:00:47 1754708447

My guess is that either google want's a high level of physical control over their TPUs, or they have one sort of deal or another with NVidia and don't want to step on their toes.

And also, Google's track record with hardware.

ActorNightly · 2025-08-10T03:31:47 1754796707

Its the same format as other software - you release the actual software for free but offer managed services that work with that software way better and easier.

fakedang · 2025-08-10T15:13:56 1754838836

Yeah but those are on Google's managed cloud, and not onprem. But that recent announcement has been specifically for Google Distributed Cloud, which is huge.

My point was a bit more specific though. To elaborate, I know of a number of publicly traded companies (USD $200M+ market cap) globally which have identified use cases for onprem AI and want to implement them actively but cannot, because they lack the knowhow to work with onprem, and hiring talent to implement that is just extremely expensive. Google should simply provide it as a turnkey bundle and milk them for it.

klik99 · 2025-08-08T22:25:29 1754691929

It’s my understanding that google makes bulk of ad money from search ads - sure they harvest a ton of data but it isn’t as valuable to them as you’d think. I suspect they know that could change so they’re hoovering up as much as they can to hedge their bets. Meta on the other hand is all about targeted ads.

ActorNightly · 2025-08-10T03:30:38 1754796638

Right so keeping things in house and seeing what people are asking Gemini would be probably better for them?

Horos · 2025-08-09T06:23:00 1754720580

Gemma Term of uses ?

Ericson2314 · 2025-08-09T02:24:12 1754706252

Relenting hardware like that would be such a cleansing old-school revenue stream for Google... just imagine...

stogot · 2025-08-08T22:59:23 1754693963

Hasn’t the Inferentia chip been around long enough to make the same argument? AWS and Google probably have the same order of magnitude of their own custom chips

saagarjha · 2025-08-09T09:13:40 1754730820

Inferentia has a generally worse stack but yes

davedx · 2025-08-08T22:11:48 1754691108

But they’re ASICs so any big architecture changes will be painful for them right?

llm_nerd · 2025-08-09T02:31:27 1754706687

TPUs are accelerators that accelerate the common operations found in neural nets. A big part is simply a massive number of matrix FMA units to process enormous matrix operations, which comprises the bulk of doing a forward pass through a model. Caching enhancements and massively growing memory was necessary to facilitate transformers, but on the hardware side not a huge amount has changed and the fundamentals from years ago still powers the latest models. The hardware is just getting faster and with more memory and more parallel processing units. And later getting more data types to enable hardware-enabled quantization.

So it isn't like Google designed a TPU for a specific model or architecture. They're pretty general purpose in a narrow field (oxymoron, but you get the point).

The set of operations Google designed into a TPU is very similar to what nvidia did, and it's about as broadly capable. But Google owns the IP and doesn't pay the premium and gets to design for their own specific needs.

saagarjha · 2025-08-09T09:10:27 1754730627

There are plenty of matrix multiplies in the backward pass too. Obviously this is less useful when serving but it's useful for training.

edoceo · 2025-08-08T23:47:26 1754696846

I'd think no. They have the hardware and software experience, likely have next and next-next plans in place already. The big hurdle is money, which G has a bunch of.

canyon289 · 2025-08-08T20:15:20 1754684120

Im a research person building models so I can't answer your questions well (save for one part)

That is, as a research person using our GPUs and TPUs I see first hand how choices from the high level python level, through Jax, down to the TPU architecture all work together to make training and inference efficient. You can see a bit of that in the gif on the front page of the book. https://jax-ml.github.io/scaling-book/

I also see how sometimes bad choices by me can make things inefficient. Luckily for me if my code/models are running slow I can ping colleagues who are able to debug at both a depth and speed that is quite incredible.

And because were on HN I want to preemptively call out my positive bias for Google! It's a privilege to be able to see all this technology first hand, work with great people, and do my best to ship this at scale across the globe.

ignoramous · 2025-08-08T21:39:27 1754689167

> Another great resource to look at is the unsloth guides.

And folks at LMSys: https://lmsys.org/blog/

  Large Model Systems (LMSYS Corp.) is a 501(c)(3) non-profit focused on incubating open-source projects and research. Our mission is to make large AI models accessible to everyone by co-developing open models, datasets, systems, and evaluation tools. We conduct cutting-edge machine learning research, develop open-source software, train large language models for broad accessibility, and build distributed systems to optimize their training and inference.

hnpolicestate · 2025-08-09T06:38:40 1754721520

This caught my attention "But today even “small” models run so close to hardware limits".

Sounds analogous to the 60's and 70's i.e "even small programs run so close to hardware limits". If optimization and efficiency is dead in software engineering, it's certainly alive and well in LLM development.

jackhalford · 2025-08-08T21:27:35 1754688455

Why does the unsloth guide for gemma 3n say:

> llama.cpp an other inference engines auto add a <bos> - DO NOT add TWO <bos> tokens! You should ignore the <bos> when prompting the model!

That makes the want to try exactly that? Weird

nwhnwh · 2025-08-09T19:59:42 1754769582

Nothing smart about making something that is not useful for humans.

revskill · 2025-08-09T08:35:54 1754728554

No, you just over complicate things.

LAC-Tech · 2025-08-09T01:28:55 1754702935

If people at google are so smart why can't google.com get a 100% lighthouse score?

jeltz · 2025-08-09T10:32:09 1754735529

I have met a lot of people at Google, they have some really good engineers and mediocre ones. But mostl importantly they are just normal engineers dealing normal office politics.

I don't like how the grand parent mystifies this. This problem is just normal engineering. Any good engineer could learn how to do it.

usr1106 · 2025-08-09T06:54:39 1754722479

Because most smart people are not generalists. My first boss was really smart and managed to found a university institute in computer science. The 3 other professors he hired were, ahem, strange choices. We 28 year old assistents could only shake our heads. After fighting a couple of years with his own hires the founder left in frustration to found another institution.

One of my colleagues was only 25, really smart in his field and became a professor less than 10 years later. But he was incredibly naive in everyday chores. Buying groceries or filing taxes resulted in major screw-ups regularly

jeltz · 2025-08-09T10:34:54 1754735694

I have met those supersmart specialists but in my experience there are also a lot of smart people who are more generalists.

The real answer is likely internal company politics and priorities. Google certainly has people with the technical skills to solve it but do they care and if they care can they allocate those skilled people to the task?

gregorygoc · 2025-08-09T10:50:13 1754736613

My observation is that in general smart generalists are smarter than smart specialists. I work at Google, and it’s just that these generalists folks are extremely fast learners. They can cover breadth and depth of an arbitrary topic in a matter of 15 minutes, just enough to solve a problem at hand.

It’s quite intimidating how fast they can break down difficult concepts into first principles. I’ve witnessed this first hand and it’s beyond intimidating. Makes you wondering what you’re doing at this company… That being said, the caliber of folks I’m talking about is quite rare, like top 10% of top 1% teams at Google.

jeltz · 2025-08-09T11:13:08 1754737988

That is my experience too. It sometimes seem the supersmart generalists are people whose strongest skill is learning.

ranger_danger · 2025-08-09T21:27:45 1754774865

Pro-tip they're just not. A lot of tech nerds really like to think they're a genius with all the answers ("why don't they just do XX"), but some eventually learn that the world is not so black and white.

The Dunning-Kruger effect also applies to smart people. You don't stop when you are estimating your ability correctly. As you learn more, you gain more awareness of your ignorance and continue being conservative with your self estimates.

catigula · 2025-08-08T23:37:47 1754696267

A lot of really smart people working on problems that don't even really need to be solved is an interesting aspect of market allocation.

YossarianFrPrez · 2025-08-08T23:47:27 1754696847

Can you explain what you mean about 'not needing to be solved'? There are versions of that kind of critique that would seem, at least on the surface, to better apply to finance or flash trading.

I ask because scaling an system that a substantially chunk of the population finds incredibly useful, including for the more efficient production of public goods (scientific research, for example) does seem like a problem that a) needs to be solved from a business point of view, and b) should be solved from a civic-minded point of view.

windexh8er · 2025-08-09T00:53:27 1754700807

I think the problem I see with this type of response is that it doesn't take into context the waste of resources involved. If the 700M users per week is legitimate then my question to you is: how many of those invocations are worth the cost of resources that are spent, in the name of things that are truly productive?

And if AI was truly the holy grail that it's being sold as then there wouldn't be 700M users per week wasting all of these resources as heavily as we are because generative AI would have already solved for something better. It really does seem like these platforms are, and won't be, anywhere as useful as they're continuously claimed to be.

Just like Tesla FSD, we keep hearing about a "breakaway" model and the broken record of AGI. Instead of getting anything exceptionally better we seem to be getting models tuned for benchmarks and only marginal improvements.

I really try to limit what I'm using an LLM for these days. And not simply because of the resource pigs they are, but because it's also often a time sink. I spent an hour today testing out GPT-5 and asking it about a specific problem I was solving for using only 2 well documented technologies. After that hour it had hallucinated about a half dozen assumptions that were completely incorrect. One so obvious that I couldn't understand how it had gotten it so wrong. This particular technology, by default, consumes raw SSE. But GPT-5, even after telling it that it was wrong, continued to give me examples that were in a lot of ways worse and kept resorting to telling me to validate my server responses were JSON formatted in a particularly odd way.

Instead of continuing to waste my time correcting the model I just went back to reading the docs and GitHub issues to figure out the problem I was solving for. And that led me down a dark chain of thought: so what happens when the "teaching" mode rethinks history, or math fundamentals?

I'm sure a lot of people think ChatGPT is incredibly useful. And a lot of people are bought into not wanting to miss the boat, especially those who don't have any clue to how it works and what it takes to execute any given prompt. I actually think LLMs have a trajectory that will be similar to social media. The curve is different and I, hopefully, don't think we've seen the most useful aspects of it come to fruition as of yet. But I do think that if OpenAI is serving 700M users per week then, once again, we are the product. Because if AI could actually displace workers en masse today you wouldn't have access to it for $20/month. And they wouldn't offer it to you at 50% off for the next 3 months when you go to hit the cancel button. In fact, if it could do most of the things executives are claiming then you wouldn't have access to it at all. But, again, the users are the product - in very much the same way social media played into.

Finally, I'd surmise that of those 700M weekly users less than 10% of those sessions are being used for anything productive that you've mentioned and I'd place a high wager that the 10% is wildly conservative. I could be wrong, but again - we'd know about that if it were the actual truth.

mlyle · 2025-08-09T03:45:04 1754711104

> If the 700M users per week is legitimate then my question to you is: how many of those invocations are worth the cost of resources that are spent, in the name of things that are truly productive?

Is everything you spend resources on truly productive?

Who determines whether something is worth it? Is price/willingness of both parties to transact not an important factor?

I don't think ChatGPT can do most things I do. But it does eliminate drudgery.

windexh8er · 2025-08-09T13:45:03 1754747103

I don't believe everything in my world is as efficient as it could be. But I genuinely think about the costs involved [0]. When doing automations that are perfectly handled by deterministic systems why would I put the outcomes of those in the hands of a non-deterministic one? And at that cost differential?

We know a few things: LLMs are not efficient, LLMs are consuming more water than traditional compute, we know the providers know but they haven't shared any tangible metrics, and the build process involves, also, an exceptional amount of time, wattage and water.

For me it's: if you have access to a supercomputer do you use it to tell you a joke or work on a life saving medicine?

We didn't have these tools 5 years ago. 5 years ago you dealt with said "drudgery". On the other hand you then say it can't do "most things I do". It seems as though the lines of fatalism and paradox are in full force for a lot of the arguments around AI.

I think the real kicker for me this week (and it changes week-over-week, which is at least entertaining) is when Paul Graham told his Twitter feed [1] a "hotshot" programmer is writing 10k LOC that are not "bug-filled crap" in 12 hours. That's 14 LOC per minute. Compared to industry norms of 50-150 LOC per 8 hour day. Apparently,this "hot-shot" is not "naive", though, implying that it's most definitely legit.

[0] https://www.sciencenews.org/article/ai-energy-carbon-emissio... [1] https://x.com/paulg/status/1953289830982664236

mlyle · 2025-08-09T17:11:16 1754759476

> When doing automations that are perfectly handled by deterministic systems why would I put the outcomes of those in the hands of a non-deterministic one?

The stuff I'm punting isn't stuff I can automate. It's stuff like, "build me a quick command line tool to model passes from this set of possible orbits" or "convert this bulleted list to a course articulation in the format preferred by the University of California" or "Tell me the 5 worst sentences in this draft and give me proposed fixes."

Human assistants that I would punt this stuff to also consume a lot of wattage and power. ;)

> We didn't have these tools 5 years ago. 5 years ago you dealt with said "drudgery". On the other hand you then say it can't do "most things I do".

I'm not sure why you think this is paradoxical.

I probably eliminate 20-30% of tasks at this point with AI. Honestly, it probably does these tasks better than I would (not better than I could, but you can't give maximum effort on everything). As a result, I get 30-40% more done, and a bigger proportion of it is higher value work.

And, AI sometimes helps me with stuff that I -can't- do, like making a good illustration of something. It doesn't surpass top humans at this stuff, but it surpasses me and probably even where I can get to with reasonable effort.

Mentlo · 2025-08-11T02:25:35 1754879135

It is absolutely impossible that human assistants being given those tasks would use even remotely within the same order of magnitude the power that LLM’s use.

I am not an anti-LLM’er here but having models that are this power hungry and this generalisable makes no sense economically in the long term. Why would the model that you use to build a command tool have to be able to produce poetry? You’re paying a premium for seldom used flexibility.

Either the power drain will have to come down, prices at the consumer margin significantly up or the whole thing comes crashing down like a house of cards.

mlyle · 2025-08-11T03:35:08 1754883308

> It is absolutely impossible that human assistants being given those tasks would use even remotely within the same order of magnitude the power that LLM’s use.

A human eats 2000 kilocalories of food per day.

Thus, sitting around for an hour to do a task takes 350kJ of food energy. Depending on what people eat, it's 350kJ to 7000kJ of fossil fuel energy in to get that much food energy. In the West, we eat a lot of meat, so expect the high end of this range.

The low end-- 350kJ-- is enough to answer 100-200 ChatGPT requests. It's generous, too, because humans also have an amortized share of sleep and non-working time, other energy inputs/uses to keep them alive, eat fancier food, use energy for recreation, drive to work, etc.

Shoot, just lighting their part of the room they sit in is probably 90kJ.

> I am not an anti-LLM’er here but having models that are this power hungry and this generalisable makes no sense economically in the long term. Why would the model that you use to build a command tool have to be able to produce poetry? You’re paying a premium for seldom used flexibility.

Modern Mixture-of-Experts (MoE) models don't activate the parameters/do the math related to poetry, but just light up a portion of the model that the router expects to be most useful.

Of course, we've found that broader training for LLMs increases their usefulness even on loosely related tasks.

> Either the power drain will have to come down, prices at the consumer margin significantly up

I think we all expect some mixture of these: LLM usefulness goes up, LLM cost goes up, LLM efficiency goes up.

Mentlo · 2025-08-11T14:01:09 1754920869

Reading your two comments in conjunction - I find your take reasonable, so I apologise for jumping the gun and going knee first in my previous comment. It was early where I was, but should be no excuse.

I feel like if you're going to go down the route of the energy consumption needed to sustain the entire human organism, you have to do that on the other side as well - as the actual activation cost of human neurons and articulating fingers to operate a keyboard won't be in that range - but you went for the low ball so I'm not going to argue that, as you didn't argue some of the other stuff that sustains humans.

But I will argue the wider implication of your comment that a like-for-like comparison is easy - it's not, so leaving it in the neuron activation space energy cost would probably be simpler to calculate, and there you'd arrive at a smaller ChatGPT ratio. More like 10-20, as opposed to 100-200. I will concede to you that economies of scale mean that there's an energy efficiency in sustaining a ChatGPT workforce compared to a human workforce, if we really want to go full dystopian, but that there's also outsized energy inefficiency in needing the industry and using the materials to construct a ChatGPT workforce large enough to sustain the economies of scale, compared to humans which we kind of have and are stuck with.

There is a wider point that ChatGPT is less autonomous than an assistant, as no matter the tenure with it, you'll not give it the level of autonomy that a human assistant would have as it would self correct to a level where you'd be comfortable with that. So you need a human at the wheel, which will spend some of that human brain power and finger articulation, so you have to add that to the scale of the ChatGPT workflow energy cost.

Having said all that - you make a good point with MoE - but the router activation is inefficient; and the experts are still outsized to the processing required to do the task at hand - but what I argue is that this will get better with further distillation, specialisation and better routing however only for economically viable task pathways. I think we agree on this, reading between the lines.

I would argue though (but this is an assumption, I haven't seen data on neuron activation at task level) that for writing a command-line tool, the neurons still have to activate in a sufficiently large manner to parse a natural language input, abstract it and construct formal language output that will pass the parsers. So you would be spending a higher range of energy than for an average Chat GPT task

In the end - you seem to agree with me that the current unit economics are unsustainable, and we'll need three processes to make them sustainable - cost going up, efficiency going up and usefulness going up. Unless usefulness goes up radically (which it won't due to scaling limitations of LLM's), full autonomy won't be possible, so the value of the additional labour will need to be very marginal to a human, which - given the scaling laws of GPU's - doesn't seem likely.

Meanwhile - we're telling the masses at large to get on with the programme, without considering that maybe for some classes of tasks it just won't be economically viable; which creates lock in and might be difficult disentangle in the future.

All because we must maintain the vibes that this technology is more powerful than it actually is. And that frustrates me, because there's plenty pathways where it's obvious it will be viable, and instead of doubling down on those, we insist on generalisability.

mlyle · 2025-08-11T15:50:59 1754927459

> There is a wider point that ChatGPT is less autonomous than an assistant, as no matter the tenure with it, you'll not give it the level of autonomy that a human assistant would have as it would self correct to a level where you'd be comfortable with that.

IDK. I didn't give human entry level employees that much autonomy. ChatGPT runs off and does things for a minute or two consuming thousands and thousands of tokens, which is a lot like letting someone junior spin for several hours.

Indeed, the cost is so low -- better to let it "see its vision through" than to interrupt it. A lot of the reason why I'd manage junior employees closely are to A) contain costs, and B) prevent discouragement. Neither of those apply here.

(And, you know -- getting the thing back while I remember exactly what I asked and still have some context to rapidly interpret the result-- this is qualitatively different from getting back work from a junior employee hours later).

> that maybe for some classes of tasks it just won't be economically viable;

Running an LLM is expensive. But it's expensive in the sense "serving a human costs about the same as a long distance phone call in the 90's." And the vast majority of businesses did not worry about what they were expending on long distance too much.

And the cost can be expected to decrease, even though the price will go up from "free." I don't expect it will go up too high; some players will have advantages from scale and special sauce to make things more efficient, but it's looking like the barriers to entry are not that substantial.

og_kalu · 2025-08-11T15:20:09 1754925609

The unit economics is fine. Inference cost has reduced several orders of magnitude over the last couple years. It's pretty cheap.

Open AI reportedly had a loss of $5B last year. That's really small for a service with hundreds of millions of users (most of which are free and not monetized in any way). That means Open AI could easily turn a profit with ads, however they may choose to implement it.

hirvi74 · 2025-08-09T01:28:32 1754702912

> so what happens when the "teaching" mode rethinks history, or math fundamentals?

The person attempting to learn either (hopefully) figures out the AI model was wrong, or sadly learns the wrong material. The level of impact is probably quite relative to how useful the knowledge is one's life.

The good or bad news, depending on how you look at it, is that humans are already great at rewriting history and believing wrong facts, so I am not entirely sure an LLM can do that much worse.

Maybe ChatGPT might just kill of the ignorant like it already has? GPT already told a user to combine bleach and vinegar, which produces chlorine gas. [1]

[1] https://futurism.com/chatgpt-bleach-vinegar

bawana · 2025-08-09T14:15:50 1754748950

Reminds me of our president

https://www.bbc.com/news/world-us-canada-52407177.amp

catigula · 2025-08-08T23:50:24 1754697024

[flagged]

hattmall · 2025-08-09T00:01:02 1754697662

The only solution to those people starving to death is to kill the people that benefit from them starving to death. It's a solved problem, the solution isn't palatable. No one is starving to death because of a lack of engineering prowess.

AdieuToLogic · 2025-08-09T01:11:43 1754701903

>> People are starving to death ...

> The only solution to those people starving to death is to kill the people that benefit from them starving to death.

There are solutions other than "to kill the people that benefit", such as what have existed for many years, including but not limited to:

  - Efforts such as the recently emasculated USAID[0].
  - Humanitarian NGO's[1] such as the World Central Kitchen[2]
    and the Red Cross[3].
  - The will of those who could help to help those in need[4].

Note that none of the aforementioned require executions nor engineering prowess.

0 - https://en.wikipedia.org/wiki/United_States_Agency_for_Inter...

1 - https://en.wikipedia.org/wiki/Non-governmental_organization

2 - https://wck.org/

3 - https://en.wikipedia.org/wiki/International_Red_Cross_and_Re...

4 - https://en.wikipedia.org/wiki/Empathy

catigula · 2025-08-09T00:07:29 1754698049

Figuring out how to align misaligned incentives is an engineering problem. Obviously I disavow what you said, I reject all forms of advocacy of violence.

AdieuToLogic · 2025-08-09T00:58:54 1754701134

> People are starving to death and the world's brightest engineers are ...

This is a political will, empathy, and leadership problem. Not an engineering problem.

shigawire · 2025-08-09T01:20:45 1754702445

Those problems might be more tractable if all of our best and brightest were working on them.

AdieuToLogic · 2025-08-09T02:00:42 1754704842

>>> People are starving to death and the world's brightest engineers are ...

>> This is a political will, empathy, and leadership problem. Not an engineering problem.

> Those problems might be more tractable if all of our best and brightest were working on them.

The ability to produce enough food for those in need already exists, so that problem is theoretically solved. Granted, logistics engineering[0] is a real thing and would benefit from "our best and brightest."

What is lacking most recently, based on empirical observation, is a commitment to benefiting those in need without expectation of remuneration. Or, in other words, empathetic acts of kindness.

Which is a "people problem" (a.k.a. the trio I previously identified).

0 - https://en.wikipedia.org/wiki/Logistics_engineering

seneca · 2025-08-09T01:04:10 1754701450

Famine in the modern world is almost entirely caused by dysfunctional governments and/or armed conflicts. Engineers have basically nothing to do with either of those.

This sort of "there are bad things in the world, therefore focusing on anything else is bad" thinking is generally misguided.

darth_avocado · 2025-08-09T01:18:41 1754702321

Famine is mostly political but engineers (not all of them) definitely have to do with it. If you’re building powerful AI for corporations that are then involved with the political entities that caused the famine, then you can’t claim to basically have nothing to do with it.

seneca · 2025-08-09T03:38:59 1754710739

I totally disagree. "If A is associated with B, and B is associated with C, and C causes D, then A is responsible for D" is tortured logic.

darth_avocado · 2025-08-09T03:51:05 1754711465

You can disagree all you want but the exact wording used in original comment that I responded to was

> Engineers have basically nothing to do with either of those.

The logic here is “If A is actively working to develop capabilities for B, which B offers up to C who then uses it to do D, then A cannot claim to have nothing to do with D.”

trhway · 2025-08-09T01:07:04 1754701624

the existence of poor hungry people feeds the fear of becoming poor and hungry which drives those brightest engineers. I.e. the things work as intended, unfortunately.

abletonlive · 2025-08-08T23:54:44 1754697284

They won’t be honest and explain it to you but I will. Takes like the one you’re responding to are from loathsome pessimistic anti-llm people that are so far detached from reality they can just confidently assert things that have no bearing on truth or evidence. It’s a coping mechanism and it’s basically a prolific mental illness at this point

ezst · 2025-08-09T04:40:39 1754714439

And what does that make you? A "loathsome clueless pro-llm zealot detached from reality"? LLMs are essentially next word predictors marketed as oracles. And people use them as that. And that's killing them. Because LLMs don't actually "know", they don't "know that they don't know", and won't tell you they are inadequate when they are. And that's a problem left completely unsolved. At the core of very legitimate concerns about the proliferation of LLMs. If someone here sounds irrational and "coping", it very much appears to be you.

jon-wood · 2025-08-09T13:44:48 1754747088

> so far detached from reality they can just confidently assert things that have no bearing on truth or evidence

So not unlike an LLM then?

virgil_disgr4ce · 2025-08-08T23:58:51 1754697531

> working on problems that don't even really need to be solved

Very, very few problems _need_ to be solved. Feeding yourself is a problem that needs to be solved in order for you to continue living. People solve problems for different reasons. If you don't think LLMs are valuable, you can just say that.

crawfordcomeaux · 2025-08-09T04:12:43 1754712763

The few problems humanity has that need to be solved:

1. How to identify humanity's needs on all levels, including cosmic ones...(we're in the Space Age so we need to prepare ourselves for meeting beings from other places)

2. How to meet all of humanity's needs

Pointing this out regularly is probably necessary because the issue isn't why people are choosing what they're doing...it's that our systems actively disincentivize collectibely addressing these two problems in a way that doesn't sacrifice people's wellbeing/lives... and most people don't even think about it like this.

catigula · 2025-08-09T00:08:35 1754698115

The notion that simply pretending to not understand that I was making a value judgment about worth is an argument is tiring.

vermilingua · 2025-08-08T23:45:31 1754696731

Well, we all thought advertising was the worst thing to come out of the tech industry, someone had to prove us wrong!

hirvi74 · 2025-08-09T01:28:56 1754702936

Just wait until the two combine.

airhangerf15 · 2025-08-08T19:50:51 1754682651

An H100 is a $20k USD card and has 80GB of vRAM. Imagine a 2U rack server with $100k of these cards in it. Now imagine an entire rack of these things, plus all the other components (CPUs, RAM, passive cooling or water cooling) and you're talking $1 million per rack, not including the costs to run them or the engineers needed to maintain them. Even the "cheaper"

I don't think people realize the size of these compute units.

When the AI bubble pops is when you're likely to be able to realistically run good local models. I imagine some of these $100k servers going for $3k on eBay in 10 years, and a lot of electricians being asked to install new 240v connectors in makeshift server rooms or garages.

semi-extrinsic · 2025-08-08T20:20:46 1754684446

What do you mean 10 years?

You can pick up a DGX-1 on Ebay right now for less than $10k. 256 GB vRAM (HBM2 nonetheless), NVLink capability, 512 GB RAM, 40 CPU cores, 8 TB SSD, 100 Gbit HBAs. Equivalent non-Nvidia branded machines are around $6k.

They are heavy, noisy like you would not believe, and a single one just about maxes out a 16A 240V circuit. Which also means it produces 13 000 BTU/hr of waste heat.

kj4ips · 2025-08-08T21:32:42 1754688762

Fair warning: the BMCs on those suck so bad, and the firmware bundles are painful, since you need a working nvidia-specific container runtime to apply them, which you might not be able to get up and running because of a firmware bug causing almost all the ram to be presented as nonvolatile.

iJohnDoe · 2025-08-09T03:38:52 1754710732

Are there better paths you would suggest? Any hardware people have reported better luck with?

kj4ips · 2025-08-09T06:00:38 1754719238

Honestly, unless you //really// need nvlink/ib (meaning that copies and pcie trips are your bottleneck), you may do better with whatever commodity system with sufficient lanes, slots, and CFM is available at a good price.

ksherlock · 2025-08-08T21:42:00 1754689320

It's not waste heat if you only run it in the winter.

hdgvhicv · 2025-08-08T22:24:44 1754691884

Opt if you ignore that both gas furnaces and heat pumps are more efficient than resistive loads.

tgma · 2025-08-08T22:36:48 1754692608

Heat pump sure, but how is gas furnace more efficient than resistive load inside the house? Do you mean more economical rather than more efficient (due to gas being much cheaper/unit of energy)?

meatmanek · 2025-08-08T23:07:06 1754694426

Depends where your electricity comes from. If you're burning fossil fuels to make electricity, that's only about 40% efficient, so you need to burn 2.5x as much fuel to get the same amount of heat into the house.

tgma · 2025-08-09T01:32:02 1754703122

Sure. That has nothing to do with the efficiency of your system though. As far as you are concerned this is about your electricity consumption for the home server vs gas consumption. In that sense resistive heat inside the home is 100% efficient compared to gas furnace; the fuel cost might be lower on the latter.

mlyle · 2025-08-09T03:47:52 1754711272

Sure, it's "equally efficient" if you ignore the inefficient thing that is done outside where you draw the system box, directly in proportion to how much you do it.

Heating my house with a giant diesel-powered radiant heater from across the street is infinitely efficient, too, since I use no power in my house.

tgma · 2025-08-09T05:25:11 1754717111

If you don’t close the box of the system at some point to isolate the input, efficiency would be meaningless. I think in the context of the original post, suggesting running a server in winter would be a zero-waste endeavor if you need the heat anyway, it is perfectly clear that the input is electricity to your home at a certain $/kWh and gas at a certain $/BTU. Under that premise, it is fair to say that would not be true if you have a heat pump deployed but would be true compared to gas furnace in terms of efficiency (energy consumed for unit of heat), although not necessarily true economically.

hdgvhicv · 2025-08-09T08:07:07 1754726827

Generating 1kWh of heat with electric/resistive is more expensive than gas, which itself is more expensive than a heat pump, based on the cost of fuel to go in

If your grid is fossil fuels burning the fuel directly is more efficient. In all cases a heat pump is more efficient.

mlyle · 2025-08-09T16:44:22 1754757862

I think this is pretty silly either way.

- There's an upstream loss on electricity directly in proportion to how much you use; ignoring this tilts the analysis in favor of electricity.

- You pay more for heat from electricity than gas, in part because of this loss.

devmor · 2025-08-09T00:21:30 1754698890

It’d be fun to actually calculate this efficiency. My local power is mostly nuclear so I wonder how that works out.

fulafel · 2025-08-09T07:37:00 1754725020

You accelerate the climate catastrophe so there's less need for heating in the long run.

Tade0 · 2025-08-08T23:50:56 1754697056

I'm in the market for an oven right now and 230V/16A is the voltage/current the one I'll probably be getting operates under.

At 90°C you can do sous vide, so basically use that waste heat entirely.

For such temperatures you'd need a CO2 heat pump, which is still expensive. I don't know about gas, as I don't even have a line to my place.

_zoltan_ · 2025-08-09T00:17:54 1754698674

90C for sous vide??? You're going to kill any meal at 90.

Tade0 · 2025-08-09T18:39:49 1754764789

Make it "up to 90°C". 5th quarter meats are better done in the higher end of sous vide temperatures.

Point being, you can throttle your equipment to the desired temperature and use that energy effectively.

mewpmewp2 · 2025-08-09T00:32:51 1754699571

How can you bear to eat sous vide though? I've tried it for months and years, and I still find it troublesome. So mushy, nothing enjoy.

SAI_Peregrinus · 2025-08-09T01:52:49 1754704369

Did you skip searing it after sous vide? Did you sous vide it to the "instantly kill all bacteria" temperature (145°F for steak) thereby overcooking & destroying it, or did you sous vide to a lower temperature (at most 125°F) so that it'd reach a medium-rare 130°F-140°F after searing & carryover cooking during resting? It should have a nice seared crust, and the inside absolutely shouldn't be mushy.

brookst · 2025-08-09T05:04:03 1754715843

Please research this. Done right, sous vide is amazing. But it is almost never the only technique used. Just like when you slow roast a prime rib at 200f, you MUST sear to get Maillard reaction and a satisfying texture.

energy123 · 2025-08-09T00:50:19 1754700619

Seasonality in git commit frequency

eulgro · 2025-08-08T22:06:01 1754690761

> 13 000 BTU/hr

In sane units: 3.8 kW

andy99 · 2025-08-08T22:17:51 1754691471

You mean 1.083 tons of refrigeration

Skunkleton · 2025-08-08T22:41:36 1754692896

> In sane units: 3.8 kW

5.1 Horsepower

amy214 · 2025-08-09T01:51:34 1754704294

> > In sane units: 3.8 kW

> 5.1 Horsepower

0-60 in 1.8 seconds

oblio · 2025-08-09T07:21:54 1754724114

Again, in sane units:

0-100 in 1.92 seconds

_kb · 2025-08-09T00:14:44 1754698484

3.8850 poncelet

ta12653421 · 2025-08-08T23:50:45 1754697045

But ... can it run Crysis?

:D

UnnoTed · 2025-08-09T14:05:37 1754748337

It makes you run into a crysis

markdown · 2025-08-09T01:33:26 1754703206

How many football fields of power?

semi-extrinsic · 2025-08-09T08:10:56 1754727056

The choice of BTU/hr was firmly tongue in cheek for our American friends.

quickthrowman · 2025-08-08T21:11:59 1754687519

You’ll need (2) 240V 20A 2P breakers, one for the server and one for the 1-ton mini-split to remove the heat ;)

Dylan16807 · 2025-08-08T21:22:52 1754688172

Matching AC would only need 1/4 the power, right? If you don't already have a method to remove heat.

quickthrowman · 2025-08-08T21:34:23 1754688863

Cooling BTUs already take the coefficient of performance of the vapor-compression cycle into account. 4w of heat removed for each 1w of input power is around the max COP for an air cooled condenser, but adding an evaporative cooling tower can raise that up to ~7.

I just looked at a spec sheet for a 230V single-phase 12k BTU mini-split and the minimum circuit ampacity was 3A for the air handler and 12A for the condenser, add those together for 15A, divide by .8 is 18.75A, next size up is 20A. Minimum circuit ampacity is a formula that is (roughly) the sum of the full load amps of the motor(s) inside the piece of equipment times 1.25 to determine the conductor size required to power the equipment.

So the condensing unit likely draws ~9.5-10A max and the air handler around ~2.4A, and both will have variable speed motors that would probably only need about half of that to remove 12k BTU of heat, so ~5-6A or thereabouts should do it, which is around 1/3rd of the 16A server, or a COP of 3.

Dylan16807 · 2025-08-08T22:07:41 1754690861

Well I don't know why that unit wants so many amps. The first 12k BTU window unit I looked at on amazon uses 12A at 115V.

quickthrowman · 2025-08-10T14:00:48 1754834448

That is probably just bad data entry at Amazon. I don’t ever trust the specification data on Amazon, I look for the manufacturer’s spec sheet/cutsheet.

In this case, 12A is the maximum continuous load allowed on a 15A breaker. The unit itself probably uses between 900-1000w (7.5A to 8.3A), the spec sheet might say 12A to encourage a dedicated circuit for the A/C unit which then gets added to Amazon’s specs on their website.

Dylan16807 · 2025-08-10T22:33:42 1754865222

I think I finally found an actual product page: https://bdachelp.zendesk.com/hc/en-us/articles/2319602600002...

The amazon page specifically said 1354 watts, but I think that's actually for the 14300BTU model. 12000BTU is 9.72 amps.

Anyway, doesn't this make my actual argument stronger? These units fit even better into a normal circuit than I thought, and make the mini-split look even worse in comparison.

quickthrowman · 2025-08-11T13:15:51 1754918151

4.5-5A at 240V = 9.72A at 120V

It’s the same level of power consumption. I’m not even sure what you’re asking at this point, to be honest.

Dylan16807 · 2025-08-11T22:25:18 1754951118

You were talking about needing a second 240V 20A circuit, and you later backed that up by citing the spec sheet of 230V mini-split with a minimum circuit rating of 15A.

My argument was that you do not need such a circuit.

quickthrowman · 2025-08-11T22:49:22 1754952562

Technically you’re correct, a 12000 BTU minisplit only uses around 1000 watts while running which is just over 4A.

The breaker size being 20A 2P is a consequence of the NEC requiring you to size the wire based off the equipment nameplate rating of 15A, which is based off the full load amps of the motors inside the equipment.

Full load amps is the max amount of current a motor can draw at a specific voltage and is used for sizing wire and overcurrent protection for a piece of equipment. It doesn’t always match up the current a motor draws while it’s running normally. You take full load amps times 1.25 to get minimum circuit ampacity, which you use to size the conductors.

So while you are correct that a 240V 12000 BTU minisplit wont draw anywhere near 20A, the specific minisplit I looked at required a 20A breaker due to the minimum circuit ampacity being 15A. If the MCA was 12A, you could use a 15A breaker; an MCA of 8A would allow using a 10A breaker, and so on.

If you use fuses, you can size the overcurrent protection at 100%, breakers require 125% of the load for a continuous load. So you could use a 30A fusible disconnect switch fused at 15A for a unit with an MCA of 15A.

Dylan16807 · 2025-08-12T00:18:58 1754957938

That's not the angle I'm taking. I'm not saying anything about what the mini-split actually uses. Give it the circuit that the nameplate asks for.

Instead I'm saying that particular minisplit is a lazy design and we can get a 12000 or higher BTU unit with a much smaller nameplate rating. Not only will it only need a single-pole breaker, the required circuit probably already exists.

Scoundreller · 2025-08-08T21:39:31 1754689171

Just air freight them from 60 degrees North to 60 degrees South and vice verse every 6 months.

kelnos · 2025-08-08T21:36:16 1754688976

Well, get a heat pump with a good COP of 3 or more, and you won't need quite as much power ;)

xtiansimon · 2025-08-09T13:27:03 1754746023

> “They are heavy, noisy like you would not believe, … produces … waste heat.”

Haha. I bought a 20 yro IBM server off eBay for a song. It was fun for a minute. Soon became a doorstop and I sold it as pickup-only on eBay for $20. Beast. Never again have one in my home.

yencabulator · 2025-08-09T19:26:34 1754767594

That's about the era my company was an IBM reseller. Once I was kneeling behind 8x1U starting up and all the fans went to max speed for 3 seconds. Never put rackmount hardware in a room that is near anything living.

guenthert · 2025-08-09T14:42:53 1754750573

Get an AS400. Those were actually expected to be installed in an office, rather than a server room. Might still be perceived as loud at home, but won't be deafening and probably not louder than some gaming rigs.

nulltype · 2025-08-13T16:05:59 1755101159

> What do you mean 10 years?

Didn’t the DGX-1 come out 9 years ago?

CamperBob2 · 2025-08-08T21:30:03 1754688603

Are you talking about the guy in Temecula running two different auctions with some of the same photos (356878140643 and 357146508609, both showing a missing heat sink?) Interesting, but seems sketchy.

How useful is this Tesla-era hardware on current workloads? If you tried to run the full DeepSeek R1 model on it at (say) 4-bit quantization, any idea what kind of TTFT and TPS figures might be expected?

oceanplexian · 2025-08-08T23:37:36 1754696256

I can’t speak to the Tesla stuff but I run an Epyc 7713 with a single 3090 and creatively splitting the model between GPU/8 channels of DDR4 I can do about 9 tokens per second on a q4 quant.

CamperBob2 · 2025-08-08T23:44:14 1754696654

Impressive. Is that a distillation, or the real thing?

justincormack · 2025-08-09T13:00:07 1754744407

Tesla doesnt support 4 bit float.

invaliduser · 2025-08-08T20:05:22 1754683522

Even is the AI bubble does not pops, your prediction about those servers being available on ebay in 10 years will likely be true, because some datacenters will simply upgrade their hardware and resell their old ones to third parties.

potatolicious · 2025-08-08T21:32:21 1754688741

Would anybody buy the hardware though?

Sure, datacenters will get rid of the hardware - but only because it's no longer commercially profitable run them, presumably because compute demands have eclipsed their abilities.

It's kind of like buying a used GeForce 980Ti in 2025. Would anyone buy them and run them besides out of nostalgia or curiosity? Just the power draw makes them uneconomical to run.

Much more likely every single H100 that exists today becomes e-waste in a few years. If you have need for H100-level compute you'd be able to buy it in the form of new hardware for way less money and consuming way less power.

For example if you actually wanted 980Ti-level compute in a desktop today you can just buy a RTX5050, which is ~50% faster, consumes half the power, and can be had for $250 brand new. Oh, and is well-supported by modern software stacks.

CBarkleyU · 2025-08-08T21:49:10 1754689750

Off topic, but I bought my (still in active use) 980ti literally 9 years ago for that price. I know, I know, inflation and stuff, but I really expected more than 50% bang for my buck after 9 whole years…

nucleardog · 2025-08-09T00:53:40 1754700820

> Sure, datacenters will get rid of the hardware - but only because it's no longer commercially profitable run them, presumably because compute demands have eclipsed their abilities.

I think the existence of a pretty large secondary market for enterprise servers and such kind of shows that this won't be the case.

Sure, if you're AWS and what you're selling _is_ raw compute, then couple generation old hardware may not be sufficiently profitable for you anymore... but there are a lot of other places that hardware could be applied to with different requirements or higher margins where it may still be.

Even if they're only running models a generation or two out of date, there are a lot of use cases today, with today's models, that will continue to work fine going forward.

And that's assuming it doesn't get replaced for some other reason that only applies when you're trying to sell compute at scale. A small uptick in the failure rate may make a big dent at OpenAI but not for a company that's only running 8 cards in a rack somewhere and has a few spares on hand. A small increase in energy efficiency might offset the capital outlay to upgrade at OpenAI, but not for the company that's only running 8 cards.

I think there's still plenty of room in the market in places where running inference "at cost" would be profitable that are largely untapped right now because we haven't had a bunch of this hardware hit the market at a lower cost yet.

nullc · 2025-08-09T03:57:59 1754711879

I have around a thousand broadwell cores in 4 socket systems that I got for ~nothing from these sorts of sources... pretty useful. (I mean, I guess literally nothing since I extracted the storage backplanes and sold them for more than the systems cost me). I try to run tasks in low power costs hours on zen3/4 unless it's gonna take weeks just running on those, and if it will I crank up the rest of the cores.

And 40 P40 GPUs that cost very little, which are a bit slow but with 24gb per gpu they're pretty useful for memory bandwidth bound tasks (and not horribly noncompetitive in terms of watts per TB/s).

Given highly variable time of day power it's also pretty useful to just get 2x the computing power (at low cost) and just run it during the low power cost periods.

So I think datacenter scrap is pretty useful.

mindslight · 2025-08-09T02:16:49 1754705809

It's interesting to think about scenarios where that hardware would get used only part of the time, like say when the sun is shining and/or when dwelling heat is needed. The biggest sticking point would seem to be all of the capex for connecting them to do something useful. It's a shame that PLX switch chips are so expensive.

airhangerf15 · 2025-08-09T01:44:23 1754703863

The 5050 doesn't support 32-bit PsyX. So a bunch of games would be missing a ton of stuff. You'd still need the 980 running with it for older PhyX games because nVidia.

belter · 2025-08-08T20:16:42 1754684202

Except their insane electricity demands will still be the same, meaning nobody will buy them. You have plenty of SPARC servers on Ebay.

cicloid · 2025-08-08T21:01:13 1754686873

There is also a community of users known for not making sane financial decisions and keeping older technologies working in their basements.

dijit · 2025-08-08T21:33:12 1754688792

But we are few, and fewer still who will go for high power consumption devices with esoteric cooling requirements that generate a lot of noise.

DecentShoes · 2025-08-09T08:16:09 1754727369

This seems likely. Blizzard even sold off old World of Warcraft servers. You can still get them on ebay

mattmanser · 2025-08-08T21:21:57 1754688117

Someone's take on AI was that we're collectively investing billions in data centers that will be utterly worthless in 10 years.

Unlike the investments in railways or telephone cables or roads or any other sort of architecture, this investment has a very short lifespan.

Their point was that whatever your take on AI, the present investment in data centres is a ridiculous waste and will always end up as a huge net loss compared to most other investments our societies could spend it on.

Maybe we'll invent AGI and he'll be proven wrong as they'll pay back themselves many times over, but I suspect they'll ultimately be proved right and it'll all end up as land fill.

toast0 · 2025-08-08T22:07:53 1754690873

The servers may well be worthless (or at least worth a lot less), but that's pretty much true for a long time. Not many people want to run on 10 year old servers (although I pay $30/month for a dedicated server that's dual Xeon L5640 or something like that, which is about 15 years old).

The servers will be replaced, the networking equipment will be replaced. The building will still be useful, the fiber that was pulled to internet exchanges/etc will still be useful, the wiring to the electric utility will still be useful (although I've certainly heard stories of datacenters where much of the floor space is unusable, because power density of racks has increased and the power distribution is maxed out)

hattmall · 2025-08-09T00:31:50 1754699510

I have a server in my office that's at from 2009 still far more economical to run than buying any sort of cloud compute. By at least an order of magnitude.

alexandre_m · 2025-08-09T03:20:01 1754709601

Perhaps if you only need to run some old PHP app.

What kind of disk and how much memory is in there?

hattmall · 2025-08-11T03:52:28 1754884348

72 Gigs of Ram, 4x SCSI 15K drives I think. Yeah, I mean it's not doing anything crazy running a lot of virtual machines, random servers, probably the most intense thing is video transcoding. It works well though and like I said way way cheaper than running the same stuff on cloud infrastructure. I think I bought it for like $500 about 10 years ago. I started saving about $76 a month just off of moving Virtual Desktops off of AWS to that when I got it so easily paid for itself in a year.

bespokedevelopr · 2025-08-08T22:46:19 1754693179

If it is all a waste and a bubble, I wonder what the long term impact will be of the infrastructure upgrades around these dcs. A lot of new HV wires and substations are being built out. Cities are expanding around clusters of dcs. Are they setting themselves up for a new rust belt?

thenthenthen · 2025-08-10T07:23:15 1754810595

There are a lot of examples of former industrial sites (rust belts) that are now redeveloped into data center sites because the infra is already partly there and the environment might be beneficial, politically, environmentally/geographically. For example many old industrial sites relied on water for cooling and transportation. This water can now be used to cool data centers. I think you are onto something though, if you depart from the history of these places and extrapolate into the future.

abeyer · 2025-08-08T23:04:46 1754694286

Or early provisioning for massively expanded electric transit and EV charging infrastructure, perhaps.

hirvi74 · 2025-08-09T01:33:33 1754703213

Maybe the dcs could be turned into some mean cloud gaming servers?

dortlick · 2025-08-08T21:39:26 1754689166

Sure, but what about the collective investment in smartphones, digital cameras, laptops, even cars. Not much modern technology is useful and practical after 10 years, let alone 20. AI is probably moving a little faster than normal, but technology depreciation is not limited to AI.

gscott · 2025-08-09T09:49:01 1754732941

If a coal powered electric plant it next to the data-center you might be able to get electric cheap enough to keep it going.

Datacenters could go into the business of making personal PC's or workstations using the older NVIDIA cards and sell them.

jonplackett · 2025-08-08T22:06:25 1754690785

They probably are right, but a counter argument could be how people thought going to the moon was pointless and insanely expensive, but the technology to put stuff in space and have GPS and comms satellites probably paid that back 100x

vl · 2025-08-08T22:30:27 1754692227

Reality is that we don’t know how much of a trope this statement is.

I think we would get all this technology without going to the moon or Space Shuttle program. GPS, for example, was developed for military applications initially.

DaiPlusPlus · 2025-08-08T22:22:58 1754691778

I don’t mean to invalidate your point (about genuine value arising from innovations originating from the Apollo program), but GPS and comms satellites (and heck, the Internet) are all products of nuclear weapons programs rather than civilian space exploration programs (ditto the Space Shuttle, and I could go on…).

CamperBob2 · 2025-08-09T01:54:26 1754704466

Yes, and no. The people working on GPS paid very close attention to the papers from JPL researchers describing their timing and ranging techniques for both Apollo and deep-space probes. There was more cross-pollination than meets the eye.

somenameforme · 2025-08-09T03:34:45 1754710485

It's not that going to the Moon was pointless, but stopping after we'd done little more than planted a flag was. Werner von Braun was the head architect of the Apollo Program and the Moon was intended as little more than a stepping stone towards setting up a permanent colony on Mars. Incidentally this is also the technical and ideological foundation of what would become the Space Shuttle and ISS, which were both also supposed to be little more than small scale tools on this mission, as opposed to ends in and of themselves.

Imagine if Columbus verified that the New World existed, planted a flag, came back - and then everything was cancelled. Or similarly for literally any colonization effort ever. That was the one downside of the space race - what we did was completely nonsensical, and made sense only because of the context of it being a 'race' and politicians having no greater vision than beyond the tip of their nose.

jonplackett · 2025-08-10T10:29:29 1754821769

I’ve been enjoying that Apple TV show with alternative history as if we’d kept going. It’s kinda dumb in parts but still fun to imagine!

somenameforme · 2025-08-10T14:11:33 1754835093

For All Mankind. I tried getting into that, but the identity politics stuff (at least in first season) was way too intense for me. I'm not averse to it at all in practice (Deep Space Nine is one of my favorite series of all time) but, for me, it went way beyond the line from advocacy to preachiness.

pbh101 · 2025-08-09T02:01:09 1754704869

This isn’t my original take but if it results in more power buildout, especially restarting nuclear in the US, that’s an investment that would have staying power.

mensetmanusman · 2025-08-08T21:28:52 1754688532

Utterly? Moores law per power requirement is dead, lower power units can run electric heating for small towns!

torginus · 2025-08-08T22:05:09 1754690709

My personal sneaking suspicion is that publicly offered models are using way less compute than thought. In modern mixture of experts models, you can do top-k sampling, where only some experts are evaluated, meaning even SOTA models aren't using much more compute than a 70-80b non-MoE model.

ActorNightly · 2025-08-08T20:18:08 1754684288

To piggyback on this, at enterprise level in modern age, the question is really not about "how are we going to serve all these users", it comes down to the fact that investors believe that eventually they will see a return on investment, and then pay whatever is needed to get the infra.

Even if you didn't have optimizations involved in terms of job scheduling, they would just build as many warehouses as necessary filled with as many racks as necessary to serve the required user base.

brikym · 2025-08-09T00:53:23 1754700803

As a non-American the 240V thing made me laugh.

eitally · 2025-08-08T22:23:48 1754691828

What I wonder is what this means for Coreweave, Lambda and the rest, who are essentially just renting out fleets of racks like this. Does it ultimately result in acquisition by a larger player? Severe loss of demand? Can they even sell enough to cover the capex costs?

cootsnuck · 2025-08-09T05:01:21 1754715681

It means they're likely going to be left holding a very expensive bag.

adw · 2025-08-08T22:29:11 1754692151

These are also depreciating assets.

torginus · 2025-08-08T21:29:59 1754688599

I wonder if it's feasible to hook up NAND flash with a high bandwidth link necessary for inference.

Each of these NAND chips hundreds of dies of flash stacked inside, and they are hooked up to the same data line, so just 1 of them can talk at the same time, and they still achieve >1GB/s bandwidth. If you could hook them up in parallel, you could have 100s of GBs of bandwidth per chip.

potatolicious · 2025-08-08T21:39:43 1754689183

NAND is very, very slow relative to RAM, so you'd pay a huge performance penalty there. But maybe more importantly my impression is that memory contents mutate pretty heavily during inference (you're not just storing the fixed weights), so I'd be pretty concerned about NAND wear. Mutating a single bit on a NAND chip a million times over just results in a large pile of dead NAND chips.

torginus · 2025-08-08T22:01:08 1754690468

No it's not slow - a single NAND chip in SSDs offers >1GB of bandwidth - inside the chip there are 100+ wafers actually holding the data, but in SSDs only one of them is active when reading/writing.

You could probably make special NAND chips where all of them can be active at the same time, which means you could get 100GB+ bandwidth out of a single chip.

This would be useless for data storage scenarios, but very useful when you have huge amounts of static data you need to read quickly.

slickytail · 2025-08-08T22:43:22 1754693002

The memory bandwidth on an H100 is 3TB/s, for reference. This number is the limiting factor in the size of modern LLMs. 100GB/s isn't even in the realm of viability.

torginus · 2025-08-09T08:31:43 1754728303

That bandwidth is for the whole GPU, which has 6 mermoy chips. But anyways, what I'm proposing isn't for the high-end and training, but for making inference cheap.

And I was somehat conservative with the numbers, a modern budget SSD with a single NAND can do more than 5GB/s read speed.

torginus · 2025-08-09T08:31:19 1754728279

That bandwidth is for the whole GPU, which has 6 chips. But anyways, what I'm proposing isn't for the high-end and training, but for making inference cheap.

And I was somehat conservative with the numbers, a modern budget SSD with a single NAND can do more than 5GB/s read speed.

RagnarD · 2025-08-09T20:47:58 1754772478

An RTX 6000 Pro (NVIDIA Blackwell GPU) has 96GB of VRAM and can be had for around $7700 currently (at least, the lowest price I've found.) It plugs into standard PC motherboard PCIe slots. The Max Q edition has slightly less performance but a max TDP of only 300W.

dboreham · 2025-08-08T23:42:38 1754696558

They'll be in landfill in 10 years.

neko_ranger · 2025-08-08T20:05:02 1754683502

Four H100 in a 2U rack didn't sound impressive, but that is accurate:

>A typical 1U or 2U server can accommodate 2-4 H100 PCIe GPUs, depending on the chassis design.

>In a 42U rack with 20x 2U servers (allowing space for switches and PDU), you could fit approximately 40-80 H100 PCIe GPUs.

michaelt · 2025-08-08T21:02:18 1754686938

Why stop at 80 H100s for a mere 6.4 terabytes of GPU memory?

Supermicro will sell you a full rack loaded with servers [1] providing 13.4 TB of GPU memory.

And with 132kW of power output, you can heat an olympic-sized swimming pool by 1°C every day with that rack alone. That's almost as much power consumption as 10 mid-sized cars cruising at 50 mph.

[1] https://www.supermicro.com/en/products/system/gpu/48u/srs-gb...