Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The bigger question or may be even realization is that with this architecture there is no way to build a capable model to run on the laptop or phone, which means there will never be local compute and servers became ever more important. In general thinking about how ML itself works, reducing model size while retaining capability will just never happen.




This post is about training, not inference.

The lesson here is that you can't use a laptop to train a useful model - at least not without running that training for probably decades.

That doesn't mean you can't run a useful model on a laptop that was trained in larger hardware. I do that all the time - local models hit really good this year.

> reducing model size while retaining capability will just never happen.

Tell that to Qwen3-4B! Those models are remarkably capable.


It's always a question of "compared to what?"

Local models are no where near capable compared to frontier big models.

While a small model might be fine for your use case, it can not replace Sonnet-4 for me.


Sure, Qwen-3-4B - a 4GB download - is nowhere near as capable as Claude Sonnet 4.

But it is massively more capable than the 4GB models we had last year.

Meanwhile recent models that are within the same ballpark of capabilities as Claude Sonnet 4 - like GLM 4.5 and Kimi K2 and the largest of the Qwen 3 models - can just about fit on a $10,000 512GB of RAM Mac Studio. That's a very notable trend.


It doesn't feel like that the gap is closing at all.

The local models can get 10x as good next year, it won't matter to me if the frontier models are still better.

And just because we can run those models (heavily quantized, and thus less capable), they are unusably slow on that 10k dead weight hardware.


El Capitan being much faster than my desktop doesn't mean that my desktop is useless. Same with LLMs.

I've been using Mistral Small 3.x for a bunch of tasks on my own PC and it has been very useful, especially after i wrote a few custom tools with llama.cpp to make it more "scriptable".


I would be interested in hearing about those custom tools

It depends, actually... The data and train time requirements seen to increase exponentially for linear gains in performance. As a result, you can often trade a 10x reduction in training time to get a model with 90+% of the real deal. And as we accumulate more architecture and efficiency tricks, the ceiling in what you can do locally goes up commensurately.

There's also a whole world of data curation to improve training, which is likely to be great for small models and seems still underexplored.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: