>So this has nothing to do with "getting things wrong" and everything to do with...

>So this has nothing to do with "getting things wrong" and everything to do with why they get things wrong.

We don't know how or why it gets things wrong. The LLMs are a black box. There are infinite ways it can get something wrong, so you cannot base your reasoning off of this simply because you don't know HOW it got things wrong. It may be similar to the way humans get things wrong or it may be different.

>This behavior demonstrates a fundamental lack of conceptual understanding of the world and points at rote memorization in the general case. Maybe LLMs develop a more conceptual understanding of a certain topic when they've been benchmaxxed on that topic? I don't know, I'm not necessarily arguing against that, not today anyway.

False. The LLM could be lying right? We don't know if these things are lying or if they lack actual understanding.

>But these errors are a daily occurrence in the general case when it comes to any topic they haven't been benchmaxxed for - they certainly don't have a conceptual understanding of cooking, baking, plumbing, heating, electrical circuits, etc.

You're failing to look at the success modes. Unlike the failure modes, if it succeeds in answering a prompt for which NO TRAINING data exists we know for a fact it used reasoning and it understood what it was being asked. We don't know what happened if it's a failure BUT we do know understanding and reasoning occured if it was NOT a failure mode ON a prompt with zero training data.

How?

Because of probability. There are two possible ways to get an answer correct. Random chance. Or reasoning. We know the number of incorrect answers far out number the number of correct answers.

Therefore from logic we know that LLMs MUST use reasoning and understanding to arrive at a correct answer. The logic follows from probability.

Now this does not mean the LLM does not lie, it does not mean that the LLM is consistently understanding a concept, it does not give it the same conceptual style of thinking that a human does.

But we do know that journey from prompt A to response B on a prompt and response pair that did not exist in training data, we know that reasoning and understanding happened in this gap. This fits our colloquial logical understanding of the world, of probability, and of the definition of the words reasoning and understanding.

The issue we face now is how do we replicate that gap consistently.