86 Comments

User's avatar
Christopher Feola's avatar

You are absolutely correct, and the marketing on this stuff is, frankly, criminal. Here's what I wrote two years ago about how LLMs actually operate:

Hasn’t ChatGPT been lying? Making up references? Didn’t it make up and cite a Pew Research Center survey saying 71 percent of Americans believe it would be good for society if computers become more capable and sophisticated?

No. Because that isn’t what it is doing. It is using a matrix to construct the highest-scoring answer.

That matrix shows responses citing studies get more approval than those that don’t. Responses citing prestigious institutions score higher than those that simply say “most people think.” And responses with statistics score higher than vague sayings like “most.”

So ChatGPT puts those things together. A user asks “should AI development continue?” ChatGPT or Google Bard or any of these LLMs can comb through news and blog posts and videos and such, and say most are in favor. But that response won’t score well.

So they work to up the score.

If you remove the stop words – words that only function as grammar, such as “A,” “THE,” “OF,” “AND” and so forth, that query is down to two parts: the subject (AI development); and the trend (continue: yes/no).

So the LLM reflects the subject back to you, then adds pieces to increase that score. So SUBJECT (computer development) TREND (should continue) because PRESTIGIOUS INSTITUTION (Pew Research Center/Harvard/Columbia/etc.) did SCIENCE (survey/paper/research) that showed STATITICS. The LLM is snapping these things together like Legos.

We think “2021 Pew Research Center Study on Computers and Society” is a singular thing. The LLM sees “(YEAR)(PRESTIGIOUS INSTITUTION)(SUBJECT).

It is no more lying than children are when you give them a Lego set and they build something different than the picture on the box. And if they do build the Millennium Falcon on the cover correctly, it’s still a Lego model and not an actual spaceship. And they only know how to snap together Legos, not build interstellar craft that can fly to the stars.

Discussed at tedious length here, if you are interested: https://perfectingequilibrium.substack.com/p/are-you-conscious-how-does-that-work?utm_source=publication-search

Hope this helps.

Cjf

Luke's avatar

The fundamental problem is that in most professional domains, the time it takes to do the "correct" thing with an LLM of independently verifying all outputs takes longer than just doing the work yourself. Reviewing and critiquing another's work is almost always more demanding than independent work when you're anything approaching an expert in a field.

I'm in software, and it's the case in my and most other engineers use despite sky-high claims about the field being dead. The limited research we have shows engineers are slower when using LLMs. https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/

I'm fully convinced that most LLM hype at this point is a societal level example of the Gell-Mann amnesia effect, where we see all the errors a journalist makes covering our field, but trust that the next page of the paper on foreign affairs or crime is totally accurate.

If accuracy doesn't exist, what are we left with? Some creative exercises, help with research, and slop-padding writing that would have been better communicated with the original prompt?

84 more comments...

No posts

Ready for more?