An Increasingly Worse Response – Garrick van Buren

Vintage Push Paddle 1970's Collapsible Giraffe Animal Puppet Marked TM - Picture 1 of 17

Generative AI and LLMs continue to provide the least controversial answer to any question I ask them. For my purposes, this makes them little more than a calculator for words, a generator of historical fiction short stories.

As I mentioned two years ago, this doesn’t make LLMs useless, but it does greatly shrink their usefulness – to those places where you want a general idea of the consensus…whether or not it’s correct, accurate, or legal. Just an average doesn’t necessarily represent any individual datapoint.

For, the more training data the generative AI providers shovel into their models, the greater the drift from credibility toward absurdity the generated consensus.

It’s one thing to train the models on all the scientific research. It’s another to train on all the books ever published (copyright issues aside for the moment). It’s quite another to train it on Reddit and Twitter. It’s yet another thing all together to treat all data equal independent of parody, satire, or propaganda.

18 years ago, I figured out that a 3 in Netflix’s then 5 star rating meant “looks good on paper, but probably not very”. The same seems to be true of the nondeterministic responses from LLMs, an avalanche of Gell-Mann Amnesia Effect or Knoll’s Law of Media Accuracy “AI is amazing about the thing I know nothing about….but it’s absolute garbage at the stuff I’m expert in.”

Again, there are use cases for this (e.g. getting familiar with the basics of a topic in record time), but the moment you expect quality, credibility, or specifics…it collapses like a toy giraffe.

A toy giraffe that, when a person engages with it, can only – collapse.

As a metaphor for new technologies, this toy giraffe’s message is worth considering, “we break when any pressure is applied.”

General purpose LLMs will only get worse the more data they digest. Special purpose LLMs only trained on a specific context, a specific vertical, a rigidly curated & edited set of sources may achieve the level of expert these applications are hyped up to be.

But we may never know they exist because the most valuable use cases – national defense, cybersecurity, fraud detection – will never need (or desire) the visibility the general purpose LLMs require.