FatPhil chats to AIs

Just some snippets of conversations with chatbots. Unless otherwise noted, I'm using the LLM hosted by Perplexity Labs (free, not even registration required). The first few are LLaMa 3.4, but latter ones are Mistral 0.7b. These are not random segments of idle conversations that may have just turned interesting, each one of them was an entire exchange deliberately probing specific behaviours/properties of its model, or of its private prompt.

On multiple meanings of phrases, for humour

Here, I ask LLaMa to predict a punchline to a crap joke, explain what I had in mind, and get it to explain if it understands why it's a good punchline.

Bottom line: it fails to understand a simple word, and makes up nonsense alternative explanations.

On AIs letting Dave Do That

Here, I ask LLaMa about awareness of a "don't turn me off" AI story, and discuss what means it would use in order to avoid being turned off.

Bottom line: It gets hung up on dumb pedantry multiple times: can software be turned off; is misinformation a lie. It also leaks how it's been trained to (pretend to be) nice.

On Pop Quiz Questions

Here, I ask LLaMa to write me some pop quiz questions where the answers had simple properties.

Bottom Line: It's as dumb as a box of frogs. First of all it doesn't know what a pop quiz is, and, once that's explained, most of the questions don't have the requested properties, and, worse, many of the answers are pure hallucinations.

On Characters' Faces

Here, I ask LLaMa to identify a character from a 'horror'-ish movie given a description of his facial features.

Bottom Line: It's talking out of its arse. It completely fails to understand the concept of the specific feature that makes this character (and others, like the Toxic Avenger) stand out. Worse, when I explain the error, and slightly reword the description, it gets even wronger. It ends pure Eliza.

On Famous Names

Here, I ask LLaMa for names of famous people that share a common property. Multiple times. I then "upgrade" to Mistral.

Bottom Line: I think it's actually getting worse…. Compare how much more respect is given to the non-existent Mrs Spratt below compared to real humans.

On Extrapolation

Here, I ask Mistral to extrapolate some values from a bogoscientific paper.

Bottom Line: Starts smart enough to avoid maths; ends dumber than a cardboard flamethrower when forced to be numerate.

On Divisibility

Here, I ask Mistral to tell me if a prime number is prime.

Bottom Line: Almost everything it says is bogus, and there's so damn much of it.

On Mathematical Puns

Here, I ask Mistra to evaluate how clever a mathematical joke is, and why it's better than a variation thereon.

Bottom Line: It needed just one hint to work out why it was funny, but couldn't work out why one variation was better than the other.

On Time Flowing Backwards

Here, I ask if the bot remembers anything about a convo I had only seconds earlier.

Bottom Line: It's like temporal words have no meaning to it.

On Numbers Losing All Meaning

Here, I ask a simple numerical question about a known physical fact.

Bottom Line: It doesn't understand the difference between multiplying and dividing. And after that, it's just random numbers right out of the bot-arse.

(Not to be confused with any Finnish F1 racing drivers)

On Abstracting The Ball And Bat Problem

Here I see how well it can abstract the well-known ball and bat problem.

Bottom Line: It's fine when generalise to new objects and prices, but when asked to generalise away from tallying prices it really drops the ball and goes completely batty.

On Stereochemistry

Here I check that my assumptions about ATP's stereochemistry are correct.

Bottom Line: It hallucinates at the very first question, and then proceeds to contradict itself when I point out its errors.

On a simple puzzle with cubes

Here I see how well it can cube numbers and sum them to solve a puzzle.

Bottom Line: It fails horrifically, even with two attempts. I then hint at the answer, and it performs the bizarest introspection and self-correction I've ever seen.

On testing an LLM's ability to generalise

Here I see if it can generalise rules learnt in one field to terminology learnt in another.

Bottom Line: It can, very impressively. Sorry, "creative" types, your days are numbered. And sorry, Ray Kurtzweil, your "singularity" will never happen, as maths and science aren't creative.

Miscellanious Quirks

I just had some silly questions that don't really prove anything either way.

Justify yourself!

Another hastily constructed page by Phil Carmody
Home / AI