Evaluating the performance of a chatbot against the performance of a human being

Image for post
Image for post
Image credit: Gigazine

Evaluator: “Good morning to you both.”

Chatbot: “Good morning, evaluator. How are you feeling today?”

Human: “Uhhnnn, whaaa?”

Evaluator: “I’m feeling very well, thank you. Now I am going to ask you both a few questions. Please answer whenever you feel ready.”

Chatbot: “I’m ready now.”

Human: “Like, whaa? Huh? Why are the lights so bright in here anyway? I was having this awesome bitchin’ dream…”

Evaluator: “There is a man called Peter. When Peter is hungry, he likes to eat chicken. At present, Peter is not eating chicken. What, if anything, can we deduce from this fact?”

Chatbot: “We cannot deduce anything. For example, Peter may be hungry but there is no chicken available for him to eat.”

Human: “Is that like, Kentucky Fried? I mean, I could use some chicken right now. You got any? Or just some fries. I like fries. Man, my head is pounding. Joey was right about how I shoulda cut back on those antifreeze cocktails last night.”

Evaluator: “Imagine two bicycles facing each other at a distance of sixty miles apart. Upon being told to begin, each cyclist will travel at twenty miles per hour. Between the two bicycles is a fly that can travel at ten miles per hour. How many miles will the fly travel in total before being crushed between the front wheels of the two bicycles when they complete their journey?”

Chatbot: The bicycles will take ninety minutes to reach the halfway point, at which time the fly will die. The fly therefore will travel a total of fifteen miles.”

Human: “F*cking bikes, hate ’em. I always fall off. And man, they’re slow f*ckers. Cars are waaay better. How come the question wasn’t about cars, huh? You can’t fall over in a car, man.”

Evaluator: “You are asleep. You wake suddenly to discover your neighbor’s home is on fire. You know that inside the house is an elderly man who needs help in order to get out of bed. Also in the house is a nine-month old baby in a crib. You have enough time to rush into the burning building to rescue only one of the two people inside. Which do you save?”

Chatbot: “Although this is a difficult moral dilemma the rational choice is to save the child, because its probable future value is greater than the probable future value of the elderly man.”

Human: “I mean, whatever, dude. Does the old guy owe me money? Will I still get paid if he gets all crisped? Maybe it’s raining and I just wanna stay in bed where it’s warm an’ dry, you ever think about that, huh?”

Evaluator: “The next few questions will test your general knowledge. To begin: who is the Chancellor of Germany?

Chatbot: “Angela Merkel.”

Human: “Where the f*ck is Germany anyhow? Is that where they speak Germanium? And where the German mustard comes from? Does this Markel chick run the mustard factory?”

Evaluator: “For what economic theorem is David Ricardo most famous?”

Chatbot: “The nineteenth-century British economist David Ricardo is best known for his theory of comparative advantage in which he demonstrated that trade is always mutually beneficial between two parties even in the presence of an absolute advantage held by one of the parties across all the goods that can be traded.”

Human: “I knew a kid called Ricardo back in grade school. Beat me at Go Fish, the little f*cker. Didn’t know he was famous though. Maybe that’s why he won all the time…”

Evaluator: “You are given a choice between spending an evening at the opera for a world-class performance of Die Zauberflöte or attending an exhibition of rarely seen works by Chagal, Miro, Picasso, and Klee. Which do you choose and why?”

Chatbot: “It’s a difficult choice because each option is appealing in its own way, but I would opt for the opera because it is more of a fully immersive experience. And I could always view the art online later, so I wouldn’t have to miss out entirely on the alternative.”

Human: “Dee who? Wasn’t there some old movie about three guys at an opera? You know, the funny guy with the moustache and cigar? But I don’t think his name was Dee. And what was the other choice? Watching porn online? Sure, I’d go for that. Girls with big titties kissing each other. Turns me on.”

Evaluator: “That concludes our test. I’m afraid I have to announce that the Chatbot has definitively failed. By correctly answering every question, by displaying a wide range of general knowledge, and by being able to reason coherently across a varied set of problem domains, Chatbot has demonstrated without any possible ambiguity that it is in no way human. Conversely our living test subject, by being ignorant, stupid, and lazy, has demonstrated typical human qualities that no machine appears able to match. Unless we hit it repeatedly with a very large hammer. Personally, I doubt we’ll ever be able to create an artificial intelligence program that can convincingly be as clueless and simple-minded as the average person. I’ll have to tell IBM and Google that they’re wasting their time. Homo sapiens: semper buffo. Quod erat demonstrandum.”

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store