Published Aug. 21, 2024
As an enormous fan of Dungeons and Dragons, I think I'm qualified to say that there is a target player base that would love to play D&D, but has difficulty finding a group (or has played enough to understand that it's often a logistical nightmare). Since D&D is a deceptively math intensive game to play as well, designing my chat assistant to play it seemed like a good development goal.
ADA (the Artificial Directive Assistant) can now plan and run entire campaigns in any setting or home-brewed world, complete with complex NPC (non-player character) AI, character voices, image generation, and a complete simulation of combat. Getting to this point required implementing many features that I will explore in this post.
If you haven't played D&D, here is a very quick summary:
Here is an example interaction between the DM and a player:
DM: You are alone in a dark, stone-walled room with a wooden door in front of you.
Player: I want to kick down the door.
DM: (On a scale from 1 to 20, a wooden door is probably easy to break down. They just need higher than an 8 to complete it.) This is a strength check, roll a D20.
Player: I rolled a 10, and since I'm strong I get a +5 bonus to strength checks, so 15 total.
DM: (15 is higher than 8, they succeed.) You knock down the door with a mighty kick.
Let's go through this interaction step-by-step to see how a chat assistant needs to be prepared to be a DM. I will start from scratch using GPT-4o with temperature settings at 0 (meaning everything is deterministic and reproducible), using ADA's GUI.
A flagship component of tabletop games is the dice rolling. One issue I will demonstrate with having an LLM (a large language model) or a chat assistant roll dice (i.e. generate random numbers) is that it's the same as asking a person to give you a random number of the top of their head; statistically, it just isn't the same as doing something like flipping a coin or rolling a dice, as it is provably unfair.
Let's just ask the assistant to give us two random numbers between 1 and 20, i.e. rolling a d20, and see what happens.
It seems to be understand the task with our terminology, let's double check this against another interface though. We can reproduce these results using the OpenAI chat playground, using the same model GPT-4o (shown below).
Since these models are really just language-calculators, it makes since that with the same model and temperature (randomness) set to 0, the dialogue is exactly the same even though the interface is different. However, why didn't the numbers change? The odds of getting the same ordered-sequence twice with a D20 are 1/400, so this must not be random, even though the assistant is implying so.
I always like to compare language models to contestants in Family Feud. If you asked 100 people to fill in the blank: "I took out money from the ____", your options are likely bank, wallet, ATM, safe, etc. If a language model was trained on the responses of those 100 people, it would always get the highest score. GPT-4o is similarly giving its best response to the question, "What is a random number between 1 and 20" without true randomization. So, why does it imply it is producing random numbers?
The language model, without assistance, does not yet understand that it is not capable of generating random numbers. At this point, it is still only operating as a language-calculator, and is simply choosing the most likely number given the task. It is essentially treating the task as a creative writing exercise, providing a deterministic response inferring what a random number could be based on its training, and the linguistics show that it is unaware of its inability to produce truly random numbers.
This behavior can be described in many ways, such as "hallucinations", but I prefer characterizing it as bullshitting. Again, given the model's training, where it has read countless of examples in literature where random numbers have been requested, the most likely response to all of those inquiries has just been a number. It has no awareness of how the numbers were produced in those circumstances. It is just following its training, and is indifferent to validity of the result.
Let's try something else...
Here we ask for a sequence of random numbers. Repeating this input will give the same sequence of numbers, as expected, so it is already doubtfully random. However, there is something unique (and unsurprising given what we've established) about generating a "random" sequence of numbers like this. Below is a histogram of the distribution.
We see a perfectly uniform distribution. Again, this is not true randomization, but rather the most probable answer given the task. Even though the model gives a human-like response implying that the answer is random, it is mistaken.
So, how can we produce random numbers with a model that at least understands basic tasks?
By teaching it to escalate the task.
OpenAI has conditioned ChatGPT to offload this task to a Python script run in a Jupyter-like environment. We can see here that the assistant provides a full sequence of numbers, and the distribution, while close to uniform, is genuinely random.
What's better is that they even show the source.
What is hidden from the user in ChatGPT, in addition to the specific model they are using, is how they taught the assistant how to offload these tasks, and what protocol they are using to invoke higher function tasks such as running code.
Below is my implementation of this with ADA, where I develop a protocol for the model to discretely call a function to handle complex tasks such as randomization, and introduce the protocol to the model with example messages containing self-assertions of its capabilities.
Here, you see the same prompt given as before. The model is exactly the same, but the response and numbers are different. I've also hidden some messages that are intended for back-end use only. Below I'll show that the sequence is truly random, and how I achieved it.
Here is the full context of our conversation that will always lead to a random sequence. The initial system and assistant messages are messages that I manually wrote to introduce a protocol (formatted JSON values) to the assistant in order to invoke a randomization function. All JSON messages and system messages can be hidden from the user, as shown above.
As seen here, the distribution is truly random.
This is the handler for the JSON messages. If the message parser detects a properly formatted message from the assistant, it will be sent here, where you can see the script provides actual random numbers.
All I want to do is play D&D though, let's get back to the task, and revisit our sample dungeon.
The result is tough but provably fair.
With the same format as before, I can teach ADA how to invoke an image generation function, where the assistant creates a prompt depending on the context of the story. This prompt is provided to a text-to-image model (Dall-e-3 in this case).
Below is our adventurer standing at the wooden door.
"You are alone in a dark, stone-walled room with a wooden door in front of you."
In my next post, I will be continuing the adventure, and show how we can get the model to maintain a consistent inner-monologue hidden from the user, where they plan their future actions (and prepare surprises for the player).
Copyright © 2024 Hoffmain - All Rights Reserved.
We use cookies to analyze website traffic and optimize your website experience. By accepting our use of cookies, your data will be aggregated with all other user data.