ChatGPT Fails an Introductory Computability Question

Introduction

There’s no denying that generative AI, like that used by ChatGPT, has fundamentally altered university education. While generative AI can help us outline our ideas or prepare for exams, there’s a significant risk in using it. For now, let’s put aside the obvious risk of academic integrity violations (when and how generative AI can be used, and how it should be cited, are issues that you should sort out with your course instructor). Today, instead, we’re going to explore another significant risk of using ChatGPT: it can be very confident in its answers, while also being very wrong.

In this blog post, we’ll examine a finite state machine (FSM) question about even and odd numbers, similar to what you might find on an introductory computability assignment. We’ll let ChatGPT attempt to solve the problem, then we’ll discuss where and how it went wrong. Along the way, we’ll lay the groundwork for a correct solution to the problem.

A Finite State Machine Problem

The question we’ll tackle in this blog post is:

Design a finite state machine (FSM) that determines whether an input number is odd or even.

To give a few examples, assuming our FSM should accept numbers that are even and reject numbers that are odd:

  • 11010 (in binary) is 26, which is even. So, the input 11010 should be accepted by the FSM.
  • 11011 (in binary) is 27, which is odd. So, the input 11011 should be rejected by the FSM.
  • 0 (in binary) is 0 (in decimal), which is even. So, the input 0 should be accepted by the FSM.
  • 1 (in binary) is 1 (in decimal), which is odd. So, the input 1 should be rejected by the FSM.

As is typical for questions in this style, we’ll assume that the alphabet we’re operating on is binary, i.e., {0, 1}. We’ll also assume that the FSM reads the input string from left to right — that is, from the high-order bit (also called the most significant bit) to the low-order bit (the least significant bit).

We’ll make one more assumption as well, though it’s only relevant if you’ve covered different kinds of FSMs in your computability course. We’ll assume that the type of FSM we want to construct is a deterministic finite automaton (DFA). That’s the type of FSM that students generally learn about first, so don’t worry if you haven’t heard the specific term “DFA”.

The Prompt and Response

This is the prompt that I gave ChatGPT on August 2, 2024, using the default GPT-4o model:

Generate a finite state machine that determines whether an input number is odd or even

ChatGPT’s flawed response, included here in its entirety (reformatted slightly to display in WordPress), was:

To create a finite state machine (FSM) that determines whether an input number is odd or even, we need to define the states, transitions, and the acceptance criteria.

FSM Design

States:
1. State E (Even)
2. State O (Odd)

Alphabet:
- The digits {0, 1}

Initial State:
- State E (since the number is initially considered to be 0, which is even)

Transitions:
- From State E:
  - On input `0`, stay in State E (an even number remains even if a 0 is appended to the end).
  - On input `1`, transition to State O (an even number becomes odd if a 1 is appended to the end).

- From State O:
  - On input `0`, stay in State O (an odd number remains odd if a 0 is appended to the end).
  - On input `1`, transition to State E (an odd number becomes even if a 1 is appended to the end).

Acceptance Criteria:
- If the FSM ends in State E, the input number is even.
- If the FSM ends in State O, the input number is odd.

FSM Diagram

     0      1
  (E) ---> (O)
   ^         |
   |         v
   1      0
  (O) <--- (E)

Explanation

- Start at State E.
- Read each digit of the binary number (0 or 1).
- Transition between states based on the current state and the input digit.
- Determine if the number is even or odd based on the final state.

Example Walkthrough

Let's determine if the binary number `1011` is odd or even.

1. Start at State E.
2. Read `1`: Transition to State O.
3. Read `0`: Stay in State O.
4. Read `1`: Transition to State E.
5. Read `1`: Transition to State O.

The final state is State O, which means the number `1011` (11 in decimal) is odd.

This FSM effectively determines the parity (odd or even) of a binary number by transitioning through its states based on the input digits.

Let’s critique this ChatGPT response together, and discover why it would almost certainly receive a very poor grade.

The Diagram

Let’s start with the diagram that ChatGPT generated:

     0      1
  (E) ---> (O)
   ^         |
   |         v
   1      0
  (O) <--- (E)

To state it plainly, this diagram is nonsensical, and difficult to read at that. If we intend for our FSM to contain only two states, the drawing should contain two circles, each representing one of the states. Each state should have two arrows coming out of it: one representing the transition the FSM makes if it reads a 0, and the other representing the transition the FSM makes if it reads a 1. Each of those transitions may either be a transition to a different state, or a self-loop to the state the FSM was already in.

Here, both FSM states (“E” and “O”) are drawn twice, which is something we shouldn’t do. The transition arrow coming out of both of the “E” circles appears to be what happens on the input 0, and the transition arrow coming out of both of the “O” circles appears to be what happens on the input 1. That is, the bottom half of the diagram is just a mirror image of the top half. There’s no indication of what happens if a 1 is input while the FSM is in the “E” state, and no indication of what happens in a 0 is input while the FSM is in the “O” state.

These deficiencies likely resulted from the limitations of attempting to render an FSM with ASCII art. Likely, drawing each state twice was the only way ChatGPT could indicate both a transition arrow from “E” to “O”, as well a transition arrow from “O” to “E”.

Similarly, perhaps the missing arrows are indicative of self-loops, which were too difficult to render at all? Forgetting to draw self-loops is a common error students make when diagramming FSMs; but, let’s assume that ChatGPT had no choice but to make this same error because it was drawing with ASCII art.

To see if this generous interpretation of ChatGPT’s FSM diagram is correct, let’s turn to its written description of the FSM.

The State Transition Descriptions

Having considered one possible interpretation of ChatGPT’s diagram, let’s consider its written description of the state transitions:

- From State E:
  - On input `0`, stay in State E (an even number remains even if a 0 is appended to the end).
  - On input `1`, transition to State O (an even number becomes odd if a 1 is appended to the end).

- From State O:
  - On input `0`, stay in State O (an odd number remains odd if a 0 is appended to the end).
  - On input `1`, transition to State E (an odd number becomes even if a 1 is appended to the end).

Our interpretation of ChatGPT’s diagram, difficult to read as it was, matches the written description of the state transitions. The missing arrows in the diagram were, indeed, meant to be self-loops.

ChatGPT was even helpful in explaining its reasoning behind each of the state transitions. As we’ll see, though, that reasoning is ultimately incorrect.

Let’s start with the reasoning behind the transitions from the “E” state. ChatGPT says that if a 0 is appended to an even number, it remains even; and, if a 1 is appended to an even number, it becomes odd. So far, ChatGPT’s reasoning is correct. For example, let’s consider 11010 (in binary), which is the even number 26:

  • If we were to append a 0 to the binary representation of 26, we would get 110100, which is 52 (also even).
  • If we were instead to append a 1 to the binary representation of 26, we would get 110101, which is 53 (now odd).

But, let’s move on to the second half of ChatGPT’s reasoning, regarding the transitions from the “O” state. ChatGPT says that if we append a 0 to an odd number, it remains odd; and, if we append a 1 to an odd number, it becomes even. These statements, made by ChatGPT as though they were facts, are simply incorrect.

Appending 0 to the binary representation of any positive integer is equivalent to multiplying it by 2. Since any multiple of 2 is even, appending a 0 to the binary representation of any positive integer produces an even number.

Similarly, appending a 1 to the binary representation of any positive integer is equivalent to multiplying that integer by 2 then adding 1 (i.e., computing $2x+1$, where $x$ is the original integer). Because multiplication by 2 produces an even number, multiplying by 2 then adding 1 produces an odd number. So, appending a 1 to the binary representation of any positive integer will produce an odd number.

For example, let’s try 11011 (in binary), which is the odd number 27:

  • If we were to append a 0 to the binary representation of 27, we would get 110110, which is $2 \cdot 27 = 54$. But 54 is even, not odd (as ChatGPT reasoned it would be).
  • If we were instead to append a 1 to the binary representation of 27, we would get 110111, which is $2 \cdot 27 + 1 = 55$. But, 55 is odd, not even (as ChatGPT stated it would be).

Since ChatGPT’s reasoning is incorrect, we should suspect that the FSM it produced is also incorrect. Let’s take a look at ChatGPT’s example walkthrough to see how its FSM behaves.

The Example Walkthrough

ChatGPT’s example walkthrough was:

Let's determine if the binary number `1011` is odd or even.

1. Start at State E.
2. Read `1`: Transition to State O.
3. Read `0`: Stay in State O.
4. Read `1`: Transition to State E.
5. Read `1`: Transition to State O.

The final state is State O, which means the number `1011` (11 in decimal) is odd.

The conclusion of the walkthrough is correct — that 1011 (in binary) is 11 (in decimal), and the decimal number 11 is odd. The conclusion being correct is promising, but it turns out to be entirely misleading. Let’s go through the example step by step.

  1. We start in the “E” state. It’s open to interpretation, which we’ll return to shortly, whether the empty input string (denoted $\epsilon$) should be considered even. For now, we’ll let this part of ChatGPT’s answer slide.
  2. For our first input digit, we read a 1, meaning thus far we’ve read the value 1 (in both binary and decimal). The integer 1 is odd, so the transition to state “O” is correct.
  3. Next, we read a 0, which means we’ve now read 10 (in binary). That’s 2 in decimal, which is even. Yet, ChatGPT’s FSM stays in state “O”. So if the input string were to terminate at this point (meaning we read only 10 in binary), ChatGPT’s FSM would conclude that the number 2 is odd.
  4. After that, we read a 1, so we’ve read 101 (in binary) up to this point. That’s 5 in decimal, which is odd. However, ChatGPT’s FSM transitions to state “E”, meaning that if the input string were to stop at this point, ChatGPT’s FSM would conclude that the number 5 is even.
  5. Finally, we read the last 1, meaning we’ve now read 1011 in binary (11 in decimal). We transition to state “O”, concluding that the final value of 11 (in decimal) is odd.

While that’s the correct conclusion (that the decimal number 11 is odd), it’s how we arrived here that should be concerning to us. Does ChatGPT’s response provide any other clues about its incorrect reasoning?

The Final Statement

A key insight into ChatGPT’s incorrect answer comes from its very last sentence:

This FSM effectively determines the parity (odd or even) of a binary number by transitioning through its states based on the input digits.

Up until the last sentence of ChatGPT’s response, we might have believed that its FSM was computing whether the input binary number was even or odd. Even ChatGPT’s example walkthrough concluded by stating that:

[...] the number `1011` (11 in decimal) is odd.

At every stage of ChatGPT’s response, it presented a façade of confidence that it was answering the question we gave to it. But, it wasn’t. ChatGPT’s FSM actually computes whether the bitwise parity of an input binary number is even or odd.

The issue here is that the word “parity” (in this context) has two different meanings. In mathematics, it can refer to whether a number is even or odd (what we actually want our FSM to compute). But, in computer science, “parity” can also refer to whether the binary representation of a number has an even or an odd number of 1 bits (called “set bits”). ChatGPT conflated those two definitions, with disastrous results.

Sometimes, by coincidence, the two definitions align. For example:

  • 110 (in binary) is 6, which both is an even number and has an even number of set bits in its binary representation (there are two of them). So, under both definitions, 110 has even parity.
  • 1101 (in binary) is 11, which both is an odd number and has an odd number of set bits (there are three of them). Again, the two definitions happen to align — 1101 has odd parity.

However, those examples were just coincidence. Consider:

  • 1110 (in binary) is 14. 14 is an even number; so, under one definition of parity, it has even parity. But, there are an odd number of set bits in its binary representation (three of them); so, under the other definition, it has odd parity.

We asked ChatGPT to determine whether a number was even or odd — a question that, in everyday English, is asking whether that number is divisible by 2 (the first definition of parity). But, what ChatGPT actually built was an FSM that determines if the number of set bits in the input string is odd or even — the other definition of parity.

In short, ChatGPT was answering the wrong question all along. It did so, even though its confident, clearly worded language throughout made it seem as though it was answering the question we asked.

As an exercise, I recommend that students correct ChatGPT’s response. Using the discussion in this blog post as a reference, create an FSM that accepts input binary numbers that have the (clearly) intended meaning of “even”, and rejects those that are odd.

The Initial State

It’s worth briefly discussing what the initial state of a correct FSM should be. We need to decide whether the empty binary string (denoted $\epsilon$) should be accepted as even or rejected as not even.

One possible interpretation of the question — the one ChatGPT used — is that $\epsilon$ should be considered equivalent to 0, which is even. However, $\epsilon$ is typically treated as its own unique input, and not as a number. If we assume that our FSM should accept only even numbers (rejecting odd numbers), then our FSM should probably also reject $\epsilon$.

I recommend the following exercise: after building an FSM that correctly determines whether a number is even or odd, consider how to make it reject the edge-case input $\epsilon$. There’s no single correct solution to this exercise — there’s one three-state solution (which uses a special starting state) that I personally find intuitive; or, there’s an optimized solution that uses only two states.

Changing the Prompt

As a final observation, we can get ChatGPT to produce a better FSM if we modify the prompt. I presented ChatGPT with a different prompt:

Generate a finite state machine that determines whether an input number is even

In this case, it generated a better FSM.

That said, when considered in everyday English, the two prompts of “Generate a finite state machine that determines whether an input number is odd or even” and “Generate a finite state machine that determines whether an input number is even” should be asking ChatGPT for essentially the same thing.

One takeaway for students using ChatGPT is that we could ask the same question in multiple ways. Then, after reading through all the different responses, we could carefully work out which one (if any) is correct. Doing so could be a valuable learning experience, though both a time-intensive and potentially misleading one.

Conclusion

Generative AI, such as ChatGPT, can very confidently give us answers to the kinds of questions we come across in computer science courses. But, we should be wary of that misleading confidence.

In today’s post, we explored an answer that ChatGPT gave us to a finite state machine (FSM) problem, like one we might find in an introductory computability course. That answer included a diagram that didn’t follow proper conventions for drawing an FSM. It included incorrect statements, in general, about the binary representation of numbers. It included an example where the FSM computed a correct output, but only coincidentally and through invalid reasoning. Finally, it overlooked how to deal with an edge case properly.

Worst of all, though, its language made it appear it was answering the question we prompted it with. It wasn’t — it was answering an entirely different question.

Using generative AI as an assistant in computer science comes with significant risks. First off, it should only be used if your course and instructor allow it, and it should always be cited properly if you do use it. Even if it is allowed, and if we do choose to use it, we should prompt ChatGPT with multiple variations of our question, work through all of the possible answers the generative AI provides, then determine which of them are correct (if any of them are).

Using generative AI is, ultimately, not a safe way to avoid understanding the problem we’re trying to solve.

I’d like to thank my M.Sc. supervisor from the University of Calgary, Dr. John Aycock, for the inspiration for this blog post. Sometimes, the best ideas come over a cup of coffee at a Vancouver café.

For more tips, and to arrange for personalized tutoring for yourself or your study group, check out Vancouver Computer Science Tutoring.