Rishabh Rahul
← Back to all posts

Reinforcement Learning Feels Bigger Than Games

February 9, 2023
RL

Notes from training a Pac-Man bot for 3,500 games and slowly losing my grip on what "decision-making" even means.

The first time I really sat with reinforcement learning, I was watching a Pac-Man bot die. Not once. Not twice. Hundreds of times in a row.

The setup was super standard. I used a basic tutorial setup, started the training for 3,500 games, and went to make coffee.

When I came back, the bot was doing better. It was still losing most of the time, but it was surviving longer. Then I started looking at the exact moments right before it died, and something broke in my brain.

Right before a ghost catches Pac-Man, the math values crash. It doesn't happen all at once. Just suddenly every possible move looks equally bad. The network figured out that there is no good move here. This is basically the math version of hesitation.

The bot doesn't know what a ghost is. It doesn't know it's about to die. It doesn't even know it's playing a game. But somewhere in the code, a structure is forming that I never actually programmed.

The same pattern is everywhere Once you see it, you can't unsee it.

  • Observe the state

  • Choose an action

  • See what happens

  • Make better decisions next time

That is reinforcement learning. It is also what a chess player does. What a salesperson does. What you do when you decide to answer a message now or later. The reward changes, but the loop stays the same.

Humans already do this. We just don't call it that. The best example is BODMAS in math. Without an order of operations, 3 + 4 x 2 is confusing. BODMAS is just a rule we give kids so they don't have to figure it out from scratch. It is not intelligence. It is just saving time.

Even normal conversation works like this. If someone says "Book me the cheapest flight tomorrow morning," a bad assistant asks fifteen questions. A good one guesses most of them based on common sense. That is a tiny reinforcement learning problem that humans do so well it just feels like having a personality.

What if RL becomes the decision layer? Most AI products today feel very stiff. They have a strict path for everything. It works, but it is rigid like a flowchart. Every new trick needs a new line of code.

What really excites me is if reinforcement learning quietly becomes the layer that decides how an assistant acts, not just what it says. Things like:

  • Should it ask a clarifying question or just guess?

  • Is the user frustrated? Should it slow down?

  • What is the fastest way through this conversation?

You can hardcode some of this, but eventually you are trying to write rules for thousands of slightly different situations. That is exactly what reinforcement learning is built for.

The part I actually find beautiful The thing I keep coming back to is this: Nobody told the bot what a ghost was. Nobody labeled the data as "danger" or "safe."

There is just an image, a score, and a lot of math updates. And then, somewhere inside the program, it clearly starts paying attention to where the ghosts are. Not because it was told to, but because paying attention to ghosts helps it get more points.

The structure wasn't designed. It emerged. That doesn't feel like normal AI to me. It feels closer to how nature works. Simple local rules repeated forever accidentally create behavior that looks like it has a real purpose. The bot isn't conscious, but it learned to hesitate without anyone asking it to.

Closing

I am not an AI researcher. I just did this project because I thought it would be fun, and I ended up with more questions than I started with.

The technical takeaway is small: DQNs work, replay buffers help, epsilon decay is fiddly, and conv layers learn things you didn't ask them to.

The non-technical one is bigger, and I think it is the reason reinforcement learning is going to keep eating more of the AI stack than people currently expect:

Simple reward-driven systems can quietly discover structure that nobody bothered to program.

Once you watch that happen, even on a tiny 1980s arcade game, it stops feeling like a niche tech trick. It starts feeling like a whole new way of thinking.

And that, more than the games or the benchmarks or the leaderboards, is what I think reinforcement learning is actually about.