Intelligence from scratch

September 9, 2009

Sometimes I get bored at breakfast and start to derive the first principles of fields I know nothing about. I don’t feel bad about it—I could learn something new, but if I get something wrong or get stuck, no harm done. Today’s topic is intelligence.

What does it mean for a critter to be intelligent? I would start with, “an intelligent critter acts in ways that tend to cause outcomes it likes”.

I want to work on this definition a bit. First, I want to separate out that “likes” bit. I think knowing what “likes” means is a hard and worthy problem, but I’m not interested in it today. So I’ll say instead, “an intelligent critter acts in ways that tend to cause predicted outcomes.” If the critter can predict the outcomes of its actions, then it could choose actions with outcomes it liked.

Now, our critters need to know what’s going on in the world. We all make decisions based on a constant input stream of information from the world. Sights, sounds, smells and tastes and are just the beginning—we can also feel whether our arms are above our head, and whether that leftover chinese food is sitting properly, and whether that bad knee tells us a storm is coming. If we’re general with these “internal” senses, we might find it useful later to think about our “likes” as a part of that input stream. We might even go so far as to talk about remembering as if it was turning on an internal input stream that we control.

These input streams change all the time and they’re full of rich experience. If we wanted to build a robot critter, it would need to process megabytes and megabytes of data every instant. I’m happy to leave this problem for someone else, and treat our critter’s input as a big list of bits [0,1,0,0… ( millions more )…] that gets updated many times per second. Maybe the first bunch of bits say what color the world is in the top-left corner of your left eye, and the last bunch talk about whether you’re getting a massage behind your right shoulder-blade. It doesn’t really matter.

Real quick now, we apply the same principle to the critter’s output and treat is as a similar big vector of bits. Maybe turning on the first 100 bits means “raise your hand straight up.”

So we’ve got these big vectors of bits, and they change over time. Our little critter finds patterns in the input and predicts how actions on the output bits will change the input patterns. Then it chooses the actions that give it new inputs that it likes.

Let’s say our critter learns that some output bit always affects the same input bits in the next time step. That’s close to learning that a bunch of input bits are related to each other. So I’m not going to worry about finding patterns within one input vector. Instead I want to focus on finding patterns in the inputs as time changes. So, I’ll construct an idealized problem. Every time step, the critter gets to choose a one or zero as its output, and it gets as input a one or a zero. The critter is also allowed to remember its past inputs and outputs, because we’ve now broken the system of imagining memory retrieval as just another input.

What happens if the critter’s actions have no effect on its inputs? Well, the critter would feel very frustrated and have a low sense of self-efficacy. It might still be able to amuse itself by detecting a pattern in its input, though. We could call this problem “pattern recognition.”

What happens if the critter’s input has no pattern? First, that implies that the critter would be frustrated as above. An input with no pattern would be a world in which every input bit was a coin toss. No critter could do better than 50% for very long. This teaches us that it matters “how much” pattern there is in the input. Just to make something up for now, we’ll call that quantity Q and come back to it later.

With these cases in mind, let’s write down more precisely the problem of intelligence:

Given input [{0,1}, {0,1}, …, {0,1}] and output [{0,1}, {0,1}, …, {0,1}], n = |input| == |output|, give [input(n+1 | output(n) = 0), input(n+1 | output(n) = 1)]. That is, given a history of n input bits and n output bits, provide a prediction for input bit n+1 for both possibilities of output bit. (That is, the prediction has two bits.)

In the pattern recognition problem, the two bits of the prediction are always equal, because the output bit has no effect.

Over time, we can measure how well our critter does. Each turn, the critter chooses an output, and then we get to find out if its input prediction given this output was right. (It’s too bad, but we only ever get to test the possibilities the critter chooses.) If the critter always gets the right predictions, we would call it omniscient. If the critter’s predictions are completely random and unintelligent, it would get 50% right. (If it did worse than 50%, say getting 40% right, it could just flip its predictions to get 60% right.) So we can put an objective measure our critter’s perceived intelligence, which is how often its predictions are right, and the maximum limit of this intelligence depends on Q, “how much pattern” there is.

That was too much pseudo-math, so let’s take a walk in the park and look at some ants before talking about what “how much pattern” means.*

An ant can seem to behave intelligently. It leads other ants to food, it carries dead ants away from the ant hill, and fights off mean bugs like spiders. Of course, ants aren’t really intelligent. We find this out when we take an ant away from its environment and it can’t adapt.

The ant isn’t really learning patterns in its environment to act on them. The ant detects and acts on shallow indicators of patterns that its DNA already knows. Evolution executed a learning algorithm, partially evaluating it on the “pattern” of the ant’s environment. It was an engineering tradeoff, sacrificing adaptability to get a simplified design.

From the ant we learn that our critter can have some built-in knowledge, and it behaves just like built-in knowledge in other theories of communication.

Let’s return to discussing “how much pattern.” Kolmogorov teaches us that we can measure the information content of a string by the length of the smallest program that generates that string. There might be other measures we want to use, but I’m satisfied that there is at least one measure that gives us something to work with.

The critter’s input is its environment’s output. In our simplified world, we’ll say that the the universe is a program that feeds the critter its bits. The critter, wishing to learn the patterns in its input, needs to figure out the universe’s program.

For the sake of the argument, we’ll limit the universe’s program to be finite. If the universe’s program grows infinitely, the critter can never win, and that’s no fun.

Suddenly it makes sense to think of the (infinite) set of all possible finite programs. (Actually, the set of all possible programs that input and output one bit per time-step.) We’ll call it PROGRAMS, because that’s appropriately big and bold. The universe is a finite program, so it’s an element of PROGRAMS. The critter’s learning algorithm is a program, so it’s an element of PROGRAMS. Finally, the critter’s learning algorithm has the unenviable task of searching PROGRAMS for the universe’s program.

Let’s give every program in PROGRAMS a number. I don’t know how to do this, but I’m happy to pretend that I can for now. If you’re feeling really generous, we’ll eliminate programs that don’t terminate with an output each turn, because they’re no fun. Then one possibility for the critter’s learning algorithm looks something like:

program_num = 0;
/* Find a simple model of the universe 
   that's consistent with our memory */
while(!consistent(PROGRAMS[program_num], 
                  input_so_far, 
                  output_so_far)) 
{
    program_num = program_num + 1;
}
/* Generate predictions of what the 
   universe will do next */
/* If I output 0, what will the universe do? */
prediction[0] = PROGRAMS[program_num](output_so_far + 0)
/* If I output 1, what will the universe do? */
prediction[1] = PROGRAMS[program_num](output_so_far + 1)
/* Emotions and feelings go here */
return better_for_me_of(prediction[0], prediction[1])

I hope you get the idea.

This critter is actually very greedy and short-sighted. Another critter might simulate the universe for a few turns ahead before deciding what outcome it likes best. A very clever critter might become curious, and use carefully chosen outputs to strategically eliminate possibilities.

There’s lots of interesting things we can think about next. For example, what happens when the universe is a program that deliberately tries to fool the critter’s program? Exactly much better would a critter do if, each turn, we informed it what would have happened if it had chosen the other output? What is the Q of a neuron? What is the Q of a strand of DNA of some length?

We can also play a fun game to try and develop a critter program that doesn’t require us to enumerate PROGRAMS. I think this would be a fun competition. Let’s see what we can learn right away:

At time 0, the critter knows nothing about the universe. Its choice of output is arbitrary. Call it a.

At time 1, the universe has provided input i, which is either a or ~a. The universe’s program could be: Always output i, or always echo the critter’s output, or something else. I think we can prove, though, that the critter’s optimal strategy is to predict and output i on the next turn, just in case the universe is one of the really simple programs.

After that, things get a little more complicated, and it’s lunch time now.

* This is how far I got before I decided it was time to get dressed and go to school. At school I spoke with Zak, who pointed me at Algorithmic Information Theory, which is a new-ish branch of mathematics and computer science that studies these things.

Advertisements

One Response to “Intelligence from scratch”


  1. I thought about this while on the phone with my insurance guy the other day. When he asked if I had any other questions, I asked him if he ever thought about how we’re all just little prediction machines gambling on the future, and how that basically means that insurance should really be the core business of human existence.

    Then he was like ”uhh…. yeah, I guess it’s like gambling.”

    Then it was a little awkward.


Comments are closed.

%d bloggers like this: