Posted by bert hubert Sun, 04 Feb 2007 12:14:00 GMT
Ok, I’m going to lecture a bit, a bad habit of mine. The summary is that an important enhancement of the Linux kernel has been proposed, but in order to understand the significance of this enhancement, you need a lot of theory, which follows below.
I use the word “computer” sometimes when I properly mean “the operating system”. This exposes a problem of this post, I’m trying to explain something deeply theoretical to a general audience. Perhaps it didn’t work. See for yourself.
Doing many things at once
People generally tend not to be very good at doing many thing at once, and surprisingly, computers are not much different in this respect.
First about human beings. We can do one thing at a time, reasonably well. There are people that claim they can multi-task, but if you look into it, that generally means doing one thing that is really simple, while simultaneously talking on the phone.
This is exemplified by how we answer a second phone call, ie, by saying “The other line is ringing, I’ll call you back”, or conversely, telling the other line they’ll have to wait.
We emphatically don’t try to have two conversations at once, and even if we had two mouths, we still wouldn’t attempt it.
Let’s take a look at a web server, the program that makes web pages available to internet browsers. The basic steps are:
- Wait for new connections from the internet
- Once a new connection is in, read from it which page it wants to see (for example, ‘GET http://blog.netherlabs.nl/ HTTP/1.1’).
- Find that page in the computer
- Send it to the web browser that connected to us
- Go to 1.
Compare this to answering a phone call, step 1 is the part where you wait for the phone to ring, and answering it when it does. Step 2 is hearing what the caller wants, step 3 is figuring out the answer to the query, 4 is sharing that answer.
This all seems natural to us, as it is the way we think. And programmers, contrary to what people think, are human beings, too.
Where this simple process breaks down is that, much like a regular phone call, we can only serve a new web page once the old one is done sending.
And here is where things get interesting - although we people have a hard time doing multiple things at once, we can give the problem to the computer.
What is the easiest way of doing so? Well, if we want to increase the capacity of a telephone service we do so.. by adding people. So on the programming side of things, we do the same thing, only virtually: we order the computer (or more exactly, the operating system) to split itself in two!
The new list of steps now becomes:
- Wait for new connections from the internet
- Once a new connection is in, split the computer in two.
- One half of the computer goes back to step 1, the other half continues this list
- (2) Read from it which page it wants to see (for example, ‘GET http://blog.netherlabs.nl/ HTTP/1.1’).
- (2) Find that page
- (2) Send it to the web browser
- (2) Done - remove this “half” of the computer
I’ve prefixed the things the second computer does with ”(2)” . This looks like the best of both worlds. We can “serve” many web pages at the same time, and we didn’t need to do complicated things. In other words, we could continue thinking like human beings, and use our intuition, by thinking of the analogies with answering phone calls.
So, are we done now? Sadly no. What basically has happened is that we have invoked a piece of magic: let’s split the computer in two. That is all fine, but somebody has to do the splitting. This job is farmed out to the CPU (the processor) and the operating system (Windows, Linux etc), and they have to deal with making sure it appears the computer can do two things at the same time.
Because the truth is.. people can’t do it, and neither can computers. They fake it.
This faking comes at a cost, incurred both while splitting the computer (“forking”), and by making the computer juggle all its separate parts. Finally, it turns out that practically speaking, you can divide a computer up into only a limited number of parts before the charade falls down.
Busy websites have tens of millions of visitors, we’d need to be able to split the computer into at least that many parts, while in practice the limit lies at perhaps 100,000 slices, if not less.
Several solutions to this problem have been invented. Some involve not quite splitting up the entire computer and making split parts share more of the resources (like for example, memory). This is called ‘threading’. Perhaps this could be compared with not hiring more people to answer the telephone, but instead giving the people you have more heads, so as to save money.
In the end, all these solutions run into a brick wall: it is hard to maintain the illusion that the computer can do multiple things at the same time, AND have it actually do a million things at the same time.
So in the end, we have to bite the bullet, and just make sure the program itself can handle many many things at once, without needing the magic of pretending the computer can do it for us.
This is where things get hard, and this is to be expected, as it was our basic premise that people can’t do multiple things at the same time, and what’s worse, they have a hard time even thinking about what it would be like.
The new algorithm looks like this:
- Instruct the computer to tell us when “something has happened”
- Figure out what happened:
- If there is a new connection, instruct the computer that from now on, it should tell us if new data arrived on that connection
- If something has happened to one of those connections we’ve told the computer about, read the data sent to us on that connection. Then find the information requested on that connection, and instruct the computer to tell us when there is “room” to send that data
- If the computer told us there was “room”, send the data that was previously requested on that connection. If we are done sending all the data, tell the computer to disconnect, and no longer inform us of the state of the connection.
- Go back to 1.
If this feels complicated, you’d be right. However, this is how all very high performance computer applications work, because the “faking” described above doesn’t really “scale” to tens of thousands of connections.
How does this translate to the telephone situation? It would be like we have lots of small answering machines, that lots of callers can talk to at the same time. Whenever someone has finished a question, the operator would listen to that answering machine, and leave the answer on the machine, and go on to the next machine that has a finished message.
From this description, it is clear it would not work faster that way if you’d try it for real. However, in many countries, if you call a directory service to find a telephone number, you’ll get half of this. Your call is answered by a real human being, who asks you questions to figure out which phone number you are looking for. But once it has been found, the operator presses a button, and the result of your query is sent to a computer, which then reads it to you, allowing the operator to already start answering a new call. Rather smart.
Something in between
If the previous bit was hard to understand, I make no apologies, this is just how complicated things are in the world of computing. However, we programmers also hate to deal with complicated things, so we try to avoid stuff like this.
People have invented many ways of allowing programmers to think ‘linearly’, as if only a single thing is happening at the same time, without having to split the entire computer.
One way of doing this is having a facade that makes things go linearly, until the program has to wait for something (a new connection, “room” to send data etc), and then switch over to processing another connection. Once that connection has to wait for something, chances are that what our earlier ‘wait’ was waiting for has happened, and that program can continue.
This truly offers us the best of both worlds: we can program as if only a single thing is happening at the same time, something we are used to, but the moment the computer has to wait for something, we are switched automatically to another part of the program, that is also written as if it is the only thing happening at the same time.
Actually making this happen is pretty hard however, because traditional computer programming environments don’t clearly separate actions that could lead to “waiting” from actions that should happen instantly.
A prime example of the first kind of action is “waiting for a new connection” - this might in theory take forever, especially if your website is really unpopular.
Things that should happen instantly include for example asking the computer what time it thinks it is.
Traditional operating systems can be instructed to be mindful of new incoming connections, and not keep the program waiting for them. This is what we described in the complicated “if X happened, if Y happened” scenario above.
They can also do the same for reading from the network and writing to the network, both things that might take time. This means you can ask the operating system ‘let me know when I can read so I don’t have to wait for it, and I can process other connections in the meantime’.
Furthermore, there are some limited tricks to do the same for reading a file. The problem is that back in the 1970s when most operating system theory was being invented, disks were considered so fast, nobody thought it possible you’d ever need to meaningfully wait for one. Of course disks weren’t faster back then, but computers were slower, and massively so. So by comparison, disks were really fast.
The upshot is that in most operating systems, disk reads are grouped with “stuff that should happen instantly”, whereas every computer user by now has experienced this is emphatically not the case.
Modern operating systems offer only a limited solution to this problem, called ‘asynchronous input/output’, which allows one to more or less tell the computer to notify us when it has read a certain piece of data from disk.
However, it doesn’t offer the same facility for doing a lot of other things that might take time, like finding the file in the first place, or opening it. Things that in the real world take a lot of time.
So, we can’t truly enjoy the best of both worlds as sketched above, which would mean the programmer could write simple programs, which would be switched every time his program has to wait for something.
Enter ‘Generic AIO’
Zach Brown, who is employed by Oracle to work on Linux, has now dreamed up something that appears to never have been done before: everything can now be considered something that “might take time”.
This means that you can ask Linux to find a certain file for you, and immediately allows you to process other connections that need attention. Once the operating system has found the file for you, it is available for you without waiting.
Although almost every advance in operating system design has at one point been researched already, this approach appears to be rather revolutionary.
It has ignited vigorous discussion within the Linux community about the feasibility of this approach, and if it truly is the dreamt of “best of both worlds”, but to this author, it surely looks like a breakthrough.
Especially since it unites the worlds of “waiting on a read/write from the network” with “waiting for a file to be read from disk”.
Time will tell if “Generic AIO” will become part of Linux. In the meantime, you can read more about it on LWN.