Shopping, pizza, SSHFP

Posted by bert hubert Mon, 03 Apr 2006 20:33:00 GMT

Last week I received an invitation to become a member of a superstore. Due to planning idiocies, large supermarkets aren’t allowed outside of city centers (get this) here in The Netherlands, so you need to be a ‘member’, which is only possible if you have a business. Luckily I have several.

But the odd thing was that I already was a member. But they did send a 20 euro discount voucher! So I went there today to ‘apply’ for membership, and they duly discovered I already was a member. Could I still cash my discount voucher?

Much pondering, calling of supervisors, issuing of stamps later, it was decided I could. 20 euros is no small thing so I immediately splurged lots more on vital stuff like…


.. several kinds of ready-made pizza products. Avid readers of this blog will know that I have a worrying addiction to these ancient flat-breads, so why would I try pre-made stuff? Well, I can be stupendously lazy at times, that’s why. I’ve not yet met a good programmer who isn’t lazy, so this is good news.

On to the vital stats. The ‘Nestle’ perishable pizza bottom looks expensive, well made, and even _is_ expensive. When prepared in my pizza oven, it even looks perfect, very thin, very Neapolitan. It tastes likes carton though.

I also tried a smaller no-brand pizza bottom which is thicker and looks less professional, but which tasted a lot better. Nothing compared to my home made dough though.

I bought several bottles of varying kinds of ‘pasta sauce’ and it turns out they are all fine. My new pizzas don’t have a lot of sauce on them, I find that even the very cheap ready-made sauce is very good.

All in all, a worthwhile experience.


A day that didn’t really go anywhere. Spent quite some time fighting different-endian PCAP files. PowerDNS contains technology to replay recorded DNS streams for verification and analysis purposes, for which it needs to be able to parse tcpdump files. It turns out these come in both big- and little-endian flavours.

Furthermore, Solaris has a 2*64-bit struct timeval, whereas pcap files use regular 32-bit time_t values. So I had to abstract that all out. Didn’t commit it to SVN yet as part of the code doesn’t work yet.

Peter Zijlstra previously educated me on the use of ‘clock algorithms’ for cache pruning. PowerDNS currently prunes based on the TTL of records, which is probably not the best thing to do. A long-lived record has no need to outstay a shorter lived one if it is never queried.

My local sources now put a record in the back of a linked list every time it is accessed or created (many thanks to Joaquín Mª López Muñoz for explaining how this works). When we want to prune, we start with the least used records, which are at the beginning.

When the recursor tries to find a record in the cache and finds it to be expired, it can simply ignore it. It will be refreshed soon anyhow.

It would appear this could speed up PowerDNS a lot, and also enable us to limit ourselves to a fixed amount of memory used (see below).

Also, implemented RFC 4255, SSHFP, which took all of 30 minutes, counting the implementation of hex-encoded records. Without that infrastructure, it would’ve taken 3 minutes.

This does not do anything yet - the recursor does not need to know about SSHFP and the authoritative nameserver doesn’t use the innovative ‘MOADNSParser’ yet. I’ll probably change that before 2.9.21.


I also spent some time trying to get the linux implementation of getrusage fill out the ‘integral’ memory fields. Turns out no OS I have access to uses these fields as their definition has traditionally been crap. SuSv3 also doesn’t mention these fields at all. So that was some lost effort. It appears you’ll have to do something different on each unix to discover real memory usage.

In working on this, I managed to get UML compiled and working without too much work, which is a first. The UML defconfig is not very good though, will send Jeff Dike a patch.

I did discover mallinfo(2) today, which is present in all unixes it appears, and provides information on the memory allocation subsystem. The numbers nedmalloc output here appear to be bogus though.

Posted in , , ,  | 3 comments | no trackbacks

ISP Kart Competition, PowerDNS CPU utilization coolness

Posted by bert hubert Sun, 02 Apr 2006 19:06:00 GMT

Ok, the Dataman ISP Kart Competition was cool. I met many old friends there and also a lot of PowerDNS users, like XS4ALL, True, ISP Services and WideXS (didn’t speak to them, they were too busy karting it appears). Dave Aaldering appears to have done another fine job this year!

I was a bit hammered from a party the previous day and left a bit early. I’ll be there again next year though!

This week is supposedly light on activities, but I will be at the monthly gathering of a number of Delft internet companies.


One big PowerDNS user had some performance complaints and I outfitted them with the previously mentioned ‘cache cache’ to help improve things. A nameserver is a very dynamic system and it is a good idea to monitor its CPU usage very closely. That is easier said than done though, but then I had a very cool idea. Operating systems in general keep a count of how much CPU has been spent in total on a process. This is an ever increasing number, which in itself is sort of interesting but doesn’t tell us a lot.

However, the first derivative of that number is the “instantaneous” CPU load. And taking derivatives is what graphing tools like rrd or mrtg do. So all I had to do was export the number of milli-seconds the OS has already accounted PowerDNS has occupied the CPU, and graph that.

And it works just fine. This should be a fine tool to determine the cause of PowerDNS problems.

On another note, one such cause has been that PowerDNS needs to be able to limit its memory consumption for it to be useful on smaller machines.

The Sun T2000 (Niagara)

Did initial runs of the PowerDNS recursor on the big Sun and it appears some work is needed to make things perform - as expected. At first I wanted to revamp most of the recursor but it now appears I can (almost) get away with calling pthread_create() 24 times and adding some locking.

This will probably speed things up massively as DNS recursing is mostly memory and network bound, which means there are large opportunities to get all the ‘strands’ operating in tandem. I think it might even speed up PowerDNS on SMP Opteron systems.

Stefan Arentz is working on making boost::asio benefit from Solaris ‘completion ports’, I’m pondering if it would be possible to move the PowerDNS recursor to asio. This would in theory be faster as it would need a few less system calls per packet (which we now spend on MTasker).

no comments | no trackbacks

PowerDNS recursor deployments do not age like wine

Posted by bert hubert Fri, 31 Mar 2006 09:22:00 GMT

In a break from tradition, I’ll start with the ‘life’ stuff. It has been very busy lately and I hope to relax some more. I’ve finished some non-PowerDNS work and should be under less pressure the coming weeks.

Tomorrow, Saturday, I’ll be a ‘VIP guest’ (wow) at the Dataman ISP Kart Competition. I hope and expect this to be fun. I’ve heard of a number of PowerDNS users that will be there, it should be good to meet them (once more) in real life!

Ok, PowerDNS. To start with a quote from one of my favorite movies, Pulp Fiction:

The thing is, Butch, right now you got ability. But painful as it may be, ability don’t last.


M.therf.ckers who thought their ass would age like wine. If you mean it turns to vinegar, it does. If you mean it gets better with age, it don’t.

A typical high-load PowerDNS recursor installation does not age like wine.

The past few days I’ve ploughed through the PowerDNS code and found many worthwhile micro-optimizations, totalling up to about a 20% performance increase, at the cost of some code obfuscation. See the PowerDNS timeline for details.

However, 20% is not what I am looking for. Dan Bernstein’s dnscache, part of DJBDNS is reportedly a lot faster on many loads than PowerDNS. It should be said that the actual user experience of PowerDNS is probably better though, as it is generally quicker to react to broken domains and broken queries.

It turns out that the footprint of a busy recursor is around 2 to 3 million cache entries. Internally, these entries are almost always stored in a tree, although I’m pondering moving back to a hashed list, as such a structure should excell at making lookups for things which aren’t there, which happens a lot in DNS.

Anyhow, currently things are in a red black tree, which will have around 2*log2(2500000) levels, which is in the order of 42. An average lookup can be expected to find its record 35 levels deep or so. I’m no computer science theorist, but these numbers aren’t very far off the mark.

A lookup in a full cache will then take 35 comparisons, one for every level we have to descend. Each DNS query takes at least two lookups, which makes 70 comparisons. A query for which we have no answer (yet) takes at least 6 lookups right now, for a stunning 210 comparisons.

If we look at these internal lookups, they look something like this: CNAME PTR NS NS NS NS CNAME A

etc. What we find is that the first three queries are very closely alike, and very likely all to return nothing from the cache.

So I’ve now implemented a tiny ‘cache cache’ that should help fold the initial three lookups into one, as well as the last two.

There is some further room for improvement, but measurements already show a large performance jump.

Back to the initial comments regaring wine and vinegar, why is that relevant? When the cache is small, PowerDNS doesn’t have to walk that many levels, what we’ve found is that any slowdowns start to happen after many hours.

Stay tuned for further updates - it’ll take quite some hours before the current PowerDNS users can confirm the hopefully spectacular performance gains :-)

no comments | no trackbacks

Physics, PowerDNS, life

Posted by bert hubert Tue, 28 Mar 2006 20:48:00 GMT

Ok, this blog is not all about PowerDNS. Seriously. So, first some Physics. I used to be a physics student at Delft University of Technology, but I dropped out halfway through. That doesn’t mean I lost interest in hard science though.

I have a strong interest in ‘fringe science’. In my not so humble opinion, quietly shared by scientists I know, physics is focussed too much on confirming current ideas, whereas doing research into ‘interesting’ results is frowned upon.

I previously wrote a bit about this here.

Some of the things I keep an eye on are

  • “Cold fusion”
  • The gravity anomaly described in the link above
  • Gravity shielding

Cold fusion

The cold fusion bit is interesting enough. There are literally thousands of results but none of them has proven able to convince mainstream physics. Partially this has been due to the experimenters, which have sometimes made huge fools out of themselves, or have even committed fraud.

However, even when people do come along with solid results, they are faced with incredible amounts of criticism. You might as well try to convince people child pornography is art. The results on your career in physics are highly similar.

I’m currently of the opinion that there is so much smoke surrounding cold fusion that there is bound to be some fire.

Gravity shielding

Has been interesting too. Realise that nothing, and I mean nothing affects gravity. It goes through everything. We can’t create it, we can’t stop it. The saga started out with measurements by the secretive Evgeny Podkletnov, who claimed to have observed a slight decrease in the force of gravity above a rapidly spinning superconducting disk. High temperature superconductors are excellent at conducting electricity but their mechanical properties are somewhat lacking, and people have had a hell of a time getting such a disk to rotate at speed without disintegrating.

NASA sunk a lot of effort in trying to reproduce his results, but sort of failed. The guy in charge, David Noever is currently nowhere to be found, after he also researched gravity anomalies during solar eclipses.

Then another scientist, Ning Li studied the effect and vanished, as far as I understand it. Popular Mechanics ran an article on her. In the mean time, Podkletnov is now supposed to be part of secret military research in Russia. The stuff of conspiracies!

This strand of interest appeared to be slowly dying off though when suddenly ESA and US Air Force sponsored scientists presented this paper, on the ESA website no less.

In this paper, they report finding a 1-in-10000 change in gravity above a ring of niobium or lead when, cooled to liquid helium temperatures, it is rapidly spun up or down.

They mention that they’ve spent three years trying to spot errors in their experiment, which has been run 250 times.

Well, why is this important? As I described in my own page linked above, quantum mechanics and (general) relativity collide. Gravity is firmly in the relativity domain, superconductivity is as quantum mechanic as it gets. Also, nothing else has ever changed gravity.

This discovery could quite literally put physics on its head - which is high time, things were getting decidedly boring.


Ah, that thing. Well, not a lot to report. Everything ticking over just fine. Did discover that an important part of DNS, the ‘any query’ is completely unspecified by the RFCs. You can try to read what you have to do into the ancient writings of Mockapetris & friends, but I’m not to sure. Decided to emulate BIND instead, which is also the easiest thing to do.

I’m trying to double the recursor performance (again), but this appears to be hard work. Perhaps DTrace on the Niagara can be of some help.


Trying to relax a bit, worked too hard on PowerDNS and other projects. Working too hard makes me unfriendly and irritable, which is not a pleasant thing.

Posted in , , ,  | no comments | no trackbacks

Quick update on PowerDNS, 'powertools' musings

Posted by bert hubert Mon, 27 Mar 2006 18:45:00 GMT

Today the big Dutch ISP migrated one of its three recursor nameserver IP addresses to PowerDNS, at first sight all appears well. In preparation of this event, over a billion packets were retransmitted and answers verified against incumbent nameservers.

One thing we missed is that that the verification code uses part of the same code as the nameserver itself. This in turn meant that some malformed packets never were replayed, which hid the fact that

  1. the recursor logged these errors verbosely

  2. these packets are rather common

I’ve made the recursor a little less strict with respect to packets with trailing garbage. This has reduced error reporting a lot and improved general customer satisfaction.

But after these things were addressed, things progressed swimmingly and there were happy faces all round.


Additionally, I’ve made a tiny 61 kilobyte package of just the PowerDNS recursor. I enjoyed the large amount of control a raw Makefile gives one compared to penetrating the layers of cruft called, and

I’ve long had the urge to rout out the venerable autotools from my projects, now may be the time. To this end, I’ve started summarising why we actually need ./configure. So far I’ve found a few things.

There are basically three categories:

Where things come from, what we have

  1. Checking dependencies and:
    • informing the user intelligibly of any missing ones
    • Make proper use of detected libraries
    • Configure ourselves to work around any missing ones that are not vital
  2. Allow user to specify the non-default location of any dependencies, overriding either improper defaults (this should be rare) or make it possible to choose between different versions of a dependency.
  3. Choose which capabilities should be compiled into the resulting programs
  4. Make other compile-time choices which cannot easily be changed at runtime.

Where things go

  1. Figure out where programs, documentation, configuration files should be installed, either by
    • determining proper defaults for the target operating system
    • allowing the user to override these sensible defaults, if needed
  2. On install move items to these places.
  3. Make tarballs of the source that contain the files needed to compile.

Build mechanics, dependencies

  1. Allow programmer to easily specify the buildup of binaries (ie, which source files are part of a program), without duplication of work.

  2. Abstract out the mechanics of building shared libraries. This has generally been the domain of libtool. Different operating systems have different rituals for making shared libraries, static executables etc etc.


I’m pretty sure GNU Make, combined with perhaps some bash scripts, contains almost everything needed to implement the above without too much work.

This won’t be a bombastic process, but will probably evolve into enough to make building and releasing the PowerDNS recursor easy.

Posted in , ,  | 2 comments | no trackbacks

Updated blog software, final PowerDNS tweaks

Posted by bert hubert Sun, 26 Mar 2006 14:05:00 GMT

Ok, apologies to the people that syndicate me, the URL might have changed. The timestamps on the older posts are also a bit dodgy, and the 2 comments have definitely vanished.

I used to be into ‘layout’ a “lot” so I hope you appreciate the improved appearance of this blog, including the dreaded smart quotes.

Ok, onto the real content.


You may recall the stunning bug I wrote about yesterday, and how I solved it. Later that day I thought of an old adage “A bug is never alone”, and indeed, it turned out that the negative-cache, where we store records that auhoritatively don’t exist, was also cleaned in reverse, whereby we continuously removed all new entries.

Fixing that bug raised the steady-state cache hitrate from 80% to 90%, which doesn’t sound like a lot, but means the amount of network traffic generated to the Internet has halved.

I did do something controversial and limited this negative caching to at most one hour. I’m pretty sure this is what people want, and it saves heaps of memory anyhow. After an hour PowerDNS will, on getting a new query, verify if the domain name or record exists again. Sue me.

I also moved the negative cache to Boost::multi_index_container, I can’t heap enough praise on this container. It slices, it dices.

I also used it to implement user initiated cache deletion, you can now use rec_control wipe-cache to remove this beloved blog from your cache, in case it contained bad content. To study your cache, use rec_control dump-cache filename.


No pizza news today. I’m trying to think of the human angle of this blog but there is not a lot to report :-)

Posted in , ,  | 1 comment | no trackbacks

Heading up to big PowerDNS recursor deployment

Posted by bert hubert Sat, 25 Mar 2006 15:02:00 GMT

For far too long now we’ve been working on implementing custom features for a big internet service provider here in The Netherlands and it appears we are almost there.

But then again, I’ve thought so a number of times already. The recursor (or resolver) of a network is one of the most crucial components of providing good service.

Put simply, a broken nameserver is perceived as a broken network. A slow nameserver means a slow network. So providers are understandably nervous about migrating to PowerDNS!

Some events may be forcing their hands however. To help ease migration fears, I’ve written dnsreplay_mindex, a tool that replays recorded DNS traffic (which you should anonymise using dnswasher if you plan on shipping it to me!) against PowerDNS, and shows statistics relative to your original nameserver.

I’m now confident that the PowerDNS recursor performs, in many cases, thousands of times better than the competition. Ok, that sentence has a touch of marketing to it. Just a touch. But I’m currently benchmarking at three times the original speed and dropping 30000 times less packets than BIND 8.latest.

That does not mean to say the PowerDNS recursor is perfect. It isn’t, not by a long shot. Even yesterday it turned out one of the more unique features of PowerDNS, the ability to forego hammering broken nameservers with queries that time out, had a cache that was cleaned in reverse: all NEW entries were being removed each minute.

The stunning thing is that it worked fine anyhow, just ate heaps of memory and performed some needless queries - which other nameservers perform all the time in any case.

Furthermore, the recursor carries IP addresses around as full blown strings, for which there is no excuse.

Update: I fixed this here

So there is still work to do, but I’m confident we can migrate at least one of the target servers to PowerDNS on Monday.

In other news, it is a bit quiet on the Niagara (Sun T2000) front, I’m mostly reading up on the unique features of its CPU before delving in with code.

My current aims are to make PowerDNS really fast on T2000 and write a HOWTO about the process, allowing you to benefit from this architecture as well.

On the human interest front, it turns out that leaving the dough to rise in the fridge does indeed produce something that is more like the kind of dough I want, but I’m still not there! I think I’ll aproach my favorite pizza restaurant soon and hope they are willing to share. I already have a proper pizza oven.

Posted in , , ,  | 3 comments | no trackbacks

Progress with PowerDNS on Solaris 10, other things

Posted by bert hubert Fri, 24 Mar 2006 15:01:00 GMT

I’ve managed to make the T2000 a comfortable place to live in, for a Linux person like me. John Levon of Sun pointed me towards Blastwave, an OpenSolaris community site that adds apt-get like abilities.

I fixed up the few remaining problems people faced compiling and using PowerDNS on Solaris, it now works out of he box.

Generally, compared to earlier revisions, things tend to ‘just work’ a lot more on Solaris 10, but there is yet a way to go. For example, a default install won’t allow you to generate new home directories, as /home is under control of an automounter.

But what is very good is that even the notoriously difficult programs autoconf, automake and libtool all function as intended. This is of vital importance for when actually having Solaris as your main development platform for Open Source, as these tools generate the ubiquitous ./configure scripts for most projects.

Casper Dik pointed me to the proper place for UltraSPARC T1 (aka Niagara) performance documents, I think I see how I can make the various PowerDNS components shine on this architecture.

The converse of this is that you actually need to work at making this new chip scream - quite a number of unmodified programs do not benefit from all the additional cores and strands.

For the recursor, I can probably get away with removing my beloved MTasker and instead use pthreads or Solaris native threads.

For the bind2 authoritive mode, more work will be needed, where I will be looking at finegrained locking to make zone loading fast.

In other news, tonight I continued my quest to make perfect pizza dough, but I’m still not there. I think the flour available in shops here is not entirely of the right kind. This in an attempt to add a ‘human interest’ angle to my new blog :-)

Posted in , ,  | no comments | no trackbacks

First impression of T2000

Posted by bert hubert Fri, 24 Mar 2006 14:59:00 GMT

Power maintenance was over earlier than expected so I was able to immediately plug in the ‘try and buy’ Sun Fire T2000. I am typing this from the Mozilla that comes with Solaris 10, it looks good.

I’ll write up more of the experience, I think there are some easy ways in which Sun could improve the first impression this machine makes.

Also, many thanks to my friend Ahhing who supplied vital equipment to make everything work!

Updated with picture:

Posted in , ,  | no comments | no trackbacks

UDP buffers and nameservers

Posted by bert hubert Fri, 24 Mar 2006 14:45:00 GMT

Learned a ‘big thing’ today - when writing a heavy duty UDP server (like the PowerDNS recursor), you might need to do a very rare thing: tune your kernel.

I found this site to be very helpful ‘UDP buffering background’

I implemented all this here.

no comments | no trackbacks

Older posts: 1 ... 5 6 7 8