Some debugging techniques, and "C++ introspection"

Posted by bert hubert Thu, 18 Sep 2008 19:44:00 GMT

After too much posting on IETF mailing lists, and not achieving anything, I’ve gone back to coding a bit more.

There are two things I want to share - the first because I had a devil of a time figuring out how to do something, and I hope that posting here will help fellow-sufferers find the solution via Google. The second thing I want to talk about because, and this is getting to be rare, I programmed something cool, and I just need to tell someone about it.

I pondered explaining it to my lovely son Maurits (4 months old today), but I don’t want to ruin his brain.

Debugging iterators

In most programming languages there are a lot of things that compile just fine, or generate no parsing errors at runtime, but are still accidents waiting to happen.

Tools abound to expose such silent errors, usually at a horrendous performance cost. But this is fine, as errors can be found by the developer, and fixed before release.

As part of our arsenal, we have the veritable Valgrind that detects things such as reading from memory that had not previously been written to. In addition, other tricks are available, such as changing functions that ‘mostly do X, and rarely Y’ so that they always to Y. This quickly finds programs that skipped dealing with Y (which might be a rare error condition, or realloc(2) returning a new memory address for your data).

Finally, many programming environments by default perform very little checking (in the interest of speed) - for example, they will gladly allow you to compare something that points to data in collection A to something that points to collection B - a comparison that never makes sense, classical apples and oranges.

My favorite C++ compiler, G++, comes with so called ‘debugging iterators’ that are very strict - anything that is not absolutely correct becomes an error, sometimes at compile time, sometimes at runtime.

Together with Valgrind, this is one of the techniques I like to whip out when the going gets tough.

Sadly, Debugging iterators (which are turned on by adding -DGLIBCXXDEBUG conflict with one of my favorite C++ libraries, Boost.

To make a long story short, to compile a version of Boost with debugging iterators, issue:

$ bjam define=_GLIBCXX_DEBUG

This single line of text may not look all that important, but it took me half a day of debugging to figure this out. So if you get this error:

dnsgram.o:
(.rodata._ZTVN5boost15program_options11typed_valueISscEE[vtable for 
boost::program_options::typed_value, std::allocator >, char>]+0x18): 
undefined reference to 
`boost::program_options::value_semantic_codecvt_helper::parse(boost::any&, 
std::__debug::vector, 
std::allocator >, std::allocator, std::allocator > > > const&, bool) const'

Then compile your own version of boost as outlined above.

C++ Introspection & Statistics

C++ is an old-school language, perhaps the most modern language of the old school. This means that it sacrifices a lot of things to allow programs to run at stunning ‘near bare metal’ speeds. One of the things that C++ does not offer therefore is ‘introspection’

What this means is that if you have a class called “ImportantClass”, that class does not know its own name at runtime. When a program is running, it is not possible to ask by name for an “ImportantClass” to be instantiated.

If you need this ability, you need to register your ImportantClass manually by its name “ImportantClass”, and store a pointer to a function that creates an ImportantClass for you when you need it.

Doing so manually is usually not a problem, except of course when it is. In PowerDNS, I allocate a heap (or a stack even) of runtime statistics. Each of those statistics is a variable (or a function) with a certain name.

In more modern languages, it would probably be easy to group all these variables together (with names like numQueries, numAnswers, nomUDPQueries etc), and allow these statistics to be queried using their names. So, an external program might call ‘get stat numQueries’, and PowerDNS would look up the numQueries name, and return its value.

No such luck in C or C++!

So - can we figure out something smart, say, with a macro? Yes and no. The problem is that when we declare a variable in C which we want to be accessible from elsewhere in the program, it needs to happen either inside a struct or class, or at global scope. This in turn means that we can’t execute code there. So, what we’d like to do, but can’t is:

struct Statistics {
         uint64_t numPackets;
         registerName(&numPackets, "numPackets");
         uint64_t numAnswers;
         registerName(&numAnswers, "numAnswers");
} stats;

stats.numPackets is indeed available, but the line after its definition will generate an error. This is sad, since the above could easily be generated from a macro, so we could do:

DEFINESTAT(numPackets, “Number of packets received”);

Which would simultaneously define numPackets, as well as make it available as “numPackets”, and store a nice description of it somewhere.

But alas, this is all not possible because of the reasons outlined above.

So - how do we separate the data structure from the ‘registerName()’ calls, while retaining the cool ‘DEFINESTAT’ feature where everything is in one place?

In C++, files can be included using the #include statement. Most of the time, this is used to include so called ‘header’ files - but nothing is stopping us from using this feature for our own purposes.

The trick is to put all the ‘DEFINESTAT’ statements in an include file, and include it not once, but twice!

First we do:

#define DEFINESTAT(name, desc) uint64_t name;
struct Statistics {
#include "statistics.h"
} stats;

This defines the Statistics struct, containing all the variables we want to make available. These are nicely expanded using our DEFINESTAT definition.

Secondly, we #undefine DEFINESTAT again, and redefine it as:

#define DEFINESTAT(name, desc) registerName(&stats.name, #name, desc);

Then we insert this in a function:

void registerAllNames()
{
#include "statistics.h"
}

This will cause the same statistics.h file to be loaded again, with the same DEFINESTAT lines in there, but this time DEFINESTAT expands to a call that registers each variable, its name (#name expands to “name”), and its description.

The rest of our source can now call ‘stats.numPackets++’, and if someone wants to query the “numPackets” variable, it is available easily enough through its name since it has been registered using registerName.

The upshot of this all is that we have gained the ability to ‘introspect’ our Statistics structure, without any runtime overhead nor any further language support.

As stated above, more modern languages make this process easier.. but not as fast!

I hope you enjoyed this arcane coolness as much as I did. But I doubt it :-)

Posted in , ,  | no comments