Defensive Programming
Posted by rue, Fri Sep 12 09:57:00 UTC 2008
In the process of studying the current codebase, I ran into some functions whose existence I knew about: as() and try_as(). Both are essentially cast functions, explicit in the spirit of static_cast() but slightly tailored for our purposes (in the VM, each class contains its type as data explicitly, e.g. ArrayType and so on.) All Ruby classes are subclasses of Object, and the instances are passed around as Object*; the functions take the type to attempt to cast into and the object. as() will raise a TypeError if the cast is not possible, i.e. the real type of the object is not kind_of? the target type. try_as() is meant to be used in conditions and returns NULL for an invalid cast for those cases where the object's type is not always clear. Sounds good, right?
Well, one thing caught my attention. Both of the functions were treating the object already being NULL as a normal, ignorable case. I thought about it for a moment, and could not come up with any valid situation where one would have an unknown completely uninitialised object. In some cases a local object pointer might be NULL until it gets initialised (to Qnil or Qundef if nothing else), but in those cases you also know what the type is already. So this seemed like something that could mask problems.. I asked for opinions on #rubinius, since it is always possible I was just not understanding the valid use case, but no-one else could think of any either so I changed both functions to raise an Assertion if given a NULL.
Aside from fixing some shortcuts in the tests exposed by this, I did actually find a fairly nifty bug! LookupTable, which is used as an extra-lightweight hash, was raising Assertions left and right. Somehow NULLs were getting in the hash buckets. After some research, it turned out that the offender was this line in LookupTable::remove (pseudo for clarity):
link = try_as<Tuple>(bucket.value_or_another_hash_bucket);
The intention is to check if the bucket has a link to another one and (later) recurse further into it if so. The problem is that if the element was not another bucket, i.e. of type Tuple, the link became NULL instead of the value object contained in this bucket. The rest of the code incorrectly assumed that link would always be the Tuple or the value so it was storing the NULLs. A quick fix to correctly handle the value, too, made sure we were not going to lose methods or instance variables and such.
Lesson in this? 1. I should really condense my writing; but also 2. always be as strict as possible. It helps to ensure you weed out stuff that you had not considered when writing the code, whether it turns out to be correct or incorrect behaviour.