Verso - Don’t Neglect Unit Testing

Bad habits are hard to break. When I finish a first version of a function, I sometimes just make up an input, call it interactively, and move on if it seems to work, promising myself I’ll get around to properly testing it later. If someone asks me about it, I can only reply honestly: “I haven’t tested it thoroughly, but I’m pretty sure it works.” I do use and appreciate unit testing: when I wrote my first package, I tested it thoroughly, probably too thoroughly. Still, I often yield to the temptation of informal testing, and just as often regret it.

I relearned this lesson recently while writing tests for the image classifier I’ve been working on for several months. It uses the torch package, which contains R bindings for the torch machine learning library. The classifier uses a customized subclass of a class called torch_dataset to implement a pretrained neural network. torch_dataset instances organize data for model training and evaluation. The subclass, called candidate_image_dataset, does this with file paths to images and class labels. Sampling an instance directly creates a tensor containing that image’s data; it also exposes an attributed called metadata, a data frame with each image’s file path and class label. The subclass can also be configured to return a randomly selected image when indexed. But if sampling is disabled, indexing the $i$th image directly should yield the image found in row $i$ of metadata.

Almost as an afterthought, I added a test of the subclass’ .getitem method to confirm all this worked as described All it did was check that, for an instance $x$ that did not use sampling, x$getitem(1) returned the image whose file path was stored in x$metadata[1,].

I tend to be pessimistic, but I was still caught off guard when this trivial check failed. Unnerved, I looked through my code for the origin of the bug. The logical flag controlling whether an instance sampled indices did check whether the user had specified sampling weights, as expected:

if (n_weights != 0) {
  private$.sample_weights <- self$compute_sample_weights(
    images,
    sample_weights
  )
  private$.uses_sample <- TRUE
}

That left only one possible culprit: a wrongly set default value. Sure enough, in the list of private attributes:

private$.uses_sample <- TRUE

So the class was sampling whether the user indicated it or not. I changed that TRUE to FALSE and the test passed. Had I not taken a few minutes to construct this seemingly unnecessary unit test, I would not have noticed the bug. I would only have caught it at a less convenient time, possibly after it compromised results obtained from the classifier. Verifying the assumptions you make about your code, even obvious ones, is never wasted time.

Unit testing, at least for me, has another advantage: it’s tremendously motivating. When the console fills up with a traceback and a cryptic error message, I know I made a mistake somewhere. That makes me responsible for correcting it. I become very dogged when I have a clear goal in mind, so I seldom fail to track down and fix the bug. Pinning down the cause holds some interest, too, since it’s typically a surprise: anything from a misnamed variable to a subtle misunderstanding that compromises the whole algorithm.

Then, having hopefully learned my lesson, it’s on to the next test case.

--- title: "Don't Neglect Unit Testing" author: "Ryan Heslin" date: "2022-07-19" categories: ["R"] params: title: "Don't Neglect Unit Testing" --- Bad habits are hard to break. When I finish a first version of a function, I sometimes just make up an input, call it interactively, and move on if it seems to work, promising myself I'll get around to properly testing it later. If someone asks me about it, I can only reply honestly: "I haven't tested it thoroughly, but I'm pretty sure it works." I do use and appreciate unit testing: when I wrote my first package, I tested it thoroughly, probably too thoroughly. Still, I often yield to the temptation of informal testing, and just as often regret it. I relearned this lesson recently while writing tests for the image classifier I've been working on for several months. It uses the `torch` package, which contains R bindings for the `torch` machine learning library. The classifier uses a customized subclass of a class called `torch_dataset` to implement a pretrained neural network. `torch_dataset` instances organize data for model training and evaluation. The subclass, called `candidate_image_dataset`, does this with file paths to images and class labels. Sampling an instance directly creates a tensor containing that image's data; it also exposes an attributed called `metadata`, a data frame with each image's file path and class label. The subclass can also be configured to return a randomly selected image when indexed. But if sampling is disabled, indexing the $i$th image directly should yield the image found in row $i$ of `metadata`. Almost as an afterthought, I added a test of the subclass' `.getitem` method to confirm all this worked as described All it did was check that, for an instance $x$ that did not use sampling, `x$getitem(1)` returned the image whose file path was stored in `x$metadata[1,]`. I tend to be pessimistic, but I was still caught off guard when this trivial check failed. Unnerved, I looked through my code for the origin of the bug. The logical flag controlling whether an instance sampled indices did check whether the user had specified sampling weights, as expected: ```{r, eval = FALSE} if (n_weights != 0) { private$.sample_weights <- self$compute_sample_weights( images, sample_weights ) private$.uses_sample <- TRUE } ``` That left only one possible culprit: a wrongly set default value. Sure enough, in the list of private attributes: ```{r, eval = FALSE} private$.uses_sample = TRUE ``` So the class was sampling whether the user indicated it or not. I changed that `TRUE` to `FALSE` and the test passed. Had I not taken a few minutes to construct this seemingly unnecessary unit test, I would not have noticed the bug. I would only have caught it at a less convenient time, possibly after it compromised results obtained from the classifier. Verifying the assumptions you make about your code, even obvious ones, is never wasted time. Unit testing, at least for me, has another advantage: it's tremendously motivating. When the console fills up with a traceback and a cryptic error message, I know I made a mistake somewhere. That makes me responsible for correcting it. I become very dogged when I have a clear goal in mind, so I seldom fail to track down and fix the bug. Pinning down the cause holds some interest, too, since it's typically a surprise: anything from a misnamed variable to a subtle misunderstanding that compromises the whole algorithm. Then, having hopefully learned my lesson, it's on to the next test case.