April 12, 2018
Something that will make life easier in the long-run can be the most difficult thing to do today. For coders, prioritising the long term may involve an overhaul of current practice and the learning of a new skill. This can be painful for a number of reasons:
Something that will make life easier in the long-run can be the most difficult thing to do today.
I’m doing a PhD in image analysis, working with a lot of microscopy images of cells, all of which are in the TIFF format. ImageJ is a nice software for image viewing and processing. R is a nice software for image processing but is not as good as ImageJ for viewing and playing around. For me, as an R enthusiast, it was ideal to do my image processing in R and my viewing in ImageJ.
On CRAN, the tiff
and magick
packages can both read TIFF files, and on Bioconductor, there’s EBImage
. However, all of these packages sometimes struggle with TIFF files written from ImageJ in that they wrongly perceive some images to have only one channel when in fact they have many (channels encode colour information: colour images have 3 channels - red, green and blue - whereas black and white or greyscale images have one channel - grey). Once the images are read into R, I am able to rejig them (with a combination of aperm()
and abind()
) into the format I want. However, the mistakes that the packages make vary, and thus the images require different rejigs. Every time I wanted to read an image into R, the process was read, check, rejig, check. This is a lot longer than just read. Nonetheless, this lengthy reading process still wasn’t that long in absolute terms: I could read in an image and have it in the format I wanted in about a minute. I processed thousands of images over the following two years (with various image analysis techniques). For each one, I went through read, check, rejig, check . . .
In 2016, I attended Bioconductor’s CSAMA event in Italy. Jenny Bryan was there advocating the tidyverse, happygitwithr and various other good ideas. At that time, I wasn’t much into the R scene, so I’d never heard of Jenny nor of many of the things she mentioned, but I was very taken by her teaching (she grabbed my attention with the tip that Alt
+-
in RStudio gets you the assignment operator <-
). The advice that resonated most was that NOW is the time to do that thing that you know you ought to do eventually. We all know that this is good advice, but that doesn’t mean we don’t need to be told it frequently.
Jenny advised that every regular R user should write a package containing the functions that they always use (rather than source()
ing them from some random place atop every .R
file). So I learned git, made a GitHub account and wrote my first R package. I decided to start easy, gathering together all of the functions that I use to rename and organise my files. This became (the terribly named) filesstrings
(now on CRAN). The most useful thing it can do is to extract numbers from strings with functions like first_number()
. filesstrings
used Rcpp
, so I got comfortable writing R packages that use C++. Having completed filesstrings
, I looked more seriously into writing a package that solved my ImageJ TIFF issue. It was clear that I would require the libtiff
C library, and thus would need to use R’s C interface, so I looked for documentation of it. The section in Hadley Wickham’s Advanced R and his GitHub repo r-internals are both good resources, but I still feel that this facet of R is under-documented. As a result, I couldn’t really get to grips with R’s C interface (at least not within a week as I’d hoped), so I gave up and returned to read, check, rejig, check.
In the end, I did write ijtiff
—the package I had been longing for—using libtiff
and R’s C interface. What happened that pushed me over the edge to learn everything I needed to know about C and R’s interface to it and to toil away for six weeks with clang
errors and RStudio crashes and valgrind
in Linux virtual machines? Nothing in particular. I was doing yet another read, check, rejig, check and I had had enough. I decided to take another crack at ijtiff
. This time, with the help of what I’d learned during all of my previous failed attempts, I made it over the line.
So of course I was able to do it all along, it just took longer than I thought (it’s just reading a TIFF file into an array, right?). Most things take longer than we think, so that shouldn’t necessarily discourage us so immediately. Of course I should’ve done it right at the start of my PhD (rather than at the start of my 4th and final year), but for a combination of the silly reasons outlined above, I hesitated. I hesitated for three years. Anyway, it’s done now and I’m very happy with it. My feeling of delight that it’s now done easily trumps the feeling that I wasted time by waiting so long to do it. My R life is more efficient and less frustrating now. To check out ijtiff
for yourself, see the CRAN page, the GitHub or the vignette.
With the help of what I’d learned during all of my previous failed attempts, I made it over the line.
As mentioned previously, there is a lack of accessible information on R’s C interface. The demand on a few experts like Hadley Wickham, Kevin Ushey, Jim Hester etc. to provide resources for the rest of us is huge. Although these people are very generous and well-appreciated, they only have a finite amount of time. As such, I want to contribute to the documentation on R’s C interface.
ijtiff
package. This type of review where the reviewers actively help you as well as objectively evaluating your work is a revelation. Getting the advice and help of people like Jon Clayden and Jeroen Ooms was invaluable. Thanks also to editor Scott Chamberlain and to community manager Stefanie Butland who is helping me with this blog.