Tuesday, June 14, 2016

My knitr LaTeX template: manuscript and supplement interleaved in one source file

Most of the time between starting manuscript and having it accepted after peer-review is spent writing, re-writing and re-arranging content. In Word, keeping track of figure numbers is a big pain, even more so when figures are moved between the main manuscript and the supplement. Moving my Word manuscript to knitr, I first had to decide between RMarkdown and LaTeX. Adding citations and figure references in RMarkdown seems still a bit experimental, and I decided to go with LaTeX.

For my template, I implemented two main features. First, thanks to a tip by Iddo Friedberg, supplemental figures are automatically numbered "S1", "S2", etc. Second, I added a bit of LaTeX magic to interleave parts of the main manuscript and the supplement: If I want to move a paragraph or figure to the supplement, I just wrap it with a "\supplement{ }" command. That is, in the source code, the main and supplemental text are right next to each other, and only in the generated PDF are they separated into two parts of the document.

Of course, fulfilling journal requirements once the manuscript is accepted might still take some time, but compared to the time saved in the previous stages this is quite acceptable.

Here is the template, and here an example PDF generated from the template. (For a complete manuscript, see here.)

Tuesday, April 19, 2016

Display element ids for debugging Shiny apps

My current Shiny project contains at least five tables and I constantly forget how they are called. So I whipped up a little bookmarklet that uses jQuery to show the id of each div and input. Some of those can be ignored as they are internal names set by Shiny, but most are the actual names you define in R.

Drag this link to your bookmarks: Show IDs

Friday, March 11, 2016

New R package: a dictionary with arbitrary keys and values

Coming from Python, the absence of a real dictionary in R has annoyed me for quite some time. Now, I actually needed to use vectors as keys in R:

d <- dict()

d[[1]] <- 42
d[[c(2, 3)]] <- "Hello!"
d[["foo"]] <- "bar"
d[[c(2, 3)]]
d$get("not here", "default")


# [[ ]] gives an error for unknown keys

Under the hood, separate C++ dictionaries (unordered_map) are created for the different types of keys. Using R's flexible types that are reflected in Rcpp (as SEXP), such a dictionary can store both numbers and strings (and other objects) at the same time.

The package is available on GitHub: https://github.com/mkuhn/dict

Tuesday, March 8, 2016

Avoiding unnecessary memory allocations in R

As a rule, everything I discover in R has already been discussed by Hadley Wickham. In this case, he writes:
The reason why the C++ function is faster is subtle, and relates to memory management. The R version needs to create an intermediate vector the same length as y (x - ys), and allocating memory is an expensive operation. The C++ function avoids this overhead because it uses an intermediate scalar.
In my case, I want to count the number of items in a vector below a certain threshold. R will allocate a new vector for the result of the comparison, and then sum over that vector. It's possible to speed that up about ten-fold by directly counting in C++:

Often this won't be the bottleneck, but may be useful at some point.