Wednesday, October 21, 2009

Public Service Announcment: Evolution of the centrosome

Information on the evolution of the centrosome can be found on Wikipedia and in the scientific literature. Not elsewhere.

(I just incorporated part of a grant proposal I sent in last week on Wikipedia. I hope this doesn't lead to any plagiarism charges. ;-) )

Thursday, October 1, 2009

Rebasing in Mercurial

After I used git for my own projects for a while, we switched the development of the STRING and STITCH databases from svn to mercurial (on bitbucket). Coming from git, I found two essential things lacking: (1) automatic coloring and paging of diffs and (2) rebasing.

Problem 1 is solved by enabling the pager and using the "attend" option to specify which command should go to less. You'll also have to globally set the options "SR" for less (e.g. "setenv LESS SR").

To rebase means to take code changes that were developed in parallel and make it look like they were developed sequentially, effectively avoiding commits which have the only purpose of merging independent changes. Rebasing before pushing also avoids the problem that you can silently drop previous changes by pushing without pulling beforehand:
That is, the branch name is stored in the changeset. The flaw is that it's quite easy to have more than one branch with the same name, and it's difficult to tell when this has happened. This can cause confusion in a team where one is left wondering what changes, exactly, have made it into the "stable" branch when multiple people have reopened and merged the branch on different timelines.
The problem of the "pointless" merges is solved beautifully by the rebase extension, which is included by default in current versions of hg. I think this extension is under-advertised.

To briefly compare mercurial and git: I think git's approach is a more radical break from subversion etc. and therefore more consistent. However, it's also harder to wrap your head around, which is why we chose mercurial in the end.

For reference, here's my .hgrc.

Wednesday, September 16, 2009

Test (also: we built a funicular)

This is mainly a test to see if the image will show up in FriendFeed (through Feed-buster). But the image is interesting as well: my son and I built a funicular with Duplo bricks, modeled after the one in Dresden. :-)

Thursday, September 3, 2009

Learning ggplot2: 2D plot with histograms for each dimension

Update (April 2013): The code below doesn't work anymore with new ggplot2 versions, here is an updated version.

I have two 2D distributions and want to show on a 2D plot how they are related, but I also want to show the histograms (actually, density plots in this case) for each dimension. Thanks to ggplot2 and a Learning R post, I have sort of managed to do what I want to have:

There are still two problems: The overlapping labels for the bottom-right density axis, and a tiny bit of misalignment between the left side of the graphs on the left. I think that the dot in the labels for the density pushes the plot a tiny bit to the right compared with the 2D plot. Any ideas?

Here's the code (strongly based on the afore-linked post on Learning R):

p <- colour="cyl)<br" data="mtcars," geom="point" hp="" mpg="" qplot="">
p1 <- br="" legend.position="none" opts="" p="">
p2 <- aes="" colour="cyl))<br" ggplot="" group="cyl," mtcars="" x="mpg,">p2 <- br="" fill="NA," p2="" position="dodge" stat_density="">p2 <- axis.title.x="theme_blank()," br="" legend.position="none" opts="" p2="">      axis.text.x=theme_blank())

p3 <- aes="" colour="cyl))<br" ggplot="" group="cyl," mtcars="" x="hp,">p3 <- br="" coord_flip="" fill="NA," p3="" position="dodge" stat_density="">p3 <- axis.title.y="theme_blank()," br="" legend.position="none" opts="" p3="">      axis.text.y=theme_blank())

legend <- br="" keep="legend_box" opts="" p="">
## Plot Layout Setup
Layout <- grid.layout="" ncol="2,<br" nrow="2,">   widths = unit (c(2,1), c("null", "null")),
   heights = unit (c(1,2), c("null", "null")) 
vplayout <- br="" function=""> grid.newpage()
 pushViewport(viewport(layout= Layout))
subplot <- function="" layout.pos.col="y)<br" layout.pos.row="x," viewport="" x="" y="">
# Plotting
print(p1, vp=subplot(2,1))
print(p2, vp=subplot(1,1))
print(p3, vp=subplot(2,2))
print(legend, vp=subplot(1,2))

Monday, July 27, 2009

One step towards writing papers in Google Wave

Google Wave's underlying technology will not only enable collaboration with other people, it also make it possible for bots to interact with what you've written. I think this is going to change the way we work. E.g., all applications which require a significant amount of typing will benefit from the statistical auto-correction provided by the Wave app Spelly. In effect, Spelly goes over the text as you're typing it and correcting the obvious mistakes, just as you would do a bit later.

In a similar vein, the proof-of-concept bot Igor is watching out for inserted references and automagically converts them to a citation and a reference list. When writing papers, I usually insert reminders: "REF Imming review", "REF PMID 16007907". If I adjust this convention a bit and provide a bit more detail, Igor can figure out by itself which paper is meant and fetch the citation. Google Wave and Igor save me the tiresome going back-and-forth between a reference manager and the editor to insert all the citation, and they remove distractions from the process of writing and editing the paper.

Of course, this is a proof of concept, so the style can't yet be customized. I further think it would be helpful to quickly look "what's inside" a particular citation. I don't know if Google Wave supports this, but it would be nice to click on a citation ("[23]") and be presented with a pop-up window showing not only infos about the article, but also links to PubMed / a DOI resolver.

Thursday, May 21, 2009

How good is Wikipedia's coverage of chemical compounds?

Wikipedia has an excellent coverage of chemical compounds, featuring above 20000 articles whose names match those PubChem compounds. After finding a few important chemicals not featured in Wikipedia, I wanted to quantify the coverage of Wikipedia and point out the gaps that should be filled.

I wondered how much coverage Wikipedia actually has for "important" chemicals. Here, I define importance as "number of hits in PubMed", since that is a thing that I can easily measure (and, in fact, already determined as part of working on STITCH and Reflect).

Missing chemicals
For each bin of 100 chemicals, the number of PubMed hits for all synonyms of this chemical is plotted against the fraction of the chemicals that have a Wikipedia article for any of the synonyms. (I exclude three-letter names as they are often ambiguous.) So, for compounds that occur more than 1000 times in PubMed, Wikipedia's coverage is above 80%. Here is the list of articles that should be added.

So, if you know something about one of the missing compounds, go right ahead and create an article! :)

Missing synonyms
The second question is if Wikipedia is missing important redirects, i.e. if there are widely-used names for chemicals that don't occur in Wikipedia even though an article exists for the chemical itself (just under another name). For very common names, the coverage is slightly lower, however, the abstracts in PubMed often contain chemical notation that people probably won't use when searching Wikipedia, e.g. "Ca(2+)" is the top hit on the list of redirects that could be added.

Wednesday, April 22, 2009

Announcing SIDER: a database of side effects

After using side effects to predict drug targets, we now created a public database of side effects with a total of 62269 side effects for 888 drugs. The database was created by doing text-mining on labels from various different public sources like the FDA. Furthermore, I developed rules to extract frequency information from the labels, this worked for about one third of the drug–side effect pairs.

We think that this database will make quite a bit of interesting research possible.

Wednesday, March 25, 2009

Negative controls and the police

Oh wow. In the last few years, the German police hunted a woman they only know from her DNA, which had been found at over 40 crime locations all over Germany and Austria (including the murder of a police officer in Heilbronn). Now, they slowly come to the conclusion that a charge of DNA collecting equipment got contaminated by a female employee... Running negative controls would have been really useful, no? (German source:, via:

Monday, March 16, 2009

A little hack: "mark as read" for FriendFeed

FriendFeed is missing one important thing: the ability mark posts that didn't change since you last saw them as "read." Ideally, new posts or posts with new comments should be visually different from those that are unchanged.

I created a bookmarklet that lets you put a visual indicator of your last visit to FriendFeed into the stream of posts. (You should click the bookmarklet when you begin reading, because during reading you won't see posts that are posted while you read. New and updated posts will percolate to the top, and appear on top of the marker.) It works by using the FriendFeed API to create a post to a special room. As this is a hack, you'll have to adapt the bookmarklet.

  1. create a (private) room, e.g. USER-read
  2. drag this link to your button bar: FFread (RSS readers might mangle this)
  3. adapt the bookmarklet, substituting USER for your username (2x) and KEY for your remote key (1x)
  4. go to FriendFeed and test it
  5. optional: see how often you are visiting FriendFeed and calculate how much time you wasted :-)
Update: This worked in Firefox 3.0. I updated to the Firefox 3.1 / 3.5 beta, and it doesn't work anymore. :-/

Sunday, March 15, 2009

Keine Experimente!

Lange habe ich diese Idee mit mir herumgetragen, jetzt habe ich sie dank Deutschem Historischen Museum und Vector Magic in die Tat umgesetzt. Den Vektorisierungs-Look finde ich sogar ziemlich passend.

Sollte der Witz unverstanden bleiben, hier die Auflösung. Andernfalls das ganze als PDF, zum Ausdrucken oder Editierien (statt Bioinformatik kann man ja auch Theoretische Physik favorisieren).

Friday, March 6, 2009

A seminar on Makefiles!

This blog, my tumblelog, and my Twitter stream are littered with examples of my use of Makefiles. So I was very happy to find this blog post about a whole seminar on Makefiles on FriendFeed. Go and check it out, I learned something new!