Friday, January 6, 2012

Embarrassingly parallel BLAST search

A quick note: To blast a file with many proteins against a database, you can use recent version of GNU Parallel to fill up all CPUs (which the -num_threads option of BLAST doesn't do, as it only parallelizes some steps of the search):

cat query.fasta | parallel --block 100k --recstart '>' --pipe \ 
    blastp -evalue 0.01 -outfmt 6 -db db.fa -query - > result.tsv

This will split the FASTA file into smaller chunks of about 100 kilobyte, while making sure that the records are valid (i.e. start with an ">").

Thursday, August 11, 2011

ggplot2: Determining the order in which lines are drawn

In a time series, I want to plot the values of an interesting cluster versus the background. However, if I'm not careful, ggplot will draw the items in an order determined by their name, so background items will obscure the interesting cluster:

Correct: Interesting lines in front of backgroundWrong: Background lines obscure interesting lines

One way to solve this is to combine the label and name columns into one column that is used to group the individual lines. In this toy example, the line belonging to group 1 should overlay the other two lines:

Friday, July 15, 2011

SSH to same directory on other server

In my work environment, home directories are shared across servers. I often realize that I need to run a script on another server, e.g. due to RAM requirements or other jobs running on the current server. I finally figured out how to log in to the other server while staying in the same directory:

ssh -t server cd `pwd` "&& $SHELL"

(Of course, you can define aliases for different servers.)

Tuesday, June 21, 2011

Mekentosj Papers and Dropbox: get a bit more space

I use Dropbox to keep an automatic backup of my Papers library. However, I was scraping close to my space allowance, and discovered that Papers adds some temporary files to the "Library.papers2" folder that eat up precious cloud space. Here is how to remove them from Dropbox:
  1. Quit Papers 2.0. Make a backup of your Papers library.
  2. Go to the "Library.papers2" folder inside your Papers library.
  3. Copy the folders "Thumbnails" and "Spotlight" somewhere (e.g. your Desktop). The other folders don't need much space in my case.
  4. Click the Dropbox icon in the menu bar, navigate to Preferences, Advanced, Selective Sync and de-select the "Thumbnails" and "Spotlight" folders. (Dropbox will now delete the folders.)
  5. Move the folders you copied before back into the "Library.papers2" folder.
  6. Via Dropbox's web interface, delete the online version of the two folders.
  7. Voila: more space (200 MB, in my case).

Thursday, March 10, 2011

Comparing two-dimensional data sets in R; take II

David commented on yesterday's post and suggested to put the continuous fitted distribution in the background and the discrete, empirical distribution in the foreground. This looks quite nice, although there's a slight optical illusion that makes the circles look as if they'd be filled with a gradient, even though they're uniformly colored:

Not-so-good fit

Better fit

Wednesday, March 9, 2011

Comparing two-dimensional data sets in R

I wanted to fit a continuous function to a discrete 2D distribution in R. I managed to do this by using nls, and wanted to display the data. I discovered a nice way to compare the actual data and the fit using ggplot2, where the background is the real data and the circles are the fitted data (the legend is not optimal, but for a slide/figure it's probably easier to fix it in Illustrator):

A not-so-good fit

A better fit

My data frame includes these columns: x, y, enrichment (the real data), pred (my fitted values).

Tuesday, March 1, 2011

From FriendFeed to TwitterFeed: Losing the filters

These are the things I want to see in my feed: articles and papers you found and wanted to share, your opinions, your blog posts; to a lesser extent: your bookmarks.

However, I these things don't care about in my feed: GitHub activity, BioStar comments, shared YouTube videos, songs you liked, Wikipedia activity, etc.

On FriendFeed, it was easy to hide sources I didn't want to see, as each stream of content was clearly separate. It was even possible to hide an item until someone of my network liked it or commented on it. On Twitter, this seems almost impossible. First, everyone is setting up TwitterFeed as they see fit, so even finding a way to identify streams of items I would like to hide is not easy. Second, how would I hide it? Twitter doesn't even filter blocked users properly, they still show up in search. My current client of choice, Twitterrific, doesn't support any kind of filtering.

We all know this: "It's Not Information Overload. It's Filter Failure." Yes, the members of my network can be (active) filters. But I also need passive filters to reduce the amount of information that reaches me. Otherwise, my social network becomes much less useful to me. My fear is that, if all of us pipe everything into Twitter, it becomes a useless mess.