Data management

I Still Haven't Found What I'm Googling For

Irish GoogleTwenty one years ago this month, in May 1987, Irish rockers U2 released their classic Joshua Tree single, I Still Haven't Found What I'm Looking For. Those twenty one years have seen incredible technological change: the adoption of desktop computers, mobile phones, the birth of the Web and the widespread use of search engines like Google. So with sincere apologies to Bono, The Edge, Adam and Larry, it's time we updated the lyrics for the 21st century. So, I give you "I Still Haven't Found What I'm Googling For" (21st anniversary, 2008 webby edition)...


Introducing the eyeLIMS project

Scientists usually share information with collaborators from all around the world. For that purpose, eyeOS (www.eyeos.org) provides an invaluable system to access and share documents, create and save data files or store crucial personal and professional information.

To see eyeOS widely used by scientists all around the world, we initiated the eyeLIMS project ! eyeLIMS is a community-driven project which aims at providing a Free, web-based, Open Source Laboratory Information Management System (LIMS) powered by eyeOS.


One Thousand Databases High (and rising)

StampsWell it's that time of year again. The 15th annual stamp collecting edition of the journal Nucleic Acids Research (NAR), also known as the 2008 Database issue [1], was published earlier this week. This year there are 1078 databases listed in the collection, 110 more than the previous one (see Figure 1). As we pass the one thousand databases mark (1kDB) I wonder, what proportion of the data in these databases will never be used?


Data Integration in the Life Sciences 2008

Mon dieu! Doesn't time fly? Data Integration in the Life Sciences (DILS) is here again, see the Call For Papers. This time, DILS will be in Evry near Paris. The conference is on June 25-27, 2008 but if you're thinking of doing a paper, you've got until February 20th 2008 to submit your paperware.


Journal article search via RSS mashup

I've been trying to come up with a nice way to mashup and process RSS feeds, mostly for the reason to be able to track articles from Journals that publish content that interests me. The best solution seems to be the workflows that can be constructed at Yahoo Pipes.


A Most Ugly Hack: translating from CHARMM to AMBER trajectories

Ever wondered how you might translate trajectories from one Molecular Dynamics package to another? It's a thorny little problem that's afflicted quite a few structural biologists. Here's one ugly solution that I am rather proud of.


Roundup: Extract a sequence from a fasta file

HMMs, SVMs, MCMC - interesting topics! But I have simple problems. Here is one: How do I extract some sequences from a fasta file if my a accession number are not Genbank ids themselved but other words that are still in the header?

Materials:
Hm. Pubmed won't export so many sequences from the web interface (at least I could not find a way, limit is 100) If I was a biologist, I would probably repeat the process manually 70 times to get 70 * 100 sequences. Which might have actually saved me a lot of time. But I wanted to be clever.


Publish or Perish software - now for Linux

Publish or Perish is an interesting (and free) piece of software, that obtains citations using Google Scholar and then analyses them in various ways. In particular it makes use of h-indices, which have been proposed as a "fairer" citation metric.

I've been in correspondence with the developers over the past couple of months and they kindly let me know that a native Linux version, built using GTK+ 2.x is now available. If citation analysis is your thing, give it a try and let the authors know what you think.


A pipeline is a makefile

What is a pipeline? For me, it' s series of steps that munch DNA/protein data, combines it with other data using various small scripts and outputs the results as diagrams or HTML. Do we want to code this kind of software as a script? If you think "makefile!" now, then you're much more clever than I was. But personally, until recently, I've glued my scripts together using other scripts. And used makefiles only for compiling my programs. That was a bad idea. (it's a quite detailed post, click on "read more" for the full article)


Genome wiki

Steven Salzberg has an opinion piece in Genome Biology that talks about Wikis for managing genome annotation and the problem of bit rot in gene annotation (would that be annotation rot?).

I wrote a quick post about it. This harkens to the other gene wiki discussions on nodalpoint as well as some of the things Ian Holmes and his group are thinking about on the GbrowseAJAX mailing list.


Syndicate content