The Scriptome is a cookbook of perl one liners for bioinformatics data processing tasks. Designed for you non-coding-biologists friend, the Scriptome is available in Unix and Windows versions. The concept is introduced in the following article: Data Munging for Non-Programming Biologists.
One of my greatest concerns in talking to people about biologists' data munging is that people don't even realize that there's a problem, or they think it's already been solved. Biologists--who happily pipette things over and over and over again--don't realize that computers could save them lots of time.
Nice work, and I hope that they are successful with the project, however I'm doubtful. Read on for the rant...
Maybe I'm getting a bit cynical, or getting tired of biology and computers in general, but I think this is kind of project is just never going to work. It is not going to achieve the result we all want, which is to eliminate the stupid questions from people who do biology, but at the same time remain purposefully ignorant of computers. I say purposefully ignorant, because there really is no excuse. I know there are computer science graduates who read nodalpoint that now work in biology labs and were required to gain some biological knowledge to do their jobs. Why is it that the reverse is not true ? In recent interviews I have been asked by so-called biologists to make the process of building complex data analysis pipelines more "transparent", some kind of "visualization" they ask ?
I assure you I have tried every variation of boxes and arrows, and UML etc. to communicate the mechanics of complex data integration task and why these things should be (not must) be approached in a rational way (i.e. not a bunch of ad-hoc scripts), it never works. Discussing parsers, data models, databases interfaces, and the millions of tiny issues encountered in "practical bioinformatics" makes their eyes glaze over (very bad idea in presentations). No, no, no, they say, we don't want you to do computer science, we want you to "just build a database" (i.e help us because we have no idea) not do scary computer science.
As a biology honors graduate, I would have considered my advisor to be totally incompetent to set me the task of doing hundreds of enzyme assays without the aid of a multichannel pipet. Oh I wasn't aware that multichannel pipets existed ? What ? You mean you can't just "combine the data from different databases" ? Give me a break.


Comments
Scriptome home page has changed
The Scriptome home page has a new location. Google's first link is an incorrect page due to redirecting issues.
Greg, are you able to change the link in the original article?
Thanks,
-Amir Karger
The link is now up to date,
The link is now up to date, thanks for letting us know !
anti-rant
You express understandable skepticism. Here's 4 reasons we thought it was worth building the Scriptome.
1. I agree that it's not fair for biologists to say they shouldn't need to learn anything. However, it's also not fair for programmers to say, "Just learn Perl (C, Python, whatever)." Why should biologists have to learn objects, bitwise operators, and the difference between ' and " just to do some simple data munging?
The first and most important goal of the Scriptome was to give biologists tools to do data munging without lots of training. Most biologists spend much more time doing lab stuff than analyzing, so it's just not worth their time to devote months to becoming a solid programmer.
2. The Scriptome helps non-Perlers to use and learn Perl for bioinformatics. It provides short, relevant, working examples of code - which is the way I usually learn new languages. We're hoping to add an "Explain this" button to each tool soon, which will provide a pretty-printed, commented version.
3. The really sneaky part. We seduce biologists by showing them that in just a couple hours, they can be getting real work done on a computer. I like to think of it as a catalysis - the overall amount of work to learn programming is the same, but we lower the initial barrier so they can get at least a little done with only a little time investment. (The same can't be said for C or Perl. ) Then, when the biologist sees that a tool doesn't do quite the right thing, she can tweak the code - which is way easier than writing from scratch. (Our second user changed a "find the max" tool to "find the min".) This avoids (some) boring support tasks in the short run, and creates new programmers in the long run.
As I said in the article, many biologists don't even realize you can automate this stuff, and that it takes only a couple lines of code. I taught a 3-hour class on using the Scriptome to 5 people, and two of them said at the end, "Now that I see how powerful Perl is, I'm going to skip the Scriptome and just learn Perl instead." As George Bush would say, Mission Accomplished!
4. By chance, it turns out that these scripts can be useful even for experienced programmers, who don't want to re-write (and re-debug) a one-liner for the Nth time. A Perl Cookbook for bioinformaticists, if you will. For example, my officemate just today needed to remove duplicate FASTAs from a file where the same sequences were duplicated, but with different IDs. Rather than writing a script to read FASTA and hash on the sequences, we went to the Scriptome site, and cut and pasted:
In the end, I think the Scriptome can help a whole lot of people. Even if we only help the segment of biologists that aren't lazy, it can still make the world a better place - and give me more time to read slashdot at work.
-Amir Karger
perl article link
I just added the correct URL to the data munging article. Fans of Perl will know perl.com and the humour of some of its articles - this one's no exception.
my rant follows your rant
Scriptome is not a bad idea - although it's rather a grand name for what's really a crash course in Perl/Bioperl.
I share your frustration when it comes to education of the wilfully ignorant. As someone from the biology->computing direction, I see the many benefits to my work (and I'd say life in general) that improved computer literacy has brought to me. Most notable, I suppose, is time saved through more efficient ways to organise information and automation of what would previously have been 'cut and paste' type operations. A little scripting knowledge goes an awful long way. I think "if only more people could experience what I have experienced, science in my department would be vastly improved". Yet daily, I still witness events such as 200-page BLAST reports emerging from our printer. And I ask "why?"
I suppose the reasons are many and complex. In part, it's because people educated in physical/mathematical sciences are taught a fair degree of computer science and biologists are not. The former are taught that "in the beginning was the command line" and a programming language, the latter are sat down in front of a Windows PC (or maybe a Mac) and told to open Excel. Their computers are not tools to be bent to their will - rather, the human is slave to the machine. The machine is a black box - if it doesn't do what you expect, you are helpless.
In part, I think it's because biologists are taught that tedious drudgery is normal and even to be respected. We've all been to talks where the speaker tells a "funny" story about how one of their students spent 6 months on some awful mind-numbing task of manual curation or hopeless experiment, everyone has a giggle but they really believe that it was necessary to achieve the end result. I think this numbs the mind to the notion that many tasks can easily be automated.
However, the really irritating category of biologist that we're discussing here is the type who wants all of the benefits without any investment of effort on their part. They cry "I just want it to work" - yet only when it comes to computation. Would these same people place the reagents and apparatus necessary for a wet lab experiment on their bench, stand back, yell "I just want it to work!" and expect their PCR reaction to run itself? No they would not, yet this is how they behave in front of a computer. Any attempts at explanation, even when effort is made to avoid as much technical jargon as possible, are met with blank "I don't want to know that" faces.
How do we deal with these people? We don't. Just say "if you won't help yourself, it's not my job to be your brain". Ultimately, you can only conclude that they lack the intelligence, wit and resourcefulness to be successful scientists. Let them believe if they wish that biology is some odd branch of science where every rule has an exception, logic and reason don't apply and nothing is predictable. We are already seeing that the most successful biologists are the open-minded ones with skills in the acquisition, storage and analysis of large datasets. Those who refuse to participate will soon be falling by the wayside.
Utility
Although I broadly agree with your rants, I will point out that something like Scriptome is useful to people like me. I don't really speak fluent perl (if you can believe it) because I don't really use it. I've spent the best part of the last four years writing R, and switching to perl is frustrating because I have to think before I type. One-liners like not only save me time, but they help me get the feel of perl in a non-hello world way.
So it's useful to me and the other two binfs who don't speak perl, I guess...