Wolbachia genomes sure was a conversation stopper (pauses as tumbleweed blows across the screen)...perhaps I can share a few more thoughts on getting things done. One of my favourite topics, as an ex-wet lab biochemist turned computational biologist is the differences in working practice and mentality between these two types of scientist.
In our last discussion, we considered whether research is "non-linear" and I think that this notion is somewhat illusory. We do have goals to achieve and we take a stepwise path to those goals, but the steps may branch and are constantly being revised. In other words - we're making it up as we're going along.
I think a major difference between wet lab versus computational work is the degree to which we make it up as we go along. Here's a couple of examples. In the lab, you want to purify some RNA, reverse-transcribe it, end-label it and use it as a probe. In all likelihood, this has been done many times before and standard protocols have been written down which describe this task. There may be a few protocols with minor differences and you may experiment with slight alterations which improve the procedure but essentially, you have a recipe to follow. Mastering this recipe is a lab skill and generally, if you fail, it's because you did something wrong or one of the components was sub-standard.
Now a bioinformatics task - you want to take all the ORFs from a genome, BLAST them against the subset of sequences from the PDB, extract those queries that have a certain identity to a hit, thread them against a set of templates and build a homology model of the query protein. The difference here is that whilst you still have several steps in your procedure and some idea of what "protocols" (i.e. software) to use at each step, there are multiple ways to achieve this task. Do you want to grab the query sequences once from an online database (wget/ftp/an rsync server/a Bioperl fetch tool) or regularly obtain them for storage on your hard drive (mirror)? Once obtained, would you like to index them for easy retrieval? Would you like to speed up the BLAST search using a parallel implementation? How do you decide on a suitable cutoff above which query/template are suitable? What tool do you use to parse the BLAST report? What choice of threading software? Are you happy with the pre-supplied template database or do you want to build custom templates? What's the output format from threading - plain text, marked-up text? If the latter, how about tools to transform it for database storage or display? How about your final model PDB files? Do you want to check their quality in some way, view them graphically, how are you going to analyse them? And handle the output of the analysis tools? Is this a one-time project or something you'd like to automate as a pipeline and run regularly? How will you share the results with colleagues?
My point is that in computational work, we are actually making up the methods as we are going along and always considering new and better ways to perform tasks and achieve goals. Perl people like to say " there's more than one way to do it" and I think in computational work, there are multiple ways to do pretty much everything. Herein perhaps lies the biggest communication problem between wet lab biologists and computational biologists - I think the former group are frequently taught that methods are no more than a means to an end - it's the result that's all important. In computational biology, we spend a lot more time thinking about methods, because choice of method has enormous bearing on the quality of the result, the speed with which we obtain it and the ways in which we can further use it.


Comments
Oh, Wolbackia was just too co
Oh, Wolbackia was just too complex on us poor souls...
I will admit that I always experience a temporary dislocation when shifting from wet to computational work. It takes a week or so to get in the groove of either.
I'm not sure about the methods vs results thing, though. I think you're right that computational work requires method building, but so does some lab work: if you're doing something clever with existing technology, that's cool; but what often happens is that you have to invent the technology as you go along. I'd say that these days, RNA probing as you suggest is analogous to a web search for biological information. The task is of trivial difficulty because it's so common there are robust methods to deal with it (albeit with minor variations). But we've been RTing RNA for 20 years - I suggest that BLASTing whole genomes, for instance, will be as routine in less than that time from now.
I think it's a case of field maturity: there is a vast body of wet results in a number of fields, but there still remain numerous holes to be plugged. This 'grunt work' can be done with existing techniques and a minimum of imagination. The frontier work, however, still requires novel experimental design, and very often, method development. In contrast, there is still very little grunt work to do in the *omics; as that appears, I predict the emergence of the computational research assistant - a glorified script kiddie using existing methods/pipelines to mechanically generate data.
I agree as well, that it prob
I agree as well, that it probably has to do with field maturity. As more and more trained Bioinformaticians are entering the workforce and aiding in biological research, things might become more regular. And even think, that 20 years from now, the techniques we use might be considered ancient and archaic, like using a cesium chloride gradient to purify plasmid DNA, when perfectly comparable kits are made and sold releatively cheaply.
Superseded by what ?
Some very thought provoking comments here. My question is, future pundits, where do we look for the next generation of bioinformatics methods ? As Neil makes quite clear, there are many was to do data integration and build pipelines using present day bioinformatics tools. What will we be using in 20 years time ? Will there be more general solutions to data integration and pipeline development or will it still be done on an ad-hoc basis ? My current feeling is that wide adoption of semantic web technologies might clear the way ahead.
agreed
I agree that there is plenty of room for innovation in experimental work - regardless, I think, of how "frontier" you are. Perhaps most grad students are just not taught that way. Perhaps I've been at UNSW too long ;-)
I think we are already seeing computational research assistants - I've seen job postings of that nature. My point is not really that computational work is more innovative than experimental work, but that the freedom of choice in how to approach a computational problem is often greater, leading to a different mindset.
You're probably right about t
You're probably right about the freedom of choice, although some of the range may be deceptive. I suspect that, as things become more stable, methods of choice will emerge and others will fall by the wayside (by analogy with, for example, DNA sequencing techniques in the 70's).
The main difference I think is the fact that most computational work is with *ome level data, whose multivariate nature allows more elbow room for 'doing stuff' than univariate, one gene at a time approaches. How useful is one cDNA sequence, for instance, if you don't have a bunch of others to play with?