BioGrids

From Tim Bray to Jim Gray (via Seymour Cray)

Recycle or Globus Toolkit?
Grid Computing already plays an important role in the life sciences, and will probably continue doing so for the forseeable future. BioGrid (Japan), myGrid (UK) and CoreGrid (Europe) are just three current examples, there are many more Grid and Super Duper Computer projects in the life sciences. So, is there an accessible Hitch Hikers Guide to the Grid for newbies, especially bioinformaticians?

Unfortunately much of the literature of Grid Computing is esoteric and inaccessible, liberally sprinkled with abstract and wooly concepts like “Virtual Organisations” with a large side-order of acronym soup. This makes it difficult or impossible for the everyday bioinformatican to understand or care about. Thankfully, Tim Bray from Sun Microsystems has a written an accessible review of the area, “Grids for dummies”, if you like. Its worth a read if you're a bioinformatician with a need for more heavyweight distributed computing than the web currently provides, but you find Grid-speak is usually impenetrable nonsense.

One of the things Tims discusses in his review is Microsoftie Jim Gray, who is partly responsible for the 2020 computing initiative mentioned on nodalpoint earlier. Tim describes Jim's article Distributed Computing Economics. In this, Jim uses wide variety of examples to illustrate the current economics of grids, from “Megaservices” like Google, Yahoo! and Hotmail to the bioinformaticians favourites, BLAST and FASTA. So how might Grids affect the average bioinformatician? There are many different applications of Grid computing, but two areas spring to mind:

  1. Running your in silico experiments (genome annotation, sequence analysis, protein interactions etc), using someone elses memory, disk space, processors on the Grid. This could mean you will be able to do your experiments more quickly and reliably than you can using the plain ol' Web.
  2. Executing high-throughput and long-running experiments, e.g. you've got a ton of microarray data and it takes hours or possibly days to analyse computationally.

So if you deal with microarray data daily, you probably know all this stuff already, but Tims overview and Jims commentary are both accessible pieces to pass on to your colleagues in the lab. If this kind of stuff pushes your button, you might also be interested in the eProtein Scientific Meeting and Workshop Proceedings.


Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

If you consider any kind of

If you consider any kind of distributed computing or clustering to be grid computing, then I guess you can indeed say that grid computing "plays an important role in the life sciences". Of course, if that's all it is, what need is there for the term "grid"? Wasn't the original idea of the grid concept to create a standard mechanism (not tied to any specific application) for consuming (or even sharing) resources such as processing power and disk space, similar to the way the power grid works?


What need is there?

The word GRID reminds me of the word GENE. It is either used in a vague and ambiguous way or it has multiple but precise definitions that are completely different. Despite this I think both words are useful and will continue to play an important role in the life sciences :)


Google @ help

Guys I read about GDisk thing [ http://blogs.zdnet.com/Google/?p=121 ] sometime back and then I also heard about google providing the ssh access to their supercomputer[ http://www.vmunix.com/mark/blog/archives/2004/07/27/what-is-google-build... ]. It may all be rumor but if it is going to be true, then I feel this will really be like a Venter to NIH thing looking at the above post!

______________________"The Answer Lies in Genome"______________________
http://fuzzylife.org/


Grids, Google etc.

It's been interesting to watch the minor 'Google backlash' of recent months. This seems to happen to all companies that start small, promote themselves as more ethical than others then achieve success. I remember a similar media reaction to the Body Shop back in the 80s. Google's foray into the Chinese market seems to have caused a lot of this reaction.

I've always had my doubts about grid computing and bioinformatics. As the original post points out, there's always been a lot of esoteric debate about "what is a grid" rather than "what can I do with one". In their cluster computing presentations, Bioteam, for whom I have a lot of time, have been quite dismissive of the grid concept in the past, simply because noone can quite define what it is.

Keeping it simple, let's say a cluster is a pile of machines in one physical location, tightly-coupled through a network switch and dedicated to a few specific tasks. A grid is a bunch of machines in numerous locations (perhaps scattered worldwide) that in theory can cooperate on any task with appropriate tools and services to couple them together. In the former case your tool is very clearly defined, in the second it is not. Which is why currently, local HPC resources for typical applications such as BLAST have the advantage. I simply can't imagine posting off my sequences to a Google-like service and getting back results in any reasonable time with the current state of grids.