I knew that sequencers are getting cheaper all the time but in genome technology this week they're talking with the inventors of the technology that 454 is licencing, discussing future pyro-sequencing updates and how that should lead to very cheap machines. Prospects are that any lab can sequence it's own genome in three years, the technology seems almost ready: Basically a cheaper and smaller version of 454's current machines. If you believe that sequence databases are exploding at the moment, better prepare for a new wave.


Comments
Short reads
I just came out of a group meeting where 454 was discussed. Something I hadn't realised is that the read length is very short - 100 bp or so. This makes 454 great for resequencing, mapping reads onto existing assemblies and finishing, not so great for assembly of a genome from scratch. I'm excited by the idea that any lab could do a genome one day, but it won't be for a while yet.
I vaguely recall a technology where single DNA strands are fed through nanopores and changes in electrical resistance used to read bases? If anyone has a better memory than mine, feel free to comment. That kind of technology strikes me as "the way forward" for truly high-throughput sequencing.
Solexa
Solexa's technology produces even shorter reads - approx 30bp or so. It seems to be rapidly gaining in currency and analysis effort, if the Cold Spring Harbor meeting I attended earlier this month is any indication. A number of the large sequencing centers are choosing which way to split, and rather predictably both Solexa and 454 are getting attention. It will be interesting to see whether one emerges as a winner, or as suggested below, the combination will prove more powerful.
It is certainly becoming clear that these technologies are only likely to be useful in resequencing and EST-like projects where reference genome scaffolds already exist for assembly. I'm sure someone will come up with a clever work-around eventually, though.
Upgrade to 250bp in the works?
As far as I know, the 454 people promised an "upgrade" of their technology to extend the maximum reads to 250 bp, I believe. Of course, I want to see it before I believe it.
However, I will be able to confirm in a few months, as the people responsible for the 454 instrument here want to perform that upgrade.
shorts reads
short reads are less a problem if one has a close genome at hand that has already been sequenced... the more genomes are available the easier it is to simply assemble by alignment, which is similar to what ensembl is doing now with their 3x genomes (see their 2007 article in NAR).
but anyways, yes, I was rather thinking about re-sequencing than sequencing from scratch.
We have one of those
I can say that handling the data from that beast is quite a task, according to my co-workers that rebuild and analyze the instrument's reads to get clean and (possibly gap-free) sequences.
In my opinion pyrosequencing such as what the 454 system does will not be a viable option for many institutes until there is this large problem to tackle. At first we had just one person working on the data, but it became evident that it wasn't enough. Now we have three people dedicated to the data analysis, on different projects.
Bead Events ???
I work with this data every day and let me say that it's a bit of pain and pleasure. The data management issue is something most of us dont have a problem with, but the pain point comes with having to continually refine our analysis to handle data inconsistencies.
On the whole 454 are making a great effort to deliver clean datasets to their customers. The longer reads do help although I'm a bit wary of these homopolymers and phase errors. And then there are these "Bead Events" that I keep on hearing about from various people. Personally, I dont think that we should ever expect too much from a new technology that is trying to take massive strides too soon. I like the idea of combining data sources together e.g Sanger + 454, Solexa + Sanger + 454, etc...
Mostapha Ronaghi, the inventor of the first pyrosequencing system, gave quite an informative talk in Malaysia about technologies, the state of the art, cost issues, etc. It can be streamed online from
http://www.mgrc.com.my/eLectureRonaghi.shtml.
Back to the data issue. Does anybody have more insights in what kind of inconsistencies or other things that people may need to be wary of?