Nat Torkington has a thoughtful post up on the O’Reilly Radar where he writes about ubiquitous computing (or ubicomp as he calls it).
I want to cast that problem a little differently. Let us assume we have a scalable cloud that we can tap into, perhaps with databases that easily scale to support streaming data. Let us imagine a lab, with instruments streaming out data, with devices consuming those data, and systems that can make decisions based on the data they are receiving. Such an interconnected work, a world with pervasive, ubiquitous computing might sound like science fiction, but over the past few years, we have slowly but surely built up the beginnings of an infrastructure that will make this scenario possible, perhaps a lot faster than we thought. The iPhone, the Chumby, the Bug, these are just early examples, as are streaming video services and communication platforms like Twitter and XMPP.
Of course, before we get there, we really need to develop systems that are smart about making decisions and filtering information, otherwise, we’re just going to get buried in a deluge of data that will make our heads spin, individually and collectively.
Update: On a semi-related note, here’s an Ignite talk from Where 2.0 that is an example of where we are headed (via O’Reilly Radar)
Ignite: Health In the Real World - Steven Hammond
By opening a geospatial window on patient-entered medical information, PatientsLikeMe is changing the way patients and researchers look at diseases and treatments in long-term illnesses like ALS, MS, and HIV.
Further reading
Research Streaming
Image via Wikipedia
Technorati Tags: Ubiquitous Information, Streaming Data
Using the massively parallel technique of Sequencing by Oligonucleotide Ligation and Detection (SOLiD) we have assessed the in vivo positions of more than 44 million putative nucleosome cores in the multicellular genetic model organism Caenorhabditis elegans. These analyses provide a global view of the chromatin architecture of a multicellular animal at extremely high density and resolution. While we observe some degree of reproducible positioning throughout the genome in our mixed stage population of animals, we note that the major chromatin feature in the worm is a diversity of allowed nucleosome positions at the vast majority of individual loci. While absolute positioning of nucleosomes can vary substantially, relative positioning of nucleosomes (in a repeated array structure likely to be maintained at least in part by steric constraints) appears to be a significant property of chromatin structure. The high density of nucleosomal reads enabled a substantial extension of previous analysis describing the usage of individual oligonucleotide sequences along the span of the nucleosome core and linker. We release this dataset, via the UCSC Genome Browser, as a resource for the high-resolution analysis of chromatin conformation and DNA accessibility at individual loci within the C. elegans genome.
The Trichoderma reesei genome paper was recently published in Nature Biotechnology from Diego Martinez at LANL with collaborators at JGI, LBNL, and others. This fungus was chosen for sequencing because it was found on canvas tents eating the cotton material suggesting it may be a good candidate for degrading cellulose plant material as part of cellulosic ethanol or other biofuels production. The fungus also has starring roles in industrial processes like making stonewashed jeans due to its prodigious cellulase production.
The most surprising findings from the paper include the fact that there are so few members of some of the enzyme families even though this fungus is able to generate enzymes with so much cellulase activity. The authors found that there is not a significantly larger number of glucoside hydrolases which is a collection of carbohydrate degrading enzymes great for making simple sugars out of complex ones. In fact, several plant pathogens compared (Fusarium graminearum and Magnaporthe grisea) and the sake fermenting Aspergillus oryzae all have more members of this family than does. T. reesei has almost the least (36) copies of a cellulose binding domain (CBM) of any of the filamentous ascomycete fungi. They used the CAZyme database (carbohydrate active enzymes) database which has done a fantastic job building up profiles of different enzymes involved in carhohydrate degradation binding, and modifications.
Whether T. reesei is really the best cellulose degrading fungus is definitely an open question. That it works well in the industrial culture that it has been utilized in is important, but there may be other species of fungi with improved cellulase activity and who may in fact have many more copies of cellulases. So it will be good to add other fungi to the mix with quantitative information about degradation to try and glean what are the most important combination of enzymes and activities.
One technical note. The comparison of copy number differences employed in the paper is a simple enough Chi-Squared, work that I've done with Matt Hahn and others include a gene family size comparison approach that also taked into account phylogenetic distances and assumes a birth-death process of gene family size change. It would be great to apply the copy number differences through this or other approaches that just evaluate gene trees for these domains to see where the differences are significant and if they can be polarized to a particular branch of the tree.
So will this genome sequence lead to cheaper, better biofuel production? Certainly it provides an important toolkit to start systematically testing individual cellulase enzymes. It's hard to say how fast this will make an impact, but the work of JBEI and a host of other research groups and biotech companies are going to be able to systematically test out the utility of these individual enzymes.
There is also evolutionary work by other groups on the evolution of these Hypocreales fungi trying to better define when biotrophic and heterotrophic transitions occurred to sample fungi with different lifestyles that might have different cellulase enyzmes that may not have been observed. Defining the relationships of these fungi and when and how many times transitions to lifestyles occurred to choose the most diverse fungi may be an important part of discovering novel enzymes.
Also see
Martinez, D., Berka, R.M., Henrissat, B., Saloheimo, M., Arvas, M., Baker, S.E., Chapman, J., Chertkov, O., Coutinho, P.M., Cullen, D., Danchin, E.G., Grigoriev, I.V., Harris, P., Jackson, M., Kubicek, C.P., Han, C.S., Ho, I., Larrondo, L.F., de Leon, A.L., Magnuson, J.K., Merino, S., Misra, M., Nelson, B., Putnam, N., Robbertse, B., Salamov, A.A., Schmoll, M., Terry, A., Thayer, N., Westerholm-Parvinen, A., Schoch, C.L., Yao, J., Barbote, R., Nelson, M.A., Detter, C., Bruce, D., Kuske, C.R., Xie, G., Richardson, P., Rokhsar, D.S., Lucas, S.M., Rubin, E.M., Dunn-Coleman, N., Ward, M., Brettin, T.S. (2008). Genome sequencing and analysis of the biomass-degrading fungus Trichoderma reesei (syn. Hypocrea jecorina). Nature Biotechnology DOI: 10.1038/nbt1403
© Jason Stajich for Fungal Genomes and Comparative Genomics, 2008. | Permalink | No comment
Add to del.icio.us
Search blogs linking this post with Technorati
Want more on these topics ? Browse the archive of posts filed under filamentous, gene family, genome, genome annotation, genome sequencing, trichoderma.
Paul Stamets thinks so and he's done work to make this happen. The founder of FungiPerfecti and author many books on mushroom cultivation spoke at a TED talk recently that is worth taking a look.
We also wrote about how Paul has contributed (and donated in some cases) Pleurotus spawn as part of Dioxin cleanup in Ft Bragg, CA and cleaning up the SF Bay with hair and mushrooms.
© Jason Stajich for Fungal Genomes and Comparative Genomics, 2008. | Permalink | No comment
Add to del.icio.us
Search blogs linking this post with Technorati
Want more on these topics ? Browse the archive of posts filed under bioremediation, fungi, news.
This is an unpublished post that's so old (Aug '07) that I don't know why I didn't just post the damn thing; I've forgotten what I was intending to do with it. I'm posting it now because it contains pointers to useful thinking by David Wiley and others that is germane to the ongoing discussion of data licensing (see post below). I was reminded of this old draft of mine by Deepak's comment that copyleft may be harmful in the case of scientific data, a point David also makes in respect of his particular Open area, education. Much of what David says maps readily from his field to research, so without further ado:
David Wiley of Iterating Toward Openness has been blogging up a storm about open content licensing:
That's a lot to read, but it's all good stuff. David makes one very strong argument that I want to emphasize here, because it points up the difficult distinction between data and (creative) work.
In the post introducing his draft Open Education Licence, he provides a very useful outline of the aims of open content:
I really, really like that. David's "four R's" resemble the four fundamental freedoms of the Free Software Foundation but do a better job of discriminating between Rework and Remix. The Four R's make immediate sense to me and I will certainly be Reusing and Redistributing that idea.
David goes on to quote some believable numbers and points out that: Since half of all CC licensed materials are licensed using a copyleft clause and all GFDL licensed materials are licensed using a copyleft clause, this means that over half of the world's open content is copylefted. And while the CC and GFDL copyleft clauses guarantee that all derivative works will be "open," they also guarantee that they can never be used in remixes with the majority of other copylefted works. You can't remix a GFDL work with a By-NC-SA work when the licenses require that the child be licensed exactly as the parent. Each parent had one and only one license - which license would the derivative use? It's just not possible to legally remix these materials; copyleft prevents this remixing. [see David's earlier explanation for details of the incompatibilities among various copyleft licenses]
While promoting rework at the expense of remix - in other words, taking the copyleft approach - is fine for software, it is problematic for content and extremely problematic for education. As educators, we are always remixing materials for use in our classrooms both in the "real" world and online. Your mileage may vary, but over my last 15 years of teaching I would estimate that my remixing activities outnumber my reworking activities 10:1 or more. If other teachers are like me in this regard, then, copyleft is a huge problem for open education. It's potentially a huge problem for scientists, too, because much of the potential of Open Science and Open Data (see here for an attempt at defining those terms) is in Remix. There are answers in existing datasets to questions their creators never thought to ask; as Alma Swan put it,
...exciting new developments in text-mining and data-mining are beginning to show what can be done to create new, meaningful scientific information from existing, dispersed information using computer technologies. Research articles and accompanying data files can be searched, indexed and mined using semantic technologies to put together pieces of hitherto unrelated information that will further science and scholarship in ways that we have yet to begin imagining. This is why I join Peter Murray-Rust in being against copyleft for data: I am not in favour of copyleft for data. I have no fundamental objection to creating a copyrighted work from data as long as there is significant added value. And copyleft is viral - deliberately. If any item in a system/collection/program etc. is copyleft, then the whole is (at least by the algorithm). [...]
I would argue that if I get factual information from WP [wikipedia] then it cannot carry a copyleft. I need the fundamental physical constants and get them from WP. I don't think that my data and programs are thereby copyleft. All algorithms are now slightly fuzzy. So what do we mean by "data"? What I mean is "facts about the world of sense-perception", as distinct from the presentation and interpretation of those facts. So I might not be free to reproduce, say, a scan of a Western blot from a published paper -- but having looked at that image, I had better be completely free to do whatever I like with the information it gives me about the way the world works, or else science will grind to a halt. Similarly, if a review article (which contains no new facts, and is all reuse and remix) brings together the results of a number of studies to create new information, or a new hypothesis, about the way the world works, I am not free to copy the wording but I must be free to go into my lab and test the hypothesis.
See also (this was a note to myself in the draft, so caveat lector!):
CC-NC considered harmful (Kuroshin)
When is OA not OA? (Catriona MacCallum in PLoS Biology)
CC, OA and moral rights (Thinh Nguyen, Science Commons blog)
Open Data and Moral Rights (Peter Murray-Rust)
-----
In the interests of full disclosure, I have a personal statement for this blog which I hope places the content squarely in the public domain, and for my columns on 3QuarksDaily I use CC-BY so that, if those pieces generate any interest, 3QD might at least get some traffic out of having generously offered me a spot on their roster.
We find ourselves wondering why codon adaptation index (CAI) is used as a measure of protein expression level in this article.
One answer is that CAI does correlate well with protein expression in many proteomics studies; but surely these same studies contain raw data with protein expression level? On reflection, I bet the answer is that it’s too difficult and laborious to access this type of data. There are plenty of papers that describe large-scale analysis of protein expression using proteomics, but the data are locked up in the articles or as inappropriate supplementary files.
Note to self: look into open-source software and standard data formats for proteomic data.
The genome of Podospora anserina S mat+ strain was sequenced by Genoscope and CNRS and published recently in Genome Biology. The genome sequence data has been available for several years, but it is great to see a publication describing the findings. The 10X genome assembly with ~10,000 genes provides an important dataset for comparisons among filamentous Sordariomycete fungi. The authors primarily focused on comparative genomics of Podospora to Neurospora crassa, the next closest model filamentous species. Within the Sordariomycetes there are now a very interesting collection of closely related species which can be useful for applying synteny and phylogenomics approaches.
The analyses in the manuscript focused on these differences between Neurospora and Podospora identifying some key differences in carbon utilization contrasting the coprophillic (Podospora) and plant saprophyte (Neurospora). There are several observations of gene family expansions in the Podospora genome which could be interpreted as additional enzyme capacity to break down carbon sources that are present in dung.
The genome of Neurospora has be shaped by the action of the genome defense mechanisms like RIP that has been on interpretation of the reduced number of large gene families and paucity of transposons. The authors report a surprising finding that in their analysis that despite sharing orthologs of genes that are involved in several genome defense, they in fact find fewer repetitive sequences in Podospora while it still fails to have good evidence of RIP.
Overall, these data suggest that P. anserina has experienced a fairly complex history of transposition and duplications, although it has not accumulated as many repeats as N. crassa. P. anserina possesses all the orthologues of N. crassa factors necessary for gene silencing, including RIP, meiotic MSUD and also vegetative quelling, a post transcriptional gene silencing mechanism akin to RNA interference
I think this data and observations interleaves nicely with the work our group is exploring on evolution of genome of several Neurospora species which have different mating systems. The fact that the gene components that play a role in MSUD and a RIP are found in Podpospora but yet the degree of RIP and the lack of any observed meiotic silencing suggests some interesting occurrences on the Neurospora branch to be explored. The potentially different degrees of RIP efficiency and types of mating systems (heterothallic and pseudohomothallic) among the Neurospora spp may also provide a link to understanding how RIP evolved and its role on N. crassa evolution.
Senescence in Podospora
Another aspect of Podopsora biology that isn't touched on, is the use of the fungus as a model for senescence. The fungus exhibits maternal senescence which involves targeted changes in the mitochondria that leads to cell death. The evolutionary and molecular basis for this process has been of interest to many research groups and the genome sequence can provide an additional toolkit for identifying the factors involved in the apoptosis process in this filamentous fungi. Whether it will help find a real link for aging research in other eukaryotes remains to be seen, but it is a good model system for some aspects of how aging and damage to mtDNA are linked.
Espagne, E., Lespinet, O., Malagnac, F., Da Silva, C., Jaillon, O., Porcel, B.M., Couloux, A., Aury, J., et al (2008). The genome sequence of the model ascomycete fungus Podospora anserina. Genome Biology, 9(5), R77. DOI: 10.1186/gb-2008-9-5-r77
© Jason Stajich for Fungal Genomes and Comparative Genomics, 2008. | Permalink | No comment
Add to del.icio.us
Search blogs linking this post with Technorati
Want more on these topics ? Browse the archive of posts filed under comparative, genome, genome sequencing, neurospora, sordariomycetes.