News aggregator

Ubiquitous computing

business|bytes|genes|molecules - Wed, 2008-05-14 09:10

Nat Torkington has a thoughtful post up on the O’Reilly Radar where he writes about ubiquitous computing (or ubicomp as he calls it).

I want to cast that problem a little differently. Let us assume we have a scalable cloud that we can tap into, perhaps with databases that easily scale to support streaming data. Let us imagine a lab, with instruments streaming out data, with devices consuming those data, and systems that can make decisions based on the data they are receiving. Such an interconnected work, a world with pervasive, ubiquitous computing might sound like science fiction, but over the past few years, we have slowly but surely built up the beginnings of an infrastructure that will make this scenario possible, perhaps a lot faster than we thought. The iPhone, the Chumby, the Bug, these are just early examples, as are streaming video services and communication platforms like Twitter and XMPP.

Of course, before we get there, we really need to develop systems that are smart about making decisions and filtering information, otherwise, we’re just going to get buried in a deluge of data that will make our heads spin, individually and collectively.

Update: On a semi-related note, here’s an Ignite talk from Where 2.0 that is an example of where we are headed (via O’Reilly Radar)

Ignite: Health In the Real World - Steven Hammond
By opening a geospatial window on patient-entered medical information, PatientsLikeMe is changing the way patients and researchers look at diseases and treatments in long-term illnesses like ALS, MS, and HIV.

Further reading
Research Streaming

Image via Wikipedia

Technorati Tags: ,

ShareThis

Scoring USS-like sequences in our model (so blind!)

RRResearch - Tue, 2008-05-13 23:47
Months ago (last fall?) a post-doc and I spent what seemed like a lot of time at the whiteboard in the hall, considering different ways our planned USS model might score the sequences it was considering for their similarity to a USS motif.

We eventually settled on the crude system shown on the left (yellow table). It evaluates how well the DNA sequence in a 10-base window matches the USS core consensus. Each match to the consensus earns a point, with the total score for the sequence being the sum of the points it's earned. At the time, we realized that this way of scoring had two (or three?) big problems, but we needed something simple to get the model working so we settled for this.

The first problem is that the score is not very sensitive to how good the match is. The yellow numbers beside the table show the scores earned by specific sequences. A sequence matching at all 10 positions is only 11% better than a sequence matching at 9 positions, even though we know from real uptake experiments that some single base changes can reduce uptake by more than 95%. The second problem is that this method treats all 10 positions in the motif equally. But again our uptake experiments have shown that some positions in the motif affect uptake much more strongly than others.
The third problem is that random sequences have very high scores, and adding a single perfect-match USS to it increases this baseline score only slightly.

This morning the post-doc and I reconsidered the scoring system. We expected that finding a solution to these problems would be very difficult, but we quickly came up with a much better way, illustrated by the blue table on the right of the figure. The new method is to multiply the scores of the individual positions rather than summing them. This causes the scores of well-matched sequences to be dramatically higher than those of poorer matches. And we expect (though we haven't tested this yet), that the baseline score of a random sequence will be much smaller. For now we've given all but the consensus base scores of 1, but these could be larger or smaller; for example some bases at some positions could be worth only 0.1 of a point.

Now that the program is working, implementing a multiplicative scoring system should be simple. I'm tempted to try it right now, but I have lots of other things I should be doing, and I'd probably just get bogged down in technical problems anyway.

[METHODS AND RESOURCES] A high-resolution, nucleosome position map of C. elegans reveals a lack of universal sequence-dictated positioning

GR-in-Advance - Tue, 2008-05-13 12:00

Using the massively parallel technique of Sequencing by Oligonucleotide Ligation and Detection (SOLiD) we have assessed the in vivo positions of more than 44 million putative nucleosome cores in the multicellular genetic model organism Caenorhabditis elegans. These analyses provide a global view of the chromatin architecture of a multicellular animal at extremely high density and resolution. While we observe some degree of reproducible positioning throughout the genome in our mixed stage population of animals, we note that the major chromatin feature in the worm is a diversity of allowed nucleosome positions at the vast majority of individual loci. While absolute positioning of nucleosomes can vary substantially, relative positioning of nucleosomes (in a repeated array structure likely to be maintained at least in part by steric constraints) appears to be a significant property of chromatin structure. The high density of nucleosomal reads enabled a substantial extension of previous analysis describing the usage of individual oligonucleotide sequences along the span of the nucleosome core and linker. We release this dataset, via the UCSC Genome Browser, as a resource for the high-resolution analysis of chromatin conformation and DNA accessibility at individual loci within the C. elegans genome.

Estrogen, not intrinsic aging, is the major regulator of delayed human wound healing in the elderly

Genome Biology - Latest articles - Tue, 2008-05-13 12:00
Background: Multiple processes have been implicated in age-related delayed healing, including altered gene expression, intrinsic cellular changes, and changes in extracellular milieu (including hormones). To date, little attempt has been made to assess the relative contribution of each of these processes to a human aging phenomenon. The objective of this study is to determine the contribution of estrogen versus aging in age-associated delayed human wound healing. Results: Using an Affymetrix microarray-based approach we show that the differences in gene expression between male elderly and young human wounds are almost exclusively estrogen regulated. Expression of 78 probe sets was significantly decreased and 10 probe sets increased in wounds from elderly subjects (with a fold change greater than 7). A total of 83 percent of down-regulated probe sets and 80 percent of up-regulated probe sets were estrogen-regulated. Differentially regulated genes were validated at the level of gene and protein expression, with genes identified as estrogen-regulated in human confirmed as estrogen-dependent in young estrogen depleted mice in vivo. Moreover, direct estrogen regulation is demonstrated for three array-identified genes, Sele, Lypd3 and Arg1, in mouse cells in vitro. Conclusions: These findings have clear implications for our understanding of age-associated cellular changes in the context of wound healing, the latter acting as a paradigm for other age-related repair and maintenance processes, and suggest estrogen has a more profound influence on aging than previously thought.

CpG island density and its correlations with genomic features in mammalian genomes

Genome Biology - Latest articles - Tue, 2008-05-13 12:00
Background: CpG islands, which are clusters of CpG dinucleotides in GC-rich regions, are considered gene markers and represent an important feature of mammalian genomes. Previous studies of CpG islands have largely been on specific loci or within one genome. To date, there seems to be no comparative analysis of CpG islands and their density at the DNA sequence level among mammalian genomes and of their correlations with other genome features. Results: In this study, we performed a systematic analysis of CpG islands in ten mammalian genomes. We found that both the number of CpG islands and their density vary greatly among genomes, though many of these genomes encode similar numbers of genes. We observed significant correlations between CpG island density and genomic features such as number of chromosomes, chromosome size, and recombination rate. We also observed a trend of higher CpG island density in telomeric regions. Furthermore, we evaluated the performance of three computational algorithms for CpG island identifications. Finally, we compared our observations in mammals to other non-mammal vertebrates. Conclusions: Our study revealed that CpG islands vary greatly among mammalian genomes. Some factors such as recombination rate and chromosome size might have influenced the evolution of CpG islands in the course of mammalian evolution. Our results suggest a scenario in which an increase in chromosome number increases the rate of recombination, which in turn elevates GC content to help prevent loss of CpG islands and maintain their density. These findings should be useful for studying mammalian genomes, the role of CpG islands in gene function, and molecular evolution.

Finishing the finished human chromosome 22 sequence

Genome Biology - Latest articles - Tue, 2008-05-13 12:00
Background: Although the human genome sequence was declared complete in 2004, the sequence was interrupted by 341 gaps of which 308 lay in an estimated ~28 Mb of euchromatin. While these gaps constitute only ~1% of the sequence, knowledge of the full complement of human genes and regulatory elements is incomplete without their sequences. Results: We have used a combination of conventional chromosome walking (aided by the availability of end sequences) in fosmid and bacterial artificial chromosome libraries, whole chromosome shotgun sequencing, comparative genome analysis and long PCR to finish 8 of the 11 gaps in the initial chromosome 22 sequence. In addition we have patched four regions of the initial sequence where the original clones were found to be deleted, or contained a deletion allele of a known gene, with a further 126kb of new sequence. Over 1.018 Mb of new sequence has been generated to extend into and close the gaps, and we have annotated 16 new or extended gene structures and 1 pseudogene. Conclusions: Thus we have made significant progress to completing the sequence of the euchromatic regions of human chromosome 22 using a combination of detailed approaches. Our experience suggests that substantial work remains to close the outstanding gaps in the human genome sequence.

Biocomputational prediction of small non-coding RNAs in Streptomyces

BMC Genomics - Latest articles - Tue, 2008-05-13 12:00
Background: The first systematic study of small non-coding RNAs (sRNA, ncRNA) in Streptomyces is presented. Except for a few exceptions, the Streptomyces sRNAs, as well as the sRNAs in other genera of the Actinomyces group, have remained unstudied. This study was based on sequence conservation in intergenic regions of Streptomyces, localization of transcription termination factors, and genomic arrangement of genes flanking the predicted sRNAs. Results: Thirty-two potential sRNAs in Streptomyces were predicted. Of these, expression of 20 was detected by microarrays and RT-PCR. The prediction was validated by a structure based computational approach. Two predicted sRNAs were found to be terminated by transcription termination factors different from the Rho-independent terminators. One predicted sRNA was identified computationally with high probability as a Streptomyces 6S RNA. Out of the 32 predicted sRNAs, 24 were found to be structurally dissimilar from known sRNAs. Conclusions: Streptomyces is the largest genus of Actinomyces, whose sRNAs have not been studied. The Actinomyces is a group of bacterial species with unique genomes and phenotypes. Therefore, in Actinomyces, new unique bacterial sRNAs may be identified. The sequence and structural dissimilarity of the predicted Streptomyces sRNAs demonstrated by this study serve as the first evidence of the uniqueness of Actinomyces sRNAs.

Discovery and assembly of repeat family pseudomolecules from sparse genomic sequence data using the Assisted Automated Assembler of Repeat Families (AAARF) algorithm

BMC Bioinformatics - Latest articles - Tue, 2008-05-13 12:00
Background: Higher eukaryotic genomes are typically large, complex and filled with both genes and multiple classes of repetitive DNA. The repetitive DNAs, primarily transposable elements, are a rapidly evolving genome component that can provide the raw material for novel selected functions and also indicate the mechanisms and history of genome evolution in any ancestral lineage. Despite their abundance, universality and significance, studies of genomic repeat content have been largely limited to analyses of the repeats in fully sequenced genomes. Results: In order to facilitate a broader range of repeat analyses, the Assisted Automated Assembler of Repeat Families algorithm has been developed. This program, written in PERL and with numerous adjustable parameters, identifies sequence overlaps in small shotgun sequence datasets and walks them out to create long pseudomolecules representing the most abundant repeats in any genome. Testing of this program in maize indicated that it found and assembled all of the major repeats in one or more pseudomolecules, including coverage of the major Long Terminal Repeat retrotransposon families. Both Sanger sequence and 454 datasets were appropriate. Conclusions: These results now indicate that hundreds of higher eukaryotic genomes can be efficiently characterized for the nature, abundance and evolution of their major repetitive DNA components.

When you care enough to send the very best DNA

Omics! Omics! - Tue, 2008-05-13 07:28
Yesterday was Mother's Day, and while searching for a card I spied what looked like a double helix on the front of one card. Finding this odd, I checked the card in detail -- and indeed it was DNA!

DNA is clearly in the public consciousness -- years of Law & Order and CSI have ensured that, but I found it striking that the image of a double helix is deemed recognizable by as mainstream & middlebrow a company as Hallmark.

A nice twist is the card actually bore a message along the lines of 'even though you didn't give me any DNA...' -- a card for mother figures, not birth mothers. So this isn't a sign of rampant DNA deterministic thinking, but rather the imprint of DNA on the public (or at least corporate) mind

The Perl code had a bug, but I found it!

RRResearch - Tue, 2008-05-13 02:05
The bells-and-whistles version of the Perl model of USS evolution still had a bug, which became apparent once I fiddled the fragment scoring system to strongly favour good matches, and turned off mutation of the genome sequence (so only the fragments mutated). The bug manifested itself in the program cycles stopping, at fairly random points in the run (never stopping twice at the same cycle number or genome score, as far as I could tell).

After a LOT of careful detective work on my part, entirely unencumbered by knowledge of any Perl debugging tools, I found that a 'while' counter was being incremented at the wrong place (inside an 'if' instruction that was inside its 'while' loop, instead of just inside its 'while' loop). I still don't understand why this would cause the runs to stick at random points, but maybe the undergrad can explain it to me tomorrow.

(Confession added later: Solving the problem was not just the result of my careful detective work. The final discovery was helped by luck. I had added an 'else' statement to print a report that the next step had happened, but had incorrectly inserted one too many } brackets. In solving this I accidentally removed a different bracket than the one I had incorrectly inserted, which moved the while counter outside of the 'if' loop and, I discovered, eliminated the stopping problem.)

Trichoderma reesei genome paper published

Fungal Genomes Blog - Tue, 2008-05-13 02:00

The Trichoderma reesei genome paper was recently published in Nature Biotechnology from Diego Martinez at LANL with collaborators at JGI, LBNL, and others. This fungus was chosen for sequencing because it was found on canvas tents eating the cotton material suggesting it may be a good candidate for degrading cellulose plant material as part of cellulosic ethanol or other biofuels production.  The fungus also has starring roles in industrial processes like making stonewashed jeans due to its prodigious cellulase production.

The most surprising findings from the paper include the fact that there are so few members of some of the enzyme families even though this fungus is able to generate enzymes with so much cellulase activity. The authors found that there is not a significantly larger number of glucoside hydrolases which is a collection of carbohydrate degrading enzymes great for making simple sugars out of complex ones. In fact, several plant pathogens compared (Fusarium graminearum and Magnaporthe grisea) and the sake fermenting Aspergillus oryzae all have more members of this family than does.  T. reesei has almost the least (36) copies of a cellulose binding domain (CBM) of any of the filamentous ascomycete fungi.  They used the CAZyme database (carbohydrate active enzymes) database which has done a fantastic job building up profiles of different enzymes involved in carhohydrate degradation binding, and modifications.

Whether T. reesei is really the best cellulose degrading fungus is definitely an open question.  That it works well in the industrial culture that it has been utilized in is important, but there may be other species of fungi with improved cellulase activity and who may in fact have many more copies of cellulases.  So it will be good to add other fungi to the mix with quantitative information about degradation to try and glean what are the most important combination of enzymes and activities.

One technical note.  The comparison of copy number differences employed in the paper is a simple enough Chi-Squared, work that I've done with Matt Hahn and others include a gene family size comparison approach that also taked into account phylogenetic distances and assumes a birth-death process of gene family size change.  It would be great to apply the copy number differences through this or other approaches that just evaluate gene trees for these domains to see where the differences are significant and if they can be polarized to a particular branch of the tree.

So will this genome sequence lead to cheaper, better biofuel production? Certainly it provides an important toolkit to start systematically testing individual cellulase enzymes. It's hard to say how fast this will make an impact, but the work of JBEI and a host of other research groups and biotech companies are going to be able to systematically test out the utility of these individual enzymes.

There is also evolutionary work by other groups on the evolution of these Hypocreales fungi trying to better define when biotrophic and heterotrophic transitions occurred to sample fungi with different lifestyles that might have different cellulase enyzmes that may not have been observed. Defining the relationships of these fungi and when and how many times transitions to lifestyles occurred to choose the most diverse fungi may be an important part of discovering novel enzymes.

Also see

Martinez, D., Berka, R.M., Henrissat, B., Saloheimo, M., Arvas, M., Baker, S.E., Chapman, J., Chertkov, O., Coutinho, P.M., Cullen, D., Danchin, E.G., Grigoriev, I.V., Harris, P., Jackson, M., Kubicek, C.P., Han, C.S., Ho, I., Larrondo, L.F., de Leon, A.L., Magnuson, J.K., Merino, S., Misra, M., Nelson, B., Putnam, N., Robbertse, B., Salamov, A.A., Schmoll, M., Terry, A., Thayer, N., Westerholm-Parvinen, A., Schoch, C.L., Yao, J., Barbote, R., Nelson, M.A., Detter, C., Bruce, D., Kuske, C.R., Xie, G., Richardson, P., Rokhsar, D.S., Lucas, S.M., Rubin, E.M., Dunn-Coleman, N., Ward, M., Brettin, T.S. (2008). Genome sequencing and analysis of the biomass-degrading fungus Trichoderma reesei (syn. Hypocrea jecorina). Nature Biotechnology DOI: 10.1038/nbt1403

© Jason Stajich for Fungal Genomes and Comparative Genomics, 2008. | Permalink | No comment

Add to del.icio.us

Search blogs linking this post with Technorati

Want more on these topics ? Browse the archive of posts filed under filamentous, gene family, genome, genome annotation, genome sequencing, trichoderma.

Will mushrooms save the world?

Fungal Genomes Blog - Tue, 2008-05-13 00:14

Paul Stamets thinks so and he's done work to make this happen.  The founder of FungiPerfecti and author many books on mushroom cultivation spoke at a TED talk recently that is worth taking a look. 

We also wrote about how Paul has contributed (and donated in some cases) Pleurotus spawn as part of Dioxin cleanup in Ft Bragg, CA and cleaning up the SF Bay with hair and mushrooms

See Paul Stamets' TED talk.

© Jason Stajich for Fungal Genomes and Comparative Genomics, 2008. | Permalink | No comment

Add to del.icio.us

Search blogs linking this post with Technorati

Want more on these topics ? Browse the archive of posts filed under bioremediation, fungi, news.

OA and licensing: why not Public Domain?

Open Reading Frame - Mon, 2008-05-12 12:51

This is an unpublished post that's so old (Aug '07) that I don't know why I didn't just post the damn thing; I've forgotten what I was intending to do with it. I'm posting it now because it contains pointers to useful thinking by David Wiley and others that is germane to the ongoing discussion of data licensing (see post below). I was reminded of this old draft of mine by Deepak's comment that copyleft may be harmful in the case of scientific data, a point David also makes in respect of his particular Open area, education. Much of what David says maps readily from his field to research, so without further ado:

David Wiley of Iterating Toward Openness has been blogging up a storm about open content licensing:

That's a lot to read, but it's all good stuff. David makes one very strong argument that I want to emphasize here, because it points up the difficult distinction between data and (creative) work.

In the post introducing his draft Open Education Licence, he provides a very useful outline of the aims of open content:

  • Reuse - Use the work verbatim, just exactly as you found it
  • Rework - Alter or transform the work so that it better meets your needs
  • Remix - Combine the (verbatim or altered) work with other works to better meet your needs
  • Redistribute - Share the verbatim work, the reworked work, or the remixed work with others

I really, really like that. David's "four R's" resemble the four fundamental freedoms of the Free Software Foundation but do a better job of discriminating between Rework and Remix. The Four R's make immediate sense to me and I will certainly be Reusing and Redistributing that idea.

David goes on to quote some believable numbers and points out that: Since half of all CC licensed materials are licensed using a copyleft clause and all GFDL licensed materials are licensed using a copyleft clause, this means that over half of the world's open content is copylefted. And while the CC and GFDL copyleft clauses guarantee that all derivative works will be "open," they also guarantee that they can never be used in remixes with the majority of other copylefted works. You can't remix a GFDL work with a By-NC-SA work when the licenses require that the child be licensed exactly as the parent. Each parent had one and only one license - which license would the derivative use? It's just not possible to legally remix these materials; copyleft prevents this remixing. [see David's earlier explanation for details of the incompatibilities among various copyleft licenses]

While promoting rework at the expense of remix - in other words, taking the copyleft approach - is fine for software, it is problematic for content and extremely problematic for education. As educators, we are always remixing materials for use in our classrooms both in the "real" world and online. Your mileage may vary, but over my last 15 years of teaching I would estimate that my remixing activities outnumber my reworking activities 10:1 or more. If other teachers are like me in this regard, then, copyleft is a huge problem for open education. It's potentially a huge problem for scientists, too, because much of the potential of Open Science and Open Data (see here for an attempt at defining those terms) is in Remix. There are answers in existing datasets to questions their creators never thought to ask; as Alma Swan put it,
...exciting new developments in text-mining and data-mining are beginning to show what can be done to create new, meaningful scientific information from existing, dispersed information using computer technologies. Research articles and accompanying data files can be searched, indexed and mined using semantic technologies to put together pieces of hitherto unrelated information that will further science and scholarship in ways that we have yet to begin imagining. This is why I join Peter Murray-Rust in being against copyleft for data: I am not in favour of copyleft for data. I have no fundamental objection to creating a copyrighted work from data as long as there is significant added value. And copyleft is viral - deliberately. If any item in a system/collection/program etc. is copyleft, then the whole is (at least by the algorithm). [...]
I would argue that if I get factual information from WP [wikipedia] then it cannot carry a copyleft. I need the fundamental physical constants and get them from WP. I don't think that my data and programs are thereby copyleft. All algorithms are now slightly fuzzy. So what do we mean by "data"? What I mean is "facts about the world of sense-perception", as distinct from the presentation and interpretation of those facts. So I might not be free to reproduce, say, a scan of a Western blot from a published paper -- but having looked at that image, I had better be completely free to do whatever I like with the information it gives me about the way the world works, or else science will grind to a halt. Similarly, if a review article (which contains no new facts, and is all reuse and remix) brings together the results of a number of studies to create new information, or a new hypothesis, about the way the world works, I am not free to copy the wording but I must be free to go into my lab and test the hypothesis.


See also (this was a note to myself in the draft, so caveat lector!):

CC-NC considered harmful (Kuroshin)
When is OA not OA? (Catriona MacCallum in PLoS Biology)
CC, OA and moral rights (Thinh Nguyen, Science Commons blog)
Open Data and Moral Rights (Peter Murray-Rust)


-----
In the interests of full disclosure, I have a personal statement for this blog which I hope places the content squarely in the public domain, and for my columns on 3QuarksDaily I use CC-BY so that, if those pieces generate any interest, 3QD might at least get some traffic out of having generously offered me a spot on their roster.

Comprehensive inventory of protein complexes in the Protein Data Bank from consistent classification of interfaces

BMC Bioinformatics - Latest articles - Mon, 2008-05-12 12:00
Background: Protein-protein interactions are ubiquitous and essential for all cellular processes. High-resolution X-ray crystallographic structures of protein complexes can reveal the details of their function and provide a basis for many computational and experimental approaches. Differentiation between biological and non-biological contacts and reconstruction of the intact complex is a challenging computational problem. A successful solution can provide additional insights into the fundamental principles of biological recognition and reduce errors in many algorithms and databases utilizing interaction information extracted from the Protein Data Bank (PDB). Results: We have developed a method for identifying protein complexes in the PDB X-ray structures by a four step procedure: (1) comprehensively collecting all protein-protein interfaces; (2) clustering similar protein-protein interfaces together; (3) estimating the probability that each cluster is relevant based on a diverse set of properties; and (4) combining these scores for each PDB entry in order to predict the complex structure. The resulting clusters of biologically relevant interfaces provide a reliable catalog of evolutionary conserved protein-protein interactions. These interfaces, as well as the predicted protein complexes, are available from the Protein Interface Server (PInS) website at http://pins.ornl.gov/. Conclusions: Our method demonstrates an almost two-fold reduction of the annotation error rate as evaluated on a large benchmark set of complexes validated from the literature. We also estimate relative contributions of each interface property to the accurate discrimination of biologically relevant interfaces and discuss possible directions for further improving the prediction method.

Global analysis of aberrant pre-mRNA splicing in glioblastoma using exon expression arrays

BMC Genomics - Latest articles - Mon, 2008-05-12 12:00
Background: Tumor-predominant splice isoforms were identified during comparative in silico sequence analysis of EST clones, suggesting that global aberrant alternative pre-mRNA splicing may be an epigenetic phenomenon in cancer. We used an exon expression array to perform an objective, genome-wide survey of glioma-specific splicing in 24 GBM and 12 nontumor brain samples. Validation studies were performed using RT-PCR on glioma cell lines, patient tumor and nontumor brain samples. Results: In total, we confirmed 14 genes with glioma-specific splicing; seven were novel events identified by the exon expression array (A2BP1, BCAS1, CACNA1G, CLTA, KCNC2, SNCB, and TPD52L2). Our data indicate that large changes (> 5-fold) in alternative splicing are infrequent in gliomagenesis (< 3% of interrogated RefSeq entries). The lack of splicing changes may derive from the small number of splicing factors observed to be aberrantly expressed. Conclusions: While we observed some tumor-specific alternative splicing, the number of genes showing exclusive tumor-specific isoforms was on the order of tens, rather than the hundreds suggested previously by in silico mining. Given the important role of alternative splicing in neural differentiation, there may be selective pressure to maintain a majority of splicing events in order to retain glial-like characteristics of the tumor cells.

Complete Sequence and Analysis of the Mitochondrial Genome of Hemiselmis andersenii CCMP644 (Cryptophyceae)

BMC Genomics - Latest articles - Mon, 2008-05-12 12:00
Background: Cryptophytes are an enigmatic group of unicellular eukaryotes with plastids derived by secondary (i.e., eukaryote-eukaryote) endosymbiosis. Cryptophytes are unusual in that they possess four genomes--a host cell-derived nuclear and mitochondrial genome and an endosymbiont-derived plastid and 'nucleomorph' genome. The evolutionary origins of the host and endosymbiont components of cryptophyte algae are at present poorly understood. Thus far, a single complete mitochondrial genome sequence has been determined for the cryptophyte Rhodomonas salina. Here, the second complete mitochondrial genome of the cryptophyte alga Hemiselmis andersenii CCMP644 is presented. Results: The H. andersenii mtDNA is 60,553 bp in size and encodes 30 structural RNAs and 36 protein-coding genes, all located on the same strand. A prominent feature of the genome is the presence of a ~20 Kbp long intergenic region comprised of numerous tandem and dispersed repeat units of between 22-336 bp. Adjacent to these repeats are 27 copies of palindromic sequences predicted to form stable DNA stem-loop structures. One such stem-loop is located near a GC-rich and GC-poor region and may have a regulatory function in replication or transcription. The H. andersenii mtDNA shares a number of features in common with the genome of the cryptophyte Rhodomonas salina, including general architecture, gene content, and the presence of a large repeat region. However, the H. andersenii mtDNA is devoid of inverted repeats and introns, which are present in R. salina. Comparative analyses of the suite of tRNAs encoded in the two genomes reveal that the H. andersenii mtDNA has lost or converted its original trnK(uuu) gene and possesses a trnS-derived 'trnK(uuu)', which appears unable to produce a functional tRNA. Mitochondrial protein coding gene phylogenies strongly support a variety of previously established eukaryotic groups, but fail to resolve the relationships among higher-order eukaryotic lineages. Conclusion: Comparison of the H. andersenii and R. salina mitochondrial genomes reveals a number of cryptophyte-specific genomic features, most notably the presence of a large repeat-rich intergenic region. However, unlike R. salina, the H. andersenii mtDNA does not possess introns and lacks a Lys-tRNA, which is presumably imported from the cytosol.

A universal DNA mini-barcode for biodiversity analysis

BMC Genomics - Latest articles - Mon, 2008-05-12 12:00
Background: The goal of DNA barcoding is to develop a species-specific sequence library for all eukaryotes. A 650bp fragment of the cytochrome c oxidase 1 (CO1) gene has been used successfully for species-level identification in several animal groups. It may be difficult in practice, however, to retrieve a 650bp fragment from archival specimens, (because of DNA degradation) or from environmental samples (where universal primers are needed). Results: We used a bioinformatics analysis using all CO1 barcode sequences from GenBank and calculated the probability of having species-specific barcodes for varied size fragments. This analysis established the potential of much smaller fragments, mini-barcodes, for identifying unknown specimens. We then developed a universal primer set for the amplification of mini-barcodes. We further successfully tested the utility of this primer set on a comprehensive set of taxa from all major eukaryotic groups as well as archival specimens. Conclusions: In this study we address the important issue of minimum amount of sequence information required for identifying species in DNA barcoding. We establish a novel approach based on a much shorter barcode sequence and demonstrate its effectiveness in archival specimens. This approach will significantly broaden the application of DNA barcoding in biodiversity studies.

Proteomics discussion from the science streamosphere


We find ourselves wondering why codon adaptation index (CAI) is used as a measure of protein expression level in this article.

One answer is that CAI does correlate well with protein expression in many proteomics studies; but surely these same studies contain raw data with protein expression level? On reflection, I bet the answer is that it’s too difficult and laborious to access this type of data. There are plenty of papers that describe large-scale analysis of protein expression using proteomics, but the data are locked up in the articles or as inappropriate supplementary files.

Note to self: look into open-source software and standard data formats for proteomic data.

Podospora genome published

Fungal Genomes Blog - Mon, 2008-05-12 09:25

The genome of Podospora anserina S mat+ strain was sequenced by Genoscope and CNRS and published recently in Genome Biology. The genome sequence data has been available for several years, but it is great to see a publication describing the findings.  The 10X genome assembly with ~10,000 genes provides an important dataset for comparisons among filamentous Sordariomycete fungi. The authors primarily focused on comparative genomics of Podospora to Neurospora crassa, the next closest model filamentous species.  Within the Sordariomycetes there are now a very interesting collection of closely related species which can be useful for applying synteny and phylogenomics approaches.

The analyses in the manuscript focused on these differences between Neurospora and Podospora identifying some key differences in carbon utilization contrasting the coprophillic (Podospora) and plant saprophyte (Neurospora).  There are several observations of gene family expansions in the Podospora genome which could be interpreted as additional enzyme capacity to break down carbon sources that are present in dung.

The genome of Neurospora has be shaped by the action of the genome defense mechanisms like RIP that has been on interpretation of the reduced number of large gene families and paucity of transposons. The authors report a surprising finding that in their analysis that despite sharing orthologs of genes that are involved in several genome defense, they in fact find fewer repetitive sequences in Podospora while it still fails to have good evidence of RIP.

Overall, these data suggest that P. anserina has experienced a fairly complex history of transposition and duplications, although it has not accumulated as many repeats as N. crassa. P. anserina possesses all the orthologues of N. crassa factors necessary for gene silencing, including RIP, meiotic MSUD and also vegetative quelling, a post transcriptional gene silencing mechanism akin to RNA interference

I think this data and observations interleaves nicely with the work our group is exploring on evolution of genome of several Neurospora species which have different mating systems. The fact that the gene components that play a role in MSUD and a RIP are found in Podpospora but yet the degree of RIP and the lack of any observed meiotic silencing suggests some interesting occurrences on the Neurospora branch to be explored.  The potentially different degrees of RIP efficiency and types of mating systems (heterothallic and pseudohomothallic) among the Neurospora spp may also provide a link to understanding how RIP evolved and its role on N. crassa evolution.

Senescence in Podospora

Another aspect of Podopsora biology that isn't touched on, is the use of the fungus as a model for senescence.  The fungus exhibits maternal senescence which involves targeted changes in the mitochondria that leads to cell death.  The evolutionary and molecular basis for this process has been of interest to many research groups and the genome sequence can provide an additional toolkit for identifying the factors involved in the apoptosis process in this filamentous fungi. Whether it will help find a real link for aging research in other eukaryotes remains to be seen, but it is a good model system for some aspects of how aging and damage to mtDNA are linked.

Espagne, E., Lespinet, O., Malagnac, F., Da Silva, C., Jaillon, O., Porcel, B.M., Couloux, A., Aury, J., et al (2008). The genome sequence of the model ascomycete fungus Podospora anserina. Genome Biology, 9(5), R77. DOI: 10.1186/gb-2008-9-5-r77

© Jason Stajich for Fungal Genomes and Comparative Genomics, 2008. | Permalink | No comment

Add to del.icio.us

Search blogs linking this post with Technorati

Want more on these topics ? Browse the archive of posts filed under comparative, genome, genome sequencing, neurospora, sordariomycetes.

Rosendal Meet

Computational Biology News - Mon, 2008-05-12 08:50


I was attending the Rosendal meet (from 6-8 th May, 2008) between members of CBU (Computational Biology Unit ... and Department of Informatics, University of Bergen, Norway) and MB-NIMR (Division of Mathematical Biology at National Institute for Medical Research, MRC, London group, UK). Thanks to Xianjun for posting the pictures from the trip at http://picasaweb.google.com/sterding/Rosendal2008 . The ones from my camera are available at http://picasaweb.google.com/sharma.animesh/Rosendal_meet?authkey=wFgJVxkGwTU .
Syndicate content