2006: Year of the OWL?

Two related articles in the December 2005 issue of Plos Computational Biology describe the state of the art in biomedical ontologies and where they might be going in 2006. The first is a report from the Eighth Annual Bio-Ontologies Meeting at ISMB 2005 which reviews current uses of biomedical ontologies while the second is a plea stating its Time to Organise the Bioinformatics "Resourceome" using ontologies. In the second of these papers, the authors Nicola Cannata, Emanuela Merelli and Russ B. Altman boldly state that the three "initial steps toward a bioinformatics resourceome are clear". These three steps are:

First, an overall ontology with the high-level concepts (algorithms, databases, organisations, papers, people, etc.) must be created,

This is important, but probably non-trivial to achieve. At least some if it has already been done, e.g. the Bioinformatics Links Directory: a Compilation of Molecular Biology Web Servers and the The Molecular Biology Database Collection: 2006 update have already classified resources available. Semantic Webheads would probably argue that the world would be a better place if these classifications of databases and web servers were expressed using the Web Ontology Language (OWL) or its poor cousin RDF... Following that, the next step is:

Second, a mechanism for people to extend this ontology with subconcepts in order to describe their own resources should be designed.

Most ontology languages allow you to do this in various ways. Getting people to use existing ontologies is much more difficult though...perhaps more of a social problem than a technical one? The final step is:

Third, the formats for the ontologies and the resource descriptions should be published so enterprising software engineers can create interfaces for surfing, searching, and viewing the resources.

It sounds deceptively simple. Lets hope 2006 will see significant progress towards this goal, it looks like the (Inter?) National Centre for Biomedical Ontology will play an important role in achieving this.


Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Re: 2006: Year of the OWL?

I'm not sure it would still be feasable (or even desirable) to create and maintain a comprehensive and up to date directory that would suit all people. More interesting would be a standard for describing projects in a machine readable manner, perhaps something like FOAF or DOAP. These files could be hosted and maintained by the projects themselves, and could be crawled and indexed by anyone.


Semantic Webheads

Semantic Webheads would probably argue that the world would be a better place if these classifications of databases and web servers were expressed using the Web Ontology Language (OWL) or its poor cousin RDF...

Semantic Webheads, the horror :) however I would include myself in that category. I have a project page on the wiki to build schemas/vocabularies/ontologies for the description of biological resources (DOBR), for example a biological database. DOBR is vapourware at the moment, awaiting more motivation on my part. A simple metadata description of a biological database would be helpful for discovering what kind of data is available. It might also be helpful in bootstrapping awareness of the semantic web in the biology community.

A use case might be that when a new database is published an RDF file describing the database (data sources, maintainers, construction method etc.) must be linked on the site. It would be simple enough to crawl these descriptions to build a comprehensive database of all available databases...


Deep Web

To see how usefull this would be, I am directing you to an article in ComputerWorld journal, related to the "Deep Web" :

ComputerWorld, Dec 19th issue

And as it is mentioned in this article:

The deep Web, also called the invisible Web, refers to the mass of information that can be accessed via the World Wide Web but can't be indexed by traditional search engines -- often because it's locked up in databases and served up as dynamic pages in response to specific queries or searches.

I thing this is pretty much the case with Bio-data, also mentioned @my Hopes & Fears for 2006, labs that create data and put them in a custom mySQL database that hide behind an HTML interface...


There are two issues here:

There are two issues here: discovery and query. Not only is it necessary to open up those relational databases hiding in the 'deep web' but we also need to find them.

Providing a metadata description of your database in RDF, using a controlled vocabulary (ontology if you like) is one step towards finding all this hidden data. The use case I see here is that when a group publishes in the NAR databases, a link to an RDF description of that databases must be submitted. These descriptions could then be aggregated much like RSS or FOAF.

While this is far from an optimal solution, see BioMoby for something more ambitious, it does serve the purpose of raising awareness of semantic web technologies and metadata.

As for querying, REST (or maybe SOAP) interfaces are likely to be the easiest way to unlocking biological data stored in relational databases (short of downloading a mysql dump). What would be better is if each databases provided a SPARQL query interface. SPARQL query interfaces for biological databases will be on my wish list for 2008...