Bio-informatics at the BBC

Demonstrating the power of metadata

The BBC programme catalogue has recently gone online, and provides a demonstration of the applications that can be built using web technology such as Ruby on Rails, RDF, FOAF, web feeds, tag clouds and sparklines. This impressive online catalogue has: IMGP4697

Unfortunately this catalogue currently includes no data, only metadata at the moment, so there are no audio or video streams yet, as this is an experimental prototype. As mentioned earlier the catalogue is based on RDF which will no doubt please Semantic Webhead Tim Berners-Lee and allows the database to be queried with SPARQL. One of the brains behind this is Matt Biddulph.

I wonder if a similar application could be built using the UniProt protein sequence and annotation data in RDF or the data currently being produced by the W3C BioRDF subgroup? Compared to biological databases the BBC catalogue is relatively small, although there are no figures on the size of the catalogue, which has been extensively hand-curated by experts over the years. The ratio of metadata to data is probably different too, where a typical biological database might have lots of data (e.g. raw protein sequence data) but poor quality and a low quantity of metadata (interactions, structures, functions etc).

However, this catalogue is an interesting prototype, which is addictively fun to play with and might spark a few imaginations in the bioinformatics community.

[update: seeAlso Alf Eaton's visual TouchGraph of BBC TV/Radio Collaborators which allows you to browse this data more graphically. Unfortunately, this fantastic BBC Database is not always online.].


Creative Commons License


This work is licensed under a
Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.


Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Practical code

I love a good semantic web debate (from an outsider's viewpoint).

The BBC catalogue is fun as a demonstration. I'll say what I always say when this topic comes up here: people will be convinced by semantic web technology when it does something useful. Almost every demonstration that I've seen is just that - a demonstration. When a website appears that allows me to retrieve and analyse biological data, I'll be impressed.


Identity, Reference and the Web

Greg, the importance of RDF is highly debateable. Some people really don't like RDF very much, but I don't want to get into a language war here. These issues of Identity, Reference and Meaning on the web are as old as the hills, and the subject of a tasty looking workshop at the world wide web (dub dub dub) 2006 conference later this month.


Don't get me wrong, I very

Don't get me wrong, I very much want the semantic web to work, I guess I'm just a little grumpy at the moment.

As far as identity issues go, I find it a little crazy that right now there is no straight forward guidelines, solutions etc. wrt the semantic web. I really hope that I'm wrong about some stuff and it all works out in the end...

Back to more productive things. What could we do with the UniProt RDF data set ? I looks possible to build a 'Semantic Bio Catalogue' in the same vein has the BBC catalogue. The advantage of this would be simply *having* such a resource online and available to download and query (with sparql). A show case of semantic web tech for biology... but how would it work, could we do it as a collaborative project ? Maybe this should be suggested as one of the tasks for the BioRDF group.


BioDASH

You've mentioned BioDASH before - that's the kind of thing you could use the UniProt data for (at least once BioDASH is a bit more usable).


I really don't know what to

I really don't know what to think about the semantic web anymore. The vision and the technologies underlying that vision seem to be a horrible mess. The BBC catalogue is a great example of semantic web technologies being used 'in the real world'. The question is what kind of advantages do they bring over standard off-the-shelf structured data e.g. XML and XML Schemas ?

If the BBC catalogue is purely an experiment in using semantic web technologies then fine, but why bother producing RDF/XML for consumers When XML will do just fine ? This is part of the reason why RSS 1.0 failed, it was RDF/XML.

If you do consider the RDF part important then how do you resolve issues like identity ? For example take this entry in the catalogue. It describes "HORIZON REVISITED, THE HUMAN GENOME PROJECT". At the bottom of the page there is a link to an RDF feed for the page. When you look at the RDF source it is saying something like the URI http://open.bbc.co.uk/catalogue/infax/programme/LSFR607L identifies a resource of type Image.Moving.TV. This is of course not true, that URI identifies an HTML web page that describes a resources of type Image.Moving.TV.

I don't see any other way an agent can interpret that ? If it resolves the URI it will get HTML not a TV program ?

Interestingly enough this same kind of problem is now being debated on the W3C HCLS list for URIs to identify NCBI's database...


It is useful

Firstly, RSS1 didn't fail - it's in use all over the place, particularly by journal publishers, who include things like the PRISM ontology to describe journal/issue/page data.

Secondly, why is the BBC's RDF useful? Because I can understand what it means without having to read the instructions.

Thirdly, it was a contention a while ago that you shouldn't use http URIs for things that aren't documents on the web, but I think it's generally accepted now that using a URL as a proxy representation of a 'thing' is useful, particularly if you use a fragment identifier and the URL returns an RDF representation of the object.