Anyone using LSID?

Since the folks at IBM have recently released a new version of the LSID toolkits in both java and perl
here I thought I would post here to see if any nodalpoint readers are actually using it.

I personally have not used any of the LSID or BioMoby stuff yet but have been watching it with a keen eye over the past year or so. In the grand scheme of things, LSID and all of the other ideas that are coming out of the semantic web seem useful, but are people actually using this within the realm of bioinformatics yet? Yes I am aware of a few sites that are venturing in the direction like UniProt and a few others but is the regular bioinformatician in a small to medium bioinformatics shop using this yet?

If you have been using it, I am interested in hearing about your experience with it and any thoughts you might have.

If LSID is just another 4 letter acronym to you there is more information available at http://www-124.ibm.com/developerworks/oss/lsid/

Josh


Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

what is LSID

If LSID is just another 4 letter acronym to you there is more information available at http://www-124.ibm.com/developerworks/oss/lsid/

Maybe it's explained better here:
- http://lsids.sourceforge.net/
- http://www.ibm.com/developerworks/opensource/library/os-lsidbp/


LSIDs and biological taxonomy

I've been experimenting with LSIDs as part of a project on searching multiple databases for a taxonomic name (for the curious the site is here).

I've relied on IBM's perl library to get things up and running. LSIDs do have the overhead of having to persuade your system administrator to add records to the DNS, and I've had to learn about running virtual servers in Apache, RDF (not to mention the joys of perl) but so far it seems worth it.

What I find most exciting is the metadata that comes with an LSID. In the case of taxonomic names, this opens up the possibility of being able to reason about names returned by a database (e.g., whether two names are synonyms, and so on).

At present, it looks like most people using LSIDs (e.g., BioPathways Consortium) are creating LSIDs for other resources (such as NCBI) as the data providers themselves are not issuing them. The only data provider that I aware that issues its own is the North Temperate Lakes - LTER site. It is interesting that ecologists seem quicker to embrace the technology than those in genomics.


Contentious

The short answer is no.

My problem with the LSID proposal can best be illustrated with an example. Try clicking on the following two links:

The point I am making here is the second link doesn't work because you browser does not understand LSIDs. If you want to resolve the LSID (i.e. take the identifier and retrieve a representation or metadata) you browser must support the LSID resolution mechanism (which is DDDS). At present very little support exists for DDDS in general and LSIDs specifically. I'll assume that 90% of people reading this so far probably have no clue as to what DDDS is and why their browsers don't support it and whether or not it would be a good thing if they did.

So let's start from the beginning and restate the problem that LSIDs is trying to solve. Firstly we all recognize there is a huge amount of biological data being produced. In most cases this is being stored in databases, each of which assign a unique ID to the data set. The problem is: Given one of these unique IDs, how does an agent 1. Retrieve a representation of whatever is being identified (i.e. the actual data) and 2. How do I retrieve metadata about the identifier (i.e. what type of data is being identified).

On the surface this seems simple. If you have a Genbank identifier (GI), say 30350027, go to Genbank search for the GI and you get your answer. However this is not so easy for software agents. It is at this point things start to get a little crazy, as there are many mind bending issues regarding current solutions to this general problem (i.e. the Semantic Web).

Nonetheless my basic problems with LSIDs is that 1. They require you to invest in infrastructure to resolve identifiers. 2. To get metadata you need to deal with SOAP webservices (this is not entirely true, HTTP mappings exist, but SOAP is in the main implementations). We already have a perfectly good (IMO) system for resolving identifiers to representations: HTTP. Having to use SOAP or REST to get basic metadata about an identifier makes metadata a second class citizen on the web, better solutions exist. Ironically in the online LSID demonstration, HTTP is being used to resolve the LSIDs examples.

I have already had this out with some of the LSID developers (i.e. IBM) on the public-semweb-lifesci list, you can read the thread here. I would be interested to here other people's opinions on this.


LSID support for browsers

I just wanted to add that a week or so after I posted this thread I noticed that someone had made a plugin for firefox that lets the browser resolve lsid urns. IBM's launchpad application also seems to do the same but for IE on windows only. So it looks like some of your concerns are known and they are trying to be deal with them on some level.

Ths plugin is available here


Firefox plugin

That "someone" was me, partly motivated by Greg's post. If you install the protocol handler, the LSIDs become clickable. Greg's example won't work because as is because the LSID link needs the "lsidres" prefix (in the same way that a link to a web page needs the "http" prefix). To reuse his example, try clicking on these two links in Firefox with the LSID protocol handler installed:


Nice work

Firstly, nice work on the plugin, I'm sure this will be useful in the future. However I'm not sure if this is demonstrating the usefulness of LSIDs or how easy it is to extend Firefox.

Another argument I have put forward in favour of HTTP identifiers is that existing libraries do not need to be changed to access representations of HTTP identifiers (obviously). While it is nice that I can resolve LSIDs in my browser, the urllib module in python is still useless to me. Hence the need to invest in infrastructure. Stronger forces have decided otherwise, I guess we will just have to wait and see what developers do...

BTW, I suspect that the "lsidres" prefix (URI scheme) is redundant here. In fact URN is the URI scheme for LSIDs, please correct me if I am wrong, I haven't confirmed this against the specifications.

Also, we should post this as a story to the main page so it gets picked up by the aggregators.