Tagging your research

Here is a commentary that will sound very familiar to most bioinformaticians, something that a lot of us have been talking about. It argues for tagging papers prior to publication to make the data available for automatic processing. It would not be so difficult to have the text read by a tagging program and have the author go trough a list of terms to quickly accept/change/reject them. The effort would be minimal to the individual author when compared to the required effort to curate databases with all the available and ever increasing published papers. Just imagine the nice tools that could be built with such information rich databases.


Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

I may be ignorant..

Apologies if my query regarding tagging reeks of ignorance. But how is author tagging going to be any different than, say, asking the authors to provide a set of keywords to describe their findings/contents of their paper/conclusions etc?
When you get authors to tag a paper, they are more likely to bias any independent reader or computer-based tagging, because they probably believe in the findings of their research.
I don't think text-mining is going to die. Hopefully with improvements in Natural-Language processing, things may only get better.


You write bioinformatics exte

You write bioinformatics extensions for Firefox, so by definition you can't be ignorant :)

I agree that classification or tagging bias is an issue that should be addressed in the paper. From my brief reading of the paper it seems that the author is using tagging in terms of automated extraction of key words during the writing process. Authors will then decide whether or not these are the "correct tags" that describe the article.

However what I immediately assumed that the reference to tagging was in terms of delicious style tagging (i.e. group based classification). This style of tagging might get around the author bias issue.

I posted a comment to the article on BMC regarding this issue, unfortunately I can't point to it as each comment is checked by a moderator (so you have to wait two days).


tagged by

It's odd that a paper that talks about 'tagging' would only ask two questions: "will authors do it [the tagging] for us?" or "will computers do it for us?", without mentioning the obvious "will readers do it for us?".

del.icio.us, citeulike and connotea have shown that you only need a few people to tag an article for most of the important keywords to be extracted, which seems much more likely to succeed than persuading every author to try and tag or categorise their own work.


Yes...

And it may also solve the problem of tagging older abstracts. One potential model is for sites like hubmed or connotea to provide access to the tag database that their users create (via RDF or web services ?). Periodically pubmed could collect these tags and add them to the pubmed database. Alternatively they could add a tagging feature to their own user interface. Although it might take a while if their RSS support is anything to go by.

The user tagging model has many advantages and may make the field of "text mining" irrelevant for bioinformatics. If users were able to add metadata to publications (protein A is associated with protein B) using "text mining" to deduce that relationship based on the abstract text would be redundant. One extension of tagging that may facilitate this kind of value added metadata is tag triples. See Phil Dawes' weblog for more on tag triples.

Reader question: How esoteric is user tagging, tag triples, rdf, del.icio.us, connotea etc.?