The discussion on impact factors got me wondering - is there a public, free access citation database for articles in Medline / Pubmed? I know of Scopus, ISI WOS (but theyre not free, and their content is proprietary) and Google Scholar (only give 'cited by', when I want 'this article cites x and y')?
How would one build such a database, if its not accessible? I know that ISI actually scans articles (not doable by myself) - I don't know how Scopus got their index, though.
Such a database would help tremendously on some bibliomics work I'm doing. Is it technically feasible to get references for all Medline articles (at least, those past 1996?). Where would you get the information - scrape/spider&index publishers website, if this information is even freely accessible (without a subscription?) and then match against a local Medline database (which I already have)? If anyone can help, it'd be appreciated :)


Comments
How to Link/Deduplicate Citation Records
To answer the how question, check out William Cohen et al.'s excellent 2003 overview paper, A Comparison of String Metrics for Matching Names and Records. I chose this paper of the dozens out there because it has good references and William also works in bio-informatics, so you may see him around.
There was a recent spate of papers on the topic, because academic citation linkage was the topic of the 2003 KDD Cup Competition which is part of ACM's SIGKDD (the Association for Computing Machinery's special interest group in knowledge discovery and data mining).
Bob Carpenter
Alias-i, Inc.
Elsevier probably wouldn't be very happy...
... if you scraped references from their non-OA journals to create a competitor to SCOPUS. :)
You could work with the subset of journals in PubMedCentral & BMC - that way you get a lot of data but only need to work out how to scrape (or parse the embedded RDF in) two different manuscript templates. Maybe the data wouldn't be representative enough of the literature as a whole, though, I don't know.
Ideally there'd be reference metadata linked to in the header of every paper and you could do everything dynamically - type in an URL and it retrieves the reference list from that paper, then use the URLs of those papers, etc.
(plug:) OTMI might eventually enable that.
Thanks for all the useful
Thanks for all the useful answers everyone! I really really really hope that OTMI takes off big time (like, every publisher/journal) - it would help tremendously. Right now, only Nature is behind this, right?
Too bad Scopus chose to go the proprietary route - with their reference database and their new author disambiguation, it would beat pubmed hands down. I just became aware of it a week ago; no researchers that I know of had even heard of it, and our university have a subscription! Go figure...
blow for freedom
Elsevier probably wouldn't be very happy...
No, but Elsevier is the antichrist. So ultimately, it's justified.
Citeseer
Hi,
do you know of Citeseer (http://citeseer.ist.psu.edu/)?
It's supposed to be a tool to browse in scientific citations.
Actually their search engine is not very funcional: you should use the google tag 'site:http://citeseer.ist.psu.edu/' to find an article correctly.
here is an example: http://citeseer.ist.psu.edu/6394.html (it gives you citations, graphs, etc..)
Or maybe you could use hubmed, it has some option to search citations.
Google Scholar basically
Google Scholar basically supercedes CiteSeer, and HubMed only has citation data from PubMed Central. The OpCit project attempted to make an open citation database from OAI archives and produced CiteBase, for searching citations.
citeXtract
This guy is working on a project called citeXtract, which might be pretty much what you are looking for (though I don't know what the current status of the project is).