<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xml:base="http://www.nodalpoint.org" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
 <title>nodalpoint.org - How to compile a database of citations? - Comments</title>
 <link>http://www.nodalpoint.org/2006/11/07/how_to_compile_a_database_of_citations</link>
 <description>Comments for &quot;How to compile a database of citations?&quot;</description>
 <language>en</language>
<item>
 <title>How to Link/Deduplicate Citation Records</title>
 <link>http://www.nodalpoint.org/2006/11/07/how_to_compile_a_database_of_citations#comment-3217</link>
 <description>&lt;p&gt;To answer the &lt;i&gt;how&lt;/i&gt; question, check out William Cohen et al.&#039;s excellent 2003 overview paper, &lt;a href=&quot;http://www.cs.cmu.edu/~wcohen/postscript/kdd-2003-match-ws.pdf&quot;&gt;A Comparison of String Metrics for Matching Names and Records&lt;/a&gt;.  I chose this paper of the dozens out there because it has good references and William also works in bio-informatics, so you may see him around.&lt;/p&gt;
&lt;p&gt;There was a recent spate of papers on the topic, because academic citation linkage was the topic of the &lt;a href=&quot;http://www.cs.cornell.edu/projects/kddcup/&quot;&gt;2003 KDD Cup Competition&lt;/a&gt; which is part of ACM&#039;s SIGKDD (the Association for Computing Machinery&#039;s special interest group in knowledge discovery and data mining).&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://www.colloquial.com/carp&quot;&gt;Bob Carpenter&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://www.alias-i.com/lingpipe&quot;&gt;Alias-i, Inc.&lt;/a&gt;&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Mon, 13 Nov 2006 12:25:55 -0500</pubDate>
 <dc:creator>Bob Carpenter</dc:creator>
 <guid isPermaLink="false">comment 3217 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>Thanks for all the useful</title>
 <link>http://www.nodalpoint.org/2006/11/07/how_to_compile_a_database_of_citations#comment-3216</link>
 <description>&lt;p&gt;Thanks for all the useful answers everyone! I really really really hope that OTMI takes off big time (like, every publisher/journal) - it would help tremendously. Right now, only Nature is behind this, right?&lt;/p&gt;
&lt;p&gt;Too bad Scopus chose to go the proprietary route - with their reference database and their new author disambiguation, it would beat pubmed hands down. I just became aware of it a week ago; no researchers that I know of had even heard of it, and our university have a subscription! Go figure...&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Fri, 10 Nov 2006 22:03:49 -0500</pubDate>
 <dc:creator>FiReaNG3L</dc:creator>
 <guid isPermaLink="false">comment 3216 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>blow for freedom</title>
 <link>http://www.nodalpoint.org/2006/11/07/how_to_compile_a_database_of_citations#comment-3215</link>
 <description>&lt;p&gt;&lt;i&gt;Elsevier probably wouldn&#039;t be very happy...&lt;/i&gt;&lt;/p&gt;
&lt;p&gt;No, but Elsevier is &lt;a href=&quot;http://www.cscs.umich.edu/~crshalizi/weblog/442.html&quot;&gt;the antichrist&lt;/a&gt;.  So ultimately, it&#039;s justified.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Fri, 10 Nov 2006 09:21:41 -0500</pubDate>
 <dc:creator>Neil</dc:creator>
 <guid isPermaLink="false">comment 3215 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>Elsevier probably wouldn&#039;t be very happy...</title>
 <link>http://www.nodalpoint.org/2006/11/07/how_to_compile_a_database_of_citations#comment-3214</link>
 <description>&lt;p&gt;... if you scraped references from their non-OA journals to create a competitor to SCOPUS. :)&lt;/p&gt;
&lt;p&gt;You could work with the subset of journals in PubMedCentral &amp;amp; BMC - that way you get a lot of data but only need to work out how to scrape (or parse the embedded RDF in) two different manuscript templates. Maybe the data wouldn&#039;t be representative enough of the literature as a whole, though, I don&#039;t know.&lt;/p&gt;
&lt;p&gt;Ideally there&#039;d be reference metadata linked to in the header of every paper and you could do everything dynamically - type in an URL and it retrieves the reference list from that paper, then use the URLs of &lt;i&gt;those&lt;/i&gt; papers, etc. &lt;/p&gt;
&lt;p&gt;(plug:) &lt;a href=&#039;http://blogs.nature.com/wp/nascent/2006/06/open_text_mining_interface_ver.html&#039;&gt;OTMI&lt;/a&gt; might eventually enable that.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Fri, 10 Nov 2006 06:48:14 -0500</pubDate>
 <dc:creator>stewc</dc:creator>
 <guid isPermaLink="false">comment 3214 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>Google Scholar basically</title>
 <link>http://www.nodalpoint.org/2006/11/07/how_to_compile_a_database_of_citations#comment-3213</link>
 <description>&lt;p&gt;Google Scholar basically supercedes CiteSeer, and HubMed only has citation data from PubMed Central. The &lt;a href=&quot;http://opcit.eprints.org/&quot;&gt;OpCit&lt;/a&gt; project attempted to make an open citation database from OAI archives and produced &lt;a href=&quot;http://www.citebase.org/&quot;&gt;CiteBase&lt;/a&gt;, for searching citations.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Fri, 10 Nov 2006 06:39:57 -0500</pubDate>
 <dc:creator>alf</dc:creator>
 <guid isPermaLink="false">comment 3213 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>Citeseer</title>
 <link>http://www.nodalpoint.org/2006/11/07/how_to_compile_a_database_of_citations#comment-3211</link>
 <description>&lt;p&gt;Hi,&lt;br /&gt;
do you know of Citeseer (&lt;a href=&quot;http://citeseer.ist.psu.edu/&quot;&gt;http://citeseer.ist.psu.edu/&lt;/a&gt;)?&lt;br /&gt;
It&#039;s supposed to be a tool to browse in scientific citations.&lt;br /&gt;
Actually their search engine is not very funcional: you should use the google tag &#039;site:http://citeseer.ist.psu.edu/&#039; to find an article correctly.&lt;br /&gt;
here is an example: &lt;a href=&quot;http://citeseer.ist.psu.edu/6394.html&quot;&gt;http://citeseer.ist.psu.edu/6394.html&lt;/a&gt; (it gives you citations, graphs, etc..)&lt;/p&gt;
&lt;p&gt;Or maybe you could use &lt;a href=&quot;http://www.hubmed.org&quot;&gt;hubmed&lt;/a&gt;, it has some option to search citations.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Fri, 10 Nov 2006 04:58:44 -0500</pubDate>
 <dc:creator>dalloliogm</dc:creator>
 <guid isPermaLink="false">comment 3211 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>citeXtract</title>
 <link>http://www.nodalpoint.org/2006/11/07/how_to_compile_a_database_of_citations#comment-3210</link>
 <description>&lt;p&gt;&lt;a href=&quot;http://www.ebi.ac.uk/Information/Staff/person_maint.php?person_id=727&quot;&gt;This guy&lt;/a&gt; is working on a project called citeXtract, which might be pretty much what you are looking for (though I don&#039;t know what the current status of the project is).&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Thu, 09 Nov 2006 19:50:12 -0500</pubDate>
 <dc:creator>ejain</dc:creator>
 <guid isPermaLink="false">comment 3210 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>How to compile a database of citations?</title>
 <link>http://www.nodalpoint.org/2006/11/07/how_to_compile_a_database_of_citations</link>
 <description>&lt;p&gt;The discussion on impact factors got me wondering - is there a public, free access citation database for articles in Medline / Pubmed? I know of Scopus, ISI WOS (but theyre not free, and their content is proprietary) and Google Scholar (only give &#039;cited by&#039;, when I want &#039;this article cites x and y&#039;)?&lt;/p&gt;
&lt;p&gt;How would one build such a database, if its not accessible? I know that ISI actually scans articles (not doable by myself) - I don&#039;t know how Scopus got their index, though. &lt;/p&gt;
&lt;p&gt;Such a database would help tremendously on some bibliomics work I&#039;m doing. Is it technically feasible to get references for all Medline articles (at least, those past 1996?). Where would you get the information - scrape/spider&amp;amp;index publishers website, if this information is even freely accessible (without a subscription?) and then match against a local Medline database (which I already have)? If anyone can help, it&#039;d be appreciated :)&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;&lt;p&gt;&lt;a href=&quot;http://www.nodalpoint.org/2006/11/07/how_to_compile_a_database_of_citations&quot;&gt;read more&lt;/a&gt;&lt;/p&gt;</description>
 <comments>http://www.nodalpoint.org/2006/11/07/how_to_compile_a_database_of_citations#comments</comments>
 <category domain="http://www.nodalpoint.org/master_list/bioinformatics">Bioinformatics</category>
 <category domain="http://www.nodalpoint.org/science/bioinformatics">Bioinformatics</category>
 <category domain="http://www.nodalpoint.org/test_master_list/information_management/literature">Literature</category>
 <category domain="http://www.nodalpoint.org/computer_science/semantic_web">Semantic web</category>
 <pubDate>Tue, 07 Nov 2006 18:28:56 -0500</pubDate>
 <dc:creator>FiReaNG3L</dc:creator>
 <guid isPermaLink="false">2107 at http://www.nodalpoint.org</guid>
</item>
</channel>
</rss>
