<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xml:base="http://www.nodalpoint.org" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
 <title>nodalpoint.org - NAR - Comments</title>
 <link>http://www.nodalpoint.org/nodalpoint_tags/nar</link>
 <description>Comments for &quot;NAR&quot;</description>
 <language>en</language>
<item>
 <title>Yeah, need to update my</title>
 <link>http://www.nodalpoint.org/2008/01/18/one_thousand_databases_high_and_rising#comment-4335</link>
 <description>&lt;p&gt;Yeah, need to update my blog...&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Tue, 22 Jan 2008 19:47:42 -0500</pubDate>
 <dc:creator>ejain</dc:creator>
 <guid isPermaLink="false">comment 4335 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>700?</title>
 <link>http://www.nodalpoint.org/2008/01/18/one_thousand_databases_high_and_rising#comment-4334</link>
 <description>&lt;p&gt;At 700 a pop for an Action Figure?&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Tue, 22 Jan 2008 13:18:11 -0500</pubDate>
 <dc:creator>nuin</dc:creator>
 <guid isPermaLink="false">comment 4334 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>Dude, Where&#039;s My NAR?</title>
 <link>http://www.nodalpoint.org/2008/01/18/one_thousand_databases_high_and_rising#comment-4333</link>
 <description>&lt;p&gt;Hello Maximillian, I wasn&#039;t attacking NAR, was just wondering about what proportion of the data is dead. It is an interesting technical challenge to find this out. It would also be useful to measure the cost of gathering noisy, redundant and poorly understood data in terms of wasted resources (people, time, money, computers, false positives etc). Perhaps somebody has done something like this already? Especially with all the irresponsible sequencing just for sake of it that goes on...&lt;/p&gt;
&lt;p&gt;As for the physicists, I mentioned them for comparison. Like you, I doubt they will use 100% of their data either, but will probably use much more of it. However, they keep &lt;a href=&quot;http://www.newscientist.com/channel/fundamentals/mg19426103.300-particle-smasher-aims-for-may-2008-switchon.html&quot;&gt;delaying switching their big machine on&lt;/a&gt;, which means they are still waiting for the data. That is, unless when they finally flip the switch on the &lt;a href=&quot;http://en.wikipedia.org/wiki/Large_Hadron_Collider&quot;&gt;LHC&lt;/a&gt;, &lt;a href=&quot;http://popsci.typepad.com/popsci/2007/07/the-large-hadro.html&quot;&gt;we all disappear into a black hole&lt;/a&gt;, tombs and all :)&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Tue, 22 Jan 2008 13:09:27 -0500</pubDate>
 <dc:creator>Duncan</dc:creator>
 <guid isPermaLink="false">comment 4333 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>It doesn&#039;t matter, they got published</title>
 <link>http://www.nodalpoint.org/2008/01/18/one_thousand_databases_high_and_rising#comment-4332</link>
 <description>&lt;blockquote&gt;&lt;p&gt;Does it matter that large quantities of this data will probably never be used?&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Probably not, as their authors are already happy enough to have those published, and then they can &lt;strike&gt;stop mantaining&lt;/strike&gt; move on to another &lt;strike&gt;project&lt;/strike&gt; publication. It is indeed a sad state of affairs.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Tue, 22 Jan 2008 06:46:50 -0500</pubDate>
 <dc:creator>lbbros</dc:creator>
 <guid isPermaLink="false">comment 4332 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>Swiss Prot Databases / Action Figures</title>
 <link>http://www.nodalpoint.org/2008/01/18/one_thousand_databases_high_and_rising#comment-4331</link>
 <description>&lt;p&gt;Hi Eric, yeah even more redundancy there. Talking of SWISS Prot people, I&#039;m just wondering when your &lt;a href=&quot;http://eric.jain.name/2007/12/04/amos-bairoch-action-figure/&quot;&gt;Amos Bairoch action figure&lt;/a&gt; will be available in shops?!?! What are you up to now that you&#039;ve moved on from SWISS Prot to Seattle?&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Mon, 21 Jan 2008 18:12:33 -0500</pubDate>
 <dc:creator>Duncan</dc:creator>
 <guid isPermaLink="false">comment 4331 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>Why NAR</title>
 <link>http://www.nodalpoint.org/2008/01/18/one_thousand_databases_high_and_rising#comment-4328</link>
 <description>&lt;p&gt;I add the usual defense of the NAR databases: At least they are indexed by pubmed, so as there are biologists that are not used to checking in google if there are websites about their subject, they will find them quicker via NAR. The data is not dead,  websites can always be exported. (I&#039;ve made the experience that for smaller databses it&#039;s usually quicker to scrape the data from an html page with something like HTTP::Recorder than writing to the person responsible for the data to send you an sql dump.) In addition, people get papers for their databases like this and other people can cite the database properly, so NAR makes the web citable and advances someone&#039;s career a little bit. It already eases the transition from a traditional paper-based science to a more web-orientated world. Databases are peer-reviewed, so they don&#039;t contain complete crap, a paper in NAR assures some minimal quality. And a write-only database is better than none at all, at least someone has collected something and you can scrape it from its tomb.&lt;/p&gt;
&lt;p&gt;I would just love to see a minimal requirement for a publication in NAR: They should all offer some simple text-based export, e.g. tab-delimited flatfiles. That could save me a lot of time...  &lt;/p&gt;
&lt;p&gt;Of course, large quantities of these data are never used and never read. But, heck, this is research, right? 90% will not be used in the end. It&#039;s not too different from those 500 alignments algorithms, 200 genomic analyses, hundreds of papers that describe &quot;new&quot; cloning strategies, yet another new gene, etc.  I don&#039;t believe that the physisists use 100% of their 1,5 GB / sec either.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Sun, 20 Jan 2008 10:59:14 -0500</pubDate>
 <dc:creator>maximilianh</dc:creator>
 <guid isPermaLink="false">comment 4328 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>Also, think of all the</title>
 <link>http://www.nodalpoint.org/2008/01/18/one_thousand_databases_high_and_rising#comment-4325</link>
 <description>&lt;p&gt;Also, think of all the redundant effort that goes into setting up the basic technical infrastructure each database needs (data storage, user interfaces etc)...&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Fri, 18 Jan 2008 20:12:16 -0500</pubDate>
 <dc:creator>ejain</dc:creator>
 <guid isPermaLink="false">comment 4325 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>Databases in Peril</title>
 <link>http://www.nodalpoint.org/2007/01/05/nar_database_issue_2007#comment-3277</link>
 <description>&lt;p&gt;Thanks for all your comments, here are some thoughts...&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Quantity &lt;i&gt;is&lt;/i&gt; a significant problem&lt;/b&gt; we&#039;re not just talking about individual databases getting bigger and bigger like GenBank, we&#039;re talking about more different types of databases. Potentially we want to allow the combination of data from &lt;i&gt;any&lt;/i&gt; of these different databases and others that will appear in the future. Obviously, any given researcher probably isn&#039;t going to want to search all 900+ databases, but it would be beneficial to the wider scientific community if all these databases can easily interoperate. The more databases there are, the more challenging easy interoperation becomes, because there is more heterogeneity, more API&#039;s, more schemas etc.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Peer-reviewed publication can help assess quality&lt;/b&gt; this is what peer-review is for. The editors of this issue claim to look for good quality data as well as a good quality interface. As pointed out in the comments above &amp;#x201C;anyone with a modicum of knowledge can put a database or web app online&amp;#x201D;. By itself, this is not enough for publication. It is no good having great data with an awkward non-standard interface and vice versa. The NAR database issue may well be an &amp;#x201C;easy&amp;#x201D; publication, but it doesn&#039;t make it any less important. The &lt;a href=&quot;http://dx.doi.org/10.1038/4351010a&quot; rev=&quot;review&quot;&gt;Databases in Peril&lt;/a&gt; article, wouldn&#039;t have been possible if NAR hadn&#039;t been faithfully recording all this information in the first place. I suspect publication in the NAR database issue is harder than some suggested, it&#039;s not just a case of shoving a database on the web then writing a paper about it, you have to convince the reviewers the database is worthy: novel, useful and usable.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Churn is inevitable but the overall trend is still upward&lt;/b&gt; Databases (and tools) are not immortal, some are bound to wither and die eventually. Since last year 11 databases have gone this way, and the &lt;a href=&quot;http://dx.doi.org/10.1093/nar/gkl1008&quot;&gt;article&lt;/a&gt;, discusses why. The general trend is still upward and will probably keep going. In the long run, the longevity of database can be an indicator of its quality because somebody cares and is skilled enough to maintain and fund it for a long period of time. As for the databases that are &amp;#x201C;struggling financially&amp;#x201D; (according to Nature) how is this news? Struggling could mean anything.	 Haven&#039;t you always had to fight for sustained funding of any scientific project?&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Standards are boring (but important)&lt;/b&gt; it can be difficult to get standards work funded, done and published, what John Quackenbush calls &lt;a href=&quot;http://dx.doi.org/10.1038/msb4100052&quot; title=&quot;Standardising the standards Molecular Systems Biology 2, 1 (2006-02-21)&quot; rev=&quot;review&quot;&gt;Blue-collar science&lt;/a&gt;. It is  unglamorous but essential work, and nobody is going to win a nobel prize for creating a standard schema, ontology or whatever. What is the research contribution of creating a standard? Novelty? Discovery of new knowledge? This is partly why we have chaos, creating standards, in itself is often not considered &amp;#x201C;science&amp;#x201D; or &amp;#x201C;research&amp;#x201D;. But without them, science is much harder.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Integrated Search is hard&lt;/b&gt; We would all like integrated search &amp;#x201C;from one box&amp;#x201D;, but  the way to do this is still very much an open research question, not just in bioinformatics, but for 	computer science also. What is more, this is not merely an &amp;#x201C;IT problem&amp;#x201D;, there are novel and &lt;a href=&quot;http://dx.doi.org/10.1038/nrd1608&quot; rev=&quot;review&quot; title=&quot;Nature Reviews Drug Discovery 4, 45-58 (2005)&quot;&gt;serious scientific challenges&lt;/a&gt; in achieving this. If it was easy and straight forward to provide integrated search to all these databases, don&#039;t you think somebody would have done it by now? Until that time, we have &lt;a href=&quot;http://www.ncbi.nlm.nih.gov/gquery/gquery.fcgi&quot;&gt;Entrez Global Query&lt;/a&gt;...&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Fri, 12 Jan 2007 10:48:39 -0500</pubDate>
 <dc:creator>Duncan</dc:creator>
 <guid isPermaLink="false">comment 3277 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>Quantity not the problem</title>
 <link>http://www.nodalpoint.org/2007/01/05/nar_database_issue_2007#comment-3267</link>
 <description>&lt;p&gt;Not drowning.  More data are good - the more, the better.  Only a few of those databases are relevant to an individual researcher.&lt;/p&gt;
&lt;p&gt;As others have already commented, the problems are (1) the quality of the databases, (2) their diverse, &quot;higgledy-piggledy&quot; nature (no standards, APIs, integration) and (3) their longevity, or lack thereof.  Frankly, anyone with a modicum of SQL and CGI knowledge can put a database or web app online.  So they do.  You can&#039;t legislate against bad web resources.&lt;/p&gt;
&lt;p&gt;I would question whether these annual issues still serve any useful purpose, other than to make the journal appear authoritative or provide an avenue for an easy publication.  If I&#039;m looking for an online resource I start with Google, not an outdated journal article.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Sun, 07 Jan 2007 08:10:00 -0500</pubDate>
 <dc:creator>Neil</dc:creator>
 <guid isPermaLink="false">comment 3267 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>Need standards</title>
 <link>http://www.nodalpoint.org/2007/01/05/nar_database_issue_2007#comment-3265</link>
 <description>&lt;p&gt;Great article Duncan, thanks for bringing this on.&lt;br /&gt;
I have seen lot of people accessing low quality data from many well-known db&#039;s and high quality data in not-so-well-known ones.&lt;br /&gt;
For eg, GBK files mostly does not talk about quality while it&#039;s ASN.1 counterpart might offer it [ &lt;a href=&quot;http://getentry.ddbj.nig.ac.jp/cgi-bin/get_entry.pl?AF207953&quot; title=&quot;http://getentry.ddbj.nig.ac.jp/cgi-bin/get_entry.pl?AF207953&quot;&gt;http://getentry.ddbj.nig.ac.jp/cgi-bin/get_entry.pl?AF207953&lt;/a&gt; ].&lt;br /&gt;
Regarding algorithms to analyze these data, any comment will be like a troll.&lt;br /&gt;
To minimize this, I feel, something like Bioinformatics oriented DIGG will be great.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Sun, 07 Jan 2007 05:02:36 -0500</pubDate>
 <dc:creator>Animesh</dc:creator>
 <guid isPermaLink="false">comment 3265 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>Vaporware</title>
 <link>http://www.nodalpoint.org/2007/01/05/nar_database_issue_2007#comment-3264</link>
 <description>&lt;p&gt;Each time a new annual issue of NAR is published I remember this paper from Nature.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://www.nature.com/nature/journal/v435/n7045/full/4351010a.html&quot;&gt;http://www.nature.com/nature/journal/v435/n7045/full/4351010a.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Databases in peril&lt;br /&gt;
Zeeya Merali and Jim Giles&lt;br /&gt;
Nature 435, 1010-1011 (23 June 2005)&lt;br /&gt;
doi: 10.1038/4351010a&lt;/p&gt;
&lt;p&gt;&lt;cite&gt;Nature contacted 89 databases listed in the Molecular Biology Database Collection (Nucl. Acids Res.28 1−7; 2000) to see how many still have funding five years on. Of these, 51 reported that they are struggling financially. Seven of these have closed; the rest are being updated sporadically in their owners&#039; spare time.&lt;/cite&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;http://www.nature.com/nature/journal/v435/n7045/images/4351010a-i3.0.jpg&quot;/&gt;&lt;/p&gt;
&lt;p&gt;Pierre&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Fri, 05 Jan 2007 16:23:41 -0500</pubDate>
 <dc:creator>lindenb</dc:creator>
 <guid isPermaLink="false">comment 3264 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>Drowning!!!</title>
 <link>http://www.nodalpoint.org/2007/01/05/nar_database_issue_2007#comment-3263</link>
 <description>&lt;p&gt;I think that we&#039;re going to drown at this rate.  Not because there are too many databases.  Those can, and perhaps should be spread far and wide.  My concerns&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Quality&lt;/b&gt;.  How do we know whether the results in our hands are any good?  Can we glean meaningful knowledge from them?&lt;br /&gt;
&lt;b&gt;Integrated search&lt;/b&gt;.  I don&#039;t want to go to every database and search there.  I want to search from one box&lt;br /&gt;
&lt;b&gt;Standards&lt;/b&gt;.  I want the data to follow certain minimum standards.&lt;/p&gt;
&lt;p&gt;What was that about airlines :-)?&lt;/p&gt;
&lt;p&gt;My Blog: &lt;a href=&quot;http://mndoci.com&quot; title=&quot;http://mndoci.com&quot;&gt;http://mndoci.com&lt;/a&gt;&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Fri, 05 Jan 2007 16:12:25 -0500</pubDate>
 <dc:creator>mndoci</dc:creator>
 <guid isPermaLink="false">comment 3263 at http://www.nodalpoint.org</guid>
</item>
</channel>
</rss>
