<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xml:base="http://www.nodalpoint.org" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
 <title>nodalpoint.org - database - Comments</title>
 <link>http://www.nodalpoint.org/nodalpoint_tags/database</link>
 <description>Comments for &quot;database&quot;</description>
 <language>en</language>
<item>
 <title>Yeah, need to update my</title>
 <link>http://www.nodalpoint.org/2008/01/18/one_thousand_databases_high_and_rising#comment-4335</link>
 <description>&lt;p&gt;Yeah, need to update my blog...&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Tue, 22 Jan 2008 19:47:42 -0500</pubDate>
 <dc:creator>ejain</dc:creator>
 <guid isPermaLink="false">comment 4335 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>700?</title>
 <link>http://www.nodalpoint.org/2008/01/18/one_thousand_databases_high_and_rising#comment-4334</link>
 <description>&lt;p&gt;At 700 a pop for an Action Figure?&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Tue, 22 Jan 2008 13:18:11 -0500</pubDate>
 <dc:creator>nuin</dc:creator>
 <guid isPermaLink="false">comment 4334 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>Dude, Where&#039;s My NAR?</title>
 <link>http://www.nodalpoint.org/2008/01/18/one_thousand_databases_high_and_rising#comment-4333</link>
 <description>&lt;p&gt;Hello Maximillian, I wasn&#039;t attacking NAR, was just wondering about what proportion of the data is dead. It is an interesting technical challenge to find this out. It would also be useful to measure the cost of gathering noisy, redundant and poorly understood data in terms of wasted resources (people, time, money, computers, false positives etc). Perhaps somebody has done something like this already? Especially with all the irresponsible sequencing just for sake of it that goes on...&lt;/p&gt;
&lt;p&gt;As for the physicists, I mentioned them for comparison. Like you, I doubt they will use 100% of their data either, but will probably use much more of it. However, they keep &lt;a href=&quot;http://www.newscientist.com/channel/fundamentals/mg19426103.300-particle-smasher-aims-for-may-2008-switchon.html&quot;&gt;delaying switching their big machine on&lt;/a&gt;, which means they are still waiting for the data. That is, unless when they finally flip the switch on the &lt;a href=&quot;http://en.wikipedia.org/wiki/Large_Hadron_Collider&quot;&gt;LHC&lt;/a&gt;, &lt;a href=&quot;http://popsci.typepad.com/popsci/2007/07/the-large-hadro.html&quot;&gt;we all disappear into a black hole&lt;/a&gt;, tombs and all :)&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Tue, 22 Jan 2008 13:09:27 -0500</pubDate>
 <dc:creator>Duncan</dc:creator>
 <guid isPermaLink="false">comment 4333 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>It doesn&#039;t matter, they got published</title>
 <link>http://www.nodalpoint.org/2008/01/18/one_thousand_databases_high_and_rising#comment-4332</link>
 <description>&lt;blockquote&gt;&lt;p&gt;Does it matter that large quantities of this data will probably never be used?&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Probably not, as their authors are already happy enough to have those published, and then they can &lt;strike&gt;stop mantaining&lt;/strike&gt; move on to another &lt;strike&gt;project&lt;/strike&gt; publication. It is indeed a sad state of affairs.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Tue, 22 Jan 2008 06:46:50 -0500</pubDate>
 <dc:creator>lbbros</dc:creator>
 <guid isPermaLink="false">comment 4332 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>Swiss Prot Databases / Action Figures</title>
 <link>http://www.nodalpoint.org/2008/01/18/one_thousand_databases_high_and_rising#comment-4331</link>
 <description>&lt;p&gt;Hi Eric, yeah even more redundancy there. Talking of SWISS Prot people, I&#039;m just wondering when your &lt;a href=&quot;http://eric.jain.name/2007/12/04/amos-bairoch-action-figure/&quot;&gt;Amos Bairoch action figure&lt;/a&gt; will be available in shops?!?! What are you up to now that you&#039;ve moved on from SWISS Prot to Seattle?&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Mon, 21 Jan 2008 18:12:33 -0500</pubDate>
 <dc:creator>Duncan</dc:creator>
 <guid isPermaLink="false">comment 4331 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>Why NAR</title>
 <link>http://www.nodalpoint.org/2008/01/18/one_thousand_databases_high_and_rising#comment-4328</link>
 <description>&lt;p&gt;I add the usual defense of the NAR databases: At least they are indexed by pubmed, so as there are biologists that are not used to checking in google if there are websites about their subject, they will find them quicker via NAR. The data is not dead,  websites can always be exported. (I&#039;ve made the experience that for smaller databses it&#039;s usually quicker to scrape the data from an html page with something like HTTP::Recorder than writing to the person responsible for the data to send you an sql dump.) In addition, people get papers for their databases like this and other people can cite the database properly, so NAR makes the web citable and advances someone&#039;s career a little bit. It already eases the transition from a traditional paper-based science to a more web-orientated world. Databases are peer-reviewed, so they don&#039;t contain complete crap, a paper in NAR assures some minimal quality. And a write-only database is better than none at all, at least someone has collected something and you can scrape it from its tomb.&lt;/p&gt;
&lt;p&gt;I would just love to see a minimal requirement for a publication in NAR: They should all offer some simple text-based export, e.g. tab-delimited flatfiles. That could save me a lot of time...  &lt;/p&gt;
&lt;p&gt;Of course, large quantities of these data are never used and never read. But, heck, this is research, right? 90% will not be used in the end. It&#039;s not too different from those 500 alignments algorithms, 200 genomic analyses, hundreds of papers that describe &quot;new&quot; cloning strategies, yet another new gene, etc.  I don&#039;t believe that the physisists use 100% of their 1,5 GB / sec either.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Sun, 20 Jan 2008 10:59:14 -0500</pubDate>
 <dc:creator>maximilianh</dc:creator>
 <guid isPermaLink="false">comment 4328 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>Also, think of all the</title>
 <link>http://www.nodalpoint.org/2008/01/18/one_thousand_databases_high_and_rising#comment-4325</link>
 <description>&lt;p&gt;Also, think of all the redundant effort that goes into setting up the basic technical infrastructure each database needs (data storage, user interfaces etc)...&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Fri, 18 Jan 2008 20:12:16 -0500</pubDate>
 <dc:creator>ejain</dc:creator>
 <guid isPermaLink="false">comment 4325 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>Try X-Path it is pretty</title>
 <link>http://www.nodalpoint.org/2007/05/02/database_or_flat_text_file#comment-3668</link>
 <description>&lt;p&gt;Try X-Path it is pretty straightforward. You need to use libraries. Perl and Python i believe they all have X-path libraries.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Wed, 13 Jun 2007 14:52:31 -0400</pubDate>
 <dc:creator>paladinjack</dc:creator>
 <guid isPermaLink="false">comment 3668 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>XML is the hit!</title>
 <link>http://www.nodalpoint.org/2007/05/02/database_or_flat_text_file#comment-3667</link>
 <description>&lt;p&gt;However when you have a huge XML file X-Path could not be working as it needs to parse the entire document into tree structure for querying, and it could be slow. For a reasonably medium sized dataset, XML is a good substitution for database.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Wed, 13 Jun 2007 14:51:20 -0400</pubDate>
 <dc:creator>paladinjack</dc:creator>
 <guid isPermaLink="false">comment 3667 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>For splicing, python dictionaries could be better than databases</title>
 <link>http://www.nodalpoint.org/2007/05/02/database_or_flat_text_file#comment-3665</link>
 <description>&lt;p&gt;Read this document:&lt;br /&gt;
- &lt;a href=&quot;http://bioinfo.mbi.ucla.edu/pygr_0_5_0/seq-align.html#SECTION000110000000000000000&quot; title=&quot;http://bioinfo.mbi.ucla.edu/pygr_0_5_0/seq-align.html#SECTION000110000000000000000&quot;&gt;http://bioinfo.mbi.ucla.edu/pygr_0_5_0/seq-align.html#SECTION00011000000...&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;they say that databases could be complicated to use if you work with splicing or alternative splicing, or at least that you should look for some libraries.&lt;/p&gt;
&lt;p&gt;Bye :)&lt;br /&gt;
--&lt;br /&gt;
&lt;a href=&quot;http://genome.imim.es/~giovanni&quot; title=&quot;http://genome.imim.es/~giovanni&quot;&gt;http://genome.imim.es/~giovanni&lt;/a&gt;&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Tue, 12 Jun 2007 12:00:56 -0400</pubDate>
 <dc:creator>dalloliogm</dc:creator>
 <guid isPermaLink="false">comment 3665 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>Databases or XML? Invest in your future...</title>
 <link>http://www.nodalpoint.org/2007/05/02/database_or_flat_text_file#comment-3591</link>
 <description>&lt;p&gt;There is quite a steep learning curve with both relational databases and XML. However, whichever you finally choose, they are both worth learning and will pay dividends in the long run.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Thu, 10 May 2007 06:50:01 -0400</pubDate>
 <dc:creator>Duncan</dc:creator>
 <guid isPermaLink="false">comment 3591 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>I also thought about XML,</title>
 <link>http://www.nodalpoint.org/2007/05/02/database_or_flat_text_file#comment-3590</link>
 <description>&lt;p&gt;I also thought about XML, but I found it quite complex, mostly because of the learning curve involved (which looked steep at the time I investigated). I tried to write something simple to parse Entrez Gene records (as a means to get a common ground for my datasets) in Python back then but I had to admit defeat due to lack of knowledge.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Wed, 09 May 2007 16:26:30 -0400</pubDate>
 <dc:creator>lbbros</dc:creator>
 <guid isPermaLink="false">comment 3590 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>Thanks for the suggestions.</title>
 <link>http://www.nodalpoint.org/2007/05/02/database_or_flat_text_file#comment-3589</link>
 <description>&lt;p&gt;Thanks for the suggestions. I thought about a database exactly because I had to infer relationships between my dataset and some others I obtained from the literature, and some of those relationships weren&#039;t exactly straightforward. I am into stage two at the moment, using SQLite as I don&#039;t want to pollute our MySQL database (we have an internal server with many dumps from the UCSC Genome Browser and other things). &lt;/p&gt;
&lt;p&gt;I had to put things on hold right now (the deadline for the summary of the thesis is next month), but I&#039;ll resume soon. I&#039;ll give a look to UML for sure and decide if it&#039;s worth implementing something more complex.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Wed, 09 May 2007 16:20:51 -0400</pubDate>
 <dc:creator>lbbros</dc:creator>
 <guid isPermaLink="false">comment 3589 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>Database or flat file? Could try XML</title>
 <link>http://www.nodalpoint.org/2007/05/02/database_or_flat_text_file#comment-3588</link>
 <description>&lt;p&gt;Its horses for courses, as people have said. If its a one-off hack, that nobody (including you) will ever want to re-run or re-use, flat files are the thing. However, this is rarely the case. If you don&#039;t want to go through all the hassle of learning about relational databases, you could try XML. It doesn&#039;t force you to come up with complex schemas from the beginning, but gives you powerful mechanisms for querying and manipulating your data (XPath, XQuery and XSLT). You can still have simple flat-files of course, but you won&#039;t have to write your own parsers, since there are tons of XML parsers out there to choose from. Just my $0.02 ... hope it helps.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Wed, 09 May 2007 13:26:37 -0400</pubDate>
 <dc:creator>Duncan</dc:creator>
 <guid isPermaLink="false">comment 3588 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>It does indeed depend</title>
 <link>http://www.nodalpoint.org/2007/05/02/database_or_flat_text_file#comment-3587</link>
 <description>&lt;p&gt;Greg has it about covered.  It&#039;s appropriate to use a database when it&#039;s inappropriate to use a flat file.  And &lt;i&gt;vice versa&lt;/i&gt; :)&lt;/p&gt;
&lt;p&gt;If you don&#039;t need (a) structure and (b) queries for the data in your current project, then stick with flat file for the sake of speed.  However, it&#039;s well worth learning databases at some stage in your career.  The best way to learn new stuff is to apply it to one of your projects, rather than just saying &quot;well, I&#039;ll teach myself one day when I have a moment&quot;.  You&#039;ll never have a moment and trying to learn something in an abstract sense is much harder than in a practical sense, where you can see the relevance to your data.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Wed, 09 May 2007 06:37:49 -0400</pubDate>
 <dc:creator>Neil</dc:creator>
 <guid isPermaLink="false">comment 3587 at http://www.nodalpoint.org</guid>
</item>
</channel>
</rss>
