<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xml:base="http://www.nodalpoint.org" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
 <title>nodalpoint.org - Database or flat text file? - Comments</title>
 <link>http://www.nodalpoint.org/2007/05/02/database_or_flat_text_file</link>
 <description>Comments for &quot;Database or flat text file?&quot;</description>
 <language>en</language>
<item>
 <title>Try X-Path it is pretty</title>
 <link>http://www.nodalpoint.org/2007/05/02/database_or_flat_text_file#comment-3668</link>
 <description>&lt;p&gt;Try X-Path it is pretty straightforward. You need to use libraries. Perl and Python i believe they all have X-path libraries.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Wed, 13 Jun 2007 14:52:31 -0400</pubDate>
 <dc:creator>paladinjack</dc:creator>
 <guid isPermaLink="false">comment 3668 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>XML is the hit!</title>
 <link>http://www.nodalpoint.org/2007/05/02/database_or_flat_text_file#comment-3667</link>
 <description>&lt;p&gt;However when you have a huge XML file X-Path could not be working as it needs to parse the entire document into tree structure for querying, and it could be slow. For a reasonably medium sized dataset, XML is a good substitution for database.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Wed, 13 Jun 2007 14:51:20 -0400</pubDate>
 <dc:creator>paladinjack</dc:creator>
 <guid isPermaLink="false">comment 3667 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>For splicing, python dictionaries could be better than databases</title>
 <link>http://www.nodalpoint.org/2007/05/02/database_or_flat_text_file#comment-3665</link>
 <description>&lt;p&gt;Read this document:&lt;br /&gt;
- &lt;a href=&quot;http://bioinfo.mbi.ucla.edu/pygr_0_5_0/seq-align.html#SECTION000110000000000000000&quot; title=&quot;http://bioinfo.mbi.ucla.edu/pygr_0_5_0/seq-align.html#SECTION000110000000000000000&quot;&gt;http://bioinfo.mbi.ucla.edu/pygr_0_5_0/seq-align.html#SECTION00011000000...&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;they say that databases could be complicated to use if you work with splicing or alternative splicing, or at least that you should look for some libraries.&lt;/p&gt;
&lt;p&gt;Bye :)&lt;br /&gt;
--&lt;br /&gt;
&lt;a href=&quot;http://genome.imim.es/~giovanni&quot; title=&quot;http://genome.imim.es/~giovanni&quot;&gt;http://genome.imim.es/~giovanni&lt;/a&gt;&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Tue, 12 Jun 2007 12:00:56 -0400</pubDate>
 <dc:creator>dalloliogm</dc:creator>
 <guid isPermaLink="false">comment 3665 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>Databases or XML? Invest in your future...</title>
 <link>http://www.nodalpoint.org/2007/05/02/database_or_flat_text_file#comment-3591</link>
 <description>&lt;p&gt;There is quite a steep learning curve with both relational databases and XML. However, whichever you finally choose, they are both worth learning and will pay dividends in the long run.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Thu, 10 May 2007 06:50:01 -0400</pubDate>
 <dc:creator>Duncan</dc:creator>
 <guid isPermaLink="false">comment 3591 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>I also thought about XML,</title>
 <link>http://www.nodalpoint.org/2007/05/02/database_or_flat_text_file#comment-3590</link>
 <description>&lt;p&gt;I also thought about XML, but I found it quite complex, mostly because of the learning curve involved (which looked steep at the time I investigated). I tried to write something simple to parse Entrez Gene records (as a means to get a common ground for my datasets) in Python back then but I had to admit defeat due to lack of knowledge.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Wed, 09 May 2007 16:26:30 -0400</pubDate>
 <dc:creator>lbbros</dc:creator>
 <guid isPermaLink="false">comment 3590 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>Thanks for the suggestions.</title>
 <link>http://www.nodalpoint.org/2007/05/02/database_or_flat_text_file#comment-3589</link>
 <description>&lt;p&gt;Thanks for the suggestions. I thought about a database exactly because I had to infer relationships between my dataset and some others I obtained from the literature, and some of those relationships weren&#039;t exactly straightforward. I am into stage two at the moment, using SQLite as I don&#039;t want to pollute our MySQL database (we have an internal server with many dumps from the UCSC Genome Browser and other things). &lt;/p&gt;
&lt;p&gt;I had to put things on hold right now (the deadline for the summary of the thesis is next month), but I&#039;ll resume soon. I&#039;ll give a look to UML for sure and decide if it&#039;s worth implementing something more complex.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Wed, 09 May 2007 16:20:51 -0400</pubDate>
 <dc:creator>lbbros</dc:creator>
 <guid isPermaLink="false">comment 3589 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>Database or flat file? Could try XML</title>
 <link>http://www.nodalpoint.org/2007/05/02/database_or_flat_text_file#comment-3588</link>
 <description>&lt;p&gt;Its horses for courses, as people have said. If its a one-off hack, that nobody (including you) will ever want to re-run or re-use, flat files are the thing. However, this is rarely the case. If you don&#039;t want to go through all the hassle of learning about relational databases, you could try XML. It doesn&#039;t force you to come up with complex schemas from the beginning, but gives you powerful mechanisms for querying and manipulating your data (XPath, XQuery and XSLT). You can still have simple flat-files of course, but you won&#039;t have to write your own parsers, since there are tons of XML parsers out there to choose from. Just my $0.02 ... hope it helps.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Wed, 09 May 2007 13:26:37 -0400</pubDate>
 <dc:creator>Duncan</dc:creator>
 <guid isPermaLink="false">comment 3588 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>It does indeed depend</title>
 <link>http://www.nodalpoint.org/2007/05/02/database_or_flat_text_file#comment-3587</link>
 <description>&lt;p&gt;Greg has it about covered.  It&#039;s appropriate to use a database when it&#039;s inappropriate to use a flat file.  And &lt;i&gt;vice versa&lt;/i&gt; :)&lt;/p&gt;
&lt;p&gt;If you don&#039;t need (a) structure and (b) queries for the data in your current project, then stick with flat file for the sake of speed.  However, it&#039;s well worth learning databases at some stage in your career.  The best way to learn new stuff is to apply it to one of your projects, rather than just saying &quot;well, I&#039;ll teach myself one day when I have a moment&quot;.  You&#039;ll never have a moment and trying to learn something in an abstract sense is much harder than in a practical sense, where you can see the relevance to your data.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Wed, 09 May 2007 06:37:49 -0400</pubDate>
 <dc:creator>Neil</dc:creator>
 <guid isPermaLink="false">comment 3587 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>There are three stages to</title>
 <link>http://www.nodalpoint.org/2007/05/02/database_or_flat_text_file#comment-3585</link>
 <description>&lt;p&gt;There are three stages to this. First you start out with a tab delimited text file. This is great for parsing using scripting languages, and writing simple queries, again using scripting languages. The problem is that at some point your intuition tells you that you should be doing more to structure your data, find relationships in your data etc. This then leads you to the never ending quest for the right database and the right schema. It is easy to dismiss relational databases at this point, when schema design and database choice side track you from your real goal. &lt;/p&gt;
&lt;p&gt;My suggestion is to move on to stage two, use a relational database, but only for the added power that SQL gives you when querying a single table. It doesn&#039;t matter which one you use, MySQL is quite straight forward, and you don&#039;t need a complex schema. Just dump your existing flat file into a simple table that models the existing data types. Now you can use your scripting language&#039;s MySQL integration to pull out rows, query the table using conditions and limits etc. without having to write ad hoc query code in you analysis scripts. &lt;/p&gt;
&lt;p&gt;If you think you can avoid the trap of finding the right schema/database, then moving on to stage three, and trying to model relationships in your data might be worth it. The setup time will cost you, but if you do it right, then you will have the full power of SQL to do your data mining. Some kind of visual modeling language can be helpful (e.g. UML) but relational database modeling is also an &#039;end in itself&#039;, try to avoid it at all costs, and focus on getting the best scientific value from your data.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Tue, 08 May 2007 14:16:45 -0400</pubDate>
 <dc:creator>Greg</dc:creator>
 <guid isPermaLink="false">comment 3585 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>database or flat file - it depends</title>
 <link>http://www.nodalpoint.org/2007/05/02/database_or_flat_text_file#comment-3583</link>
 <description>&lt;p&gt;It depends on what are your skills with computers and databases.&lt;/p&gt;
&lt;p&gt;Actually I&#039;m using flat files, because of three reasons:&lt;br /&gt;
- I don&#039;t have too much data and most of the files I have are fasta sequences of gffs;&lt;br /&gt;
- I don&#039;t know very well how to work with databases, I think that sometimes it&#039;s easier to use grep/perl scripts from the unix command line instead of interrogating the database every time.&lt;br /&gt;
- I&#039;m too lazy to ask the systems managers to create a database for me, and I don&#039;t want to bring my laptop every day at work :( &lt;/p&gt;
&lt;p&gt;Anyway I believe that if you know how to handle data in a database you should do it, read for example this post:&lt;br /&gt;
- &lt;a href=&quot;http://www.bioinformaticszen.com/2007/02/bioinformatics-use-a-database-for-data/&quot; title=&quot;http://www.bioinformaticszen.com/2007/02/bioinformatics-use-a-database-for-data/&quot;&gt;http://www.bioinformaticszen.com/2007/02/bioinformatics-use-a-database-f...&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;---&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://dalloliogm.wordpress.com&quot; title=&quot;http://dalloliogm.wordpress.com&quot;&gt;http://dalloliogm.wordpress.com&lt;/a&gt;&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Tue, 08 May 2007 09:33:07 -0400</pubDate>
 <dc:creator>dalloliogm</dc:creator>
 <guid isPermaLink="false">comment 3583 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>Database or flat text file?</title>
 <link>http://www.nodalpoint.org/2007/05/02/database_or_flat_text_file</link>
 <description>&lt;p&gt;As the deadline for my Ph.D. thesis looms over me, I&#039;ve been working recently with a certain quantity of data of the same type (basically a variant of microarray gene expression) but from different sources and/or software.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;&lt;p&gt;&lt;a href=&quot;http://www.nodalpoint.org/2007/05/02/database_or_flat_text_file&quot;&gt;read more&lt;/a&gt;&lt;/p&gt;</description>
 <comments>http://www.nodalpoint.org/2007/05/02/database_or_flat_text_file#comments</comments>
 <category domain="http://www.nodalpoint.org/forums/discussion/bioinformatics_0">Bioinformatics</category>
 <category domain="http://www.nodalpoint.org/nodalpoint_tags/csv">csv</category>
 <category domain="http://www.nodalpoint.org/nodalpoint_tags/database">database</category>
 <category domain="http://www.nodalpoint.org/nodalpoint_tags/file">file</category>
 <category domain="http://www.nodalpoint.org/nodalpoint_tags/integration">integration</category>
 <pubDate>Wed, 02 May 2007 14:11:12 -0400</pubDate>
 <dc:creator>lbbros</dc:creator>
 <guid isPermaLink="false">2238 at http://www.nodalpoint.org</guid>
</item>
</channel>
</rss>
