<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xml:base="http://www.nodalpoint.org" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
 <title>nodalpoint.org - Resources for text mining? - Comments</title>
 <link>http://www.nodalpoint.org/2006/11/16/resources_for_text_mining</link>
 <description>Comments for &quot;Resources for text mining?&quot;</description>
 <language>en</language>
<item>
 <title>Firstly thanks for the</title>
 <link>http://www.nodalpoint.org/2006/11/16/resources_for_text_mining#comment-3241</link>
 <description>&lt;p&gt;Firstly thanks for the informative comment. I&#039;ll be saving those books to my Amazon wishilist. &lt;/p&gt;
&lt;p&gt;Second, in case anyone from industry/business is reading this, Bob&#039;s comment above is an excellent way to get people in a community forum like nodalpoint to notice your products without being annoying about it.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Wed, 06 Dec 2006 02:38:15 -0500</pubDate>
 <dc:creator>Greg</dc:creator>
 <guid isPermaLink="false">comment 3241 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>Text Data Mining: Books and  Open Source Software</title>
 <link>http://www.nodalpoint.org/2006/11/16/resources_for_text_mining#comment-3240</link>
 <description>&lt;p&gt;The best book to start with for a big picture is Witten and Frank&#039;s &quot;Data Mining, 2nd Edition&quot;.  The best book for the gory details of the math is Hastie et al.&#039;s &quot;The Elements of Statistical Learning&quot;.  Unfortunately, neither of these are specifically about text data mining.   For text, the best reference is still Manning and Schuetze&#039;s &quot;Foundations of Statistical Language Processing&quot;, but it doesn&#039;t give you nearly the same kind of big picture view of data mining, nor does it cover as many classification and clustering techniques.&lt;/p&gt;
&lt;p&gt;You can also look at some of the open source software packages.  For instance, we offer &lt;a href=&quot;http://www.alias-i.com/lingpipe&quot;&gt;LingPipe&lt;/a&gt;, which you can download with Java source and doc.  There are tutorials for downloading MEDLINE, parsing its XML format, extracting named entities (e.g. proteins and cell lines), and putting the results in a MySQL database.  There are also tutorials on indexing MEDLINE for search, doing part-of-speech tagging for biology texts, extracting named entities, doing sentence extracting for biology texts, etc.&lt;/p&gt;
&lt;p&gt;There&#039;s a similar package released on SourceForge called OpenNLP, but it doesn&#039;t contain biology specific modules as far as I know.  It&#039;s a little more researchy and less industrial than ours.  And then there&#039;s Steve Bird et al.&#039;s NLTK package previously mentioned, which is aimed at learning, and is written in Python.  Then there are some more sophisticated statistical packages, such as Andrew McCallum&#039;s Mallet (UMass) and William Cohen&#039;s MinorThird (CMU), both in Java.  Cohen, in particular, has done a lot with MinorThird in biomedical text data mining.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://www.colloquial.com/carp&quot;&gt;Bob Carpenter&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://www.alias-i.com/lingpipe&quot;&gt;Alias-i, Inc.&lt;/a&gt;&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Tue, 05 Dec 2006 13:00:33 -0500</pubDate>
 <dc:creator>Bob Carpenter</dc:creator>
 <guid isPermaLink="false">comment 3240 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>Thanks for both comments.</title>
 <link>http://www.nodalpoint.org/2006/11/16/resources_for_text_mining#comment-3220</link>
 <description>&lt;p&gt;Thanks for both comments. The book looks interesting, I&#039;ll give it a go.&lt;br /&gt;
As for Python, by chance it&#039;s exactly the programming language I&#039;m learning, so that library will come in useful.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Thu, 16 Nov 2006 09:03:26 -0500</pubDate>
 <dc:creator>lbbros</dc:creator>
 <guid isPermaLink="false">comment 3220 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>Text mining for Biology and Medicine</title>
 <link>http://www.nodalpoint.org/2006/11/16/resources_for_text_mining#comment-3219</link>
 <description>&lt;p&gt;You might find &lt;a href=&quot;http://www.amazon.com/exec/obidos/ASIN/158053984X&quot;&gt;Text mining for Biology and Medicine&lt;/a&gt; a useful starting point if you are new to this field.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Thu, 16 Nov 2006 08:09:22 -0500</pubDate>
 <dc:creator>Duncan</dc:creator>
 <guid isPermaLink="false">comment 3219 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>It all depends on what</title>
 <link>http://www.nodalpoint.org/2006/11/16/resources_for_text_mining#comment-3218</link>
 <description>&lt;p&gt;It all depends on what language you want to use. If you&#039;re willing to give python a try you might want to investigate the &lt;a href=&quot;http://nltk.sourceforge.net/&quot;&gt;Natural Language Tool Kit&lt;/a&gt;. It was designed to be a base library for teaching NLP in undergraduate computer science classes. It is general library covering most of the popular methods of text mining, and not specific to biology. The documentation is also good, with a lot of background material on NLP in the &lt;a href=&quot;http://nltk.sourceforge.net/lite/doc/en/&quot;&gt;tutorial&lt;/a&gt;.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <pubDate>Thu, 16 Nov 2006 06:45:08 -0500</pubDate>
 <dc:creator>Greg</dc:creator>
 <guid isPermaLink="false">comment 3218 at http://www.nodalpoint.org</guid>
</item>
<item>
 <title>Resources for text mining?</title>
 <link>http://www.nodalpoint.org/2006/11/16/resources_for_text_mining</link>
 <description>&lt;p&gt;Hello.&lt;br /&gt;
Recently I began studying the basics of text mining (related to literature mining) for a small project in our laboratory. I would like to know if there are any good introductory resources (be either online, or books) to get a good overview of the subject in a biological perspective (because I&#039;m a biotechnologist and I&#039;m still new to the computational field). Any help would be appreciated.&lt;br /&gt;
Thanks a lot.&lt;/p&gt;
&lt;br class=&quot;clear&quot; /&gt;</description>
 <comments>http://www.nodalpoint.org/2006/11/16/resources_for_text_mining#comments</comments>
 <category domain="http://www.nodalpoint.org/forums/discussion/bioinformatics_0">Bioinformatics</category>
 <pubDate>Thu, 16 Nov 2006 04:26:16 -0500</pubDate>
 <dc:creator>lbbros</dc:creator>
 <guid isPermaLink="false">2110 at http://www.nodalpoint.org</guid>
</item>
</channel>
</rss>
