Machine Learning for better Clinical Gene Expression Signatures

Machine Learning Algorithms
for Clinical and Research Microarray Data Analysis

Mining Microarray Data to Discover:

Disease Biomarkers & Complex Genetic Relationships

Biomind LLC WHITE PAPER

January 2006

Molecular biomarkers associated with disease and disease predisposition may be used for diagnostic purposes in the early detection and characterization of various disorders. Microarray and SNP data have been used extensively based upon their respectively high resolution of gene expression and polymorphism. And, while diagnostic, pharmacogenomic, and research uses for such biomarkers have proliferated, methods for their identification have standardized. Biomind has developed software which sifts through large, complex microarray datasets to accurately identify biomarkers implicit in clinical disease data. The software uses machine learning algorithms which integrate the Gene Ontology (GO) and Protein Information Resource (PIR).

Traditional methods for identifying biomarkers in microarray data rely upon differential expression followed by clustering techniques. These methods give insight into the effects of individual genes, considered in isolation. However, many experimentally important genes are not significantly (over or under) expressed in the relevant samples, and are therefore ignored by differentiation analysis. Machine learning algorithms which search for nonlinear patterns and integrate relevant knowledge resources can identify a more complete set of experimentally important gene and gene features. These nonlinear patterns better identify the biomarkers which explain clinical outcomes and endpoints. The process highlights relevant genes, gene combinations and gene interactions implicit in the microarray data.

Biomind’s machine learning algorithms generate a set of mathematical rules or classification models which best explain how microarray data is distributed among predetermined categories (e.g. case vs. control, time series etc.). Mathematicians refer to this sort of machine learning process as supervised categorization. Decision trees, neural networks, logistic regression, support vector machines, and genetic programming are all examples of supervised categorization algorithms. Biomind utilizes support vector machines and genetic programming in its commercial analysis software, ArrayGenius™.

Biomind’s classification models combine genes, gene combinations, gene ontologies, and protein families, collectively referred to as features, in complex algebraic equations. Features are derived from the experimental microarray data and its links to the Gene Ontology and Protein Information Resource. The 100 most valuable mathematical rules found for dividing the data between categories are listed. The members of this “model ensemble


Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

So I assume you're Douglas

So I assume you're Douglas Bodde, VP Sales and Marketing from Biomind ?

I'm am not fundamentally against biotech companies posting informative news/announcements on nodalpoint. However cutting and pasting from press releases and white papers is the wrong way of going about it. It comes across as a little bit contemptuous of the people who run/frequent the site.

Please contact me if you have any questions reading site policy wrt companies/advertising etc. or reply to the post. If I don't hear anything shortly I'll unpublish this post.

Ironically this has already been indexed by google, I came across it today doing a search for 'supervised leaning and biomarkers'. I didn't see biomind.com there...


Commercial posts

Yes, I am. That policy sounds fair enough..

Doug Bodde