Computer scientists must find this kind of report hilarious: Gene name errors can be introduced inadvertently when using Excel in bioinformatics. It turns out that Excel's automatic data type conversion changes Riken identifiers of the form nnnnnnnEnn (where n denotes a digit) into floating point values e.g. "2310009E13" is converted to "2.31E+13". Before everyone starts laughing at poor Excel users, note that this is not an Excel specific problem. The real problem is that most biological file formats do not have schemas. Update: The Register has picked up the story: Excel ate my DNA.


Comments
Schemas
It's not really a schema problem at this level. I would say that schemas, like most ontologies, would wind up using serial numbers as categorical variables, which is more efficient, but still leaves number handling. The essential problem is using an inappropriate tool -- and before everyone screams that not all of us have programming skills, let me offer this analogy: would you use a bucket for liquid handling because you can't be bothered to learn to use a pipette?
Absolutely
"The essential problem is using an inappropriate tool"
Also my take on this daft story. In the old days, I used to import data into Excel and I too noticed that it will automatically reformat certain data into, e.g. dates. I believe this can be stopped by setting all cell types to "text". Or better still, by not using spreadsheets when not appropriate. Just because it's in rows and columns doesn't automatically mean "spreadsheet".
Suggestions?
It's probably not obvious to most people, especially the researchers mentioned in the article, that a spreadsheet isn't necessarily the best place for microarray data analysis. In all this commentary though, I haven't seen any mentions of suitable replacements. What do the experts use for this kind of task?
tools
I can understand the inclination to use a spreadsheet. What baffles me is that people never twig to other solutions. It becomes obvious very quickly that a spreadsheet is unsuitable for processing large amounts of data, but many people seem remarkably incurious to alternatives. The "Oh, it's computer stuff" mentality.
Personally, I use R through Emacs. Granted it's an expert tool; but then again, microarray data analysis is an expert's job. The stats and concepts behind a reasonably comprehensive analysis are not trivial, and they do require some overhead. Factor in the multitude of libraries available for microarray analysis through R, and it becomes a weapon of choice. Unless, of course, you want to write everything yourself (code up PCA recently, anyone?). And, of course, it's OSS!
Re:BASE. I like the idea. However, I feel it's only useful if the LIMS component is used. A simpler SQL schema would suffice otherwise. Besides, data storage and analysis are beasts of different stripes (in this case, anyway...).
Not sure about the experts...
Personally, I wouldn't go near microarray data analysis unless I was forced at gun point :-)
But to me, "rows and columns" says "database", not "spreadsheet". As Greg is always telling me, once your data is stored to your satisfaction, methods of analysis suggest themselves. You can do a lot with the basics of SQL and even more with the scripting language of your choice and its DB binding functions.
There are also heaps of free open-source microarray solutions out there, e.g. BASE and software from TIGR, The Institute for Systems Biology and the Eisen lab, to name a few.