This forum post is just to see how others in the field of bioinformatics think about an issue that in my opinion is rather important. By "software fit for publication" I mean software that is rushed out just for the paper then not actively mantained, mantained poorly, or down right abandoned.
At least in the field of microarrays I've seen a couple examples of such software. Most falls under the "poorly mantained" category, for example CNAG (Copy Number Analyzer for GeneChip) that despite having an interesting DNA copy number estimation algorithm, from the software point of view it is terrible (hint: when a list of references is empty the program should output an error, not crash horribly). I won't even get started on CARAT", an algorithm that has been published without a working implementation...
What experiences have you had in the field of bioinformatics? Did you find software that was just made for publication?


Software, quality and publication
Like Neil, I am a hack. I used to write code for a living, but programming was a means to an end. That said, I wish the quality of code written by scientists was better. There are several reasons for this. Maintainability, longevity, usability come to mind. In this day and age, with more "non-experts" using software, usability is a huge deal, and developing an algorithm that only 2 people can understand and use kind of defeats the purpose. Would it be too hard to start teaching bioinformaticians, computational chemists, etc, some of the principles of software design? I think it should become mandatory.
My Blog: http://mndoci.com
an academic problem
There's a common thread to these comments - that this is very much a problem of the academic research environment.
It's important to realise that until very recently, most bioinformaticians have been self-taught amateur programmers, not software engineers. I spend my entire working day writing Perl, but I still think of myself as a biologist first and a (very amateur) programmer second. This is not necessarily a bad thing. Researchers write code to get results, not for the sake of code.
You might even argue that under these conditions, the requirement to make code publicly available isn't always helpful. It's more a case of "well you can have my code if you really want it, but I warn you it may be unusable by anyone but me". I think that code should be available and I think that the open-source philosophy - i.e. by sharing, others will improve the code - is something that many researchers could learn from. Not just in the area of programming but in what we're calling open science. And of course, I think that anyone who programs should strive to improve their coding skills and that ideally, maintenance should be an important aspect of their work. But we have to live with the reality - that much code is written by amateurs for a particular project and that in academia, methods are seen only as a means to results/papers.
As to experiences - I must have downloaded, installed and tried out hundreds if not thousands of packages and visited numerous online resources in the past few years. They've ranged from rather good to unusable. You learn to live with it after a while. Perhaps there could be better standards for published work. If results are found to be wrong a paper is retracted - is it too harsh to suggest that if I download published software and find it unusable, the editors should pull the paper? At the very least you should contact the authors and explain, politely, that things are not working for you. I've done this once or twice and received grateful replies when bugs have been fixed - now that's open source warm-and-fuzziness.
If anything, I object more to the numerous publications we now see which are nothing more than a program - no results, no biological insight, often even no practical application. Seriously, how many more minor improvements to multiple alignment algorithms can we take?
...and important issues in
...and important issues in the field of bioinformatics deserve front page attention.
This issue in various forms has come up before on nodalpoint and elsewhere.
This is research software, it will always be more important to share the idea rather than make usable software. As to the problem of publishing working implementations or open sourcing implementations, a couple of related articles from 2002 on Does Publicly Funded Research Have to Result in Open Source Code? may add some perspective on this issue.
I agree on the "sharing the
I agree on the "sharing the idea", but sometimes at least a proof of concept code should be added, at least when talking about algorithms. Then anyone would be free to rewrite/change/improve using his/her preferred programming language.
My laboratory and the people we work with have the good habit of publishing the source of the algorithms we developed/tested (something strongly wanted by my boss). It's a shame not everyone does that...
Unfortunate, but inevitable
I think this kind of thing is unfortunately common but inevitable when the primary output of academic scientists is papers not programs. Code obviously needs to work at publication time but how could you ensure code is maintained thereafter?
I guess, many algoritms or
I guess, many algoritms or programs are designed by post-doc students. In this case the most important thing is obviously to publish a paper about their work in an academic paper. Once the postdoc is over, the student leaves the lab and the software remains here without being matained (no time, no documentation, not local skills, no more interest, deprecated langage of programing...). That's why any project should be installed on web site such as sourceforge.
Pierre
I agree with you Pierre. And
I agree with you Pierre. And one more thing - we should'nt forget about development of such programs and algoritms. I think the good financial backing will do that.