Remember the mid-nineties when they told all of us computer-enabled folks that
jobs would be raining on us as soon as we got out of university ?
In some fields it is still true. Notably, there is a significant subset of
bioinformatics that ought to be (and in all fairness, often is) consulting
work and concerns itself with building the kind of operational application
every software designer in the nineties developed for a private company with
too much money on its hands : integrating the "old" database format with the
"new", converting image files to the decades-old printer's driver format,
converting between VisioCalc and MS Excel and so on.
What gets published in bioinformatics is generally a) research work leading to
new methods (i.e. algorithms) that involve proofs and new mathematical
abstractions for common problems, or b) theoretical molecular biology advances
where bioinformatics is only the 'methods' section. Every once in a while, the
methods themselves get their own paper, which is generally abysmal. The
"cooler" folks put their scripts on their blogs or homepages.
In biology, there is a lot of information to memorize, which is why biology is
always so hard in high school. As academics are no better at rote remembering
than teenagers, but much less likely to take it upon themselves to do this
rote remembering, the need for organized information in biology appeared,
along with them came databases and Web services playing the role of giant
bio-repositories. Then, researchers got tired of looking through this
information and linking it together themselves and tried to get computers to
think in their place. So was born the field of systems biology.
We have on one hand, an economical need for software jobs requiring little
innovation and a lot of specifics. On another, a social need for thinking
machines that do your job. Combine both, and a plethora of applications for
individual needs come into existence. As everything in academia, the rewards
are supposed to be exposure (that is, publication) rather than paychecks. How
do we reward software writers who are in it for the resume ? Enter BMC
Moutselos et al. bring us academics a brand new tool for converting XML into
XML, with a tweak. The tool is written in Java with Java's XML and DOM
The authors identified a need for KEGG database format converting because
"existing tools" (KEGG2SBML) only run under Unix (thus we learn that Python,
Perl, Qt and Graphviz require Unix to run, which must come as a surprise to
everyone involved in their development and use).
To sum up twenty pages of absurdly long technical writing, the author's
program merges (Unix cat) database files obtained from the Internet (Unix
wget), changes the XML format into another (XSLT), replaces (Unix sed) IDs
with equivalents that can be stored in a database (sqlite) and eliminate
duplicate entries (a bit of creativity with sed, or two lines worth of
Python). Unsurprisingly, they even cite a bash script that does exactly that.
The contribution from Moutselos et al. ? Their program isn't a script, it's a
full length Java program that needs a server interface to run in a browser
(unlike, say, XSLT).
Knowing Windows users, especially academics, they probably have a point :
opening a text editor is hard, as is executing a file. Ready-made scripts,
therefore, lack the user interface necessary to the target users.
obviously too easy to code and summarize in less than twenty pages. It doesn't
even require people to download a JRE. Java in this case is clearly
overdesigning (or a result of computer science courses pushing Java so hard
the pupils like it when it hurts).
As a conclusion, this is yet another instance of inadequate software earning
people citations (and fame). As variation is an important component of Nature,
next up is adequate contribution from biologists in completely wet
Moutselos K, Kanaris I, Chatziioannou A, Maglogiannis I, Kolisis FN.
BMC Bioinformatics, doi:10.1186/1471-2105-10-324 (2009)