Donnerstag, 25. November 2010

Towards fast scientific python

Python seems to come of age in its role as an universal language for scientific computing. It already has a good standing in the computational neuroscience community. The Neural Ensemble project gathers some initiatives that use Python as the primary language for neuronal simulation and data analysis. Large simulator projects like Nest and NEURON adopted Python as their primary command language already a few years ago. The core of those simulators is still written in C/C++, which delivers good performance, but leads to interfacing issues with the command language. Those issues can be addressed by clever software design, but a pure-python implementation of a simulator is much more convenient regarding maintainability and extendibility. The problem is, that pure Python will lag behind the speed of compiled languages like C/C++ by an order of magnitude.

The Brian simulator is designed to be a simulator written entirely in python. To cope with the speed of C/C++-based simulators, Brian can generate compiled code from the python network model. This code can also be compiled for graphics processors (GPUs), which promise high speedups for computational problems that can be parallelized efficiently. The Brian developers describe how to do just that in their article on vectorized algorithms for neuronal simulations, which is one of my current favorite papers.

Today, and that was the initial motivation for this post, I came across the announcement for the new version of Theano, a compiler for evaluation mathematical expressions on CPUs and GPUs. I haven't tried it out yet, but it looks definitely promising. But the really interesting fact is that there is vivid development toward making Python not only an ubiquitous language for scientific computing (a goal which has largely been achieved already), but also an alternative in terms of performance to established software packages.

Without licence fees, and fully open source.

Dienstag, 23. November 2010

PNAS Editorial: Impact Factor corrupts science

The (ab)use of the impact factor to evaluate the scientific merit of individuals corrupts the way how scientists publish their findings, say Eve Marder, Helmut Kettenmann and Sten Grillner in their recent editorial to PNAS. Moreover, they state that the current practice to measure scientific achievement shifts the choice of research topic to potentially "great discoveries" (read: discoveries which will make it to Nature), although the most important findings in science were made serendipitously, and hence the eventual contribution to science could not be estimated beforehand.

However, in my opinion, the impact factor is only the tip of the iceberg. Even worse is the implicit role of author sequence on a paper. In life sciences, the first author typically is the one who did the work, and the last author is the supervisor or lab head. All authors in between are perceived to be "minor contributors". Of course, this rule leads to all kinds of problems. Fierce battles are fought over author sequence, since for PhD students, only first-author papers count, while for group leaders last-author papers are vital to demonstrate their scientific contribution.

But there can only be one author first, and one author last. Of course, there are "equal contribution" asterisks all over the place, but are they actually been taken into account? After all, how much sense does it make to refer to the deprecated, intransparent and inflexible rule of author sequence to indicate contribution? For example, it is completely unclear how to handle interdisciplinary collaborations, which involve typically at least two PhD students and two group leaders.

A completely fair and unbiased way to state individual contributions to a scientific publication would be to list the authors in alphabetical order and have an "Author contribution" section in the paper, where the individual contributions are described in detail. In fact, this is how many disciplines handle it, for example in social sciences.