May 29, 2007

Testing for Blackbox re-Engineering or The lost source code...

Working on a system for chemistry computation, we are integrating software components written in various languages (FORTRAN, C++, Shell, Python) using Python. As the new server is 64 bits (instead of 32 for the existing ones), we have to recompile a lot of stuff, but unfortunately the source code of some of the components have been lost.

We are re-implementing such sourceless components in Python and our specification is reduced to what the component is supposed to do based upon the name of the component and the command line arguments depicted by --help. In addition we can run the component on a know input (a NetCDF file) in order to compare our implementation to its expected output (a csv like file).

One of these component produces a file containing numerical data as four strings separated by spaces. Each number is written using the exponent notation. Our problem was to compare the string version of numbers produced by a Fortran program (like 0.35315E+02) to a string version of numbers produced by a Python one (like 3.5315E+01). These files (2200 lines each) look like:


== tests/computed.ps.data ==
-7.5000E-01 4.8644E+01 3.5315E+01
-6.5000E-01 4.8644E+01 3.5333E+01
-5.5000E-01 4.8644E+01 3.5291E+01

== tests/expected.ps.data ==
-0.75000E+00 0.48644E+02 0.35315E+02
-0.65000E+00 0.48644E+02 0.35333E+02
-0.55000E+00 0.48644E+02 0.35291E+02


In fine, it turned to be quite simple. Files are compared line by line and, lines are compared value by value. Even if the format of values if not the same, once translated to float, values can be directly compared.


expected = open('tests/expected.ps.data')
computed = open('tests/computed.ps.data')

for e_line, c_line in zip(expected, computed):
for e, c in zip(e_line.split(), c_line.split()):
assert float(e) == float(c), \
"expected %s differs from computed %s" % (e, c)


Python, thanks for your supporting my lazyness...