My Struggle: July 2007

Monday, 30 July 2007

Eclipse JRE Configuration to work with Mac OS X

Just a short tip for today. When you are configuring your Eclipse framework (Preferences/Java/Compiler) or a particular Java project, select the compiler compliance level to 5.0, as JDK 6.0 is not yet available on Mac OS X (it is still in a beta state, so everything may change soon). This ensures that your final product will run on Mac OS X when you export it in the end.

Friday, 27 July 2007

JNI on Mac OS X

JNI (Java Native Interface) is a way to use native (read "C") functions in your Java program. You can read more details about this approach at Sun Java Web Site.

A particular problem I was struggling with is how the libraries are loaded when running on Mac OS X. The standard naming convention for using the dynamic libraries in Mac OS X is to use "lib" prefix, and ".dylib" as the suffix. Java Virtual Machine, however, does not follow this convention when trying to load a library, and requires you to use ".jnilib" suffix (and "lib" prefix) instead.

The typical loadLibrary call:
System.loadLibrary("mine");
will search for libmine.jnilib in your java.library.path; and if you have named your library libmine.dylib, it will produce you an UnsatisfiedLinkError exception.

Thursday, 19 July 2007

Eclipse SWT: Disabling selection event in a Table

Sometimes we do not want users to be able to select cells or rows in the tables, but the standard styles for SWT Table widget do not give a NOSELECTION option.

This, however, can be easily achieved by masking the selection event, Just use the following piece of code for creating your tables:

final Table table = new Table(group, SWT.SINGLE | SWT.BORDER);
table.addListener(SWT.EraseItem, new Listener() {
    public void handleEvent(Event event) {
        if((event.detail & SWT.SELECTED) != 0 ){
            event.detail &= ~SWT.SELECTED;
        }
    }
});

Tuesday, 17 July 2007

Eclipse RCP: NullPointerException when creating the SaveAs dialog

When using standard editors from org.eclipse.ui.ide, the default Save As action usually fails, throwing a NullPointerException. This happens when the program tries to create a Save As Dialog. A particular exception usually looks something like:

java.lang.NullPointerException
    at org.eclipse.ui.dialogs.SaveAsDialog.createContents(SaveAsDialog.java:102)
    at org.eclipse.jface.window.Window.create(Window.java:426)
    at org.eclipse.jface.dialogs.Dialog.create(Dialog.java:1124)
    at org.eclipse.ui.texteditor.AbstractDecoratedTextEditor.performSaveAs(AbstractDecoratedTextEditor.java:1580)
    at org.eclipse.ui.editors.text.TextEditor.performSaveAs(TextEditor.java:115)
    at org.eclipse.ui.texteditor.AbstractTextEditor.doSaveAs(AbstractTextEditor.java:3538)
    at ....MultiPageEditor.doSaveAs(MultiPageEditor.java:196)
    ....

This, actually, happens when the dialog tries to set the banner image for the dialog, but the image is not available. The default IDEWorkbenchAdvisor registers such an image for the default IDE, so must your application. The solution is the following:
1) Make sure you add org.eclipse.ui.ide to the dependencies of your plugin
2) In your WorkbenchAdvisor class add the following code to the initialise() method implementation:

Bundle ideBundle = Platform.getBundle(IDEWorkbenchPlugin.IDE_WORKBENCH);
    final String banner = "/icons/full/wizban/saveas_wiz.png";
    URL url = ideBundle.getEntry(banner);
    ImageDescriptor desc = ImageDescriptor.createFromURL(url);
    configurer.declareImage(IDEInternalWorkbenchImages.IMG_DLGBAN_SAVEAS_DLG, desc, true);

This will declare a standard banner image from the org.eclipse.ui.ide plugin as the proper banner image for SaveAs dialog in the same way as it is done in the IDEWorkbenchAdvisor. Running your application again demonstrates the correct behaviour. If you are still getting a NullPointerException, check whether you declared plugin dependencies, and whether you included org.eclipse.ui.ide to your target platform.

Wednesday, 11 July 2007

A follow-up to SBML-related post

I contacted the SBML team to discuss the problems I wrote about last week. And I really appreciate their prompt response and collaboration. Especially, I want to thank Michael Hucka, who clarified a lot of things to me.

First of all, as Michael pointed out, XML Schema standard requires the targetNamespace to be an URI, which does not mean this should resolve to a real web resource. So my reasoning about what schemata you obtain when trying to fetch the namespace URI does not hold water.

More importantly, is what are the target namespaces for different SBML standards:
Level 2 Version 1 http://www.sbml.org/sbml/level2
Level 2 Version 2 http://www.sbml.org/sbml/level2/version2
Level 2 Version 3 http://www.sbml.org/sbml/level2/version3

So, libSBML version 2.3.4, indeed, supports Level 2 Version 1 specification only, and it crashes when other version is processed. The SBML team is working on fixes to this problem, and a pre-release version of libSBML 3.0.0 works with the newer specification perfectly fine. You code, however, might need some changes due to architecture redesign in libSBML.

Michael also agreed that it won't hurt to make the web server serve different versions of the SBML schema to different URLs, so the schemata are now served as I (and JAXB) expected.

Summary:

Check the schema twice, and make sure that you are really using the proper schema.
If you need to support SBML L2 V 2 or 3 - use libSBML 3.0.0
Thanks to everyone from the SBML team for your help

Saturday, 7 July 2007

SBML standard and problems with libsbml

I have spent several days fighting a problem with my Java program which uses libsbml indirectly through the Java Native Interface. Every once in a while the whole java virtual machine would crash while reading an SBML model. So, I performed a detailed investigation of what is wrong about libsbml and the standard itself.

Let's begin with an SBML schema. My program was using SBML Level 2 Version 2 schema, and the most recent one is SBML Level 2 Version 3. I specifically checked if this problem persists, and my comments are still valid. So, the key point. The schema defines a specific XML namespace for SBML, and for SBML Level 2 Version 3 it is:

targetNamespace="http://www.sbml.org/sbml/level2/version3"

However, libsbml, being a de facto standard library for SBML support crashes badly when processing a model compliant to the standard. I begin my model with the following:

<?xml version="1.0" encoding="UTF-8"?> <sbml xmlns="http://www.sbml.org/sbml/level2/version3" level="2" version="3">

Which is exactly what the schema prescribes. Next, I am trying to read this model with readSBML example program supplied with libsbml. Guess what happens?

$ ./readSBML model1.xml Segmentation fault

Too bad, as this means that the problem is not with my code, but with libsbml itself. Running readSBML with valgrind demonstrates obvious memory management errors in libsbml's XML parser. The problem persists with both expat and xerces-c implementations.

But why the majority of software tools work fine with this implementation of XML parser? I went on and took a look on what Copasi produces when exporting a model to SBML. Surprisingly, the model produced with Copasi does not comply to the SBML standard. And the root element of the produced model uses the wrong namespace:

http://www.sbml.org/sbml/level2

I tried to feed that (non-compliant) model to readSBML, and everything went fine. I even tried to valgrind readSBML with the new model and look:

==15583== ERROR SUMMARY: 0 errors from 0 contexts ==15583== All heap blocks were freed -- no leaks are possible.

Unbelievable! But is it a problem with Version 3 (and Version 2) schema support or with the whole SBML standard? Just type that namespace http://www.sbml.org/sbml/level2 in your browser, and it will show you the proper schema. The schema, however, clearly specifies:

targetNamespace="http://www.sbml.org/sbml/level2/version3"

So, the standard itself is inconsistent, and standard support is badly broken.

What can we do at the moment? Well, the only way around is to break the standard and continue using the wrong namespace. Does it require any attention from the SBML consortium? I think so. What can I do to write an SBML compliant software tool? Write my own SBML support library. Will a properly SBML compliant tool read the vast majority of published models? I doubt it.

What have I done to fix my problem? I produced a little routine which converts proper models to the broken form and then reads them with libsbml parser.

Friday, 6 July 2007

Valgrind it

A student was executing his tasks on the same machines as mine, and over the night his task allocated almost all the available memory causing incredible amount of swapping and stopping any progress of all the tasks (not to mention that he started the task by directly logging in to a computer rather than through a queue). Here is a very important advice, and if you follow it you can avoid upsetting many people in a shared computing resources environment: always check your program for memory management errors and memory leaks. It is pretty easy to do with a modern Linux system, just use a tool called valgrind.

I recommend to start in a standard way, and read valgrind manual page, just type
$ man valgrind
in your terminal.

Going beyond referring users to a manual page, here is a little example of how this tool can be used. Imagine starting with the following C program:

#include <stdio.h> #include <string.h> #include <stdlib.h> int main(int argc, char** argv) { char* string = (char*) malloc(sizeof(char)*strlen(argv[0])); strcpy(string,argv[0]); printf("My name is %s\n",string); return 0; }

Well done! You have won 5 points if you noticed all the three problems. Let's imagine you haven't.
First, you compile your program with debugging options:
$ gcc -ggdb ex1.c -o ex1
No compilation time errors, no warnings. If you run the program, the result usually depends on how lucky you are. I am lucky, and the result is exactly what I expected:

$ ./ex1
My name is ./ex1

But something can be wrong about this program. Let's use valgrind to check. Start with the default checking, and run:
$ valgrind ./ex1
The tool detects 2 errors and a memory leak of 3 bytes. The first error is:

==10523== Invalid write of size 1 ==10523== at 0x4006A2C: strcpy (mc_replace_strmem.c:272) ==10523== by 0x8048406: main (ex1.c:7) ==10523== Address 0x401F02B is 0 bytes after a block of size 3 alloc'd ==10523== at 0x4005400: malloc (vg_replace_malloc.c:149) ==10523== by 0x80483EF: main (ex1.c:6)

So, at line 7 of my program I am writing 1 byte beyond the allocated memory block. Did you read Kernighan-Ritchie book? Every string has to be ended by a zero (\0) character. This character is not counted when computing the length of a string, but is copied when copying a string.
The second error is:

==10523== Invalid read of size 1 ==10523== at 0x4006283: strlen (mc_replace_strmem.c:246) ==10523== by 0xB2A0C1: vfprintf (in /lib/libc-2.5.so) ==10523== by 0xB2F602: printf (in /lib/libc-2.5.so) ==10523== by 0x8048419: main (ex1.c:8) ==10523== Address 0x401F02B is 0 bytes after a block of size 3 alloc'd ==10523== at 0x4005400: malloc (vg_replace_malloc.c:149) ==10523== by 0x80483EF: main (ex1.c:6)

this is directly caused by the previous one. That 1 byte written outside of the memory block is now read when printing the string out. So, I should fix my program by extending the size of the allocated memory block. I create a new program called ex2.c:

#include <stdio.h> #include <string.h> #include <stdlib.h> int main(int argc, char** argv) { char* string = (char*) malloc(sizeof(char)*(strlen(argv[0])+1)); strcpy(string,argv[0]); printf("My name is %s\n",string); return 0; }

See? I added 1 symbol to the end of my string. Compile and run the program in exactly the same way.
Running with valgrind produces the following output:

==10812== ERROR SUMMARY: 0 errors from 0 contexts

Congratulations, we fixed both of the errors. But there is a little more to this program:

==10812== LEAK SUMMARY: ==10812== definitely lost: 6 bytes in 1 blocks. ==10812== possibly lost: 0 bytes in 0 blocks. ==10812== still reachable: 0 bytes in 0 blocks. ==10812== suppressed: 0 bytes in 0 blocks. ==10812== Use --leak-check=full to see details of leaked memory.

Let's follow the advice, and run leak detection tool:
$ valgrind --leak-check=full ./ex2
It produces the following report:

==10932== 6 bytes in 1 blocks are definitely lost in loss record 1 of 1 ==10932== at 0x4005400: malloc (vg_replace_malloc.c:149) ==10932== by 0x80483F2: main (ex2.c:6)

This means that 6 bytes allocated with our malloc in line 6 were never released. Adding
free(string);
to the end of the program (and naming it ex3.c) gives us an ideal result:

==11013== ERROR SUMMARY: 0 errors from 0 contexts ==11013== All heap blocks were freed -- no leaks are possible.

Now you can be more sure that your program is correct.

A couple of closing remarks.

Remember, valgrind checks only that part of your program which was executed. If you have several branches (if-then-else) or subroutines, make sure that you have a decent set of test scenarios to cover it all. You might want to search Google for test coverage methodologies (just type man gcov if you do not know how to use Google)
Leaving such dynamic memory management errors causes segmentation faults. The most nasty ones appear when such a problem is located inside a shared library called from Java native method. The java virtual machine will crash leaving you frustrated and incapable of catching an exception or something to recover your program on the fly.
Just having a memory leak in a shared library will eventually waste all your virtual memory, and the system will need to be rebooted. Guess why Windows servers have to be rebooted every month or so?

Thursday, 5 July 2007

Starting the Blog

This time I am starting a real blog, devoted solely to the professional questions and tricky problems I have to deal with every day. Hopefully my enthusiasm will last a little longer this time.

Stochastic Processes and Distributed Computations

The most disturbing problem I am struggling with way too often is related to the way random numbers are generated in the majority of modern computers. As many of you know, those are not really random, and are generated with some kind of a numerical algorithm. Most of the random number generation algorithms rely on something called "random number generator seed", that is a parameter to such an algorithm to start a particular pseudo-random sequence. But beware - the majority of algorithms will spit out exactly the same random sequence when given the same seed. The traditional solution for this problem is to initialise the seed with the current time (when the program is started), something akin to:


gsl_rng_set(r,time(NULL));

if you are using C and GSL.

This, however, will not help if you are running distributed computations on a cluster, and you need to get an independent random sequence for each of the processes. The solution in this case is to generate a unique random seed for each of the processes. Adding the rank of the current job to the current time value works the best for me when using MPI. The solution looks like the following:
gsl_rng_set(r,time(NULL) + MPI::COMM_WORLD.Get_rank());
for C++, and

int rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
gsl_rng_set(r,time(NULL) + rank);

if you are using C.

If you are not using MPI, but rather run independent jobs through the execution queue, you cannot compute such a rank, but you can use hostname checksum / process id combination instead. It is extremely unlikely (I am speaking about the most UNIX architectures) that two instances of your program will be run on the same host within the same second and have exactly the same process id. If it is the case, you have to think a little, but you already have the general idea.

My Struggle