Wednesday, 15 August 2007

Storing binary information in XML

Sometimes we need to store some binary information within XML format, which is textual.
Designing schemata for your data you should avoid using binary representations if possible, but there are some cases when you absolutely need it. For example, what if you want to store a picture within your XML?

The problem is that binary data may contain special characters which are prohibited for use in XML.


The solution to this problem is to recode your binary data into a textual form using, for example, Base64 or UUEncode algorithms. The following fragment of Java code demonstrates how to convert an array of bytes (a chunk of binary data) into a properly formed String using Base64 algorithm.

byte[] data = getMyBinaryData();
String encodedData = new sun.misc.BASE64Encoder().encode(data);

You then can use encodedData as a value for some XML element.


The reverse process of decoding a String into an array of bytes can be done using the following Java code:

String encodedData = getEncodedData();
byte[] data = new sun.misc.BASE64Decoder().decodeBuffer(encodedData);


UUEncode/UUDecode algorithms are also available in sun.misc package.
This approach will, unfortunately, fail in an Applet, as the access to sun. packages is prohibited for applets. You have to use a different implementation of a transcoding algorithm in such cases. One of the alternatives is: http://iharder.sourceforge.net/current/java/xmlizable/

Wednesday, 1 August 2007

Computing MD5 digest (checksum) in Java

MD5 digests are useful to track file (or, in fact, data) modifications. It is very unlikely that two files will have the same MD5 digests. It is extremely unlikely that a minor modification to a file will preserve its MD5 digest.

I use MD5 digests to decide whether a precomputed result is still valid. For example, you simulate a system of differential equations defined in file "system.xml", and you save the result to a file "result.xml". But what if the user modifies the original system of equations? The computed result will not be correct anymore. The solution is to store the MD5 digest of the original system of equations with your result. When result is selected for display, you check whether the digest of your current system of equations matches the one stored with the result, and if they don't match, the result is not valid.

There is a number of ways to compute MD5 digests. You can find hundreds of implementations on the web. I stick to the one which comes with standard Java Runtime Environment distribution from Sun Microsystems. Below is my function which returns a String representation of MD5 digest for an arbitrary file:

public String checksum(File file) {
try {
InputStream fin = new FileInputStream(file);
java.security.MessageDigest md5er =
MessageDigest.getInstance("MD5");

byte[] buffer = new byte[1024];
int read;
do {
read = fin.read(buffer);
if (read > 0)
md5er.update(buffer, 0, read);
} while (read != -1);
fin.close();
byte[] digest = md5er.digest();
if (digest == null)
return null;
String strDigest = "0x";
for (int i = 0; i < digest.length; i++) {
strDigest += Integer.toString((digest[i] & 0xff)
+ 0x100, 16).substring(1).toUpperCase();
}
return strDigest;
} catch (Exception e) {
return null;
}
}

And if you are using Eclipse RCP and want to compute MD5 digest for an IFile object, here you are:

public String checksum(IFile file) {
try {
InputStream fin = file.getContents(true);
java.security.MessageDigest md5er =
MessageDigest.getInstance("MD5");

byte[] buffer = new byte[1024];
int read;
do {
read = fin.read(buffer);
if (read > 0)
md5er.update(buffer, 0, read);
} while (read != -1);
fin.close();
byte[] digest = md5er.digest();
if (digest == null)
return null;
String strDigest = "0x";
for (int i = 0; i < digest.length; i++) {
strDigest += Integer.toString((digest[i] & 0xff)
+ 0x100, 16).substring(1).toUpperCase();
}
return strDigest;
} catch (Exception e) {
return null;
}
}