Wednesday, February 9, 2011

Posting non-ASCII characters in web forms

I just hit this issue. You write some non-ASCII text (e.g. in Cyrillic) in a form and when submitted the text appears garbled on the server side.

It turns out this is a well known glitch in web development especially when done in Java. The main reasons for the mess are
  1. Web browsers do not specify the encoding of posted data
  2. Java Servlet specification says that default request encoding should be ISO-8859-1 in contrast to UTF-8 which is universally used nowadays
You can find a good description of this issue here HTTP Form Character Sets and Related Problems.

Tomcat FAQ recommends creating a filter to set the request encoding.

But when using Wicket there is no such issue as they fix the request encoding to UTF-8 as described in How to change the character encoding.

Unfortunately Sling guys did it in another way (SLING-508) which requires me to put a hidden input named _charset_ with the value UTF-8 in all my forms. So they have adopted the ugly IE hack. :(
This is also described in Sling documentation.

I wish Sling had a way to set this to UTF-8 in one place and get rid of it.

Sometimes web development is so frustrating.

Update: Mar 2nd
After picking up this discussion on Sling mailing list the guys there decided after all to make this configurable, see SLING-1998.
Great! My forms now work without the _charset_ hack.

Monday, November 29, 2010

Sling gotcha

A content-driven application normally implements the basic CRUD operations. Using Sling this should be done relatively easy.
As part of the Read operation you normally have to implement some listing of available resources. Being familiar with JCR API  it would be natural to do it with this JavaScript code in your ESP page

var iter = currentNode.getNodes();
while (iter.hasNext()) {
    var childNode = iter.nextNode();
    ...


Yes, but no! The result is a big fat exception

org.mozilla.javascript.EcmaError: TypeError: hasNext is not a function, it is org.mozilla.javascript.Undefined. (/apps/catalog/html.esp#11)


Hm! currentNode is a JCR Node so getNodes() should return a javax.jcr.NodeIterator which is a java.util.Iterator. So the code looks correct but somehow iter here is not a Java object at all.

I struggled with this several hours. I was very puzzled to see similar code in espblog sample working just fine. The only difference is that espblog uses QueryResult.getNodes() which still returns NodeIterator.

Finally I found the answer in this thread. It turns out Sling wraps Node.getNodes() to return a JavaScript object which has one property for each child node. Probably the idea was to easily iterate over that object with a for-each loop

for each (var childNode in  currentNode.getNodes()) {
    ...


Another solution is to use the property name instead of the getter method

var iter = currentNode.nodes;
while (iter.hasNext()) {
    var childNode = iter.nextNode();
    ...


This seems to circumvent the wrapping done by Sling.

This example illustrates a cute feature of Rhino which allows easy access of JavaBeans properties.

Yes, server side JavaScript can be fun.

Monday, November 1, 2010

Dynamic Class Loading in OSGi

Sometimes you need to load an arbitrary class which you don't know in advance. In a normal Java application you would do that with Class.forName(String className). But in OSGi this will work only if your bundle imports explicitly the package of the class that you wish to load. In OSGi each bundle has its own class loader. OSGi services will not help if the class you wish to load is not exposed as a service, which is more likely the case.
It would be great if at run-time you could find the bundle exporting the package of the desired class and ask that bundle to load the class using its own class loader.
It turns out this is possible via PackageAdmin service.

org.osgi.service.packageadmin.PackageAdmin packageAdmin;
...
Class clazz = packageAdmin.getExportedPackage(packageName)
  .getExportingBundle().loadClass(className);

Here packageName is the package name and className is the full class name.
This way you can load any class from any package exported by any active bundle and still your bundle is independent from the bundle providing the class. This could be very useful when implementing some generic functionality like object persistence.

Wednesday, September 1, 2010

Ant & Xerces

Have you seen errors like this when parsing XML from Ant tasks?

org.xml.sax.SAXParseException: Invalid encoding name "Cp1252".
  at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
  at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
  at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:208)
  ...

The problem is in the encoding specified in the xml header
<?xml version="1.0" encoding="Cp1252"?>
This often appears in xml files generated by Java programs.

Bug 4665105 is filed for this issue but it is closed as "Not a defect".

It turns out that the same XML file is parsed fine by the JRE libraries, but ant uses its own xml libraries - xml-apis.jar and xercesImpl.jar found under ant/lib. If I delete these two files, xml parsing in ant works fine.
Luckily this issue seems solved in latest Ant 1.8.1. From the release notes:
* Ant no longer ships with Apache Xerces-J or the XML APIs but relies
on the Java runtime to provide a parser and matching API versions.

A workaround.
When parsing the xml file instead of
DocumentBuilder.parse(file)
use
DocumentBuilder.parse(new InputSource(new FileReader(file)))
This tells the parser to ignore the encoding specified in the xml header.

Friday, August 27, 2010

Backup on Linux

One of the few things for I still have to switch to Windows is backup.
As I wrote before, for that job I have put up a small bat file that relies on robocopy to do the job.
Finally I came up with a similar shell script for Linux that does the same job using rsync.
Now the backup script on Linux reads the same list file as the backup script on Windows (my NTFS drives are mounted in Linux).
It looks like this

#!/bin/sh


TARGET=/media/SAMSUNG/BACKUP
SYNC='rsync -rptgovF --delete --delete-excluded'


# Backup Windows drives
LIST=/home/peter/Documents/Documents/Backup/backup.lst
cat $LIST | fromdos | sed s/\\\\/\\//g  | while read -r line 
do
echo BACKUP $line
case "$line" in
C:*) $SYNC "/media/Windows 7${line#C:}/" "$TARGET${line#C:}";;
D:*) $SYNC "/media/Data${line#D:}/" "$TARGET${line#D:}";;
esac
done


# Backup Linux drive
echo BACKUP /home/peter
$SYNC /home/peter/ "$TARGET/home/peter"


The F option tells rsync to look for file .rsync-filter in each directory. If it is present it specifies files to exclude from that directory.
In a typical UNIX way several commands are chained via pipes.
fromdos is used to convert the text file from Windows to UNIX format, i.e. strip CR chars as they would cause problems later on.
sed is used to replace all back slashes to forward slashes.
${line#C:} and {line#D:} macros produce the path without the drive.

UNIX shell scripting proved once again to be very powerful.

Monday, August 23, 2010

Sling

Have been checking Sling recently. Basically yet another Java web framework, but this one has a rather interesting approach - it is content oriented. It is based on OSGi, JCR, REST, server side scripting, etc. The recently popular principle convention over configuration is heavily used.
Found this brief intro to Sling.
Also this cheat sheet gives a condensed overview over the Sling way.

Although both projects Sling and Jackrabbit are hosted at Apache, they are led by a company called Day.
It is interesting that Roy T Fielding is a chief scientists at Day.

Sunday, August 15, 2010

Cyrillic in Wine

As windows is still much more popular than Linux there are some Windows applications that still do not have a Linux version. In these cases WineHQ comes to rescue. It allows installing and running Windows applications directly on Linux.
One such applications that I use is AceMoney Lite. But I encountered one problem - the application does not show properly text in Cyrillic like text in Bulgarian or Russian.
The solution that works for me is to set the LANG environment variable to Bulgarian.
After running the installation of a Windows application using Wine, it appears in Applications > Wine > Programs menu. So it can be launched just like an ordinary Linux application. So I opened the properties of AcemoneyLite launcher. The command there looked like this
env WINEPREFIX="/home/peter/.wine" wine "C:\Program Files\AceMoney\AceMoney.exe" 
I inserted LANG=bg_BG.UTF-8 before wine and now the cyrillic works just fine in AceMoney Lite.

I found the same trick works also for Picasa. The Linux version of Picasa actually bundles Wine and runs on it. So there prepend env LANG=bg_BG.UTF-8 to the command so it looks something like this
env LANG=bg_BG.UTF-8 /opt/google/picasa/3.0/bin/picasa

...

UPDATE 2 Feb 2012   
Recently this stopped working in Ubuntu 11.10.
After several attempts I found that if I use LC_ALL=bg_BG.UTF-8 instead of LANG=bg_BG.UTF-8, it works again.