« April 2006 | Main | August 2006 »
June 22, 2006
When false==true : JVM bugs and (byte)code generation strategies
Update: the JRockit team gave outstanding service on this issue, and found and fixed the bug in their VM within 24 hours of it being reported. The code generated by AspectJ has also been tweaked so that it does not trigger the bug on VMs that do not have the patch.
We had a bug reported on the aspectj mailing list this week, in which a simple program returning a boolean value would return "true" when the code clearly said to return "false". Oh, and this only happens when running on the JRockit VM.
They say variety is the spice of life. This time last week I was just about to deliver a keynote at the SpringOne conference on the vision for the future directions of the Spring framework. One week on, and I've just finished an in-depth investigation into our bytecode generation strategies and their effect on the JRockit VM. A detailed write-up of the issue, my investigations into the situation, and the eventual resolution are documented in the AspectJ bug report that you can read here: 148007. The episode raises a number of interesting discussion points.
Firstly, it's interesting to see how different compilers generate code for the same or very similar Java program. In the bug report you'll see the difference between javac and the jdt compiler in compiling a very simple method (the jdt compiler varies again if you don't pass the -inlineJSR flag). There are many valid strategies for compiling program instructions into bytecode.
Secondly, VMs can have bugs in them (and so can compilers!). It sounds so obvious when stated, and you should certainly always be suspicious of your own code when something is not behaving as expected, but many people don't realize just how sophisticated (and hence complex) a modern JVM is.
Often the bugs are in corner cases. For example, I found and fixed a bug in the AspectJ compiler earlier this week in which an abstract generic aspect with a type parameter or paremeters that specify upper bounds, extended by an abstract generic sub-aspect that also specifies a type parameter with an upper bound and binds the super-aspect type parameter to it, could trigger a compiler message saying that the sub-aspect's type parameter did not satisfy the bounds of the super-aspect parameter, when in fact it did. The bug I'm discussing in this blog entry is significant because it's so near the mainline code path.
In essence, some perfectly legal and simple code strategies for compiling a method that looks like this:
private boolean invert() { try { return !isTrue(); } finally { SomeType.doSomething(); } } private boolean isTrue() { return true; }
cause the JRockit VM (1.4.2_08, as used in WLS 8, I haven't tried in the 1.5 JRockit VMs) to return "true" from the invert method - when clearly it should return "false". This is one of those horrible category of bugs that just silently do the wrong thing (if the JVM were to crash, it would be much better in this case). Note that the set of circumstances needed to reproduce the bug are still relatively narrow :- must return boolean not Boolean, the negation is needed, the finally or some similar block is also needed.
The situation is not quite as simple as the code I've shown above, because the real bug report involved an aspect. So the program code was:
// in some class private boolean invert() { return !isTrue(); } private boolean isTrue() { return true; } // compiled with aspect public aspect AnAspect { after() : execution(* invert()) {} }
which the weaver translates into a program that looks like the first example I gave (note that I'm discussing implementation details here, not language semantics).
So the third interesting point - and the key lesson for any of the many projects springing up that use class file transformers, asm, cglib, or any other bytecode transforming tool. You always want to generate code that looks as close as possible to the code that javac would generate in the same situation. Even if you are generating perfectly legal bytecodes, straying too far from the javac path can uncover bugs in VMs. This is what was happening in the case under discussion - AspectJ was generating a bytecode sequence (perfectly legal) that looked subtly different from what javac would do. Enough to trigger what seems to be some kind of stack corruption bug in JRockit. (Remember that the same code works perfectly on every other VM we've tried). If you're writing, say, an ORM tool, you need to know not just the SQL dialects and semantics of the various databases, but also all of the undocumented bugs and quirks that you need to work around. It's the same if you're doing any compilation or bytecode generation for the JVM - you need to know and work around all the little quirks and bugs in the various VMs. Coder beware!
So finally, let me finish up with a plug for AspectJ. If you need to advise types in some way, AspectJ offers a programming model with a much better level of abstraction than writing your own transformers at the bytecode level. You're not restricted to forcing users to work with AspectJ directly - we offer classloader and class file transformer integration too. You'll be more productive using the AspectJ programming model, and you'll benefit from a compiler/weaver that's being widely used and works around known VM bugs etc.. (The bug report that triggered me to write this entry was very exceptional, and the AspectJ compiler now generates code that avoids the JRockit bug (good news for those using WebLogic Server)).
There's one more good reason for building on AspectJ - AspectJ has well-defined semantics for what happens when you weave types with one set of aspects, and then weave again with a second set. This mirrors for example the scenario where two class file transformers from different third parties are used in conjunction. If you don't build on top of something that offers this kind of semantic guarantee, what exactly is supposed to happen when your class file transformer is used in conjunction with someone elses? The semantics can only be "undefined" - which doesn't seem a good deal for users or for vendors...
Posted by adrian at 09:02 AM [permalink] | Comments (2)