Friday, January 13, 2012

Remoting Groovy with Generated Closures

On attempting to implement remote closure execution in Gmx, I envisioned simply generating a closure on the fly and passing it over the wire to a remote Gmx counterpart.
In my mind, it looked a bit like this very simple example that would print "Hello Jupiter" to the standard out on the target remote server:


This is the simplest version of the idea, and so serves as a good proof of concept.

Gmx already has a lot of the plumbing in place, namely:
  • The ReverseClassLoader service spins up an HTTP server to provide remote class loading to foreign MBeanServers and performs an acceptable job of locating and serving the bytecode (in the form of a byte[]) to requesting classloaders.
  • Gmx automatically detects when remoting will be required and starts the ReverseClassLoader and installs the remote counterpart (RemotableMBeanServer) on the foreign MBeanServer. (This means that some of the code in this example is redundant).
The major barrier I ran into was discovering that it is quite difficult to acquire the bytecode (the class representation in the form of bytes) of a closure compiled on the fly. Excuse my use of an imprecise term like "on the fly". What I mean is that if a closure undergoes a formal compilation process, such as using groovyc or a CompilationUnit, then the class itself is accessible from the java code source URL in the form of a stream of bytes. However, if you define an inline closure in an inline script and execute it (such as you might in GroovyConsole),  the class appears to have a "fake" code source URL that cannot be read from. I am not that familiar with the internals of Groovy so this needs further explanation by way of an example:


The output of this script, when run in GroovyConsole, is as follows:

Class [Name:ClosureFactory$_getClosure_closure1] Bytes:1192 Interfaces: [interface org.codehaus.groovy.runtime.GeneratedClosure] 
Hello Venus 
Exception:/groovy/shell (The system cannot find the file specified) 
Class [Name:ConsoleScript5$_run_closure1] Bytes:[] Interfaces: [interface org.codehaus.groovy.runtime.GeneratedClosure] 
Hello Jupiter 


In lines 4..6, the example defines a class that creates a closure and returns it from a call to getClosure(). Using the CompilationUnit is the equivalent of using groovyc. When the byte code of the closure is read using clozure.getClass().getProtectionDomain().getCodeSource().getLocation().getBytes(). In the first instance, this is successful. However, when script defines an "on the fly" closure on line 16 (the closure works fine, as can be seen on line 26), the same method does not work. In support of this, the bytecode for the "compiled" closure is writen to disk. The value of the URL returned by clozureClass.getProtectionDomain().getCodeSource().getLocation() is file:/home/nwhitehe/groovy/groovy-1.8.5/./ and the following can be seen in that directory:

-rw-r--r-- 1 nwhitehe nwhitehe 5893 2012-01-13 11:54 ClosureFactory.class 
-rw-r--r-- 1 nwhitehe nwhitehe 2454 2012-01-13 11:54 ClosureFactory$_getClosure_closure1.class 

For the "on the fly" closure on the other hand, the script gets an exception when attempting to read the bytes from the code source URL pointing to a file /groovy/shell which does not exist. The bytecode disappears off into the aether.

There are a few hints on this challenge in a Jira ticket filed at the end of 2010 titled Ability to get class bytes of closures at runtime, including nested closures (for remote control). RemoteControl is a groovy package for groovy closure remoting, so this seemed like a good place to start, but the documentation [more concisely than I did above] alerts the reader to the same problem:
The remote execution mechanism works by sending the definition of the closure class to the server. It does this by finding the corresponding .class file for the closure on the class path. This means that there must be a .class file for the closure on the class path for the closure to be able to be remotely executed. Closure's whose class has been generated dynamically at runtime are currently not supported.
The Jira issue also points to a script by Guillaume Laforge that outlines a method of finding nested closures with an advisory that this might be useful in resolving the problem, but the script uses a CompilationUnit to acquire the bytecode so it was not clear to me if that strategy would work.

Aside from using the Class/ProtectionDomain/CodeSource/Location.getBytes method to get class bytecode, another path I have used is Javassist which did not work, returning a null CtClass when requested from the ClassPool.

The solution that ended up working was using a ClassFileTransformer. This interface is defined in the java.lang.instrument package and instance of it can be registered with the JVM's Instrumentation
instance. It has a single method:

byte[] transform(ClassLoader loader, String className, Class classBeingRedefined, ProtectionDomain protectionDomain, byte[] classfileBuffer)

Accordingly, a ClassFileTransformer can provide the bytecode of a class. For our purposes, it needs to:
  • Be registered before the class loads or be invoked by a request to retransform the target class.
  • Filters out the class that has been targeted.
The following is a contrived example that retrieves a flat (i.e. no nested closures) closure's class byte code. The instrumentation instance is provided by a Gmx utility class called ByteCodeRepository. (More on that later).


In short, the class file transformer captures the bytecode of the class it is configured to capture. Once the bytecode has been captured, the transformer is unregistered. In order to trigger a call to the transformer, the instrumentation instance requests a retransform on the closure's class. Acquiring an instrumentation can be simplified, but the underlying mechanics are complicated. The JVM's instrumentation instance is not automatically created or available. It is typically created when the JVM is started with a -javaagent JVM startup option.

Alternatively, it is possible to use the Java Attach API to load a Java Agent into a running JVM which will trigger the creation of an instrumentation instance. This step is implemented by the Gmx utility class LocalAgentInstaller's static method getInstrumentation().

That's pretty much all that is required to get a reference to the instrumentation with some caveats:
  • The Attach API is only supported in Java 1.6+
  • The Attach API is variably implemented by different JVMs. You may have trouble with IBM JVMs, for example.
  • The Attach API is contained in the JDK's tools.jar so if you're using the JRE, you need to specifically add tools.jar to the classpath.
Gmx uses a reflective wrapper around the Attach API to avoid sticky compilation sores, but the details can be seen in the eponymous classes in the org.helios.vm package.

The next piece of Gmx is the ByteCodeRepository class. It is a singleton and its functionality is as follows:
  • It is a ClassFileTransformer that targets classes that implement org.codehaus.groovy.runtime.GeneratedClosure. These are typically the closures we're looking for. In my parlance, GeneratedClosure means "compiled on the fly". See this gist for the transformer basics.
  • It is a caching repository and cross-indexer for closure classes, bytecode and class names (binary and resource).
  • It automatically loads an instrumentation instance, so one can ignore the LocalAgentInstaller when using the ByteCodeRepository.

Capturing a closure's bytecode is now siplified to:


There's one subtlety left [that will be discussed here] in the quest to get bytecode for remoting closures and that's the rider on the Groovy Jira issue I mentioned, namely "including nested closures". Consider a closure like this that prints the declared methos signatures methods for each class in an array:

def Class[] classes = .....; 
classes.each() { 
     it.getDeclaredMethods().each() { 
          println it.toGenericString(); 
     } 

That's a closure within a closure, and they're two different classes. The bytecode for the outer closure does not contain the bytecode for the inner, so the earlier examples have hidden this problem. The class file transformer will still see the inner closure, and even though the name of the inner closure class is not as obvious (we could derive or guess it), here's why you don't need it:
  1. For the pruposes of remoting, so long at the ByteCodeRepository is installed, it will have captured all closures and indexed them by class name. When the remote class loader gets a serialized closure to invoke, it will surely know and will request each class from the ReverseClassLoader, which in turn looks them up in the ByteCodeRepository. 
  2. The following code prints (among other things that my optimizing editor has elided) the names of the loaded generated closures in this modified example. 


The output is:

Generated Closure: ClosureFactory3$_getClosure_closure1
Generated Closure: ClosureFactory3$_getClosure_closure1_closure2
All this nonsense was dedicated to getting all the bytecode necessary to remote closures, which I have not addressed at all, but I will in the next post.

Cheers.



No comments: