Tuesday, January 17, 2012

GroovyMX: New Remote JVM Test Cases

I just completed a check in to gmx with a small number of critical test cases.
I am attempting to make gmx a Groovy oriented but Java usable API, so for the most part, each logical test case has a Java and a Groovy version. Here's an example of a Groovy test:

    public void testRemoteClosureForMBeanCountAndDomains() throws Exception {
        def port = 18900;
        def gmx = null;
        def jvmProcess = null;
        try {
            jvmProcess = JVMLauncher.newJVMLauncher().timeout(120000).basicPortJmx(port).start();
            gmx = Gmx.remote(jmxUrl(port));
            def remoteDomains = gmx.exec({ return it.getDomains();});
            def domains = gmx.getDomains();
            Assert.assertArrayEquals("Domain array", domains, remoteDomains);
            def remoteMBeanCount = gmx.exec({ return it.getMBeanCount();});
            def mbeanCount = gmx.getMBeanCount();
            Assert.assertEquals("MBean Count", mbeanCount, remoteMBeanCount);
        } finally {
            if(gmx!=null) try { gmx.close(); } catch (Exception e) {}
            if(jvmProcess!=null) try { jvmProcess.destroy(); } catch (Exception e) {}           
In a nutshell, this test:
  1. Starts a new JVM with  the JMX management agent enabled.
  2. Retrieves the MBeanServer's MBean domains using a remote JMX call and then using a remoted closure. Verifies they're equal.
  3. Retrieves the MBeanServer's MBean count using a remote JMX call and then using a remoted closure. Verifies they're equal.

The latest snapshot is available here:
 Of course, these tests raised a bunch of issues which I am in arrears entering into Github. If you encounter any issues, please leave me some feedback.


Friday, January 13, 2012

Remoting Groovy with Generated Closures

On attempting to implement remote closure execution in Gmx, I envisioned simply generating a closure on the fly and passing it over the wire to a remote Gmx counterpart.
In my mind, it looked a bit like this very simple example that would print "Hello Jupiter" to the standard out on the target remote server:

This is the simplest version of the idea, and so serves as a good proof of concept.

Gmx already has a lot of the plumbing in place, namely:
  • The ReverseClassLoader service spins up an HTTP server to provide remote class loading to foreign MBeanServers and performs an acceptable job of locating and serving the bytecode (in the form of a byte[]) to requesting classloaders.
  • Gmx automatically detects when remoting will be required and starts the ReverseClassLoader and installs the remote counterpart (RemotableMBeanServer) on the foreign MBeanServer. (This means that some of the code in this example is redundant).
The major barrier I ran into was discovering that it is quite difficult to acquire the bytecode (the class representation in the form of bytes) of a closure compiled on the fly. Excuse my use of an imprecise term like "on the fly". What I mean is that if a closure undergoes a formal compilation process, such as using groovyc or a CompilationUnit, then the class itself is accessible from the java code source URL in the form of a stream of bytes. However, if you define an inline closure in an inline script and execute it (such as you might in GroovyConsole),  the class appears to have a "fake" code source URL that cannot be read from. I am not that familiar with the internals of Groovy so this needs further explanation by way of an example:

The output of this script, when run in GroovyConsole, is as follows:

Class [Name:ClosureFactory$_getClosure_closure1] Bytes:1192 Interfaces: [interface org.codehaus.groovy.runtime.GeneratedClosure] 
Hello Venus 
Exception:/groovy/shell (The system cannot find the file specified) 
Class [Name:ConsoleScript5$_run_closure1] Bytes:[] Interfaces: [interface org.codehaus.groovy.runtime.GeneratedClosure] 
Hello Jupiter 

In lines 4..6, the example defines a class that creates a closure and returns it from a call to getClosure(). Using the CompilationUnit is the equivalent of using groovyc. When the byte code of the closure is read using clozure.getClass().getProtectionDomain().getCodeSource().getLocation().getBytes(). In the first instance, this is successful. However, when script defines an "on the fly" closure on line 16 (the closure works fine, as can be seen on line 26), the same method does not work. In support of this, the bytecode for the "compiled" closure is writen to disk. The value of the URL returned by clozureClass.getProtectionDomain().getCodeSource().getLocation() is file:/home/nwhitehe/groovy/groovy-1.8.5/./ and the following can be seen in that directory:

-rw-r--r-- 1 nwhitehe nwhitehe 5893 2012-01-13 11:54 ClosureFactory.class 
-rw-r--r-- 1 nwhitehe nwhitehe 2454 2012-01-13 11:54 ClosureFactory$_getClosure_closure1.class 

For the "on the fly" closure on the other hand, the script gets an exception when attempting to read the bytes from the code source URL pointing to a file /groovy/shell which does not exist. The bytecode disappears off into the aether.

There are a few hints on this challenge in a Jira ticket filed at the end of 2010 titled Ability to get class bytes of closures at runtime, including nested closures (for remote control). RemoteControl is a groovy package for groovy closure remoting, so this seemed like a good place to start, but the documentation [more concisely than I did above] alerts the reader to the same problem:
The remote execution mechanism works by sending the definition of the closure class to the server. It does this by finding the corresponding .class file for the closure on the class path. This means that there must be a .class file for the closure on the class path for the closure to be able to be remotely executed. Closure's whose class has been generated dynamically at runtime are currently not supported.
The Jira issue also points to a script by Guillaume Laforge that outlines a method of finding nested closures with an advisory that this might be useful in resolving the problem, but the script uses a CompilationUnit to acquire the bytecode so it was not clear to me if that strategy would work.

Aside from using the Class/ProtectionDomain/CodeSource/Location.getBytes method to get class bytecode, another path I have used is Javassist which did not work, returning a null CtClass when requested from the ClassPool.

The solution that ended up working was using a ClassFileTransformer. This interface is defined in the java.lang.instrument package and instance of it can be registered with the JVM's Instrumentation
instance. It has a single method:

byte[] transform(ClassLoader loader, String className, Class classBeingRedefined, ProtectionDomain protectionDomain, byte[] classfileBuffer)

Accordingly, a ClassFileTransformer can provide the bytecode of a class. For our purposes, it needs to:
  • Be registered before the class loads or be invoked by a request to retransform the target class.
  • Filters out the class that has been targeted.
The following is a contrived example that retrieves a flat (i.e. no nested closures) closure's class byte code. The instrumentation instance is provided by a Gmx utility class called ByteCodeRepository. (More on that later).

In short, the class file transformer captures the bytecode of the class it is configured to capture. Once the bytecode has been captured, the transformer is unregistered. In order to trigger a call to the transformer, the instrumentation instance requests a retransform on the closure's class. Acquiring an instrumentation can be simplified, but the underlying mechanics are complicated. The JVM's instrumentation instance is not automatically created or available. It is typically created when the JVM is started with a -javaagent JVM startup option.

Alternatively, it is possible to use the Java Attach API to load a Java Agent into a running JVM which will trigger the creation of an instrumentation instance. This step is implemented by the Gmx utility class LocalAgentInstaller's static method getInstrumentation().

That's pretty much all that is required to get a reference to the instrumentation with some caveats:
  • The Attach API is only supported in Java 1.6+
  • The Attach API is variably implemented by different JVMs. You may have trouble with IBM JVMs, for example.
  • The Attach API is contained in the JDK's tools.jar so if you're using the JRE, you need to specifically add tools.jar to the classpath.
Gmx uses a reflective wrapper around the Attach API to avoid sticky compilation sores, but the details can be seen in the eponymous classes in the org.helios.vm package.

The next piece of Gmx is the ByteCodeRepository class. It is a singleton and its functionality is as follows:
  • It is a ClassFileTransformer that targets classes that implement org.codehaus.groovy.runtime.GeneratedClosure. These are typically the closures we're looking for. In my parlance, GeneratedClosure means "compiled on the fly". See this gist for the transformer basics.
  • It is a caching repository and cross-indexer for closure classes, bytecode and class names (binary and resource).
  • It automatically loads an instrumentation instance, so one can ignore the LocalAgentInstaller when using the ByteCodeRepository.

Capturing a closure's bytecode is now siplified to:

There's one subtlety left [that will be discussed here] in the quest to get bytecode for remoting closures and that's the rider on the Groovy Jira issue I mentioned, namely "including nested closures". Consider a closure like this that prints the declared methos signatures methods for each class in an array:

def Class[] classes = .....; 
classes.each() { 
     it.getDeclaredMethods().each() { 
          println it.toGenericString(); 

That's a closure within a closure, and they're two different classes. The bytecode for the outer closure does not contain the bytecode for the inner, so the earlier examples have hidden this problem. The class file transformer will still see the inner closure, and even though the name of the inner closure class is not as obvious (we could derive or guess it), here's why you don't need it:
  1. For the pruposes of remoting, so long at the ByteCodeRepository is installed, it will have captured all closures and indexed them by class name. When the remote class loader gets a serialized closure to invoke, it will surely know and will request each class from the ReverseClassLoader, which in turn looks them up in the ByteCodeRepository. 
  2. The following code prints (among other things that my optimizing editor has elided) the names of the loaded generated closures in this modified example. 

The output is:

Generated Closure: ClosureFactory3$_getClosure_closure1
Generated Closure: ClosureFactory3$_getClosure_closure1_closure2
All this nonsense was dedicated to getting all the bytecode necessary to remote closures, which I have not addressed at all, but I will in the next post.


Friday, January 06, 2012

GroovyMX: A Groovy JMX Client

I bootstrapped a new project on Github called GroovyMX (or just gmx). It is a monitoring oriented API but you may find it useful for various things. In the spirit of Groovy SQL, I am attempting to provide a JMX client API that is rich in functionality, terse in code and that extends the natural abilities of the native Java client.

This is a quick example which demonstrates how to connect to a remote MBeanServer and list the committed memory in bytes of each of the JVM's Memory Pools:

The output of this script is:

java.lang:type=MemoryPool,name=PS Eden Space:  402653184
java.lang:type=MemoryPool,name=PS Survivor Space:   16777216
java.lang:type=MemoryPool,name=Code Cache:  3407872
java.lang:type=MemoryPool,name=PS Perm Gen: 84738048
java.lang:type=MemoryPool,name=PS Old Gen:  268435456
Briefly,what this code is doing:
  1. This is usually the only import you will need.
  2. The Gmx class represents an MBeanServer, or more specifically, an MBeanServerConnection. There are a few ways to acquire one, depending on the situation, but in this case, I am connecting to a remote MBeanServer using its JMXServiceURL.
  3. The mbeans method has several overloads. In this case, I am providing an ObjectName pattern that will match to all the JVM's MemoryPool MXBeans, and a closure which executes for each ObjectName returned. 
  4. The closure is passed an instance of a MetaMBean which is notionally a proxy that combines the ObjectName of the MBean that it represents, and a MBeanServerConnection to the MBeanServer where the MBean is registered (the Gmx). In as much as possible, MetaMBeans act just like regular Pojos (or Pogos)  so the MBean attributes are accessed as simple properties and MBean operations are invoked like regular methods.
  5. The MetaMBean also has various local properties which are also accessed as simple properties in a Pogo. An example of this is objectName in line 4.
  6. The MemoryPool MBeans publish an attribute called Usage which is an instance of CompositeData. Fortunately, Groovy allows the simple reference of the composite sub-values by simple dot notation so the expression it.Usage.committed retrieves the nested value from the Usage composite structure that is keyed by the key committed
  7. Keen observers might wonder why Usage is cased that way. This is because the Groovy property specifier Usage is directly mapped to the MemoryPool MBean attribute Usage. Since it is perfectly legal for an MBean to have two separate attributes Foo and foo, the MetaMBean only honors the correctly specified case.
One of the tricky parts of making MetaMBean behave like a regular Pojo is that MBean operation invocations require the exact signature of the operation to be passed as an argument, which is very exacting and JMX can be quite fussy in this regard. Unlike regular Java and Groovy, there is no implicit [un/]boxing or inheritance easing done on your behalf. Consider these signature pairs:

void foo(int)                void foo(Integer)
void log(CharSequence)       void log(String)

If these were MBeanOperations, they would represent two different signatures, so the tricky part is inspecting the MetaMBean invocation name and arguments and executing a multidimensional pattern match against the MBean provided MBeanOperationInfos.

Gmx is also integrated with the Java Attach API so another way of acquiring a Gmx instance is to specify the PID of the JVM you want to attach to. The following example illustrates a script that discovers all (Attach API compatible) JVMs on the local machine and then prints the MBeanServerID attribute for each from the MBeanServerDelegate MBean.

The output of this script is:


Not super interesting, but I think the brevity is great.
This last example demonstrates some of the powerful optimizations of the Groovy remoting implemented by Gmx. To contrive an example, consider determining the total number of thread blocks across all threads in a JVM. Rather than retrieving every ThreadInfo from the ThreadMXBean and computing the total locally, I install the Gmx remoting on the remote MBeanServer and pass a script to perform the computation on the remote JVM and then return the result.

 The raw script passing is a bit awkward, and I'm working on implementing seamless remote closure invocation (See Issue #15).  and native closures are now supported. See here and here.

There are several more features which are complete, defective, in the roadmap/documentation or just in my imagination, but if you are interested, please check out the Gmx GitHub Site's Source Code, Issues and Wiki. I just started this recently so its not ready for a release, but you can download a snapshot from my Cloudbees snapshot repository.

The dependencies can be viewed in the Gmx Maven Pom.

There are some additional examples in the project unit tests:
If you have any feedback, please drop a comment on this blog.