Tuesday, April 05, 2016

A trial port from Netty 3 to Netty 4.1, Part 3 of ?

Fixed another stupid mistake in the pipeline factory. Noticed this testing data submission using bosun / scollector which uses gzipped JSON HTTP posts to submit data through OpenTSDB. Of course, the HttpContentDecompressor must be called after the HTTP decoding codec, because the HTTP wrapper is not compressed, just the body of the request. The corrected pipeline is:

Another issue popped up related to Netty 4's HttpResponseStatus. As mentioned before, many Netty 4 classes have been de-getter-izationed where immutable attributes don't have a getX method, so what was formerly getStatus() is now status(). Fair enough. But beware.... Many of these classes do not have explicit Jackson [JSON] serializers, since Jackson has typically figured it out on the fly using..... yup. BeanInfo. And since HttpResponseStatus no longer has a bean like signature, Jackson gripes like this:

No serializer found for class io.netty.handler.codec.http.HttpResponseStatus and no properties discovered to create BeanSerializer (to avoid exception, disable SerializationFeature.FAIL_ON_EMPTY_BEANS) ) (through reference chain: java.util.HashMap["httpResponse"])

Not a big deal. Just need to hunt all these cases down and define explicit serializers for each.

Anyways, both Telnet and JSON data import is now working through the PutDataPointRpc.

Also found a number of instances where I had ByteBuf.retain()ed, the incoming buffer to prevent them from being reclaimed before the pipeline is done with them. Well in these cases, the buffer also needs to be ByteBuf.release()d. This error tipped me off to that:

2016-04-05 06:21:37,447 ERROR [EpollServerWorkerThread#1] ResourceLeakDetector: LEAK: ByteBuf.release() was not called before it's garbage-collected. See http://netty.io/wiki/reference-counted-objects.html for more information.
WARNING: 4 leak records were discarded because the leak record count is limited to 4. Use system property io.netty.leakDetection.maxRecords to increase the limit.

Recent access records: 5.

The leak detection makes it easy to find these cases and I've closed out all the ones I've come across so far. Netty 4 reports some fairly detailed stats on pooled buffers, and since I think Direct Pooled buffers will be a major improvement for OpenTSDB, I'm going to surface these through the stats collector.

Sunday, April 03, 2016

A trial port from Netty 3 to Netty 4.1, Part 2 of ?

Finally got the UI up and running. 

I was actually surprised to see it since I have not seen the new look before....

2016-04-03 15:49:50,254 INFO  [EpollServerWorkerThread#4] RpcHandler: Handling message [AggregatedFullHttpRequest]
2016-04-03 15:49:50,262 INFO  [EpollServerWorkerThread#4] HttpQuery: [id: 0x4cc79a07, L:/ - R:/] HTTP /aggregators done in 7ms
2016-04-03 15:49:50,282 INFO  [EpollServerWorkerThread#4] RpcHandler: Handling message [AggregatedFullHttpRequest]
2016-04-03 15:49:50,284 INFO  [EpollServerWorkerThread#4] HttpQuery: [id: 0x4cc79a07, L:/ - R:/] HTTP /s/gwt/opentsdb/images/corner.png done in 1ms
2016-04-03 15:49:50,285 INFO  [EpollServerWorkerThread#2] RpcHandler: Handling message [AggregatedFullHttpRequest]
2016-04-03 15:49:50,285 INFO  [EpollServerWorkerThread#3] RpcHandler: Handling message [AggregatedFullHttpRequest]
2016-04-03 15:49:50,286 INFO  [EpollServerWorkerThread#2] HttpQuery: [id: 0xabbb13cb, L:/ - R:/] HTTP /s/gwt/opentsdb/images/hborder.png done in 1ms

2016-04-03 15:49:50,287 INFO  [EpollServerWorkerThread#3] HttpQuery: [id: 0x4857bc87, L:/ - R:/] HTTP /s/gwt/opentsdb/images/vborder.png done in 2ms

I had a bit of trouble getting the static content server to work properly. Not sure what's going on, but after the first serve, all remaining requests seemed to stall and the browser reported no content delivered. It seemed to have something to do with the use of FileRegions, so I tried swapping the OpenTSDB file serving code with Netty's HttpStaticFileServerHandler, but I saw the same issues.

Eventually, I decided to bypass the problem by just loading the content up int ByteBufs and writing the bufs out. Not the most optimal, but I'll come back round to it soon.

Here's some more conversion steps:

A lot of these:

  • query.method().getName()  ---> query.method().name()
  • channel.isConnected() ---> channel.isOpen()
  • HttpHeaders.Names.CONTENT_TYPE  --->  HttpHeaderNames.CONTENT_TYPE
A new class, HttpUtil handles the setting of some headers, but not all, so for example, this seems to be the most terse way of setting the content length in a response:   HttpUtil.setContentLength(response, buf.readableBytes());

Most Netty 4.1 sample code I have read uses ChannelHandlerContext.write(...) and ChannelHandlerContext.writeAndFlush(...) and not the same methods in Channel, although as far as I can tell, they perform the exact same function. Not sure if there is a case to be made to use one over the other, but I converted the OpenTSDB HttpQuery constructs to use ChannelHandlerContext.

The OpenTSDB PipelineFactory continues to need tweaking. Following up on the changes in Netty's HTTP codecs, specifically the new breakout of HttpRequest into seperate parts, I am no longer sure it makes sense for OpenTSDB to not always have an HTTP object aggregator. Right now, OpenTSDB does not accept chunked requests unless the default is overridden. Problem is, Netty 4 assumes that that either you or an HttpContentAggregator handler will piece the full request back together,  so if there are no ill side effects, I propose request aggregation become a permanent fixture. The HTTP switch now looks like this:

Next up is testing all the RPCs.

Saturday, April 02, 2016

A trial port from Netty 3 to Netty 4.1, Part 1 of ?

Ci sono riuscito ! First boot up of OpenTSDB using Netty 4.1.

2016-04-02 13:40:14,919 INFO  [EpollServerBossThread#1] TSDTCPServer: [id: 0x4a16e893] REGISTERED
2016-04-02 13:40:14,920 INFO  [EpollServerBossThread#1] TSDTCPServer: [id: 0x4a16e893] BIND: /
2016-04-02 13:40:14,921 INFO  [main] TSDTCPServer: Started [EpollServerSocketChannel] TCP server listening on [/]

Wasn't too bad. Is it stable ? No chance. Still working on it. What follows is a summary of the porting process so far.

  • Search and replace "import org.jboss.netty" to "import io.netty" on the whole code base.
That's it ....  ha ha. If only ....
  •  Search and replace "ChannelBuffer" to "ByteBuff" on the whole code base.
  • Many of the HTTP related Netty bits have getter methods un-gettified. As I understand it, this is to provide a more pure naming conventions where only attributes supporting a set will have a get. So, for example, getName() becomes name(). For example, I made this change a lot: query.method().getName()  to query.method().name().
  • There's no ChannelBuffers utility class in Netty 4, so where ever OpenTSDB allocated an ad-hoc buffer, I delegated the call to a simple ByteBuff factory. The buffer options in Netty 4 are much more extensive than in 3, so I am not sure what to do with the factory, but for now it exists as a functional place-holder. The idea is to take advantage of direct and pooled buffers in OpenTSDB so as to reduce heap space utilization and GC.
  • Netty 4.1 enforces the channel handler shareability now, so I needed to mark some of the modified handlers as @ChannelHandler.Sharable. Otherwise, you get this:
io.netty.channel.ChannelPipelineException: net.opentsdb.tsd.ConnectionManager is not a @Sharable handler, so can't be added or removed multiple times.

  • So far, in OpenTSDB that means ConnectionManager, RPCHandler, DetectHttpOrRpc,  and PipelineFactory. There's probably a few more I haven't hit yet. Oh, and in some cases, even when I applied the annotation, I got:
Caused by: java.lang.IllegalStateException: @Sharable annotation is not allowed

  • But that was a stupid mistake. DetectHttpOrRpc was a Netty 3 FrameDecoder. I switched to be a ByteToMessageDecoder (which suffers @Sharable not!) but the handler does not actually do any decoding so the more appropriate replacement was SimpleChannelInboundHandler.
  • Pooled Buffers come with some baggage. It's not Louis Vuitton, but it should fit comfortably under the seat in from of you. One bit of baggage is this:

14:24:10,729 WARN  [EpollServerWorkerThread#1] DefaultChannelPipeline: An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.
io.netty.util.IllegalReferenceCountException: refCnt: 0, decrement: 1
  • The issue there was two fold. The Netty 3 version of DetectHttpOrRpc basically looks at the first byte of the incoming buffer to determine if the payload is HTTP or text (Telnet). (Yes, it should probably be called DetectHttpOrTelnet). It makes no relative reads from the buffer, so no bytes are consumed, and when it's done, it returns [a copy of] the full buffer which in Netty 3 FrameDecoder parlance means the buffer was passed to the next handler in the pipeline. However, in Netty 4, once you handle a pooled ByteBuf, you're either done with it, in which case it gets reclaimed by the pool, or you must ByteBuff.retain() it, meaning that it should not be reclaimed and it will be passed as is to the next handler in the pipeline. Phrase to live by..... "retain it or lose it"
  • The other issue was.... Netty 4 handlers don't typically return stuff. They used to in Netty 3, but the pipelines are now less hole-ey. In short, to send the (retained) ByteBuff to the next handler in this case, you call ChannelHandlerContext.fireChannelRead(byteBuff).
  • Right now I am trying to figure out what to do with the HTTP request handling. Netty 4 breaks up the old HttpRequest (iface) and DefaultHttpRequest (impl) into a vast hierarchy of types that signify different things depending on when you get them. I think there's an easy way to do this, and then there's the super optimized way of doing it, but it's possible that neither of those is true.
  • One last thing.... I initially removed all references to Netty 3 in my dev classpath (Eclipse) so that outdated references would be highlighted and to help with class lookups and what not. However, keep in mind that both ZooKeeper and AsyncHBase both use (at least in OpenTSDB) Netty 3 so without a lot more porting, Netty 3 and Netty 4 need to live side by side for a while. Thanks for changing the package name guys !!!