Java vs. C++ Performance Face-Off`
As the architect/designer of OptionsCity’s algorithmic trading platform Freeway, I am often asked, “Why did you write it in Java, when you could (should) of used X?”, and by X they usually mean ‘C’ or ‘C++’. And when they ask, “Why didn’t you write it in C or C++?”, they usually mean “Isn’t your system slower than a C or C++ system?”.
I respond that our system is plenty fast, and that we chose Java for a variety of reasons: performance, easy concurrency, maintainability, and deployment options being just a few.
That being said, I am neither a Java extremist, nor a C/C++ hater. We use C in cases where we need to access low level facilities that Java does not expose. We are also open to using C/C++ libraries for performance – but the gains typically need to be massive to offset the ‘Java/C’ translation overhead and the additional maintenance headaches.
Recently I took time to evaluate the leading open-source C++ quickfast library, which has been actively developed and tuned over the past 5 years, and is used by many proprietary trading firms and exchanges to perform FIX FAST market data decoding/encoding.
The quickfast library includes ‘PerformanceTest’, which takes a FIX FAST market data capture file, loads of all the packets into memory, then decodes each packet into a “message” ** . The test measures the decode time only, and divides by the number of packets to arrive at a ‘usec per packet’ claim. No I/O or startup time is included. It is single threaded so no contention is possible.
I don’t typically place a lot of stock in “benchmarks”, as too often they are “micro” which leads to easy manipulation, and/or are dependent upon “super” programmers writing (and rewriting) highly optimized (and correct) code, but most importantly, they don’t normally translate to how a complete system will actually perform.
But this case is different as the tests perform “useful work”, and since we have developed a functionally identical decoding library in Java, and both the C++ and Java have been heavily tuned, we have an ideal Java vs. C++ performance comparison. We were able to run both test applications on the same machine/OS. The tests were run with 3 different “live data” captures since the market segment influences the complexity of the encoded data. Each capture contained over 100k packets.
So… (all times in microseconds) *
* all tests performed on Intel i7 2.93 ghz, running Linux x86_64. quickfast compiled using gcc 4.6.3. Java used was Sun 64-bit JDK 1.7.51.
** the PerformanceTest included in the quickfast distribution does not build a “message” (it uses PerformanceBuilder.cpp), it only decodes and counts the fields/groups/etc. Our Java solution always builds a “message”, which is more useful but more expensive, so we also developed a modified “PerformanceTest” for quickfast (using GenericMessageBuilder.cpp) that performs similar message building.
|e-mini futures||e-mini options||“proprietary” capture|
|quickfast 1.5.0 (no msgs)||6.7||13.0||14.3|
|quickfast 1.5.0 (with msg)||11.0||39.7||39.6|
|java no warm-up||3.8||14.0||14.0|
|java with warm-up||1.8||11.0||11.0|
|% Difference w/msg||
The Java based solution is the clear winner – outperforming quickfast even when quickfast builds no messages !
How Can This Be ?
Without doing a very time consuming in-depth analysis, I can only theorize the reasons:
1. Better Algorithms. Even though quickfast is actively developed (with performance a prime motivator), due to the difficult nature of C++ development (even experienced developers have trouble writing bug free code) you’re not apt to “try new things” (read as “change a lot of code”) unless you know these changes are going to provide benefits. The Java development cycle being easier allows us to even create different solutions to be used under different conditions.
2. Memory Allocation. Java has been maligned in the past due to the “garbage collection” pauses, but in the case of “decoding” where lots of little objects are created, Java can allocate objects far more efficiently than C/C++ malloc. You could write/generate a ‘no garbage’ decoder – and I’m sure many companies do – but it would be most likely be less flexible – especially in concurrent environments. (See *** above – even when quickfast doesn’t create the objects it is still slower… )
3. JIT Optimization. For programs like decoders, especially when using a pre-compiled language, the developer must make many of the optimization decisions and code accordingly. The reality is that many of the developers assumptions do not hold in a live “realistic” environment, but at this point the code is “frozen”. A JIT instead will perform the optimization (and re-optimizations) as it determines its compilation assumptions no longer hold. Also, the quickfast library makes heavy use of virtual method calls – similar to Java – but Java can optimize these virtual calls away during JIT. As a small aside, on a RISC architecture the JIT gains can be huge due to the large number of registers available which provide the ability to heavily optimize the calling of methods, and the dynamic allocation of temporary values to registers.
Let me be clear, these tests are still “micro” in the sense that a FIX FAST decoder does not a trading system make, but it is a critical performance component in many of them.
In closing, I’ve quite enjoyed producing my first blog post – I typically only “get out” once a year at the Algo Conference**** – but based on your feedback I might be influenced to do a few more.
**** BTW, my Algo Conference presentation this year uses 100% C code…