Wednesday 4 July 2012

SOA 11g - JVM Tuning



Installed couple of 11g SOA Suite instances and delivered it to our development team to work on it. For a week the server seemed to be working fine but then started my nightmare. The servers started becoming unstable.

Following were the issues:

1.       The servers became too slow, especially the EM console hence making it difficult for developers to continue working.
2.       EM Login page got struck at the user authentication page.
3.       Data sources went into suspended state hence bringing EM application to halt.
4.       Out of Memory issues

This is where I did have a look into the JVM settings for the SOA application. Till now I was going with default JVM settings as provided by Oracle. I generated some gc logs and used a GC analyzer to view the GC info. That is where I could see frequent GCs with unarguably high pause time. I needed to tune my JVM settings for sure.

With release of the Oracle Fusion Middleware 11g products like SOA suite, BAM, OER, OSB etc a lot has changed the way these products are built and work. Let’s focus on the SOA Suite. The 11g SOA Suite unlike the 10g now runs on Oracle Weblogic server. The SOA suite application now grows bigger with addition of applications like B2B, BAM etc. In the past releases (10g) B2B and BAM used to be separate installations .On top of this there are two management consoles, the Weblogic Admin Console and the Enterprise Manager FMW console which the product needs to function. Hence as you see the new 11g SOA suite is not only new but also a big. The Application Sever (Weblogic) has to be tuned appropriately in order to ensure a healthy SOA instance.

Below is my environment info:


Application and Database Server hardware Info


The SOA Application and the database servers both were installed on separate physical boxes. The specifications of the boxes are mentioned below.


Server Hardware: SUN T 5240                                                                                                                          Operating System: Solaris 10                                                                                                                        Architecture: Sun Sparc 64 Bit                                                                                                                  
Number of CPU: 10                                                                                                                                             Available Memory: 13.6 GB

Application Installed and Version:

Application Server: Oracle Weblogic Server (Version 10.3.4 )                                                                             
FMW Product: Oracle SOA Suite (Version 11.1.1.4)                                                                                                        
JVM Used: Sun JDK 1.6 Update 23 
(Latest Sun JDK ,this version boasts of performance boost on Solaris servers)

Application Install Architecture: Stand Alone Install.   

Below is my JVM setting recommendation. Please note, below tuning might be a good one to start with.  As number of concurrent users, deployed applications, load increase the tuning parameter below might change.     

SOA Suite 11g ideally uses two JVMs to function.

1.       Admin Server JVM: This is the weblogic  server (JVM) on which the Weblogic Admin Console and the EM Fusion Middleware Control are deployed. The Weblogic Admin Console is used to manage and control the weblogic resources. The EM Fusion Middleware control mainly is used to work on the SOA suite. It enables application deployment, application monitoring etc.

2.       SOA Managed Server:  This is the   weblogic server (JVM) on which the   entire SOA Suite and B2B product stack is deployed. Hence you can expect it to be a bit heavier than   the Admin Server.

Below is the JVM settings I recommend:
JVM Heap Recommendations for Development Managed Servers
-server –d64 –Xss256k –Xms4g –Xmx4g –XX:NewRatio=2 -XX:+AggressiveOpts -XX:PermSize=1g -XX:MaxPermSize=1g -XX:+UseParallelGC -XX:+UseParallelOldGC -XX:ParallelGCThreads=8 -XX:InitialSurvivorRatio=10 -XX:SurvivorRatio=10 -XX:LargePageSizeInBytes=4m -Dweblogic.management.discover=false -Dweblogic.StuckThreadMaxTime=900 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/java.hprof -verbose:gc -Xloggc:/tmp/gc.log -Xnoclassgc -XX:TargetSurvivorRatio=90 -XX:ReservedCodeCacheSize=64m -XX:CICompilerCount=8 -XX:+AlwaysPreTouch -XX:+PrintReferenceGC -XX:+ParallelRefProcEnabled -XX:-UseAdaptiveSizePolicy -XX:+PrintAdaptiveSizePolicy -XX:+DisableExplicitGC

JVM Heap Recommendations for Production Managed Servers
-server –d64 –Xss256k –Xms6g –Xmx8g –XX:NewRatio=2 -XX:+AggressiveOpts -XX:PermSize=2g -XX:MaxPermSize=2g -XX:+UseParallelGC -XX:+UseParallelOldGC -XX:ParallelGCThreads=16 -XX:LargePageSizeInBytes=4m -XX:InitialSurvivorRatio=10 -XX:SurvivorRatio=10 –XX:-UseTLAB -Dweblogic.management.discover=false -Dweblogic.StuckThreadMaxTime=900 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/java.hprof -verbose:gc -Xloggc:/tmp/gc.log -Xnoclassgc -XX:TargetSurvivorRatio=90 -XX:ReservedCodeCacheSize=64m -XX:CICompilerCount=8 -XX:+AlwaysPreTouch -XX:+PrintReferenceGC -XX:+ParallelRefProcEnabled -XX:-UseAdaptiveSizePolicy -XX:+PrintAdaptiveSizePolicy -XX:+DisableExplicitGC

JVM Heap Recommendations for AdminServer

Modify the AdminServer JVM since it's running both the WebLogic Console administration application and the Enterprise Manager Fusion Application Control:

-server –Xms2g –Xmx2g –XX:NewRatio=3 -XX:+AggressiveOpts -XX:PermSize=512m -XX:MaxPermSize=512m -XX:+UseParallelGC -XX:+UseParallelOldGC -XX:ParallelGCThreads=16 -XX:InitialSurvivorRatio=10 -XX:SurvivorRatio=10 -Dweblogic.StuckThreadMaxTime=900 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/java.hprof -verbose:gc -Xloggc:/tmp/gc.log -Xnoclassgc -XX:TargetSurvivorRatio=90 -XX:ReservedCodeCacheSize=64m -XX:CICompilerCount=8 -XX:+AlwaysPreTouch -XX:+PrintReferenceGC -XX:+ParallelRefProcEnabled -XX:-UseAdaptiveSizePolicy -XX:+PrintAdaptiveSizePolicy -XX:+DisableExplicitGC

Tuning Explanations:

The production recommendation differs from Development by the size of the heap.  Since the Production environment will be hosting significantly more traffic it will need the additional heap space to grow and handle those requests.  Tuning will be required in that there’s a possibility the Production SOA applications may need more or less heap space as required by the following factors:

-          Size of SOA interfaces deployed                                                                                                                                      -          Frequency of SOA interface usage or # of instances per minute                                                                                -          Length of time through which each instance executes

Tuning the environment should be not speculative but a measured one. Using Memory and GC analyzing tools like Oracle Enterprise Manager Grid Control in conjunction with Performance Load Testing activities, you will be able to tune your production environment adequately to prevent any load related outages. Further tuning may need to be monitored on the JVM for Garbage Collection times.  If the time it takes to do partial or Full GC increases significantly then increase the number of ParallelGCThreads.  By default the ParallelGCThreads is set to what’s available at the system level.  Example on a 2xUltraSPARC T2+ [T5240] = 128 which is too high and can cause heap fragmentation.

Since garbage collection in the Old Space or Tenured Space can be costly requiring more pause time and cpu time to complete a full gc, you may need to size up the New, Nursery, or Eden Space.  This is controlled with the NewRatio=n directive.  This sets the Eden Space to 1 / n + 1 size of the Max heap space.  If you find that majority of objects are short lived meaning the heap grows to a high end with heavy load but then returns to a lower level, then you may benefit from a larger Eden space.  This may require using a different directive than NewRatio.  You may need to size your Eden space to 50 – 60% of the total heap size.  Try –XX:NewSize=5g –XX:MaxNewSize=5g where –Xmx8g.

The default 64bit thread stack size is 1024m under SPARCv9.  When defining a 64bit model [-d64] be sure to size down the thread stack size which by default is too large; the 32bit model defaults to 512k on SPARCv9; the Linux x86-64 the Java 32bit model is 256k.  Some performance benchmarks on spec.org for WebLogic set the thread stack size to 128k on the Sun T series servers.  Having a high thread stack size can waste a significant amount of stack space [heap space].  Consider setting it to –Xss128k or –Xss256k to free up heap space and thereby reducing the overall max heap the application may need under load.

Thread local portions of the heap in the young generation is free space on the thread stack.  This can be used as a cache and can offer “excellent speedups on smaller numbers of threads (100s)”.  However, this can become a burden to the JVM costing more gc time when the number of threads are in the thousands.  On the Solaris SPARC platform the directive –XX:+UseTLAB is on by default.  When testing an application under heavy load using thousands of threads and experiencing excessive gc, consider turning off TLABs:  -XX:UseTLAB

When sizing the JVM heap or internal heaps ensure you set both the min and max to the same size.  This reduces latency while the JVM is trying to size up or down the heap spaces.  Example NewSize, MaxNewSize or PermSize, MaxPermSize

-XX:+HeapDumpOnOutOfMemoryError This directive creates a Heapdump in case of a out of memory error in the JMV. This would allow you to diagnose the root cause of thr memory leak.

-XX:HeapDumpPath=/tmp/java.hprof : This creates the Heapdump in the specified location.

-verbose:gc -Xloggc:/tmp/gc.log : This option allows you to specify the gc log file location.

Above tuning recommendations could be applied to other FMW products as well. Again the above recommendations should give you a descent start. With growing load and usage you may reconsider the tuning.

2 comments: