Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The Shibboleth IdP's periodic reloading of InCommon metadata makes effective use of heap space and garbage collection less than obvious. This page describes some of the issues and offers some suggestions.

 

Table of Contents

Java Garbage Collection

Heap memory is Java's temporary storage.  Management of the heap is the job of a Garbage Collector (GC).  Periodically the collector looks through the heap for allocations that are no longer referenced and reclaims them (a collection), making them available for another allocation.These collections are usually Stop-the-World (STW) events---the application is paused during collection.

...

Java has several collector options:

  • Serial GC. This is the basic, single threaded GC.  It is useful on single processor machines with small heaps. Collection is STW. 

  • Parallel GC. This is essentially a multi-threaded version of the Serial GC. It also tends to be STW.

  • Concurrent Mark Sweep (CMS). This multi-threaded collector is able to perform much of its work on the old generation in parallel with the application's normal processing.  This allows it to reduce the length of STW pauses and is most useful for real-time systems, e.g. web services. CMS uses the same new generation collector as the Parallel GC. This is the default GC for Java 1.8.

  • Garbage-First (G1). This multi-threaded collector attempts to meet configured maximum pause time goals.  It is a compacting collector, and can often obviate full STW old generation collections. Although G1 has a sense of new and old allocations, it does not divide the heap into new and old regions.

Regardless of which you use, they all have one architectural design in common: a separation, if only logical, of the heap into 'new' and 'old' spaces, and a sense that old allocations are semi-permanent.  The problem is that a Shibboleth IdP doesn't allocate memory that way,  Most of the 'old' space in the IdP's heap is InCommon metadata (~500MB per parse).  This is dynamic, and is completely replaced daily.

...

Some considerations for IdP GC tuning:

  1. Each InCommon metadata in-core instance consumes about 400-500 MB.

  2. InCommon metadata is replaced daily, usually mid-day in the U.S.

  3. Most IdPs are not uniformly loaded over a 24-hour period.

  4. There's little point in starting out with heap space less than the maximum.


CMS Tuning 

Choose CMS with the java arg (-XX:+UseConcMarkSweepGC).

...

The CMS GC is fairly strict with its allocation of new and old generation heap space.  For the Old Generation we need a some score of MB for non-metadata persistent allocations, and enough space for two parses of InCommon metadata—so we can refresh it without triggering a full collection.  That's about a minimum of 1.5GB for the old space.  The new generation is very dynamic, but the IdP doesn't require large blocks at a time.  We could get by with 500MB for the new.

  • 2GB total heap

    • Set the new ratio so new gets 1/4 of the heap.

    • Code Block
      Args: (-Xms2g -Xmx2g -XX:NewRatio=3)
  • 3GB total heap

    • Set the new ratio so new gets 1/2 of the heap.

    • Code Block
      Args: (-Xms3g -Xmx3g -XX:NewRatio=1)

Timing Collection of the Old Generation

...

Sample CMS 'old' heap space usage

Image Removed
Image Added
  1. InCommon metadata refresh at 11:17, 11:18:30, 11:19:30, and 11:26.

  2. Force GC at 11:27 and 11:28:30.

  3. The third refresh triggered a full GC, lasting a total of 7 sec. But recovered only .5GB.

  4. A forced GC at 11:27 recovered about .1GB.

  5. The second forced GC recovered all unused heap space.

Suggestions

  • Force a Full Collection Nightly

    • Usually 2 GC runs are required to empty the Old Generation space, 10 or 20 seconds apart.

  • Log GC Events

    • GC logging is inexpensive and is very useful.

    • Args: (-XX:+PrintGC -XX:+PrintGCDateStamps -Xloggc:<some_abs_path>)

    • See also: -XX:+UseGCLogFileRotation, -XX:NumberOfGCLogFiles, -XX:GCLogFileSize


G1 Tuning 

Choose G1 with the java arg (-XX:+UseG1GC).

...

G1 adjusts its collections in order to limit application pauses to a configured target (MaxGCPauseMillis).  If the target is too low each collection will not 'collect' enough free space to keep the heap sufficiently empty.  If it is too long your application can experience freezes (STW events).  Default MaxGCPauseMillis is 200ms.  I find 400ms keeps more free space in the heap.  With that setting I find there is always enough free space for the metadata rollover without needing a full collection. (In my load tests, anyway.)

  • 3GB total heap

    • Args: (-Xms3g -Xmx3g -XX:MaxGCPauseMillis=400)

Sample G1 heap space usage

Old space

New (Eden) space

Image Removed
Image Added
Image Removed
Image Added
  1. InCommon metadata refresh at 10:47, 10:53, 10:57, 11:07, 11:21, 13:15.

  2. Light load (1-2 login/sec) from start.

  3. Heavy load (8-9 login/sec) 11:10-11:23, and 13:30 to end.

  4. The heavy load at 11:10 increased Eden space to 1.5GB, reducing old space.

  5. The MD refresh at 13:15 shrank Eden space to allow old space to grow without GC.

  6. Heavy use at 13:30 grew Eden to 2GB and emptied old of all unused space.

  7. No full GC was ever necessary.

Suggestions for GC Monitoring

  • Log GC Events

    • GC logging is inexpensive and can be very useful.

    • Args: (-XX:+PrintGC -XX:+PrintGCDateStamps -Xloggc:<some_abs_path>)

    • See also: -XX:+UseGCLogFileRotation, -XX:NumberOfGCLogFiles, -XX:GCLogFileSize

    • The logs are text and can be easily read by eye, script or a visualizer, e.g. jvisualvm.

  • Watch your IdP live

    • For load test or just watching one of the IdP hosts I like to activate jmxremote and use a remote jconsole.

A Day in the Life of an IdP

This shows heap space allocations for one day for one of our IdP nodes at the University of Washington.

  1. Java settings: -XX:+UseG1GC -Xss24m -Xms3g -Xmx3g -XX:MaxGCPauseMillis=400

  2. Load:  logins/min range from about 60 mid day to about 5 mid night.


Old space

Eden space

Image Removed
Image Added
Image Removed
Image Added
  1. Some per-request allocations migrate to old space, but are easily discarded during routine old space GC

    1. These didn't show in load tests

    2. Due to very low number of unique ids used by the load test

  2. The old collections at 16:00, 01:00 and 09:30 were due to a threshold (50% of total).

  3. Rapid increase of old use at 10:00 was due to a reload of InCommon metadata.

  4. Anomaly between 10:30 and 12:00 due to disconnect of jconsole.