IdP Clustering Configuration
Actually, we fooled you, this page contains nothing about configuring the IdP to run in a clustered fashion. However, please read all of this page because it provides critical information for setting up the environment into which you will deploy you clustered IdPs.
Terminology
Many individuals consider high availability, load balancing, and clustering to be the same thing. They are not. Within this document the terms are used and have the following definitions:
...
Finally, be aware that fail over and load balancing are completely outside the control of the IdP. These require some mechanism (two described below) for routing traffic before it even reaches a node.
Application State and its Effect on Clustering
The IdP is a stateful application. That is it maintains operational information, between requests, that is required to answer subsequent requests. The IdP keeps this information in memory (for various reasons cookies can not cannot be used as they are in some web applications). Therefore, in order to achieve high-availability this information must be shared amongst all IdP nodes. By default the Shibboleth team recommends V2 release assumed (and documentsdocumented) the use of Terracotta as the mechanism for doing this. This is still supported, but only on Java 6 because the Terracotta solution was not ported to support Java 7.
At present, the only options for clustering in-memory state with Java 7 are third party extensions, such as the memcached plugin. An alternative that relies on client-side state is mentioned below.
Like most applications that use this approach, each IdP node keeps the state it creates in memory in a form readily usable by the node but uses a more compact form when making it available to other nodes. Therefore, any load balancing solution used should route all subsequent requests to the same node that serviced the initial request. This prevents the IdP nodes from constantly reading/writing information to/from this more compact form (an expensive process). This is generally known as session affinity load balancing.
Two Common Clustering Approaches
Most environments use one of two methods for creating a cluster of nodes that look like one single service instance to the world at large.
DNS Round Robin
This is done by registering each cluster node under the same hostname. When a DNS lookup is performed for that hostname the DNS server returns back a list of IP addresses (one for each node) and the client chooses which one to contact.
...
The Shibboleth team strongly discourages this approach.
Hardware
...
-Based Clustering
This is done by using specially dedicated hardware to intercept and route traffic the various nodes in a cluster (so the hardware basically becomes a switch in front of the nodes). This hardware is then given the host name for all the services provided by the clusters behind it.
...
Because of the guaranteed characteristics provided by this solution, the Shibboleth team recommends this approach. Caution should be taken though to ensure that the load balancing hardware does not become a single point of failure (i.e. buy and run two of them).
Configuring the IdP for Clustering
The above description is mostly focused on the actual Identity Provider servlet itself when it comes to clustering. However, since the IdP relies on other services for authentication and attribute resolution, it is important to also make sure that these services are clustered as well. Therefore, one should make sure in a production environment to:
- Have the IdP servlet running on multiple cluster nodes. How to do this is described in the howto configure the IdP for Clustering.
- Have the attribute resolver configured to use multiple LDAP/AD servers. How this can be done is described in the LDAP Resolver description.
- Also protect the authentication system to make use of clustering. Have a look at the username/password authentication description.
When it comes to using LDAP (most common case) for attribute resolution and authentication, you may also have a look at the multiple LDAP configuration hints.
Stateless Clustering
Now, go set up you cluster environment and then proceed to configure the IdP for ClusteringFinally, if you're willing to give up some features and (in many cases) do some custom development, it's possible to deploy the IdP in a configuration that does not rely on shared runtime state. See this topic for more information.
Proxy Clustering
Further finally, if you front-end your server with Apache and are willing to give up some features, it's possible to cluster an IdP using Apache rewrite and proxy directives . See this topic for more information.