Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Typo

...

Clustering the SP requires that you understand a number of inter-related issues:

  • Is the application itself clusterable?

  • How does the application's session management interact with the SP's session management, if at all?

  • What are the capabilities of your load balancer?

  • What are your tolerances for single points of failure?

The key issue is session management. The SP does maintain other information in memory, but generally this only includes the replay cache, and the protection you get from replay checks probably isn't enough to justify worrying about clustering it.

...

The SP has a feature to store a limited amount of attribute and sesssion session data in the client in an encrypted cookie, and a shared key across a set of servers can be used to recover the session on each node as the client moves between them. This is transparent, but isn't intended for use by a client constantly moving between servers on every request due to the overhead. The feature is discussed in the SessionCache page.

This is generally the recommended approach now.

Shared Process

The "simple", but not advisable, solution is to take advantage of the fact that the SP is divided into two pieces, and all of the session state is maintained in the shibd process rather than the web server. While the SP installation requires that you install both halves on each machine, you don't actually technically have to use both halves on each server. If you have a fast enough, and secure enough, network, you can could utilize a TCP connection to connect a number of web servers running the SP to a single shibd "listener" process. This process can run on any of the cluster nodes, or on a separate box devoted to it. To set this up, just follow the documentation for using the TCP Listener plugin.

On Windows, this plugin is already the default, so it's just a matter of configuring it. On Unix, you'll usually have to switch to it from the "Unix" variant. Normally, the listener component binds itself to the local loopback address and blocks traffic from any other source. Just configure the machine running shibd and set the listener's address to an actual network address, and set the ACL to a list of addresses corresponding to your web servers. Each of them in turn has the SP installed with the same configuration, which allows them to connect to the shared process.

The overhead of this approach has not been studied extensively, but in 2.0 the design was substantially changed in order to minimize the network traffic neededWhile the protocol was made less chatty over the years, it remains very non-performant this way because every single request has to traverse this hop. For a typical application not under extreme load, this is likely to be viable for the most part. However, you have to understand a few things:

  • Support for this is more or less unofficial and generally limited to consortium members (meaning nobody’s likely to jump in and help with this if you have problems).

  • Performance becomes pretty horrible very quickly as load increases. This is a bad solution for any but a lightly used site.

  • The protocol between the servers is NOT secure. It's a simple XML protocol running over TCP. You MUST rely on a secure network between the servers, ideally a private subnet.

  • The server running shibd is a single point of failure. If the process fails or the server fails, you'll lose all active sessions. However, you can restart any of the web servers without losing any state. They will reload any sessions as needed.

  • Session affinity is still important here. You'll get much better performance if you keep some locality of reference, because the sessions are cached in the web servers as well.

Shared Database

If you really want to try to cluster the SP in a persistent way, there is a plugin provided for this purpose based on using an ODBC-compliant database. You'll see there are lots of caveats recorded there.

...