SP4Details
This early material has been supplemented by this DesignNotes page.
Background
The SP faces serious challenges as a sustainable piece of software:
It contains many thousands of lines of C++ code. This code is of a complex nature, written in a complex idiom that is also dated. It has been difficult, essentially impossible, to identify new programming resources able to assume responsibility for the code as the original developer gets closer to a possible retirement.
Its key XML and security dependencies are large, complex code bases in their own right that face even more dire maintenance challenges than the SP itself, as we essentially subsidize their continued existence out of necessity. The only "stable" project among the largest dependencies is OpenSSL, and it faced its own existential crisis only a few years ago. The possibility of a significant security issue that is difficult to fix given ordinary effort is very much on the table.
Some of the features we might like to support (e.g. ECDH encryption, possibly even OIDC support) are much more difficult to add if they have to be built from scratch or at least in C++.
On the other hand, the feedback from the membership is that many organizations see significant value in what the SP does and more importantly in how it does that job:
There is concern that other options for SAML support in a native Apache or IIS footprint also lack clear sustainability (i.e, they're in the same boat, ultimately).
Concern has also been raised that mod_authn_oidc may be a suitable replacement but is itself a large code base without a bench of developers to support it.
Several members expressed the view that a lot of value and trust is attached to the "brand" we represent when it comes to supporting our code. A solution we support is seen as very desirable.
As the author, I would add that rough edges aside, the code base we have provides a number of compelling features that are probably valuable in their own right to repurpose in a new form, such as the integration between Apache configuration and a portable syntax that spans both Apache and IIS (and other integrations potentially), and a cookie-based session clustering feature.
When the conversation began at ACAMP, the project's view was that we had a problem and are open to any/all solutions, including identifying alternatives to use as replacements, or investing in a replacement ourselves provided the membership approves of that investment of their money and provided that the replacement would address the sustainability risks.
The consensus view seems to be that we should produce a plan for such a replacement.
Replacement Design Constraints
We would like to get consensus quickly around these constraints:
The amount of C/C++ code should be kept to an absolute minimum, requiring that as much as possible be offloaded to "something else". For now, let's refer to "something else" as a "processing hub".
Given current project development resources and direction, the processing hub would likely be in Java but that isn't an inherent requirement as long as appropriate resources are identified to own the work long term.
The C/C++ code needs to be as self-contained as possible, in particular not relying on any libraries not present on virtually any Linux distribution. Plausibly it may be worth abstracting more of the code to leverage native Windows APIs in some areas to further limit dependencies.
Corollary: there should be no XML or XML security dependencies or processing in C/C++ to eliminate that set of dependencies.
Corollary: logging would be limited to pre-existing options, such as Apache, syslog, and/or Windows event logging and would not use an additional library.
Ideally packaging other than perhaps Windows would be farmed out to other groups of people, with funding used to incentivize that. This is more viable if we can limit the frequency of updates, akin to the way many other Apache modules tend to be much more static.
The replacement needs to support at least Apache 2.4 and IIS 7+. Supporting older versions may be in question since it increases the amount of code needed.
Some degree of configuration compatibility would be nice. Since that requires supporting XML configuration files, that may imply "outsourcing" the processing of the configuration to the processing hub. That might not be a hugely disruptive change in certain respects and the system would be inoperable without that processing hub anyway.
Deployment of the processing hub needs to be as streamlined as possible, likely including embedding a web server to allow more of a stand-alone appliance feel. If Java, this would likely filter back to the IdP eventually, providing added benefits.
Mutual TLS with some standard trust management assumptions is sufficient to secure module/processing hub exchanges, allowing that one might architect other options if they don't violate the other requirements.
A single processing hub should be expected to service multiple, discrete deployments of the module operating with their own "local" configurations.
There would be no expected communication paths between the module(s) and any systems other than the processing hub. That is, direct communication for, e.g., the purposes of SAML artifact resolution (in either direction), SAML attribute queries, potentially future OIDC callbacks,etc. would be handled solely by the processing hub.
Conversely, the processing hub is not intended to become a gateway/proxy in its own right because we already have that now. If people want to replace the SP with other agent solutions, we already have support for that using the IdP along with all the other proxies people could choose. So, the hub is not intended to ever interact with user agents directly.
Strawman Proposal
Design a publically-documented web service API for required operations such as:
obtaining a configuration for the module (e.g., this might be managed by the hub or obtained by uploading a local configuration to process it into a consumable form)
producing discovery and SSO requests
validating and processing SSO responses into session data
logout
possibly consuming session recovery cookies?
???
The API would be a key deliverable to enable theoretically independent implementations of both halves of the system, which was a deliberate non-goal of the current design.
The web service format will need to be such that producing and parsing it doesn't violate the design constraints. JSON is of course an option but even that may pull in more code than would be preferred, depending on what Apache's APR library supports these days.
Done correctly, it may be possible to fully abstract this API away from SAML and allow for other protocols to be supported as long as they generally fit the same message exchange pattern.
Session management is an open question with this design. It's plausible to envision using the processing hub as a session store as well, which offloads much of the clustering responsibility. It's also possible that this is not the best idea because it adds a lot of load to the processing hub, potentially an impractical amount given high volume services and the difficulty of guaranteeing or synchronizing session cleanup.
The processing hub would not be tracking every invividual use of a session (this doesn't scale), so inactivity policy would be very limited with this design, with perhaps some kind of exit to allow someone who cared to plug in a solution that would allow that level of tracking. (This is a significant change from the design today, which does write through on every access to a session.)
The processing hub would presumably be built in Java using OpenSAML, other existing libraries, and possibly some portions of the IdP code (probably moved to a new shared library). It would probably be based on Spring and Spring Web Flow as the IdP is now. Much of the logic for this exists already in the IdP's proxying support.
Obviously the SAML keys used would be held at the hub, and would out of necessity (because of bugs in ADFS) have to allow for potentially many sets of keys.
Naming is interesting. The entityID(s) should really be a problem for the hub, but there will have to be some kind of identification of the modules connecting to it to tie to the proper SSO configuration, and the ApplicationOverride concept will have to be captured in some way.