SP Service Architecture

DRAFT

Overview

Right now the design for the service is strongly modeled on the IdP, partly because it’s rock-solid stable and secondarily to allow reuse of as much code as possible. The major difference is that this is not, contrary to expectations, a web application. The main reason for this is that introducing HTTP as a transport, while likely making future agents somewhat easier to develop, greatly complicates the re-development of the C++ agents we have now. HTTP is also quite over-complicated for the requirements of this scenario. On the down side, that requires that the service be developed as a stand-alone Spring application, which will create some additional complexity. For now, Spring Boot isn’t a candidate for this, but we could always introduce it at a later stage if it were beneficial. I could imagine us going back to the use of HTTP for other agents as an alternative transport and just embedding the Spring Integration gateway if that actually works.

The root Spring context, as with the IdP, contains most of the “infrastructure” beans that provide low level support, and is not designed to be reloadable. This is primarily because much of the customization for these beans comes from properties, which are not themselves reloadable. The root context also contains the Spring Integration objects that provide networking and messaging support, currently limited to TCP. The majority of the rest of the architecture will hopefully be made up of a variety of reloadable Spring service contexts, much as in the IdP.

The first “new” such service is one that exposes the net.shibboleth.sp.remoting.EndpointManager interface to the Spring Integration layer to identify the possible recipients of messages. Reloading it will refresh the possible message destinations.

Many of the other services should be identical to those in the IdP, such as the metadata and attribute resolver/registry/filter services. Unlike the IdP, it is a possible goal initially to support multiple instances of these services in order to adapt the SP’s current behavior. While most (and preferably nearly all) SPs define default configurations for metadata and attribute handling that are reused even when application overrides are used, there are deployers that insist on isolating these configurations. Right now, we’re leaning toward not explicitly pushing towards that kind of isolation, and instead encouraging a single instance of each service, with behavior tailored through activation conditions that would trigger around which application was active. This should be simpler to configure and more efficient most of the time.

Applications

To accomodate this design, the proposed (now being prototyped) architecture is to layer the concept of an “Application” into the remoting components as the destination of a message. The current SP typically, though not universally, identifies the application involved with an operation by embedding an “application” member into the input data. This revised proposal is to promote that value to the top level as the name of the input object, targeting the Application object that should receive the request. The Application(s) would be auto-deployed into the EndpointManager alongside any other objects implementing the net.shibboleth.sp.remoting.Endpoint interface (e.g. status/monitoring). Applications could then be hot-deployed through reload of the EndpointManager service.

Internally, the Application objects would dispatch messages to another set of auto-wired components that perform work for the SP agents. Agents would map requests to Applications in some fashion similarly to today, create a message targeted for that Application, and then embed the additional message inputs in a manner yet to be finalized (currently an embedded “this” member that is intended to reflect targeting a message to an object). The abstraction layer of the Application then invokes components that implement the net.shibboleth.sp.remoting.ApplicationEndpoint interface by auto-injecting the Application as the first parameter of the method call, providing access to the “current” Application involved in the request. This automates the “convention” used in the current code with more consistency.

To achieve the separation that the “containment” relationship provides now between Applications and components like metadata and attribute handling, the various reloadable services for these functions can be injected into the Application objects with Spring. Applicatons would then be free to reuse (or not) common instances/configurations of these services as required, but normally would do so.

As it stands now, it’s TBD whether a given instance of this service should host only Applications deployed together in a virtual web environment (i.e., not necessarily one physical server, but one logical server). I lean toward “no” and opening this up to simply viewing an Application as the top level concept without regard for where it lives. It’s not clear yet whether some of the logical URL details that are needed for things like endpoint computation and destination enforcement will actually require that the Java service be aware of these mappings. It’s plausible that any URL details might be fed in from the agent and simply trusted by the service as it goes about its work. Of course, sharing many different Applications inside one service instance obviously has implications for performance/scale, and for security.

Addressing Applications

Assuming we lump “many” SP’s Applications into one service instance, identifying them starts to overlap with the problem of porting over the configuration and how entityIDs or client_ids are assigned to systems. Even if we wanted to make that assignment purely a service-side issue, it wouldn’t allow the service to cleanly identify the agents in a shared scenario, and there’s no identifier for the Applications that would be unique in the current model. An obvious thing to do would be to combine the current entityID and applicationId together to build a unique value, but the SP’s multi-hosting features might make that a challenge if there are e.g. entityIDs being calculated based on vhosts. It’s plausible that at least for compatibility with the old configuration we would just use the “default” entityID to name the agent, and come up with something more abstracted (agent ID?) if a new, simpler configuration were used.

Security

The lack of security in the current remoting layer was always a very deliberate choice because security is hard, true networking greatly reduces performance, and it adds dependencies. The redesign’s focus is on eliminating all dependencies, so adding security formally isn’t likely in the cards. However, we envision adding TLS support to the TCP gateway, and the use of stunnel is a plausible solution to secure the traffic if desired, which would be critical if different agents share an instance of the service.

Additionally, it’s possible that some kind of simple key or secret authorization strategy could be employed to limit the agents that can send requests targeted at a particular Application.

Agent / Service Interaction

We can’t strictly require the service to remember state about the agents. That is, it’s not going to fly to have the agents connect to the service to supply information about themselves and then “remember” that information so it can be applied later. That won’t work without persistence on the service we don’t want (it starts to look like OpenID Dynamic Client Registration, no thank you) or without creating problems if the service restarts, and the agent doesn’t realize it has to re-initialize things. We could build in some kind of retry model where a remote call fails and signals that initialization is required, but it would be preferable if we can break apart enough of the settings the agent might need to supply to the service so that the ones that matter for a request are just supplied with the request. If the overhead gets too severe, we can consider the retry model.

Regardless, the agent likely will be stateful with respect to the service in some sense, by requiring the agent to initially connect to the service to both verify it’s available and ensure it is recognized/authorized by the service (possibly minimally so, i.e., it can connect so it’s accepted). At the same time, this provides the obvious hook to feed in the legacy configuration, or a new XML format surrounding primarily the RequestMapper, and have the service parse it and return the processed results for the agent. This is clearly the first obvious “not a handler” remoted operation.

Many of the other “not a handler” operations will probably be centered on session handling, asking to recover a session. Plausibly, a request to the agent that does not self-identify a session already in the agent’s local cache (whatever that entails) would lead to a request to either return a session, or (if a flag signaled that one was required) ask the service to respond with a new login request message (i.e., return the HTTP response body or redirect to make one happen). That implies that any configuration settings influencing the generation of that request would have to be known by the service or provided by the agent with the request. That kind of hurts, since it turns a simple “get session” call into a much larger message “just in case” but it seems unavoidable because the SP definitely was steered toward using content settings to influence login requests, but ideally the overhead won’t be too bad and could, I suppose, even be optional. Some kind of “I’m using advanced features like this” flag could change the agent’s behavior in some way.