Solutioning

Parking lot for various thoughts on design/implementation so I don’t lose them. “Shower thoughts” would be a good name for this I guess.

Tracking Headers to Clear

A nasty part of the original SP is the notion of having to know all the possible header variables that might be populated, independently of the specific attributes that are present in any given session, to prevent smuggling.

Currently shibd computes the set of headers from the attribute mapping layer, and adds in various “special” ones, and serves up the list in a special remoting call that the modules use to get the set. Because the modules don’t really know if it’s changed, and because there’s no complex signaling/interrupt mechanism to get it to realize that, I eventually made the attribute mappings non-reloadable by default to prevent them from changing without a conscious deployer step, and even then you’d get out of sync if you did things improperly. I never liked any of that and it won’t work for the new design anyway.

My thought is that this should be moved out of shibd and into the agents, and divorce the decoding shibd does to create IdPAttributes out of protocol claims and other data from the names of the headers used to decorate requests. The former is handled by shibd but the latter should be in the agents. I envision a simple mapping file of IdPAttribute ID to local variable names, essentially “renaming” them (or not) and enumerating all of the possible “controlled” variables in the agent so that it knows locally what to clear before the decoration step where they’re exported into the requests.

It would remain non-reloadable since the whole agent is I think not going to support any kind of dynamic configuration anyway, as Apache and IIS don’t generally do that as it is. Parsing a property file syntax isn’t fun but shouldn’t be that much code and is a simple enough format for people to understand. Most deployments of the SP are fairly static in this regard anyway.

Agents, Entities, Applications, Oh My

The configuration model as a whole is complicated by the desire to support joint use of the shibd plugin by multiple, independent web server agents protecting unrelated systems. That is, 10 agents protecting service.example.org is simple in comparison, but combining those 10 with another 10 protecting unrelated.example.org adds an additional “dimension” to the configuration and to the security model to think about. it is a “multi-personality” / virtualized way of looking at shibd that is not part of the SP to date, nor part of the IdP, so we have no precedence for it as a feature.

We have to account for:

  • Applications and application overrides that are unrelated to each other and currently not named uniquely.

  • The fact that each application may act as a static entityID or as a dynamically computed entityID that is derived from the virtual host or in the general case based on specific paths to content.

  • Settings per-profile and per-relying party (the eventual IdP in this case), including potentially the entityID itself.

  • The fact that in the future we want to support protocols like OpenID that have similar constructs to an entityID but that may not be the same value.

  • The desire to hide as much protocol machinery as possible from the agents in the first place.

Initially I considered rooting everything in an Application, but that falls apart due to naming (every SP deployment currently identifies its default Application as “default”). Then I moved toward rooting this in the entityID, with the Applications contained in that as a sort of multi-map, but that falls apart because of how the SP computes the entityID to use in a content-specific way, not to mention the long term need for other protocols.

It also doesn’t address the significant problem of limiting which objects a given set of related agents can actually make use of; that is, how does the system prevent one agent from associating itself with an entityID and Application(s) of its choosing, introducing risks. An example: if the plugin supports resolving additional data about the user that is only meant for use or to be seen by specific agents, other agents could impersonate them to get access to that data. There are likely more esoteric risks but that’s an obvious one.

My mental strawman is thinking about this from the perspective of how to identify and secure the agent access, which is an unrelated but eventually necessary problem to tackle for shibd to be shareable in the way people want. Not surprisingly, this looks a lot like dealing with secret-based OpenID clients in the manner I’ve been advocating for scaling out campus deployments via the unregistered client approach; that is, the agents need to be backed by a service account process that issues/manages the account secrets needed for the agents to authenticate themselves to the hub.

Putting the issues with secret-based authentication to the side (will cover that separately), if one imagines this is “enough” to verify an agent and its requests, then it follows we can associate the correct set of Applications to each agent identity. Thus we have a model:

  • Agent Identity, some form of local agent identifier unique to a deployment (this would be shared across a cluster, but could I suppose be a set of identities):

    • Application(s) identified by a non-unique ID specific to each agent identifier, likely cribbed directly from existing SP configs in many cases:

      • A RelyingPartyConfigurationResolver associated with an Application to provide default and per-IdP profile settings.

There’s a realization underlying this: agents don’t want to know about SAML or entityIDs and neither do their deployers. If we’re serious about offloading complexity, than we need to consider the fact that it should be the hub’s job to figure out what to name things in specific protocols to comport with the metadata the hub operator is willing to provide to IdPs.

This gets very weird when we think about cases in which there’s one operator of both parts. The shibd plugin is literally creating messages to be processed by its own host IdP, and it’s not impossible to imagine a design in which the internals of making SAML work are even “devolved” or “degenerated” into a self-referencing configuration that replaces the metadata-always design of the IdP with an alternative approach suited to the problem of a campus deployment that is literally only talking to itself. When that happens, does the protocol even matter?

The point is that it turns out we probably need to rethink the notion of entityIDs and client_ids in this implementation as solely a hub-side consideration. The agent connects to its Application definitions and it’s the job of the configuration to allow the hub to figure out how to make SAML or OpenID work for those applications in the ways preferred by that operator. Thus, not only is there probably not a “rule” applied to decide if an agent can “be” a particular entityID, but there probably isn’t a need to even allow agents to influence the value at all.

Compatibility

Leaning right now away from trying to support the V2/3 configuration files in any direct sense. This could change, but it’s very messy to deal with when most of it will be ignored, and there’s enough of a different model here that forcing an explicit conversion of the main file is probably warranted.

The most significant piece that crosses this boundary is the RequestMap and the intent would be to change the namespace but not much else, so copying it right out should usually work ok. So far the changes there include:

  • Namespace

  • Some older settings will be removed so would be ignored if left

  • s/entityID/authority to allow for genericity in terms

The other portions of the configuration are largely not relevant to the new agents, except for a small subset of basic information that should be much more concisely expressed using an updated syntax, so making that simpler seems like the more important goal than direct compatibility. This mainly includes session settings, the applications, and their handler locations (at least the base URL, if not the individual paths, not sure about that yet).

I am leaning toward using properties for everything possible. Using Apache commands has appeal of course, but it’s not the simplest code to support dozens of commands, and I think it will lead to more complexity than simpler alternatives might. Depends somewhat on tolerance for using log4j-style nested properties using dotted notations. I don’t mind them, but they may prove annoying for some. This would particularly apply to the application settings, which would have to be qualified by application ID.

Handlers

Handlers in the SP today essentially plug into the SP by registering against paths that live below the root handlerURL (/Shibboleth.sso). They are not exactly dissimilar to the idea of web flows mapped from particular paths below the SWF servlet path (/idp/profile)

The goal I think needs to be to centralize the mapping of handler paths to function and get that out of the agent and into the hub, where changing or adding to it will be much simpler and centralized than having to update every agent.

So instead of the current model, most of the handlers would be implemented generally such that any invocation of “<handlerURL>/some/path” would trigger a fixed remote flow invocation by the agent that tunnels up the request from the user agent and tunnels back the response out through the web server. The agent would not in general have specific understanding of what it was actually asking the hub to do. Possibly there would be signaling metadata in the response from the hub that would live alongside the HTTP response data so the agent could watch for specific indicators and know what to do, for example to store a new session after a SAML login. This is all very much TBD.

In the hub, we would then create a mapping of Path Info strings to flow IDs and implement the handler webflow as a driver that would run subflows that actually do most of the work needed in Java, such as issuing a SAML request or processing a SAML response, or any other protocols later.

Sketching the Request Flow

Working on the initial security and request processing model, given these assmptions:

  1. The request body is the serialization of a DDF either in my proposed form or later on as JSON. (Point being, not form encoded.)

  2. The WWW-Authenticate header would carry Basic auth for now, with the agent ID and a shared secret. Agent IDs thus can’t have colons, so would not be URLs but that was the plan anyway, hostnames are more likely and periods are ok. I did not see any length limitations inherent to basic-auth.

  3. The agent would under some conditions understand cookies and be able to store and return one.

The basis of the request webflows then:

  1. PRC gets created per usual as a pre-step by the flow, which then branches back into shared parent flow actions.

  2. Extract the WWW-Authenticate header based on known schemes, possibly just Basic and perhaps “something custom” at this point. I suspect custom types would present a challenge for some client libraries, so may be simpler to just leverage Basic with a dummy password if needed.

  3. Populate AgentRequestContext under PRC by resolving Agent based on the ID from the basic-auth header. For now, that’s our primary ID signal. Failure to resolve an Agent obviously fails.

  4. Check IP address of request against Agent’s allowed ranges.

  5. Check for Agent setting to determine requirement to authenticate (allowing for localhost deployments without the overhead).

    1. If needed, check for a record of previous cached authentication in the Java session. This contains the agent ID, client address, and the expiration.

      1. If the record is there and valid, the request is accepted for processing.

      2. If not, any existing record is cleared, and then the credentials from the basic-auth header have to be run through a CredentialValidator chain and the resulting Java Subject would carry the agent ID in a UsernamePrincipal, which is to be compared to the incoming value, and then the request is accepted for processing.

  6. Now we parse the body based on Content-Type to obtain the DDF, which is stored in the inbound MessageContext as the message.

  7. Most incoming requests would carry an application ID in a standard place, so this would be resolved against the Agent to obtain the Application to store in the context (or fail the request).

Some kind of MAC over parts of the request would be nice. OAuth 1.0 defines that, but it’s likely a bit overkill, OTOH it would at least be somewhat “known” as an approach. They define a way to carry all that in a WWW-Authenticate header so that may be attractive, but I don’t know how clean that would be for agents. Notably it can’t sign non-form bodies, and I’m not sure I want to worry about canonical form to make signing the body work.

The real value I think is that a MAC that covered the URL path and the host, and possibly somethiing from the TLS cert would achieve a weak kiind of channel binding so that if the server trusts that the agent isn’t going to create and send a signed blob if it can’t connect to a trusted server would be a form of guarantee that there was no MITM. It’s not full on channel binding because it relies on that agent behavior assumption without a shared value from the TLS session.

Indirecting HttpServletRequest/Response

To reuse a lot of the current code, particularly from OpenSAML, we have to have a means of indirecting access to the servlet layer into wrapper objects that source request data from and write response data to the DDF objects passed in from (or sent back to) the agent. This is the “tunnelling” aspect of the SP today, in which facades for HTTP request and response logic gets exposed on top of the DDF objects so that the code is relatively understandable but the data is being remoted across a socket interface. The difference here is the use of HTTP to remote the data but in other respects it’s similar.

The challenge here is that the DDF objects backing the wrapped interfaces are part of the request context tree/state, but the getHttpServletRequest/Response methods on the various profile and messaging APIs are parameterless. That was a mistake, but it was necessary for getting singletons to be able to access servlet interfaces since there wouldn’t be any request state to pass into APIs that are that deep in the system without every method up the stack taking a servlet interface.

Without changing the APIs up at the action/handler layer, we need a way to allow a new pair of special Suppliers to be given access to the request state in order to walk down the tree to where the DDF objects will be to wrap them. I toyed with a few ways, but the simplest (at least for ProfileAction) is I think going to be to define a new interface that allows us to set the ProfileRequestContext against the Suppliers (the suppliers obviously being implemented as prototypes, or with tricks like thread-local storage). That injection can be added to the OpenSAML AbstractProfileAction base class conditionally based on the new interface, so it won’t affect existing wiring, but would provide the PRC to the new suppliers written for the SP.

I don’t think this will work for the MessageDecoder API because the MessageContext doesn’t exist yet, so is not attached to the PRC yet. For the rest of the MessageHandler/Encoder steps, I think it probably can work.

Session Thoughts

My inclination is to look at confining “formal” session management to the agents and not the hub. This makes the hub more stateless. It does not preclude using the Java StorageService as a back-end because we can remote the various CRUD operations implemented by the storage API within the agent remoting protocol (already done). Thus, where applicable an agent could implement a Java StorageService-backed session cache if it wanted and our initial set probably will. That will move us to Java for JDBC and off the clearly non-viable-for-Linux ODBC option.

In either case, the data being managed needs to be split between the data the agent has to understand and use and the data that could be left opaque. An example of the latter would be NameID matching information for SAML logout. The opaque data could be attached and then passed back to the hub during operations that might require it and then reparsed there for use.

So I envision the hub knowing about sessions as objects to parse and understand but not actually implementing the prototypical cache of them natively.

Error Handling

Almost certainly we will drop all of the custom error handling in the SP in favor of web server native approaches. The SP would return error codes and where possible set internal variables up to carry extended data but customizing the pages would be up to Apache/IIS config, and whatever other agents do in the future. The existing machinery is overblown, barely used properly, and a source of a lot of bugs.

Identity Protocol Support and Chaining

I envision adapting some of the existing notions of “chaning” of handlers into the plugin somehow, as a means of stacking up protocols and allowing the agent to be oblivious to which protocol is used. Rather than implement this in an agent-aware way, as with the SessionInitiator chains in the SP today, there would instead be a package of options that the agent would need to be aware of to pick up at runtime from the RequestMap or from parameters, and then pass those in a “new login” request to the hub. The hub’s job would be to manufacture the appropriate response to the agent’s client to get the client over to an IdP with a suitable request message.

The SessionInitiator endpoint would be a master webflow that probably relies on subflows to initiate requests, allowing pluggability of new protocols, so this needs to be dynamically extensible out of the box. The hub deployer would probably control the precedence of these subflows being tried, but perhaps the agent could have a setting allowing it to be overridden.

Since each subflow would operate against the same input message, they would be able to pick and choose which options and settings to look for in the request to alter behavior, along with the usual relying-party- or metadata-driven approaches we have now.

A similar model could be used on the response side of this, but it may be necessary to stick to separate endpoints at the SP. This is really related to the question of how the handler model would actually work going forward, as I envision the agent being oblivious to what the handler actually is and leaving that up to the hub to recognize based on the path.