Solutioning

Parking lot for various thoughts on design/implementation so I don’t lose them. “Shower thoughts” would be a good name for this I guess.

Tracking Headers to Clear

A nasty part of the original SP is the notion of having to know all the possible header variables that might be populated, independently of the specific attributes that are present in any given session, to prevent smuggling.

Currently shibd computes the set of headers from the attribute mapping layer, and adds in various “special” ones, and serves up the list in a special remoting call that the modules use to get the set. Because the modules don’t really know if it’s changed, and because there’s no complex signaling/interrupt mechanism to get it to realize that, I eventually made the attribute mappings non-reloadable by default to prevent them from changing without a conscious deployer step, and even then you’d get out of sync if you did things improperly. I never liked any of that and it won’t work for the new design anyway.

My thought is that this should be moved out of shibd and into the agents, and divorce the decoding shibd does to create IdPAttributes out of protocol claims and other data from the names of the headers used to decorate requests. The former is handled by shibd but the latter should be in the agents. I envision a simple mapping file of IdPAttribute ID to local variable names, essentially “renaming” them (or not) and enumerating all of the possible “controlled” variables in the agent so that it knows locally what to clear before the decoration step where they’re exported into the requests.

It would remain non-reloadable since the whole agent is I think not going to support any kind of dynamic configuration anyway, as Apache and IIS don’t generally do that as it is. Parsing a property file syntax isn’t fun but shouldn’t be that much code and is a simple enough format for people to understand. Most deployments of the SP are fairly static in this regard anyway.

Agents, Entities, Applications, Oh My

The configuration model as a whole is complicated by the desire to support joint use of the shibd plugin by multiple, independent web server agents protecting unrelated systems. That is, 10 agents protecting service.example.org is simple in comparison, but combining those 10 with another 10 protecting unrelated.example.org adds an additional “dimension” to the configuration and to the security model to think about. it is a “multi-personality” / virtualized way of looking at shibd that is not part of the SP to date, nor part of the IdP, so we have no precedence for it as a feature.

We have to account for:

  • Applications and application overrides that are unrelated to each other and currently not named uniquely.

  • The fact that each application may act as a static entityID or as a dynamically computed entityID that is derived from the virtual host or in the general case based on specific paths to content.

  • Settings per-profile and per-relying party (the eventual IdP in this case), including potentially the entityID itself.

  • The fact that in the future we want to support protocols like OpenID that have similar constructs to an entityID but that may not be the same value.

  • The desire to hide as much protocol machinery as possible from the agents in the first place.

Initially I considered rooting everything in an Application, but that falls apart due to naming (every SP deployment currently identifies its default Application as “default”). Then I moved toward rooting this in the entityID, with the Applications contained in that as a sort of multi-map, but that falls apart because of how the SP computes the entityID to use in a content-specific way, not to mention the long term need for other protocols.

It also doesn’t address the significant problem of limiting which objects a given set of related agents can actually make use of; that is, how does the system prevent one agent from associating itself with an entityID and Application(s) of its choosing, introducing risks. An example: if the plugin supports resolving additional data about the user that is only meant for use or to be seen by specific agents, other agents could impersonate them to get access to that data. There are likely more esoteric risks but that’s an obvious one.

My mental strawman is thinking about this from the perspective of how to identify and secure the agent access, which is an unrelated but eventually necessary problem to tackle for shibd to be shareable in the way people want. Not surprisingly, this looks a lot like dealing with secret-based OpenID clients in the manner I’ve been advocating for scaling out campus deployments via the unregistered client approach; that is, the agents need to be backed by a service account process that issues/manages the account secrets needed for the agents to authenticate themselves to the hub.

Putting the issues with secret-based authentication to the side (will cover that separately), if one imagines this is “enough” to verify an agent and its requests, then it follows we can associate the correct set of Applications to each agent identity. Thus we have a model:

  • Agent Identity, some form of local agent identifier unique to a deployment (this would be shared across a cluster, but could I suppose be a set of identities):

    • Application(s) identified by a non-unique ID specific to each agent identifier, likely cribbed directly from existing SP configs in many cases:

      • A RelyingPartyConfigurationResolver associated with an Application to provide default and per-IdP profile settings.

There’s a realization underlying this: agents don’t want to know about SAML or entityIDs and neither do their deployers. If we’re serious about offloading complexity, then it should be the hub’s job to figure out what to name things in specific protocols to comport with the metadata the hub operator is willing to provide to IdPs.

This gets very weird when we think about cases in which there’s one operator of both parts. The shibd plugin is literally creating messages to be processed by its own host IdP, and it’s not impossible to imagine a design in which the internals of making SAML work are even “devolved” or “degenerated” into a self-referencing configuration that replaces the metadata-always design of the IdP with an alternative approach suited to the problem of a campus deployment that is literally only talking to itself. When that happens, does the protocol even matter?

The point is that it turns out we probably need to rethink the notion of entityIDs and client_ids in this implementation as solely a hub-side consideration. The agent connects to its Application definitions and it’s the job of the configuration to allow the hub to figure out how to make SAML or OpenID work for those applications in the ways preferred by that operator. Thus, not only is there probably not a “rule” applied to decide if an agent can “be” a particular entityID, but there probably isn’t a need to even allow agents to influence the value at all.

Compatibility

Leaning right now away from trying to support the V2/3 configuration files in any direct sense. This could change, but it’s very messy to deal with when most of it will be ignored, and there’s enough of a different model here that forcing an explicit conversion of the main file is probably warranted.

The most significant piece that crosses this boundary is the RequestMap and the intent would be to drop the namespace but not change much else, so copying it right out should usually work ok. So far the changes there include:

  • Namespace (it likely needs to go away due to how we may end up parsing it)

  • Some older settings will be removed so would be ignored if left

  • s/entityID/authority to allow for genericity in terms

The other portions of the configuration are largely not relevant to the new agents, except for a small subset of basic information that should be much more concisely expressed using an updated syntax, so making that simpler seems like the more important goal than direct compatibility. This mainly includes session settings, the applications, and their handler locations (at least the base URL, if not the individual paths, not sure about that yet).

I am leaning toward using properties for everything possible. Using Apache commands has appeal of course, but it’s not the simplest code to support dozens of commands, and I think it will lead to more complexity than simpler alternatives might. Depends somewhat on tolerance for using log4j-style nested properties using dotted notations. I don’t mind them, but they may prove annoying for some. This would particularly apply to the application settings, which would have to be qualified by application ID.

Handlers

Handlers in the SP today essentially plug into the SP by registering against paths that live below the root handlerURL (/Shibboleth.sso). They are not exactly dissimilar to the idea of web flows mapped from particular paths below the SWF servlet path (/idp/profile)

Originally I was steering toward moving a lot of the concepts aorund mapping paths to specific function to the hub, but I think now that the hub has implemented the two major ones (initiators and token consumers), this is starting to shift back to agents mapping paths to the specific hub functon to perform.

What I would like to change is the complexity associated with the handler mappings in conjunction with the original Application and ApplicationOverride model. Today, handlers technically can be configured per-Application. I would like to dump that in favor of defining paths for handlers globally to the agent, and if absolutely necessary consider ways to limit their use through handler configuration somehow, such as enabling/disabling them based on the active application ID (which is established via content setting as it is now). Most people should never have to do this or should reconsider it if they do, so the configuration should make the simple cases simpler to define and leave the complexity to the complex cases. Doing this could largely dispense with any need to replicate the current Application element hierarchy in the XML and avoid the need to replace it with anything else.

Sketching the Request Flow

Working on the initial security and request processing model, given these assmptions:

  1. The request body is the serialization of a DDF either in my proposed form or later on as JSON. (Point being, not form encoded.)

  2. The WWW-Authenticate header would carry Basic auth for now, with the agent ID and a shared secret. Agent IDs thus can’t have colons, so would not be URLs but that was the plan anyway, hostnames are more likely and periods are ok. I did not see any length limitations inherent to basic-auth.

  3. The agent would under some conditions understand cookies and be able to store and return one.

The basis of the request webflows then:

  1. PRC gets created per usual as a pre-step by the flow, which then branches back into shared parent flow actions.

  2. Extract the WWW-Authenticate header based on known schemes, possibly just Basic and perhaps “something custom” at this point. I suspect custom types would present a challenge for some client libraries, so may be simpler to just leverage Basic with a dummy password if needed.

  3. Populate AgentRequestContext under PRC by resolving Agent based on the ID from the basic-auth header. For now, that’s our primary ID signal. Failure to resolve an Agent obviously fails.

  4. Check IP address of request against Agent’s allowed ranges.

  5. Check for Agent setting to determine requirement to authenticate (allowing for localhost deployments without the overhead).

    1. If needed, check for a record of previous cached authentication in the Java session. This contains the agent ID, client address, and the expiration.

      1. If the record is there and valid, the request is accepted for processing.

      2. If not, any existing record is cleared, and then the credentials from the basic-auth header have to be run through a CredentialValidator chain and the resulting Java Subject would carry the agent ID in a UsernamePrincipal, which is to be compared to the incoming value, and then the request is accepted for processing.

  6. Now we parse the body based on Content-Type to obtain the DDF, which is stored in the inbound MessageContext as the message.

  7. Most incoming requests would carry an application ID in a standard place, so this would be resolved against the Agent to obtain the Application to store in the context (or fail the request).

Some kind of MAC over parts of the request would be nice. OAuth 1.0 defines that, but it’s likely a bit overkill, OTOH it would at least be somewhat “known” as an approach. They define a way to carry all that in a WWW-Authenticate header so that may be attractive, but I don’t know how clean that would be for agents. Notably it can’t sign non-form bodies, and I’m not sure I want to worry about canonical form to make signing the body work.

The real value I think is that a MAC that covered the URL path and the host, and possibly somethiing from the TLS cert would achieve a weak kiind of channel binding so that if the server trusts that the agent isn’t going to create and send a signed blob if it can’t connect to a trusted server would be a form of guarantee that there was no MITM. It’s not full on channel binding because it relies on that agent behavior assumption without a shared value from the TLS session.

Indirecting HttpServletRequest/Response

To reuse a lot of the current code, particularly from OpenSAML, we have to have a means of indirecting access to the servlet layer into wrapper objects that source request data from and write response data to the DDF objects passed in from (or sent back to) the agent. This is the “tunnelling” aspect of the SP today, in which facades for HTTP request and response logic gets exposed on top of the DDF objects so that the code is relatively understandable but the data is being remoted across a socket interface. The difference here is the use of HTTP to remote the data but in other respects it’s similar.

The challenge here is that the DDF objects backing the wrapped interfaces are part of the request context tree/state, but the getHttpServletRequest/Response methods on the various profile and messaging APIs are parameterless. That was a mistake, but it was necessary for getting singletons to be able to access servlet interfaces since there wouldn’t be any request state to pass into APIs that are that deep in the system without every method up the stack taking a servlet interface.

Without changing the APIs up at the action/handler layer, we need a way to allow a new pair of special Suppliers to be given access to the request state in order to walk down the tree to where the DDF objects will be to wrap them. The original design I tried didn’t work, so had to fall back to more thread-locals. A different pair of TLS-based suppliers is required, but it turns out we only need the tunnelling semantic in relatively few places because we don’t actually interact with the servlet layer all that much, at least in my work so far. I was able to get message encoding work by embedding the set up of the right TLS objects in a try/finally block around the use of the MessageEncoder, and I think decoding should work similarly.

We’ll likely hit more complex cases but I’ll cross that bridge later.

Session Thoughts

My inclination is to look at confining “formal” session management to the agents and not the hub. This makes the hub more stateless. It does not preclude using the Java StorageService as a back-end because we can remote the various CRUD operations implemented by the storage API within the agent remoting protocol (already done). Thus, where applicable an agent could implement a Java StorageService-backed session cache if it wanted and our initial set probably will. That will move us to Java for JDBC and off the clearly non-viable-for-Linux ODBC option.

In either case, the data being managed needs to be split between the data the agent has to understand and use and the data that could be left opaque. An example of the latter would be NameID matching information for SAML logout. The opaque data could be attached and then passed back to the hub during operations that might require it and then reparsed there for use.

So I envision the hub knowing about a subset of session data as objects to parse and understand but not actually implementing the prototypical cache of them natively.

For the cache itself, I am starting to lean towards hoping all the other Apache modules around are right and using the filesystem is “good enough” as a cross-process solution. If things are simplified to that degree, I think it may also be possible to relatively easily implement cross-server session migration by implementing an agent to agent call to fetch a session based on the cookie, then creating the copy of the file locally as needed. We could hopefully touch the files occasionally to get a coarse-grained timeout, which we already decided would be the right approach vs. the “update it on every request” fully consistent timeout behavior we have now. Just doesn’t need to be that exact.

Error Handling

Almost certainly we will drop all of the custom error handling in the SP in favor of web server native approaches. The SP would return error codes and where possible set internal variables up to carry extended data but customizing the pages would be up to Apache/IIS config, and whatever other agents do in the future. The existing machinery is overblown, barely used properly, and a source of a lot of bugs.

Regular Expressions

It appears C++ 11 has a native regular expression API, with a number of different dialects, so that’s almost certainly the API we would use for implementing the regex-based rules supported in the current SP. C++ 11 was likely the oldest version of the language under consideration (so any platform without C++ 11 support is not going to be in scope, period). We may move to something newer if that looks practical.

Identity Protocol Support and Chaining

I envision adapting some of the existing notions of “chaning” of handlers into the plugin somehow, as a means of stacking up protocols and allowing the agent to be oblivious to which protocol is used. Rather than implement this in an agent-aware way, as with the SessionInitiator chains in the SP today, there would instead be a package of options that the agent would need to be aware of to pick up at runtime from the RequestMap or from parameters, and then pass those in a “new login” request to the hub. The hub’s job would be to manufacture the appropriate response to the agent’s client to get the client over to an IdP with a suitable request message.

The SessionInitiator endpoint would be a master webflow that probably relies on subflows to initiate requests, allowing pluggability of new protocols, so this needs to be dynamically extensible out of the box. The hub deployer would probably control the precedence of these subflows being tried, but perhaps the agent could have a setting allowing it to be overridden.

Since each subflow would operate against the same input message, they would be able to pick and choose which options and settings to look for in the request to alter behavior, along with the usual relying-party- or metadata-driven approaches we have now.

A similar model could be used on the response side of this, and contrary to my original intention, I think we want to move towards having the agents manage the handlers/paths that they process and what functionality in the hub they map to. That allows the agent to construct the “right” self-referential URLs for response handling and just supply them to the hub to bake into requests (as in SAML where the ACS URL is typically in the request).

The advantage is that we would finally get to a model where a single response URL could be used for all protocols and bindings, and the hub can chain its response processor flows together so they can probe the buffered data to see if it’s applicable (e.g., HTTP method, parameters, etc.), allowing one endpoint to handle SAML and OIDC and all bindings.