Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The persistence layer is responsible for insulating various components that have to preserve data across multiple web requests from the specifics of data storage. This includes data associated with specific clients as well as data that has to be globally accessible when servicing all clients. Not all components requiring persistence necessarily use this layer if they have more specialized requirements that are not easy to abstract behind a common interface.

Table of Contents

Draft Proposal

The following technical requirements for the abstract API are suggested based on experience with the Service Provider's overlapping requirements:

...

At least in the SP, eventing or pub/sub has not been a requirement to date and I'd like to avoid it if we can, since it greatly limits the possible implementations.

Use Cases

Replay Cache

Most identity protocols assume the use of nonces (usually via message IDs) to prevent replay attacks, though these checks are usually of low importance within the IdP. The more valuable capability is in detecting stale requests to prevent the browser from being trapped in a back-button / login loop. Because of the low security importance, an unreplicated in-memory storage service is usually sufficient. A passively replicated data store would also work well. Client-side storage is not an option, obviously.

Use of the storage API is straightforward; a context is used to isolate the namespace of possible values being checked and the value to check is the key. The value is irrelevant. The key size here can potentially exceed a desirable key size, though not in general, and hashing is sufficient to address that.

Artifact Store

The SAML artifact mechanism requires associating artifact message handles with assertions or messages. For SAML 1 artifacts to function, all servers responding to artifact lookup requests need access to the data store, making in-memory implementations suitable only for single-node systems. Replication would need to be rapid and reliable. For SAML 2 artifacts, it's possible to associate an artifact with a server URL. With additional work to deploy dedicated TLS-protected virtual hosts with unique names, it's possible to avoid a replicated artifact store. Normally every server in a cluster would be load-balanced behind one name and certificate, so this is much more complex to support, probably requiring additional addresses or ports. In either case, client-side storage is not an option.

The two-part key mechanism is irrelevant here because all artifacts are unique by themselves. The message handle is the key, and the serialized message is the value. The key size here can potentially exceed a desirable key size, though not in general, and hashing is sufficient to address that. The value is a potentially non-trivial message on the order of 10k in size.

Terms of Use

No experience with this use case, but I would speculate that this is associating some kind of local user identity with an identifier representing some kind of ToU. I would imagine a ToU could contain parameterized sections or require user input that would need to be preserved, and that would be a simple matter of storing a more complex object produced by a particular ToU module. I could imagine needing a TTL for this data for ToU that have to be renewed periodically, but permanence might also be needed.

Server-side storage here seems awkward without replication, since a user wouldn't understand why he/she was being prompted again. Client-side storage is possible but also quite awkward due to multiple devices. Also seems like a bad thing to eat into our exceedingly limited cookie space. This could be a use case for Web Storage.

Need to investigate existing uApprove code to see what's being stored.

Technology considerations seem similar to the Terms of Use case, only moreso. No way anything more than a global yes/no fits into a cookie, but Web Storage is a possibility if the extra prompting from multiple devices isn't a concern.

Session Store

We need some form of persistence for user sessions to support SSO, and features like logout depend on what we store and how we store it. This is a primary use case for client-side storage, but also a difficult one because of size limitations, particularly if logout is involved. This is a likely candidate for storing some kind of structured data as a blob but unlike the SP, sessions shouldn't need to be arbitrarily extensible.

...

Another problem here is that the current server-side design allows us to make data available to the resolver about the user or the client extensibly via Java subject/principal objects. Moving that to the client creates problems with attribute queries, and it's been bad in the past to support functionality that only works with push, it breaks the symmetry and consistency of the resolver's behavior in different flows. This may be another opportunity to push advanced needs to server-side storage.

Possible Implementations

In-Memory

Not much to say, this is obviously straightforward.

Memcache

There's an existing implementation of the V2 session cache, and a version of the SP interface, which leads me to assume this should be possible. What isn't clear to me is the point of it. I know memcache's value as a cache, but this is a storage layer, not a cache. Unless the service were deployed separately from any IdP node, there would be no simple way to take down the server with the memcache daemon. With a single point of failure like that, a database seems like a much better choice. Probably this is another case where non-persistent state and true persistence lead to different back-ends.

JDBC

Clearly possible, and the SP has an ODBC implementation. JDBC should be a straightforward port even without optimizing it.

Cookies

Supporting cookies is principally a size problem. Full portability means limiting total cookie usage to 4k for the whole IdP, and we probably lose 25% to securing the data. Chunking is probably a waste of time unless we want to target browsers without the tiny domain-wide limit. Opera's probably practical to treat exceptionally, but I don't think Safari is.

...

Versioning wouldn't be easy here since different nodes could both update and write back the same cookie, but I suppose one could have some kind of server-side synchronization of updates to the information such that it's in a consistent state before a cookie gets written back. Seems like a lot of work and hard to manage, and I would guess that the use cases for using the cookie for storage could live without versioning.

Web Storage

Web storage has much better capacity then cookies, but when you dig into it, it is a very poor solution for storing data generated and manipulated by the server. It's totally targeted at client-side application logic. The only way the data gets to the server is via a JavaScript-triggered post operation to the server, which is an awkward thing to do. The overall implementation looks very awkward because even writing back data involves being able to inject JavaScript into a page at an appropriate time.

...