Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The persistence layer is responsible for insulating various components that have to preserve data across multiple web requests from the specifics of data storage. This includes data associated with specific clients as well as data that has to be globally accessible when servicing all clients. Not all components requiring persistence necessarily use this layer if they have more specialized requirements that are not easy to abstract behind a common interface.

Table of Contents

Draft Proposal

The following technical requirements for the abstract API are suggested based on experience with the Service Provider's overlapping requirements:

  • String-Based API

    • Handle storing string and text data (blobs can be encoded as text), keeping serialization of objects separate.

    • One of the consequences of this is that aliasing has be implemented by hand by managing alternate indexes to information. For example, a secondary key B to an obect keyed by A would be stored as a mapping of the strings (B, A) so that B can be used to find A. If the mapping of B to A is not unique, then the value becomes a list requiring upkeep, and this can cause performance problems if the set of A is unbounded or large. If this is a common case, building in explicit (and thus more efficienct) secondary indexing may be worth considering.

  • Two-Part Keys

    • Supporting "partitions" or "contexts" makes it practical to share one instance of a storage back-end across different client components. Not such a big deal with database tables or in-memory storage, but very useful for options like memcache. Ultimately many back-ends will have to combine the keys, but that can be left to implementations to deal with.

  • Exposing Capabilities

    • Exposing back-end implementation capabilities such as maximum key size enables clients to intelligently query for them and adapt behavior. For example, some components might be able to truncate or hash keys while others might not. This might be something to enhance by adding pluggable strategy objects to shorten keys. Another aspect of variable behavior might be support for versioning, which a client-side storage option wouldn't handle (you can't conditionally set a cookie).

  • Internal Synchronization

    • All operations should be atomic to simplify callers.

  • Versioning

    • Attaching a simple incrementing version to records makes detecting collisions and resolving contention relatively simple without necessarily losing data. Callers can determine whether to ignore or reconcile conflicts. As noted, this may need to be an optionally supported feature.

  • TTLs

    • All records normally would get a TTL value to support cleanup. This wouldn't work for some use cases, so we probably need a permanent option (which again, might be negotiable).

At least in the SP, eventing or pub/sub has not been a requirement to date and I'd like to avoid it if we can, since it greatly limits the possible implementations.

...

As a first cut, the data involved is:

  • a unique ID, highly random (16 bytes)

  • representation of the user (ideally a canonical name) (256 bytes)

    • currently this is defined per service and allows us to attach things like the client address so that the resolver can use it

  • expiration based on time of last use (8 bytes)

  • nary authentication state (time, duration, method) (8 + 8 + 2 bytes)

  • nary service login records (entityID, method, NameID, SessionIndex) (256 + 2 + ? + 32 bytes)

    • method mainly serves here to drive attribute filters based on authentication method, can we toss this?

    • do we need time of login to a service?

Lookup of sessions is primarily by the unique ID, except when logout is involved. Then we need lookup by (entityID, NameID, SessionIndex?).

...