View Source

Overview

SAML 2.0 (but not 1.x) defines a kind of NameID called a "persistent" identifier, with a Format of urn:oasis:names:tc:SAML:2.0:nameid-format:persistent

The term "persistent" refers to the fact that it's not a per-session identifier, but a stable value; in addition, SAML persistent identifiers have very particular properties, notably they're intended to be opaque and "pairwise". The latter means that for the most part every SP receives a different value for the same user. The term "pairwise" is the more typical way of describing this whole concept today, so the Format constant is unfortunate and confusing.

You should consider carefully whether this kind of identifier meets your needs. They can be difficult to deal with, they work very poorly with a wide variety of applications, and they are often more trouble than they're worth. Be cautious, and don't commit to supporting something you don't want to have to support.

There were some internal code changes in V4 but the configuration is designed to be backward-compatible. The most significant change is that a public PairwiseIdStore interface now exists to support third party extension of the way these values are produced and managed if it becomes necessary.

The IdP has a couple of built-in implementations of this interface (called "strategies" in the configuration) to produce this kind of identifier. The strategy used is controlled with the idp.persistentId.generator property in saml-nameid.properties.

It's come to light that at least some (perhaps many, or even most) applications do not support case-sensitive handling of identifiers. This SAML format is explicitly defined to be case-sensitive, but it is much, much wiser not to rely on that. Older versions of the software generate identifiers with the Base64 encoding and this is much less safe, so if you're not already supporting identifiers produced by them, you would be wise to generate the values using a Base32 encoding, which is designed to support case-insensitive applications. New installs include a property explicitly set to produce Base32 values, but upgrades of older configurations will continue to use Base64 for compatibility reasons.

Enabling the Generator

To enable either approach, you will need to uncomment the generator bean in saml-nameid.xml for SAML 2 once you set the appropriate properties highlighted below.

SAML 2.0 Persistent NameID Generator

<util:list id="shibboleth.SAML2NameIDGenerators">
    <ref bean="shibboleth.SAML2TransientGenerator" />
 
    <ref bean="shibboleth.SAML2PersistentGenerator" />
</util:list>

There is no equivalent SAML 1 bean, as this is a SAML 2-only feature.

Supported Strategies

The default PairwiseIdStore implementation is a hash/digest-based approach called "Computed" that avoids the need for a database to store the IDs, but is incapable of reverse-mapping a given identifier (e.g., as part of a SAML attribute query), or revoking or changing the identifier associated with a subject. Tracking back to a subject for debugging purposes generally involves the use of audit logs rather than direct access to a mapping of users. It's not the best approach in the abstract, but it is much simpler to deploy.

To enable the Computed strategy, you must set additional properties:

idp.persistentId.sourceAttribute
- A list of attributes from which to derive a "source" key for the subject. The key is used as the hash input, and should be a very stable value for each subject and must never be reassigned later to a different subject. This should be a permanent serial number associated by an IDMS to each account, and not a name-based identifier like a login ID or email address. It should also be technology-neutral; using a GUID generated by an Active Directory is a very bad choice that will lead to problems if you ever change directories.
idp.persistentId.salt
- A secret string used as a default salt when hashing the subject key derived from the property above. This is required to prevent trivial attacks to determine the identifier for a given subject, and must be kept secret. Note that leading or trailing whitespace is not trimmed from the property, though using whitespace in the salt is not advisable.
idp.persistentId.encodedSalt
- If your salt value contains special characters that Spring won't accept safely, you can work around this by base64-encoding the salt you want to use, and specifying the encoded version in this property instead of the previous property. Do not set both.
idp.persistentId.encoding
- Controls the encoding of the generated hash value. Defaults to BASE64 if not set, but new installations set this property to BASE32 to produce values without mixed case.

The attribute used as the source key need not be released (in the sense of an attribute filter policy) to the SP.

Dynamic Salt ^4.3

V4.3 introduces the option to configure a dynamic salt source using a BiFunction (a Java function object that takes two inputs). If a bean named shibboleth.ComputedIdSaltLookupStrategy is defined, the function will override the value of the salt to use for a request, or if it returns null will suppress generation of a value entirely. This works in combination with the Sparse Override map desribed in the next section (i.e., the map overrides the BiFunction which overrides the default). It is also now legal to only define the BiFunction and omit the global salt if desired.

The bean is of type BiFunction<ProfileRequestContext,PairwiseId>

The intent is that the second argument will supply details about the request such as the subject and relying party, while the first provides full access to request state.

A trivial example that is equivalent to just using a global salt would be:

    <bean id="shibboleth.ComputedIdSaltLookupStrategy" parent="shibboleth.BiFunctions.Constant">
        <constructor-arg>
            <value>thisisasaltsalthisis</value>
        </constructor-arg>
    </bean>

Sparse Overrides

One of the disadvantages of strictly computing IDs is a loss of manageability of the values, particularly the ability to change a value should it become compromised. The IdP includes a feature allowing fine-grained override of the salt value used to generate IDs for specific users and/or relying parties, by means of a Java Map bean, which can be declared in saml-nameid.xml, and by default is named shibboleth.ComputedIdExceptionMap

The Java type of the object is a mouthful: Map<String,Map<String,String>> (i.e., it's a string-keyed map whose values are themselves maps). It's easier to grasp this in practice in the example below.

The primary keys are the names of subjects/users, or an asterisk (*) to signify a wildcard rule.

The values are maps of Relying Party names to salt values. These keys are the names of relying parties or an asterisk (*) as a wildcard, and the values are either a substitute salt string to use, or can be null to block the generation of an ID altogether.

One use for this feature is to maintain an old salt value for a legacy service while relying on a new value for everybody else:

Overriding salt for a single SP

<util:map id="shibboleth.ComputedIdExceptionMap">
	<entry key="*"> <!-- all users -->
		<map>
			<entry key="https://legacysp.example.org/sp" value="legacysalt" />
			<entry key="https://invalid.example.org/sp">
				<null/>	<!-- blocks generation of a value for this SP -->
			</entry>
		</map>
	</entry>
</util:map>

The alternative PairwiseIdStore generates random identifiers on first use and stores them in a database for future use. This has some benefits and addresses some of the limitations of the computed approach, but requires a highly available database accessible to every IdP node and is very difficult (bordering on impossible) to make reliable. Note that it is not possible to implement such a database using asynchronous/unreliable replication. This will lead to conflicts and race conditions, and eventually a risk of errors and duplicate entries. This is the main reason it isn't easy to get working, as most applications simply can't tolerate these kinds of conflicts easily.

The "vanilla" DDL needed for this approach is:

Stored ID Table Definition

CREATE TABLE shibpid (
	localEntity VARCHAR(255) NOT NULL,
	peerEntity VARCHAR(255) NOT NULL,
	persistentId VARCHAR(50) NOT NULL,
	principalName VARCHAR(50) NOT NULL,
	localId VARCHAR(50) NOT NULL,
	peerProvidedId VARCHAR(50) NULL,
	creationDate TIMESTAMP NOT NULL,
	deactivationDate TIMESTAMP NULL,
	PRIMARY KEY (localEntity, peerEntity, persistentId)
);

You will need to define the table above in your database, and you must define a primary key as shown above or the implementation will not function as intended. The absence of this constraint will normally be detected at startup time and prevent use of the mechanism.

Also ensure that the collation associated with the "localId" column is appropriate for use with the source attribute you specify. An inappropriate collation can render the attribute non-unique. In particular, it has been observed that a case-sensitive collation is needed if using the Active Directory objectSid as the source attribute, to ensure that persistent IDs are uniquely identified. "utf8_bin" has been found to work in this circumstance.

Using this strategy requires setting the properties described earlier, as well as some additional changes:

The idp.persistentId.generator property needs to be set to "shibboleth.StoredPersistentIdGenerator".
The idp.persistentId.dataSource property must be set to the name of a DataSource bean you must define. You can place it in saml-nameid.xml if you like (anywhere at the "top" level of the file).

A default feature of the stored strategy is that it uses the computed strategy to produce the initial identifier for each subject, to help with migration. If you don't need that to happen, you can set the idp.persistentId.computed property to an empty value and ignore that feature entirely, but it isn't a terrible idea to leverage this because it hedges your bets. If you find that the stored model is unworkable in practice, you may be able to easily convert back to the computed approach if all your values are compatible with it.

It's not a good idea to define a single shared DataSource bean between this feature and, for example, the JPA StorageService feature, even if you happen to use one database for both. The reason is that you don't want "non-essential" features like consent potentially interfering with the more essential use here. Separate DataSource beans will keep the pools of connections separate and prevent problems in one component from breaking the other.

Examples of each type of bean using an unspecified database and the DBCP2 pooling library (included with the IdP) follows. You will need to determine what driver class to plug into the bean definition for your database and the proper URL to use. Always use current drivers when possible; bug fixes for obscure problems tend to be frequent. When in doubt, grab a newer one.

Example persistent ID store beans in saml-nameid.xml

<!-- A DataSource bean suitable for use in the idp.persistentId.dataSource property. -->
<bean id="MyDataSource" class="org.apache.commons.dbcp2.BasicDataSource"
	p:driverClassName="com.example.database.Driver"
	p:url="jdbc:example://localhost/database"
	p:username="shibboleth"
	p:password="foo"
	p:maxIdle="5"
	p:maxWaitMillis="15000"
	p:testOnBorrow="true"
	p:validationQuery="select 1"
	p:validationQueryTimeout="5" />

<!-- A replacement bean suitable for use in the idp.persistentId.generator property. -->
<bean id="MyPersistentIdStore" parent="shibboleth.StoredPersistentIdGenerator"
	p:dataSource-ref="MyDataSource"
	p:queryTimeout="PT2S"
	p:retryableErrors="#{{'23000'}}" />

Advanced Customization

There are a few cases where more advanced customization of the stored approach may be required, and this is accomodated by defining your own custom bean that inherits from "shibboleth.StoredPersistentIdGenerator" and defines any additional bean properties required (see the JDBCPairwiseIdStore javadoc).

The option to define and reference your own bean rather than just supplying a plain DataSource is present to allow you to override the default table and column names used in the data store, the SQL queries used, the timeout, etc, but most of these settings are now accessible in V4.1 via simple Java properties and will not require a bean definition.

Reference

Properties defined in saml-nameid.properties to customize various aspects of persistent NameID generation behavior follow:

Property / Type / Default	Default	Function
idp.persistentId.generator Bean ID of a PairwiseIdStore	shibboleth.ComputedPersistentIdGenerator	Identifies the strategy plugin for sourcing persistent IDs
idp.persistentId.dataSource Bean ID of a JDBC DataSource		Identifies a data source for storage-based management of persistent IDs
idp.persistentId.computed Bean ID of a PairwiseIdStore	shibboleth.ComputedPersistentIdGenerator	May be null, Identifies a strategy plugin to use to generate the first persistent identifier for each subject, used to migrate from the computed to stored strategies
idp.persistentId.sourceAttribute Comma-delimited List		List of attributes to search for a value to uniquely identify the subject of a persistent identifier, it MUST be stable, long-lived, and non-reassignable
idp.persistentId.useUnfilteredAttributes Boolean	true	Whether or not the previous property has access to unreleased attributes
idp.persistentId.salt String		A secret salt for the hash when using computed persistent IDs
idp.persistentId.encodedSalt Base64-encoded String		An encoded form of the previous property
idp.persistentId.algorithm String	SHA	The hash algorithm used when using computed persistent IDs
idp.persistentId.encoding "BASE64" or "BASE32"	BASE64	The final encoding applied to the hash generated when using computed persistent IDs (BASE32 is strongly recommended for new installs)
idp.persistentId.exceptionMap Bean ID	shibboleth.ComputedIdExceptionMap	Advanced feature allowing revocation or regeneration of computed persistent IDs for specific subjects or services
idp.persistentId.queryTimeout^4.1 Duration	PT5S	Query timeout for database access
idp.persistentId.transactionRetries^4.1 Integer	3	Number of retries in the event database locking bugs cause retryable failures
idp.persistentId.retryableErrors^4.1 Comma-delimited list	23000,23505	List of error strings to identify as retryable failures
idp.persistentId.verifyDatabase^4.1 Boolean	true	When true, the connection and layout of the database is verified at bean initialization time and any failures are fatal.
idp.persistentId.tableName^4.1 String	"shibpid"	Overrides the name of the table in the database
idp.persistentId.localEntityColumn^4.1 String	"localEntity"	Overrides database column names
idp.persistentId.peerEntityColumn^4.1 String	"peerEntity"
idp.persistentId.principalNameColumn^4.1 String	"principalName"
idp.persistentId.sourceIdColumn^4.1 String	"localId"
idp.persistentId.persistentIdColumn^4.1 String	"persistentId"
idp.persistentId.peerProvidedIdColumn^4.1 String	"peerProvidedId"
idp.persistentId.createTimeColumn^4.1 String	"creationDate"
idp.persistentId.deactivationTimeColumn^4.1 String	"deactivationDate"

Beans defined in saml-nameid.xml and related system configuration are as follows:

Bean ID	Type	Function
shibboleth.SAML2PersistentGenerator	SAML2NameIDGenerator	Plugin for generating persistent identifiers using pluggable strategy
shibboleth.ComputedPersistentIdGenerator	ComputedPairwiseIdStore	Strategy plugin that generates persistent identifiers with a salted hash of an input value
shibboleth.ComputedIdSaltLookupStrategy ^4.3	BiFunction< ProfileRequestContext, PairwiseId >	Strategy function for obtaining salt dynamically
shibboleth.StoredPersistentIdGenerator	JDBCPairwiseIdStore	Strategy plugin that generates persistent identifiers and stores them in a database identified by a DataSource
shibboleth.JDBCPersistentIdStore	JDBCPairwiseIdStore	Legacy parent bean for defining a JDBC store for persistent identifiers with additional customization not supported by existing properties; this is largely for compatibility, and shibboleth.StoredPersistentIdGenerator should usually be used as a parent bean now