SAML 2.0 (but not SAML 1.x) defines a kind of NameID called a "persistent" identifier, with a Format of urn:oasis:names:tc:SAML:2.0:nameid-format:persistent

The term "persistent" refers to the fact that it's not a per-session identifier, but a stable value; in addition, SAML persistent identifiers have very particular properties, notably they're intended to be opaque and "pairwise". The latter means that for the most part every SP receives a different value for the same user.

You should consider carefully whether this kind of identifier meets your needs. They can be difficult to deal with, they work very poorly with a wide variety of applications, and they are often more trouble than they're worth. Be cautious, and don't commit to supporting something you don't want to have to support.

The IdP has a couple of built-in methods (called "strategies" in the configuration) to produce this kind of identifier. The strategy used is controlled with the idp.persistentId.generator property in saml-nameid.properties.

It's come to light that at least some (perhaps many, or even most) applications do not support case-sensitive handling of identifiers. This SAML format is explicitly defined to be case-sensitive, but it is much, much wiser not to expect that. Older versions of the software generate identifiers with the Base64 encoding and this is much less safe, so if you're not already supporting identifiers produced by them, you would be wise to generate the values using a Base32 encoding, which is designed to support case-insensitive applications. New installs include a property explicitly set to produce Base32 values, but upgrades will continue to use Base64 for compatibility reasons.

Enabling the Generator

To enable either approach, you will need to uncomment the generator bean in saml-nameid.xml for SAML 2 once you set the appropriate properties highlighted below.

SAML 2.0 Persistent NameID Generator

<util:list id="shibboleth.SAML2NameIDGenerators">
    <ref bean="shibboleth.SAML2TransientGenerator" />
 
    <ref bean="shibboleth.SAML2PersistentGenerator" />
</util:list>

There is no equivalent SAML 1 bean, as this is a SAML 2-only feature.

Computed IDs

The default strategy is a hash-based approach called "Computed" that avoids the need for a database to store the IDs, but is incapable of reverse-mapping a given identifier (e.g., as part of a SAML attribute query), or revoking or changing the identifier associated with a subject. Tracking back to a subject for debugging purposes generally involves the use of audit logs rather than direct access to a mapping of users. It's not the best approach in the abstract, but it is much simpler to deploy.

To enable the Computed strategy, you must set additional properties:

idp.persistentId.sourceAttribute
- A list of attributes from which to derive a "source" key for the subject. The key is used as the hash input, and should be a very stable value for each subject and must never be reassigned later to a different subject. This should be a permanent serial number associated by an IDMS to each account, and not a name-based identifier like a login ID or email address. It should also be technology-neutral; using a GUID generated by an Active Directory is a very bad choice that will lead to problems if you ever change directories.
idp.persistentId.salt
- A secret string used as a salt when hashing the subject key derived from the property above. This is required to prevent trivial attacks to determine the identifier for a given subject, and must be kept secret. Note that leading or trailing whitespace is not trimmed from the property, though using whitespace in the salt is not advisable.
idp.persistentId.encodedSalt^3.3
- If your salt value contains special characters that Spring won't accept safely, you can work around this by base64-encoding the salt you want to use, and specifying the encoded version in this property instead of the previous property. Do not set both.
idp.persistentId.encoding ^3.3.2
- Controls the encoding of the generated hash value. Defaults to BASE64 if not set, but new installations will set this property to BASE32 to produce values without mixed case.

As of V3.2.0, the attribute used as the source key need not be released (in the sense of an attribute filter policy) to the SP. In older versions, you can work around this limitation without disclosing more information by releasing an attribute that has no attribute encoders attached.

Sparse Overrides ^3.4

One of the disadvantages of strictly computing IDs is a loss of manageability of the values, particularly the ability to change a value should it become compromised. V3.4 adds a feature allowing fine-grained override of the salt value used to generate IDs for specific users and/or relying parties, by means of a Java Map bean, which can be declared in saml-nameid.xml, and by default is named shibboleth.ComputedIdExceptionMap

The type of the object is Map<String,Map<String,String>> (i.e., it's a string-keyed map whose values are themselves maps).

The primary keys are the names of subjects/users, or an asterisk (*) to signify a wildcard rule.

The values are maps of Relying Party names to salt values. These keys are the names of relying parties or an asterisk (*) as a wildcard, and the values are either a substitute salt string to use, or can be null to block the generation of an ID altogether.

One use for this feature is to maintain an old salt value for a legacy service while relying on a new value for everybody else:

Overriding salt for a single SP

<util:map id="shibboleth.ComputedIdExceptionMap">
	<entry key="*"> <!-- all users -->
		<map>
			<entry key="https://legacysp.example.org/sp" value="legacysalt" />
		</map>
	</entry>
</util:map>

Stored IDs

Using V3.2.0 and Later

The alternative strategy is to generate random identifiers on first use and store them in a database for future use. This has some benefits and addresses some of the limitations of the computed strategy, but requires a highly available database accessible to every IdP node and is very difficult (bordering on impossible) to make reliable. Note that it is not possible to implement such a database using asynchronous/unreliable replication. This will lead to conflicts and race conditions, and eventually a risk of errors and duplicate entries. This is the main reason it isn't easy to get working, as most applications simply tolerate these kinds of conflicts.

The "vanilla" DDL needed for this approach is:

Stored ID Table Definition

CREATE TABLE shibpid (
	localEntity VARCHAR(255) NOT NULL,
	peerEntity VARCHAR(255) NOT NULL,
	persistentId VARCHAR(50) NOT NULL,
	principalName VARCHAR(50) NOT NULL,
	localId VARCHAR(50) NOT NULL,
	peerProvidedId VARCHAR(50) NULL,
	creationDate TIMESTAMP NOT NULL,
	deactivationDate TIMESTAMP NULL,
	PRIMARY KEY (localEntity, peerEntity, persistentId)
);

You will need to define the table above in your database, and you must define a primary key as shown above or the implementation will not function as intended. The absence of this constraint will normally be detected at startup time and prevent use of the mechanism.

Also ensure that the collation associated with the "localId" column is appropriate for use with the source attribute you specified above. An inappropriate collation can render the attribute non-unique. In particular, it has been observed that a case-sensitive collation is needed if using the Active Directory objectSid as the source attribute, to ensure that persistent IDs are uniquely identified. "utf8_bin" has been found to work in this circumstance.

Using this strategy requires setting the properties above as well as some additional changes:

The idp.persistentId.generator property needs to be set to "shibboleth.StoredPersistentIdGenerator".
Either the idp.persistentId.dataSource^3.2 or idp.persistentId.store properties must be set to the name of a bean you must define. You can place it in saml-nameid.xml if you like (anywhere at the "top" level of the file). The former property is used to specify a JDBC DataSource object to use for storage, with the rest of the settings defaulted. If you want to override some of the settings available, latter property can be used to point to a bean that inherits from a parent bean named "shibboleth.JDBCPersistentIdStore", as shown below.

A default feature of the stored strategy is that it uses the computed strategy to produce the initial identifier for each subject, to help with migration. If you don't need that to happen, you can set the idp.persistentId.computed property to an empty value and ignore that feature entirely. This is recommended for anybody not already supporting identifiers produced with the other strategy.

It's not a good idea to define a single shared DataSource bean between this feature and, for example, the JPA StorageService feature, even if you happen to use one database for both. The reason is that you don't want "non-essential" features like consent potentially interfering with the more essential use here. Separate DataSource beans will keep the pools of connections separate and prevent problems in one component from breaking the other.

Examples of each type of bean using an unspecified database and the DBCP2 pooling library (not provided with the IdP) follows. You will need to determine what driver class to plug into the bean definition for your database and the proper URL to use. Always use current drivers when possible; bug fixes for obscure problems tend to be frequent. When in doubt, grab a newer one.

Example persistent ID store beans in saml-nameid.xml

<!-- A DataSource bean suitable for use in the idp.persistentId.dataSource property. -->
<bean id="MyDataSource" class="org.apache.commons.dbcp2.BasicDataSource"
	p:driverClassName="com.example.database.Driver"
	p:url="jdbc:example://localhost/database"
	p:username="shibboleth"
	p:password="foo"
	p:maxIdle="5"
	p:maxWaitMillis="15000"
	p:testOnBorrow="true"
	p:validationQuery="select 1"
	p:validationQueryTimeout="5" />

<!-- A "store" bean suitable for use in the idp.persistentId.store property. -->
<bean id="MyPersistentIdStore" parent="shibboleth.JDBCPersistentIdStore"
	p:dataSource-ref="MyDataSource"
	p:queryTimeout="PT2S"
	p:retryableErrors="#{{'23000'}}" />

The option to define and reference a "store" rather than a plain DataSource is present to allow you to override the default table and column names used in the data store, the SQL queries used, the timeout, etc.

The most common settings to override are the queryTimeout, and to provide a list of retryableErrors that are reported when a database improperly fails to prevent a duplicate insert. If this happens, the log will warn you and tell you what the error code was, and then you can add it to the configuration to prevent that problem in the future. (It also means your database fails to implement locking properly, and the retry mechanism is a workaround for that bug.)

Migrating from Older Versions

The documentation above reflects improvements and fixes associated with issue IDP-829. While your older IdP configuration itself is compatible and will automatically update the strategy plugin to the fixed version, unfortunately your database table is not fully compatible because it probably lacks a primary key constraint that prevents the duplicate record bug in that issue from happening.

You can first attempt to create the primary key directly against the old table on the three columns indicated above (localEntity, peerEntity, persistentId). If that doesn't work, then you have a couple of options:

Pull the data from existing table, reordering the data in a select statement, and then load the data back into the recreated table. That may not be possible with some dump utilities and may require more generic tools or use of a command line SQL client and some scripting.
Try and create a non-primary-key uniqueness constraint against the (localEntity, peerEntity, persistentId) column triplet if your database allows that.

To avoid failure after upgrading, your older configuration will not outright fail if the constraint is not in place, but you should find warnings in the log about this. If you upgrade your Spring bean configuration to match the approaches outlined in the previous section, it will fail to run successfully by default without the constraint on the table. The intent is to fail early on new installs, while warning on upgrades.

Using Older Versions

If you're running a version prior to V3.2.0, it's strongly advisable to upgrade if you want to make use of the Stored ID strategy. If that's not possible, be aware that you will be subject to the bug in IDP-829. If you are not already using this mechanism, be sure to define your table as described above. This should be compatible with the older broken code to get you by until you upgrade.

The configuration procedure is essentially the same, except that you have to set the idp.persistentId.store property (the newer dataSource property is not supported) and it has to be set to a bean of the now-deprecated JDBCPersistentIdStore class:

Pre-3.2 Example ID Store in saml-nameid.xml

<bean id="MyPersistentIdStore" class="net.shibboleth.idp.saml.nameid.impl.JDBCPersistentIdStore">
    <property name="dataSource">
        <bean class="org.apache.commons.dbcp2.BasicDataSource"
			p:driverClassName="com.example.database.Driver"
			p:url="jdbc:example://localhost/database"
            p:username="shibboleth"
            p:password="foo"
            p:maxIdle="5"
            p:maxWaitMillis="15000"
            p:testOnBorrow="true"
            p:validationQuery="select 1"
            p:validationQueryTimeout="5" />
    </property>
</bean>

After you upgrade, it's strongly advisable to adjust your configuration to match the newer documentation.