This article describes a semi-automatic process for managing untrusted SAML metadata using a Shibboleth LocalDynamicMetadataProvider and a complementary set of command-line tools.

First configure a Shibboleth LocalDynamicMetadataProvider. In particular, configure a sourceDirectory as a local repository of metadata. The latter is referred to as $sourceDirectory in the code fragments below.

Install the SAML Library of command-line tools. Note that BIN_DIR and LIB_DIR are environment variables created during the installation process. These environment variables are used repeatedly in the code fragments below.

Identify a metadata source location to be managed. Perform the following sequence of steps for each metadata source location:

  1. Prime the cache with a copy of the metadata
  2. Filter the metadata into the source directory of the LocalDynamicMetadataProvider
  3. Check the metadata on the server
  4. If the metadata on the server is different than the metadata in cache, investigate the differences
  5. If the differences are acceptable, update the cache with fresh metadata
  6. Filter the metadata into the source directory of the LocalDynamicMetadataProvider
  7. Go to step 3

The following examples illustrate the basic process.

Example 1: IRBManager

We start with a relatively simple example of remote metadata:

https://shibboleth.irbmanager.com/metadata.xml

If you trust the SP owner to do the Right Thing, and the reliance on commercial TLS is not a concern, configure a Shibboleth FileBackedHTTPMetadataProvider to refresh the metadata at least daily:

<MetadataProvider id="IRBManager" xsi:type="FileBackedHTTPMetadataProvider" 
    metadataURL="https://shibboleth.irbmanager.com/metadata.xml" 
    backingFile="%{idp.home}/metadata/IRBManager.xml" maxRefreshDelay="P1D">

    <!-- filter all but the listed entity -->
    <MetadataFilter xsi:type="Predicate" direction="include">
        <Entity>https://shibboleth.irbmanager.com/</Entity>
    </MetadataFilter>

</MetadataProvider>

If, OTOH, security and/or interoperability are a concern, manage the metadata as illustrated below.

Given the HTTP location of the metadata to be managed, and the source directory of a Shibboleth LocalDynamicMetadataProvider, initialize both the cache and the source directory as follows:

# Steps 1 and 2
$ md_location=https://shibboleth.irbmanager.com/metadata.xml
$ $BIN_DIR/md_refresh.bash $md_location \
    | $BIN_DIR/md_tee.bash $sourceDirectory \
    > /dev/null

Presumably the following command is executed some time later, after the metadata resource has been modified on the server:

# Step 3
$ $BIN_DIR/http_cache_check.bash $md_location && echo "cache is up-to-date" || echo "cache is dirty"
cache is dirty

If the cache is dirty, manually inspect the differences between the metadata on the server and the metadata in the cache:

# Step 4
$ $BIN_DIR/http_cache_diff.bash $md_location

If the differences are acceptable, update both the cache and the source directory with the new metadata:

# Steps 5 and 6
# force a metadata refresh
$ $BIN_DIR/md_refresh.bash -F $md_location \
    | $BIN_DIR/md_tee.bash $sourceDirectory \
    > /dev/null

To semi-automate the above process, implement a cron job that executes the command in step 3:

#!/bin/bash

# environment variables
# (also export TMPDIR if it doesn’t already exist)
export BIN_DIR=/tmp/bin
export LIB_DIR=/tmp/lib
export CACHE_DIR=/tmp/http_cache
export LOG_FILE=/tmp/bash_log.txt

# the name of this script
script_name=${0##*/}

# specify the HTTP resource
location=https://shibboleth.irbmanager.com/metadata.xml

# check the cache against the server
$BIN_DIR/http_cache_check.bash $location >&2
status_code=$?
if [ $status_code -eq 1 ]; then
	echo "WARN: $script_name: cache is NOT up-to-date for resource: $location" >&2
elif [ $status_code -gt 1 ]; then
	echo "ERROR: $script_name: http_cache_check.bash failed ($status_code) on location: $location" >&2
fi

exit $status_code

Example 2: Amazon Web Services

The AWS documentation entitled How to Use Shibboleth for Single Sign-On to the AWS Management Console shows how to use a FileBackedHTTPMetadataProvider to consume AWS metadata. What the documentation doesn't say, however, is that the AWS server does not support HTTP conditional requests, so every time the metadata provider runs, it loads fresh metadata even if the metadata has not changed on the server.

Moreover, the NameIDFormat elements in AWS metadata are bogus. The elements must be removed from metadata in order for the integration to be successful. Since AWS metadata includes a @validUntil attribute, downloading a static copy of the metadata is not advisable, however.

https://signin.aws.amazon.com/static/saml-metadata.xml

As in the previous example, initialize both the cache and the source directory, but this time filter the NameIDFormat elements from the metadata before copying to the source directory:

# Steps 1 and 2
$ md_location=https://signin.aws.amazon.com/static/saml-metadata.xml
# log a warning if the metadata will expire within 5 days
$ $BIN_DIR/md_refresh.bash $md_location \
   | $BIN_DIR/md_require_valid_metadata.bash -E P5D \
   | /usr/bin/xsltproc $LIB_DIR/remove_NameIDFormat.xsl - \
   | $BIN_DIR/md_tee.bash $sourceDirectory \
   > /dev/null

Since the server does not support HTTP Conditional GET, the tool used in the previous example (http_cache_check.bash) will not work. Here we use a diff-like tool that compares the file on the server to the cached file byte-by-byte:

# Step 3
$ $BIN_DIR/http_cache_diff.bash -Q $md_location && echo "cache is up-to-date" || echo "cache is dirty"
cache is dirty

Manually inspect the differences between the metadata on the server and the metadata in the cache:

# Step 4
$ $BIN_DIR/http_cache_diff.bash $md_location

If the new metadata is acceptable, update both the cache and the source directory with the new metadata:

# Steps 5 and 6
# force a metadata refresh
$ $BIN_DIR/md_refresh.bash -F $md_location \
   | $BIN_DIR/md_require_valid_metadata.bash -E P5D \
   | /usr/bin/xsltproc $LIB_DIR/remove_NameIDFormat.xsl - \
   | $BIN_DIR/md_tee.bash $sourceDirectory \
   > /dev/null

To semi-automate the above process, implement a cron job that executes the command in step 3:

#!/bin/bash

# environment variables
# (also export TMPDIR if it doesn’t already exist)
export BIN_DIR=/tmp/bin
export LIB_DIR=/tmp/lib
export CACHE_DIR=/tmp/http_cache
export LOG_FILE=/tmp/bash_log.txt

# the name of this script
script_name=${0##*/}

# specify the HTTP resource
location=https://signin.aws.amazon.com/static/saml-metadata.xml	

# quietly diff the cached file against the file on the server
$BIN_DIR/http_cache_diff.bash -Q $location >&2
status_code=$?
if [ $status_code -eq 1 ]; then
	echo "WARN: $script_name: cache is NOT up-to-date for resource: $location" >&2
elif [ $status_code -gt 1 ]; then
	echo "ERROR: $script_name: http_cache_diff.bash failed ($status_code) on location: $location" >&2
fi

exit $status_code

Implement a separate cron job that periodically checks the source directory for expired or soon-to-be-expired metadata:

#!/bin/bash

# environment variables
# (also export TMPDIR if it doesn’t already exist)
export BIN_DIR=/tmp/bin
export LIB_DIR=/tmp/lib
export CACHE_DIR=/tmp/http_cache
export LOG_FILE=/tmp/bash_log.txt

# the name of this script
script_name=${0##*/}

# specify the source directory
sourceDirectory=/path/to/source/dir

# remove expired metadata from the source directory
# log a warning if a document will expire within two weeks
$BIN_DIR/md_sweep.bash -E P2W $sourceDirectory >&2
status_code=$?
if [ $status_code -ne 0 ]; then
	echo "ERROR: $script_name: md_sweep.bash failed ($status_code) on source directory: $sourceDirectory" >&2
fi

exit $status_code

Note that the above script removes all expired metadata from the source directory, not just AWS metadata.