Introduction To the Metadata Aggregator

This section provides a general overview of the Metadata Aggregator product. It describes the use cases that initially prompted the development of the aggregator and provides a high-level overview of the concepts and architecture of the product. This section does not contain any configuration information but should be read before proceeding on to the installation and configuration documentation.

Initial Aggregator Use Cases

The following use cases are what prompted the development of the tool (note, while all the uses cases given below are SAML-focused the general aggregator product is not SAML-focused).

The aggregator itself is a fairly general tool and so it's quite likely that as adoption grows people will find other uses for it as well. It is certainly the goal of the developers to make it relatively easy to meet any use case with a general form of "read in a bunch of data, transform it in various ways, and write it out or search it".

Metadata Aggregator Architecture

The following sections give a high level overview of the architecture of the metadata aggregator as well as introduce terms that will be used throughout the rest of the document.

Core Concept: Items

Within the aggregator, each individual unit of data is known as an item. As the name should suggest, this is a fairly generic construct. An item might represent anything: a person, a group, SAML entity. The data itself may also be encoded in any form: XML, JSON, ASN.1.

In addition to wrapping a unit of data, an item also carries a set of metadata about the item. This metadata is attached to the item as its processed by the aggregator and can really be any computed information that applies to the item. As will be seen later on, this includes things like identifiers by which the item may be looked up, error and warning messages if the item fails to pass various checks, and provenance information showing what parts of the aggregator worked on the item.

Core Concept: Pipeline

At the center of the metadata aggregator is the concept of a processing pipeline. A pipeline is a component which passes a collection of items through a number of stages which may transform, remove, add, or otherwise modify the collection.

In most cases, the first stage in a pipeline will be the source stage, or just source for short. Source stages read in data from somewhere (e.g., files on the filesystem, HTTP accessed file, configuration files), construct items, and populate the collection with them. While source stages normally occur at the beginning of the pipeline they are in fact just normal stages and so can actually occur anywhere in the pipeline flow.

As an example, imagine a pipeline whose source stage is an XML document pulled in over HTTP. The XML document is then passed through a set of stages that schema validate the document, check its digital signature, and then passes it through an XSLT transformation. In such a case the resultant collection would have a single entry. If a stage in the pipeline had broken the XML document up in to multiple documents then the resultant collection could have had more than one entry.

The configuration will discuss all the available stages and will give concrete examples (with configuration snippets).

Command Line

One method of using the metadata aggregator is as a command line tool. In this usage mode a primary pipeline is run and the result is (usually) written to a file. This is useful for integrating with a larger data processing environment, doing one-off processing, and testing the pipelines used in the web service.

Web Service

The web service component is not currently included in the Project Roadmap.


The web service interface provides a simple HTTP (REST, if you want to play buzzword bingo) interface that allows a consumer to retrieve one, or more, elements from an item collection based on an identifier or tag.

The web service is built of: