encoding problem in IdP backup metadata file
Description
Environment
is related to
Activity
Ian Young December 12, 2013 at 12:26 PM
I think this was a problem in the IdP (as Rod says, strictly in java-opensaml) but that it has long been fixed, albeit unintentionally.
The system on which this is reported to fail was IdP 2.1.5. Tracing the dependencies, this uses shibboleth-common 1.1.4 and opensaml 2.3.1 (rev 1428).
The next version of the IdP (2.2.0) used shibboleth-common and opensaml 2.4.0 (rev 1484).
Between rev 1428 and rev 1484, FileBackedHTTPMetadataProvider was essentially rewritten. In particular, the older version wrote the backup file by serialising the loaded XMLObject into a default FileWriter. This would have used the default character encoding, which on a Linux system would almost certainly be a one-byte character encoding like US-ASCII. However, that US-ASCII text was accompanied by the following XML declaration (from the reporter's sample):
<?xml version="1.0" encoding="UTF-8"?>
This would cause the failure we're seeing.
The later version of FileBackedHTTPMetadataProvider instead operates on the basis of an array of bytes being read from or written to a Stream connected to the backup file. So, by definition, what goes out comes back in.
I'd therefore expect that it would not be possible to reproduce this on any version of the IdP from 2.2.0 onwards. I'm therefore going to mark this RESOLVED FIXED in that release. Reopen this issue if it can be reproduced on a non-EOL version of the IdP.
Ian Young December 12, 2013 at 11:45 AM
possibly similar issue on SP side, long fixed
Ian Young December 12, 2013 at 11:44 AM
Additional information from Mathew Ian Eis, with permission:
The issue most recently presented itself during a recent internet outage on our campus, in the form of exceptions in our idp-process.log (truncated for brevity):
---------
04:34:54.727 - WARN [org.opensaml.saml2.metadata.provider.FileBackedHTTPMetadataProvider:101] - Unable to read metadata from http://wayf.incommonfederation.org/InCommon/InCommon-metadata.xml attempting to read it from local backup
java.net.UnknownHostException: wayf.incommonfederation.org
04:34:54.773 - ERROR [org.opensaml.saml2.metadata.provider.HTTPMetadataProvider:253] - Unable to unmarshall metadata
org.opensaml.xml.io.UnmarshallingException: org.opensaml.xml.parse.XMLParserException: Invalid XML
at org.opensaml.saml2.metadata.provider.AbstractMetadataProvider.unmarshallMetadata(AbstractMetadataProvider.java:190) [opensaml-2.3.1.jar:na]
...
Caused by: org.opensaml.xml.parse.XMLParserException: Invalid XML
at org.opensaml.xml.parse.BasicParserPool.parse(BasicParserPool.java:219) [xmltooling-1.2.1.jar:na]
...
Caused by: org.xml.sax.SAXParseException: Invalid byte 1 of 1-byte UTF-8 sequence.
at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) [na:na]
…
---------
The file was not modified other than might have been done by the IDP itself during normal processing. I did not include the full output of xmllint in original, which does include the “no DTD” error, but is of course unrelated to the UTF error. Here is the full output, in addition to the output of --version: ( I have since made a copy of the apparently corrupted metadata file, in case the issue is resolved by a fix upstream ).
---------
# xmllint --version
xmllint: using libxml version 20626
compiled with: Threads Tree Output Push Reader Patterns Writer SAXv1 FTP HTTP DTDValid HTML Legacy C14N Catalog XPath XPointer XInclude Iconv ISO8859X Unicode Regexps Automata Expr Schemas Schematron Modules Debug
xmllint --valid --noout InCommon-metadata.xml.2013-12-11_11-26-MST
InCommon-metadata.xml.2013-12-11_11-26-MST:1: validity error : Validation failed: no DTD found !
eth-metadata-1.0.xsd http://www.w3.org/2000/09/xmldsig# xmldsig-core-schema.xsd"
^
InCommon-metadata.xml.2013-12-11_11-26-MST:15577: parser error : Input is not proper UTF-8, indicate encoding !
Bytes: 0xB9 0x73 0x20 0x70
amespace" xml:lang="en">The University of Maryland, Baltimore (UMB) is the State
---------
Our production environment is using Shibboleth IDP 2.1.5, and JAVA 1.6.0_24.
Curiously, I just noticed that the issue does not seem to present itself in our test environment, which is using a newer Shibboleth IDP 2.4.0 and JAVA 1.6.0_45. This would seem to imply that something in the older IDP/JAVA is corrupting the Metadata as it is written down to disk.
I would also add that your theory about the 0xC2 being dropped is correct:
Hex of a file retrieved with wget:
4d 42 29 20 69 73 20 74 68 65 20 53 74 61 74 65 c2 b9 73 20 70 75 62 6c 69 63 20 68 65 61 6c 74 |MB) is the State¹s public healt|
Hex of the file as stored by the IDP 2.1.5 / JAVA 1.6.0_24:
4d 42 29 20 69 73 20 74 68 65 20 53 74 61 74 65 b9 73 20 70 75 62 6c 69 63 20 68 65 61 6c 74 |MB) is the State?s public healt|
Rod Widdowson December 12, 2013 at 10:50 AM
I'll note that there is a history of "the IdP" not doing the right thing with the backup file (strictly speaking this is probably somewhere in the FileBackedHttpProvider code).
Some spelunking through the SVN history might be educational...
Ian Young December 11, 2013 at 9:21 PM
The working theory is that the backup file was being written out in a single-byte encoding such as ISO-8859-1 but mislabelled as UTF-8. This would cause a failure when reading the single-byte encoding of U+00B9 (which is just the single octet 0xB9) back in, because an 0xB9 octet cannot occur as the first octet in a UTF-8 encoding.
Scott confirms that a current IdP (2.4.0) wrote valid UTF-8 out into the backing file for the InCommon metadata in question.
If a SAML metadata file has the following declaration:
<?xml version="1.0" encoding="UTF-8"?>
and the document contains a non-UTF-8 character, xmlsectool reports that the document is schema-valid.
I can provide a signed, 6MB SAML metadata file that exhibits the issue.