POST form preservation causes XMLTooling.ParserPool ERROR when last part of POST content is not ASCII 7
Description
Environment
Activity
Scott Cantor April 4, 2018 at 8:20 PM
The bug is that the HTML response data passed back blindly to the Apache side doesn't get marked as unsafe in the RemotedResponse wrapper class, so it was not programmed to handle non-UTF8 data.
The POST data itself is tracked safely and maintained, but once it's merged into the postTemplate HTML, that whole blob is then unsafe but was being remoted without that protection.
Scott Cantor April 4, 2018 at 5:53 PM
I got this reproduced with a trailing 0xBB character unencoded in an ISO-8859-1 encoded form page, though I got a slightly dfifferent log message.
I did a code review and refreshed my memory on how this all worked, and in a nutshell, any data presumed to be other than UTF-8 is meant to be URL-encoded and then serialized and deserialized as ASCII so it can be wrapped in XML safely as UTF-8. That's what the UNSAFE_STRING designation inside the remoting code is doing.
I checked the URL encoder and verified that it encodes anything above 0x7F. So I'm not sure at this stage why it's not happening, or if it is, where the exposure shows up. But I can trace the code now. If that isn't happening, this is guaranteed to break because 8 bit ASCII is not UTF-8 compatible, that's where the leading surrogate pairs are.
Rod Widdowson April 3, 2018 at 4:22 PM
I'm continuing to look at this as a learning process - basically trying to chase the post data from the IIS HTTPRequest through the CGI parsing, into DDF and thus the stprage service memory, then out of storage service memory and into a DDF and from the DDF and into the template. (at which stage no encoding occurs even from a DDF_STRING_UNSAFE buit I don't know if that matters, or even if it would make any difference to pass it through DDF since that doesn't look for ascii > 127.
I have the two ends, I just need to find the middle.
Any time Scott gets bored with this he should pick tjis case up and fix it. I'll read his fix and be enlightened
Data point. Last night when Scott said:
??Any time things get "prettied" that tends to hide the problems that would lead to a bug like this. I know that the actual data in the response body in the DDF XML has to be encoded differently than that dump.
??
And he was right.
Hex dumps are the only meaningful way to debug. Hustvedt claimed that you didn't even need a disassembler.
The offending buffer which is about to be sent down to xerces,
6c 75 65 3d 26 71 75 6f 74 3b 52 65 74 72 69 65 76 65 20 43 68 61 6e 6e lue="Retrieve Chann
65 6c 20 4c 69 73 74 20 bb 26 71 75 6f 74 3b 2f 26 67 74 3b 0a 20 20 20 el List »"/>.
20 20 20 20 20 0a 20 20 20 20 20 20 20 20 26 6c 74 3b 69 6e 70 75 74 20 . <input
Scott Cantor April 2, 2018 at 5:25 PM
Any time things get "prettied" that tends to hide the problems that would lead to a bug like this. I know that the actual data in the response body in the DDF XML has to be encoded differently than that dump.
Anyway, if you want to hand me the bug, just having a reproducible form example to use is enough to get me farther into this. Somewhere there's an assumption that's wrong, data being treated as UTF-8 that isn't, or treated as ASCII when it's UTF-8.
I can definitely believe that the browser encoding mess overall is the problem. Anything but true UTF-8 end to end won't work if any non-ASCII data gets into the mix.
Rod Widdowson April 2, 2018 at 4:50 PM
So my brain is melting (again) with the encode/decode thing - but I haven't yet seen any attempt to XMLEncode the post data.
OTOH the post data comes back from the storage service in DDF format, so maybe it should be in DDF format then.
I think that I'll do some more XML encoding in the test form
<input type="hidden" name="currentOnly" value="showonlycurrentlyacquired"/>
If the SP is configured with postData and postTemplate on the <Sessions> element and a POST is sent when there is no session with the SP and the POST has content that ends with
+%BB
then later after the SAML2 SSO flow completes and the daemon is sending back the POST content to the module the module will log in native.log
2017-03-18 14:14:19 ERROR XMLTooling.ParserPool [11477] shib_handler: fatal error on line 1, column 1, message: invalid byte '<BB>' at position 1 of a 1-byte sequence
and this causes the module to display an error to the user.
Removing +%BB from the end of the POST content is a work around and does not trigger the error.
Here is the full POST content (length 158) that causes the error:
act=baseChan&baseSelector=true&ifo=any&subsys=any&fsCmp=%3E%3D&fs=any&chnamefilt=¤tOnly=show+only+currently+acquired&submitAct=Retrieve+Channel+List+%BB