IdP stopping metadata retrieval
Description
Environment
CentOS 5, Tomcat 6 prepped by Apache2.2
Attachments
Activity

Brent PutmanFebruary 27, 2015 at 1:09 AM
We think we've solved this for now. Going to close, but can re-open (again) if our HC timeout and Timer fixes didn't do it.

Brent PutmanFebruary 25, 2015 at 6:28 AM
Looks like I was incorrect about the provider destroy(). Spring doesn't call it, but it's explicitly invoked via the SAMLMDRelyingPartyConfigurationManager service. So it's sort of asymmetrical - Spring does the init, but service code does the destroy.
So not a concern AFAIK at this point.
Scott CantorFebruary 25, 2015 at 1:37 AM
There have always been problems with reloading that file, it's generally not recommended (one of the very major improvements I'm happy about in V3). I think a lot of the problems with Tomcat redeploys traces to these leaks.
I guess the question is whether it's worth adding the timers given that the main problem is with deadlocking the single thread, and the timeouts should mostly fix that.

Brent PutmanFebruary 25, 2015 at 12:45 AM
And now for the bad news: Confirmed with some debug logging and test tweaking that, like the BaseReloadableService
, we never actually destroy()
any of the metadata providers. The AbstractMetadataProviderBeanDefinitionParser
only registers the init method, not the destroy.
Not sure in the big picture how bad this is yet, but I'm pretty sure "royally sucks" is apropos...
I don't understand all the reloading architecture yet, but: if someone has relying-party.xml set to reload, I think that means you're going to get leaking and leaking all over the place: memory and background TimerTasks ... and now Timer instances/threads.

Brent PutmanFebruary 25, 2015 at 12:36 AM
What I have done at this point:
Added 90s connection and socket timeouts to HttpResource.
Added internal 90s socket timeout to HttpClientBuilder.
Changed reloading metadata provider so that a null Timer is allowed, which results in an internal Timer instance.
Changed the reloading metadata provider parser so that we no longer default 'shibboleth.TaskTimer'. This was easy. Turns out we got rid of the schema defaults long ago (2008). And we didn't even add the new background reloading stuff, including the timer attribute, until 2010. So the default was purely in the parser. I vaguely remember some of that. And Spring at least in this version uses indexed ctor args exclusively, AFAICT, so doing the null Timer was as simple as adding the null arg value. (Note value, not reference. Passing a 'null' to
builder.addConstructorArgReference
results in an IllegalArgumentException, as the bean name passed to the constructedRuntimeBeanReference
may not be null.)
I'm feeling like that's all we should do for this release. The only possible addition would be to publicly expose the HttpClientBuilder socketTimeout, and then do the same on the schema and parsers. But I don't really want to rev everything.
IdPs were otherwise healthy and working for months, but stopped to retrieve signed metadata for 3 different plain http URLs at the same time.
The 2.4.0 IdP on one site is part of a Terracotta cluster of two, here only one instance was affected.
The 2.3.6 IdP on an different and unrelated site stopped retrieval some days later. Here, the time stamps of the metadata backup files was when one metadata source was unavailable temporarily. After that time, there were no occurences of "metadata" in the process log (INFO) except SP lookup failures, until restart of the IdP.