All work

Select view

Select search mode

 

Invalid backingFile causes MetadataResolver initialization to fail

Fixed

Description

When there is drive full condition and the FileBackedHTTPMetadataProvider attempts to download and save the backingFile it will leave a 0 byte (empty file) on the filesystem.  The next time the IDP tries to start it will fail to initialize because the IDP is reading the backingFile expecting that it is XML and the XML parser throws an exception thats not being caught. 

Arguably there are probably two bugs here.

1) The IDP should probably validate the backingFile is valid rather than leaving a 0 byte xml on the filesystem. There's a number of ways this could be handled, but the easiest would probably just be to download the backingFile to the same location with a .staging extension, validate it then replace the backingFile.  Another approach would be to rename the backingFile, then download a new one, validate it, and if validation fails put the old file back so you don't leave the IDP in a bad state.  The problem with the second approach is that if the time is just right and the filesystem happened to fill up after you renamed it then you might not be able to put it back, but the same is probably true for the first if timing is just right.  Honestly I don't even understand how a 0 byte file is getting created to begin with because if the drive is full I would argue you shouldn't even be able to create a file, unless the code is creating a 0 byte file on purpose and adding the XML to it as a second step.   

2) The code that is loading the backingFile during the IDP initialization should capture the exception being thrown when the backingFile is not valid XML and either log a helpful message stating that the backingFile is corrupt, or better yet it should proactively try to rectify the situation by attempting to redownload the file and if it can't because the drive is full or permissions issues then log why it can't be fixed.  

I don't think you need to create a drive full situation to reproduce the problem.  You can probably just follow the instructions here https://www.fortypoundhead.com/showcontent.asp?artid=23907 to create a zero byte file, then replace the backingFile with it and restart the IDP.  

Environment

Windows 2008 or 2019, JDK (any version), Tomcat 9, and the latest 3.x IDP release. 

Details

Assignee

Reporter

Components

Fix versions

Created October 20, 2020 at 5:21 PM
Updated August 6, 2021 at 10:31 PM
Resolved March 3, 2021 at 2:09 AM

Activity

Brent PutmanMarch 3, 2021 at 2:09 AM

We haven't seen any test or other issues with this fix, so will close this out.

Brent PutmanOctober 23, 2020 at 7:46 PM

Ok, I think this should be fine then. The target file is not open at all, at least by any code we control. The source was just written out prior to the rename in a try-with-resources, so should be closed as long as that behaves as it's supposed to.

Rod WiddowsonOctober 23, 2020 at 10:27 AM

The issue with Windows and rename is that the source file should (often) not be open and the target should never be open.  So if you have an object kicking around somewhere that hasn't closed the file the rename will fail (until the object is finalized).  I recently got bitten by this in the installer where a ClasspathLoader was holding a jar open.

Recent Java's on recent Windows will exploit the Posix compatible renames (and deletes)  - unlink rather than delete.

But the good news here is that this is not an occasional failure its all or nothing and it is often fixable.  Given that the multis worked last night I'd say that we are golden.

Brent PutmanOctober 23, 2020 at 1:02 AM

Provisional change to use a staging file checked in as dc3a40825a23e45442394d078836825265127f80.

Based on the Jenkins multi jobs we'll see if Files.move(....) has an issue on Windows.

Scott CantorOctober 22, 2020 at 7:57 PM

I'd agree, that was why I at least wanted to move the bug in and see if there's anything worth improving, but the two options are still the primary resiliency tool.

I just added failFastInitialization="false" to the shipped example, it should have been there.

Flag notifications