Chapter 2. Design

Table of Contents

General design principles and goals
LifecycleManager design (Axiom 1.3)
Issues with the LifecycleManager API in Axiom 1.2.x
Cleanup strategy for temporary files

General design principles and goals

Consistent serialization.  Axiom supports multiple methods and APIs to serialize an object model to XML or to transform it to another (non Axiom) representation. This includes serialization to byte or character streams, transformation to StAX in push mode (i.e. writing to an XMLStreamWriter) or pull mode (i.e. reading from an XMLStreamReader), as well as transformation to SAX. The representations produced by these different methods should be consistent with each other. If a given use case can be implemented using more than one of these methods, then the end result should be the same, whichever method is chosen.

AXIOM-430 provides an example where this principle was not respected.

It should be noted that this principle can obviously only be respected within the limits imposed by a given API. E.g. if a given API has limited support for DTDs, then a DOCTYPE declaration may be skipped when that API is used.

LifecycleManager design (Axiom 1.3)

The LifecycleManager API is used by the MIME handling code in Axiom to manage the temporary files that are used to buffer the content of attachment parts. The LifecycleManager implementation is responsible to track the temorary files that have been created and to ensure that they are deleted when they are no longer used. In Axiom 1.2.x, this API has multiple issues and a redesign is required for Axiom 1.3.

Issues with the LifecycleManager API in Axiom 1.2.x

  1. Temporary files that are not cleaned up explicitly by application code will only be removed when the JVM stops (LifecycleManagerImpl registers a shutdown hook and maintains a list of files that need to be deleted when the JVM exits). This means that temporary files may pile up, causing the file system to fill.

  2. LifecycleManager also has a method deleteOnTimeInterval that deletes a file after some specified time interval. However, the implementation creates a new thread for each invocation of that method, which is generally not acceptable in high performance use cases.

  3. One of the stated design goals (see AXIOM-192) of the LifecycleManager API was to wrap the files in FileAccessor objects to keep track of activity that occurs on the files. However, as pointed out in AXIOM-185, since FileAccessor has a method that returns the corresponding File object, this goal has not been reached.

  4. As noted in AXIOM-382, the fact that LifecycleManagerImpl registers a shutdown hook which is never unregistered causes a class loader leak in J2EE environments.

  5. In an attempt to work around the issues related to LifecycleManager (in particular the first item above), AXIOM-185 introduced another class called AttachmentCacheMonitor that implements a timer based mechanism to clean up temporary files. However, this change causes other issues:

    • The existence of this API has a negative impact on Axiom's architectural integrity because it has functionality that overlaps with LifecycleManager. This means that we now have two completely separate APIs that are expected to serve the same purpose, but none of them addresses the problem properly.

    • AttachmentCacheMonitor automatically creates a timer, but there is no way to stop that timer. This means that this API can only be used if Axiom is integrated into the container, but not when it is deployed with an application.

    Fortunately, that change was only meant as a workaround to solve a particular issue in WebSphere (see APAR PK91497), and once the LifecycleManager API is redesigned to solve that issue, AttachmentCacheMonitor no longer has a reason to exist.

  6. LifecycleManager is an abstract API (interface), but refers to FileAccessor which is placed in an impl package.

  7. FileAccessor uses the MessagingException class from JavaMail, although Axiom no longer relies on this API to parse or create MIME messages.

Cleanup strategy for temporary files

As pointed out in the previous section, one of the primary problems with the LifecycleManager API in Axiom 1.2.x is that temporary files that are not cleaned up explicitly by application code (e.g. using the purgeDataSource method defined by DataHandlerExt) are only removed when the JVM exits. A timer based strategy that deletes temporary file after a given time interval (as proposed by AttachmentCacheMonitor) is not reliable because in some use cases, application code may keep a reference to the attachment part for a long time before accessing it again.

The only reliable strategy is to take advantage of finalization, i.e. to rely on the garbage collector to trigger the deletion of temporary files that are no longer used. For this to work the design of the API (and its default implementation) must satisfy the following two conditions:

  1. All access to the underlying file must be strictly encapsulated, so that the file is only accessible as long as there is a strong reference to the object that encapsulates the file access. This is necessary to ensure that the file can be safely deleted once there is no longer a strong reference and the object is garbage collected.

  2. Java guarantees that the finalizer is invoked before the instance is garbage collected. However, instances are not necessarily garbage collected before the JVM exits, and in that case the finalizer is never invoked. Therefore, the implementation must delete all existing temporary files when the JVM exits. The API design should also take into account that some implementations of the LifecycleManager API may want to trigger this cleanup before the JVM exits, e.g. when the J2EE application in which Axiom is deployed is stopped.

The first condition can be satisfied by redesigning the FileAccessor such that it never leaks the name of the file it represents (neither as a String nor a File object). This in turn means that the CachedFileDataSource class must be removed from the Axiom API. In addition, the getInputStream method defined by FileAccessor must no longer return a simple FileInputStream instance, but must use a wrapper that keeps a strong reference to the FileAccessor, so that the FileAccessor can't be garbage collected while the input stream is still in use.

To satisfy the second condition, one may want to use File#deleteOnExit. However, this method causes a native memory leak, especially when used with temporary files, which are expected to have unique names (see bug 4513817). Therefore this can only be implemented using a shutdown hook. However, a shutdown hook will cause a class loader leak if it is used improperly, e.g. if it is registered by an application deployed into a J2EE container and not unregistered when that application is stopped. For this particular case, it is possible to create a special LifecycleManager implementation, but for this to work, the lifecycle of this type of LifecycleManager must be bound to the lifecycle of the application, e.g. using a ServletContextListener. This is not always possible and this approach is therefore not suitable for the default LifecycleManager implementation.

To avoid the class loader leak, the default LifecycleManager implementation should register the shutdown hook when the first temporary file is registered and automatically unregister the shutdown hook again when there are no more temporary files. This implies that the shutdown hook is repeatedly registered and unregistered. However, since these are relatively cheap operations[2], this should not be a concern.

An additional complication is that when the shutdown hook is executed, the temporary files may still be in use. This contrasts with the finalizer case where encapsulation guarantees that the file is no longer in use. This situation doesn't cause an issue on Unix platforms (where it is possible to delete a file while it is still open), but needs to be handled properly on Windows. This can only be achieved if the FileAccessor keeps track of created streams, so that it can forcibly close the underlying FileInputStream objects.



[2] Since the JRE typically uses an IdentityHashMap to store shutdown hooks, the only overhead is caused by Java 2 security checks and synchronization.