Table of Contents
Consistent serialization.
Axiom supports multiple methods and APIs to serialize an object model to XML or to transform it
to another (non Axiom) representation. This includes serialization to byte or character streams, transformation
to StAX in push mode (i.e. writing to an XMLStreamWriter
) or pull mode
(i.e. reading from an XMLStreamReader
), as well as transformation to SAX.
The representations produced by these different methods should be consistent with each other.
If a given use case can be implemented using more than one of these methods, then the end result
should be the same, whichever method is chosen.
AXIOM-430 provides an example where this principle was not respected.
It should be noted that this principle can obviously only be respected within the limits imposed
by a given API. E.g. if a given API has limited support for DTDs, then a DOCTYPE
declaration may be skipped when that API is used.
The LifecycleManager
API is used by the MIME handling code in Axiom
to manage the temporary files that are used to buffer the content of attachment parts.
The LifecycleManager
implementation is responsible to track the temorary
files that have been created and to ensure that they are deleted when they are no longer used.
In Axiom 1.2.x, this API has multiple issues and a redesign is required for Axiom 1.3.
Temporary files that are not cleaned up explicitly by application code will only be removed
when the JVM stops (LifecycleManagerImpl
registers a shutdown hook
and maintains a list of files that need to be deleted when the JVM exits). This means that
temporary files may pile up, causing the file system to fill.
LifecycleManager
also has a method deleteOnTimeInterval
that deletes a file after some specified time interval. However, the implementation creates a new
thread for each invocation of that method, which is generally not acceptable in high performance
use cases.
One of the stated design goals (see AXIOM-192)
of the LifecycleManager
API was to wrap the files in FileAccessor
objects to
“keep track of activity that occurs on the files”. However, as pointed out in
AXIOM-185, since
FileAccessor
has a method that returns the corresponding File
object, this goal has not been reached.
As noted in AXIOM-382, the fact
that LifecycleManagerImpl
registers a shutdown hook which is never unregistered
causes a class loader leak in J2EE environments.
In an attempt to work around the issues related to LifecycleManager
(in particular
the first item above), AXIOM-185
introduced another class called AttachmentCacheMonitor
that implements a timer
based mechanism to clean up temporary files. However, this change causes other issues:
The existence of this API has a negative impact on Axiom's architectural integrity because it
has functionality that overlaps with LifecycleManager
. This means that
we now have two completely separate APIs that are expected to serve the same purpose, but
none of them addresses the problem properly.
AttachmentCacheMonitor
automatically creates a timer, but there is no
way to stop that timer. This means that this API can only be used if Axiom is integrated
into the container, but not when it is deployed with an application.
Fortunately, that change was only meant as a workaround to solve a particular issue in WebSphere
(see APAR PK91497),
and once the LifecycleManager
API is redesigned to solve that issue,
AttachmentCacheMonitor
no longer has a reason to exist.
LifecycleManager
is an abstract API (interface), but refers to
FileAccessor
which is placed in an impl
package.
FileAccessor
uses the MessagingException
class
from JavaMail, although Axiom no longer relies on this API to parse or create MIME messages.
As pointed out in the previous section, one of the primary problems with the
LifecycleManager
API in Axiom 1.2.x is that temporary files that are
not cleaned up explicitly by application code (e.g. using the purgeDataSource
method
defined by DataHandlerExt
) are only removed when the JVM exits.
A timer based strategy that deletes temporary file after a given time interval (as proposed
by AttachmentCacheMonitor
) is not reliable
because in some use cases, application code may keep a reference to the attachment part for
a long time before accessing it again.
The only reliable strategy is to take advantage of finalization, i.e. to rely on the garbage collector to trigger the deletion of temporary files that are no longer used. For this to work the design of the API (and its default implementation) must satisfy the following two conditions:
All access to the underlying file must be strictly encapsulated, so that the file is only accessible as long as there is a strong reference to the object that encapsulates the file access. This is necessary to ensure that the file can be safely deleted once there is no longer a strong reference and the object is garbage collected.
Java guarantees that the finalizer is invoked before the instance is garbage
collected. However, instances are not necessarily garbage collected before the
JVM exits, and in that case the finalizer is never invoked. Therefore, the
implementation must delete all existing temporary files when the JVM exits.
The API design should also take into account that some implementations of
the LifecycleManager
API may want to trigger this
cleanup before the JVM exits, e.g. when the J2EE application in which
Axiom is deployed is stopped.
The first condition can be satisfied by redesigning the FileAccessor
such that it never leaks the name of the file it represents (neither as a String
nor a File
object). This in turn means that the
CachedFileDataSource
class must be removed from the Axiom API.
In addition, the getInputStream
method defined by
FileAccessor
must no longer return a simple FileInputStream
instance, but must use a wrapper that keeps a strong reference to the FileAccessor
,
so that the FileAccessor
can't be garbage collected while the
input stream is still in use.
To satisfy the second condition, one may want to use File#deleteOnExit
.
However, this method causes a native memory leak, especially when used with temporary files,
which are expected to have unique names (see
bug 4513817).
Therefore this can only be implemented using a shutdown hook. However, a shutdown hook will
cause a class loader leak if it is used improperly, e.g. if it is registered by an application deployed
into a J2EE container and not unregistered when that application is stopped. For this
particular case, it is possible to create a special LifecycleManager
implementation, but for this to work, the lifecycle of this type of LifecycleManager
must be bound to the lifecycle of the application, e.g. using a
ServletContextListener
. This is not always possible and this approach
is therefore not suitable for the default LifecycleManager
implementation.
To avoid the class loader leak, the default LifecycleManager
implementation
should register the shutdown hook when the first temporary file is registered and
automatically unregister the shutdown hook again when there are no more temporary files.
This implies that the shutdown hook is repeatedly registered and unregistered. However, since
these are relatively cheap operations[2], this should not be a concern.
An additional complication is that when the shutdown hook is executed, the temporary files
may still be in use. This contrasts with the finalizer case where encapsulation guarantees
that the file is no longer in use. This situation doesn't cause an issue on Unix platforms (where it is possible
to delete a file while it is still open), but needs to be handled properly on Windows.
This can only be achieved if the FileAccessor
keeps track of
created streams, so that it can forcibly close the underlying FileInputStream
objects.
[2] Since the JRE typically uses an
IdentityHashMap
to store shutdown hooks, the only overhead is caused
by Java 2 security checks and synchronization.