Chapter 5. Common mistakes, problems and anti-patterns

Table of Contents

Violating the javax.activation.DataSource contract
Issues that magically disappear
The OM-inside-OMDataSource anti-pattern
Weak version
Strong version

This chapter presents some of the common mistakes and problems people face when writing code using Axiom, as well as anti-patterns that should be avoided.

Violating the javax.activation.DataSource contract

When working with binary (base64) content, it is sometimes necessary to write a custom DataSource implementation to wrap binary data that is available in a different form (and for which Axiom or the Java Activation Framework has no out-of-the-box data source implementation). Data sources are also sometimes (but less frequently) used in conjunction with OMSourcedElement and OMDataSource.

The documentation of the DataSource is very clear on the expected behavior of the getInputStream method:

/**
 * This method returns an InputStream representing
 * the data and throws the appropriate exception if it can
 * not do so. Note that a new InputStream object must be
 * returned each time this method is called, and the stream must be
 * positioned at the beginning of the data.
 *
 * @return an InputStream
 */
public InputStream getInputStream() throws IOException;

A common mistake is to implement the data source in a way that makes getInputStream destructive. Consider the implementation shown in Example 5.1, “DataSource implementation that violates the interface contract”[2]. It is clear that this data source can only be read once and that any subsequent call to getInputStream will return an already closed input stream.

Example 5.1. DataSource implementation that violates the interface contract

public class InputStreamDataSource implements DataSource {
    private final InputStream is;

    public InputStreamDataSource(InputStream is) {
        this.is = is;
    }

    public String getContentType() {
        return "application/octet-stream";
    }

    public InputStream getInputStream() throws IOException {
        return is;
    }

    public String getName() {
        return null;
    }

    public OutputStream getOutputStream() throws IOException {
        throw new UnsupportedOperationException();
    }
}

What makes this mistake so vicious is that very likely it will not cause problems immediately. The reason is that Axiom is optimized to read the data only when necessary, which in most cases means only once! However, in some cases it is unavoidable to read the data several times. When that happens, the broken DataSource implementation will cause problems that may be extremely hard to debug.

Imagine for example[3] that the implementation shown above is used to produce an MTOM message. At first this will work without any problems because the data source is read only once when serializing the message. If later on the MTOM threshold feature is enabled, the broken implementation will (in the worst case) cause the corresponding MIME parts to be empty or (in the best case) trigger an I/O error because Axiom attempts to read from an already closed stream. The reason for this is that when an MTOM threshold is set, Axiom reads the data source twice: once to determine if its size exceeds the threshold[4] and once during serialization of the message.

Issues that magically disappear

Quite frequently users post messages on the Axiom related mailing lists about issues that seem to disappear by magic when they try to debug them. The reason why this can happen is simple. As explained earlier, Axiom uses deferred building, but at the same time does its best to hide that from the user, so that he doesn't need to worry about whether the object model has already been built or not. On the other hand, when serializing the object model to XML or when requesting a pull parser (XMLStreamReader) from a node, the code paths taken may be radically different depending on whether or not the corresponding part of the tree has already been built. This is especially true when caching is disabled.

While the end result should be the same in all cases, it is also clear that in some circumstances an issue that occurs with an incompletely built tree may disappear if there is something that causes Axiom to build the rest of the object model. What is important to understand is that the something may be as trivial as a call to the toString method of an OMNode! The fact that adding System.out.println statements or logging instructions is a common debugging technique then explains why issues sometimes seem to magically disappear during debugging.

Finally, it should be noted that inspecting an OMNode in a debugger also causes a call to the toString method on that object. This means that by just clicking on something in the Variables window of your debugger, you may completely change the state of the process that is being debugged!

The OM-inside-OMDataSource anti-pattern

Weak version

OMDataSource objects are used in conjunction with OMSourcedElement to build Axiom object model instances that contain information items that are represented using a framework or API other than Axiom. Wrapping this foreign data in an OMDataSource and adding it to the Axiom object model using an OMSourcedElement in most cases avoids the conversion of the data to the native Axiom object model[5]. The OMDataSource contract requires the implementation to support two different ways of providing the data, both relying on StAX:

  • The implementation must be able to provide a pull parser (XMLStreamReader) from which the infoset can be read.

  • The data source must be able to serialize the infoset to an XMLStreamWriter (push).

For the consumer of an event based representation of an XML infoset, it is in general easier to work in pull mode. That is the reason why StAX has gained popularity over push based approaches such as SAX. On the other hand for a producer such as an OMDataSource implementation, it's exactly the other way round: it is far easier to serialize an infoset to an XMLStreamWriter (push) than to build an XMLStreamReader from which a consumer can read (pull) events.

Experience indeed shows that the most challenging part in creating an OMDataSource implementation is to write the getReader method. In the past, to avoid that difficulty some implementations simply built an Axiom tree and returned the XMLStreamReader provided by OMElement#getXMLStreamReader(). For example, older versions of ADB (Axis2 Data Binding) used the following code[6]:

Example 5.2. OMDataSource#getReader() implementation used in older ADB versions

public XMLStreamReader getReader() throws XMLStreamException {
    MTOMAwareOMBuilder mtomAwareOMBuilder = new MTOMAwareOMBuilder();
    serialize(mtomAwareOMBuilder);
    return mtomAwareOMBuilder.getOMElement().getXMLStreamReader();
}

The MTOMAwareOMBuilder class referenced by this code was a special implementation of XMLStreamWriter building an Axiom tree from the sequence of events sent to it. The code than used this Axiom tree to get the XMLStreamReader implementation. While this was a functionally correct implementation of the getReader method, it is not a good solution from a performance perspective and also contradicts some of the ideas on which Axiom is based, namely that the object model should only be built when necessary.

Starting with Axiom 1.2.14, there is a solution to avoid this anti-pattern. OMDataSource implementations that cannot provide a meaningful XMLStreamReader instance should extend org.apache.axiom.om.ds.AbstractPushOMDataSource and only implement the serialize method. OMSourcedElement will handle OMDataSource implementations extending this class differently when it comes to expansion: instead of using OMDataSource#getReader() to expand the element, it will use OMDataSource#serialize(XMLStreamWriter) (with a special XMLStreamWriter that builds the descendants of the OMSourcedElement). Note that this means that such an OMSourcedElement will be expanded instantly, and that deferred building of the descendants is not applicable. Nevertheless, this approach is significantly more efficient than using the OM-inside-OMDataSource anti-pattern.

Strong version

There is also a stronger version of the anti-pattern which consists in implementing the serialize method by building an Axiom tree and then serializing the tree to the XMLStreamWriter. Except for very special cases, there is no valid reason whatsoever to do this! To see why this is so, consider the two possible cases:

  1. The OMDataSource already implements the getReader method in a proper way, i.e. without building an intermediary Axiom tree. To properly implement serialize, it is then sufficient to pull the events from the reader returned by a call to getReader and copy them to the XMLStreamReader. The easiest and most efficient way to do this is to extend org.apache.axiom.om.ds.AbstractPullOMDataSource (available in Axiom 1.2.14), which implements the serialize method in exactly that way. There is thus no need to build an intermediary object model in this case.

  2. The getReader method also uses an intermediary Axiom tree[7]. In that case it doesn't make sense to use an OMSourcedElement in the first place! At least it doesn't make sense if one assumes that in general the OMSourcedElement will either be serialized or its content accessed after being added to the tree. Indeed, in this case the Axiom tree will be built at least once (if not multiple times), so that the code might as well use a normal OMElement.

    This only leaves the very special case where the OMSourcedElement is in general neither accessed nor serialized, either because it will usually be somehow discarded or because the code uses OMDataSourceExt#getObject() to retrieve the raw data. Even in that case one can argue that in general it should not be too hard to implement at least the serialize method properly by transforming the raw or foreign data directly to StAX events written to the XMLStreamWriter.

QED



[2] The example shown is actually a simplified version of code that is part of Axis2 1.5.

[4] To do this, Axiom doesn't read the entire data source, but only reads up to the threshold.

[5] An exception is when code tries to access the children of the OMSourcedElement. In this case, the OMSourcedElement will be expanded, i.e. the data will be converted to the native Axiom object model.