Interface StAXDialect
-
public interface StAXDialect
Encapsulates the specific characteristics of a particular StAX implementation. In particular, an implementation of this interface is able to wrap (if necessary) the readers and writers produced by the StAX implementation to make them conform to the StAX specifications. This is called normalization.In addition to bugs in particular StAX implementations and clear violations of the StAX specifications, the following ambiguities and gray areas in the specifications are also addressed by the dialect implementations:
- The specifications don't tell whether it is allowed to use a
null
value for the charset encoding parameter in the following methods:XMLOutputFactory.createXMLEventWriter(java.io.OutputStream, String)
XMLOutputFactory.createXMLStreamWriter(java.io.OutputStream, String)
XMLStreamWriter.writeStartDocument(String, String)
null
values, while others throw an exception. To make sure that code written to run with a normalizedXMLOutputFactory
remains portable, the dialect implementation normalizes the behavior of these methods so that they consistently throw an exception when called with anull
encoding. Note that the type of exception to be thrown remains unspecified. - The StAX specifications require that
XMLStreamReader.getEncoding()
returns the "input encoding if known ornull
if unknown". This requirement is not precise enough to guarantee consistent behavior across different implementations. In order to provide the consumer of the stream reader with complete and unambiguous information about the encoding of the underlying stream, the dialect implementations normalize the behavior of theXMLStreamReader.getEncoding()
method such that it returns a non null value if and only if the reader was created from a byte stream, in which case the return value is the effective charset encoding used by the parser to decode the byte stream. According to the XML specifications, this value is determined by one of the following means:- The encoding was provided when the stream reader was created, i.e. as a parameter
to the
XMLInputFactory.createXMLStreamReader(java.io.InputStream, String)
method. This is referred to as "external encoding information" by the XML specifications. - The encoding was specified by the XML encoding declaration.
- The encoding was detected using the first four bytes of the stream, as described in appendix of the XML specifications.
- The encoding was provided when the stream reader was created, i.e. as a parameter
to the
- According to the table shown in the documentation of the
XMLStreamReader
class, calls toXMLStreamReader.getEncoding()
,XMLStreamReader.getVersion()
,XMLStreamReader.isStandalone()
,XMLStreamReader.standaloneSet()
andXMLStreamReader.getCharacterEncodingScheme()
are only allowed in theXMLStreamConstants.START_DOCUMENT
state. On the other hand, this requirement is not mentioned in the documentation of the individual methods and the majority of StAX implementations support calls to these methods in any state. However, to improve portability, the dialect implementations normalize these methods to throw anIllegalStateException
if they are called in a state other thanXMLStreamConstants.START_DOCUMENT
. - The documentation of
XMLStreamReader.isCharacters()
specifies that this method "returns true if the cursor points to a character data event". On the other hand, the documentation ofXMLStreamReader
states that "parsing events are defined as the XML Declaration, a DTD, start tag, character data, white space, end tag, comment, or processing instruction" and thus makes a clear distinction between character data events and white space events. This means thatXMLStreamReader.isCharacters()
should returntrue
if and only if the current event isXMLStreamConstants.CHARACTERS
. This is the case for most parsers, but some returntrue
forXMLStreamConstants.SPACE
events as well. Where necessary, the dialect implementations correct this behavior. - It is not clear which methods other than
XMLStreamWriter.setPrefix(String, String)
andXMLStreamWriter.setDefaultNamespace(String)
should update the namespace context maintained by theXMLStreamWriter
when namespace repairing is disabled. In Woodstox and IBM's XL XP-J, onlyXMLStreamWriter.writeNamespace(String, String)
andXMLStreamWriter.writeDefaultNamespace(String)
do this. On the other hand, in BEA's reference implementation and in SJSXP,XMLStreamWriter.writeStartElement(String, String, String)
also updates the namespace context (unless the given prefix is already bound to the namespace URI). The dialect implementations normalize the behavior such that onlyXMLStreamWriter.writeNamespace(String, String)
andXMLStreamWriter.writeDefaultNamespace(String)
update the namespace context.Note that the statement about Woodstox doesn't apply to very old versions. Originally, Woodstox'
XMLStreamWriter.writeNamespace(String, String)
andXMLStreamWriter.writeDefaultNamespace(String)
implementations didn't update the namespace context (as mentioned in this post from 2006). This behavior was changed in 2007. Woodstox versions older than that are not supported.Also note that as a corollary, if namespace repairing is disabled, it is mandatory to make the necessary calls to
XMLStreamWriter.writeNamespace(String, String)
andXMLStreamWriter.writeDefaultNamespace(String)
in order to produce XML that is well formed with respect to namespaces, and it should therefore not be necessary to callXMLStreamWriter.setPrefix(String, String)
orXMLStreamWriter.setDefaultNamespace(String)
explicitly.
Note that there are several ambiguities in the StAX specification which are not addressed by the different dialect implementations:
- It is not clear whether
XMLStreamReader.getAttributePrefix(int)
should returnnull
or an empty string if the attribute doesn't have a prefix. Consistency withXMLStreamReader.getPrefix()
would imply that it should returnnull
, but some implementations return an empty string. - There is a contradicting in the documentation of the
XMLStreamReader.next()
about the exception that is thrown when this method is called afterXMLStreamReader.hasNext()
returns false. It can either beIllegalStateException
orNoSuchElementException
.Note that some implementations (including the reference implementation) throw an
XMLStreamException
in this case. This is considered as a violation of the specifications because this exception should only be used "if there is an error processing the underlying XML source", which is not the case. - An XML document may contain a namespace declaration such as
xmlns=""
. In this case, it is not clear ifXMLStreamReader.getNamespaceURI(int)
should returnnull
or an empty string. - The documentation of
XMLStreamWriter.setPrefix(String, String)
andXMLStreamWriter.setDefaultNamespace(String)
requires that the namespace "is bound in the scope of the current START_ELEMENT / END_ELEMENT pair". The meaning of this requirement is clear in the context of an element written using thewriteStartElement
andwriteEndElement
methods. On the other hand, the requirement is ambiguous in the context of an element written usingwriteEmptyElement
and there are two competing interpretations:- Since the element is empty, it doesn't define a nested scope and the namespace should be bound in the scope of the enclosing element.
- An invocation of one of the
writeEmptyElement
methods actually doesn't write a complete element because it can be followed by invocations ofwriteAttribute
,writeNamespace
orwriteDefaultNamespace
. The element is only completed by a call to awrite
method other than the aforementioned methods. An element written usingwriteEmptyElement
therefore also defines a scope and the namespace should be bound in that scope.
writeEmptyElement
,writeAttribute
,setPrefix
,writeCharacters
. In this case, it is not clear if the scope of the empty element should end at the call towriteAttribute
orwriteCharacters
.Because of these ambiguities, the dialect implementations don't attempt to normalize the behavior of
XMLStreamWriter.setPrefix(String, String)
andXMLStreamWriter.setDefaultNamespace(String)
in this particular context, and their usage in conjunction withwriteEmptyElement
should be avoided.
- The specifications don't tell whether it is allowed to use a
-
-
Method Summary
All Methods Instance Methods Abstract Methods Deprecated Methods Modifier and Type Method Description XMLInputFactory
disallowDoctypeDecl(XMLInputFactory factory)
Configure the given factory to disallow DOCTYPE declarations.XMLInputFactory
enableCDataReporting(XMLInputFactory factory)
Configure the given factory to enable reporting of CDATA sections by stream readers created from it.String
getName()
Get the name of this dialect.XMLInputFactory
makeThreadSafe(XMLInputFactory factory)
Make anXMLInputFactory
object thread safe.XMLOutputFactory
makeThreadSafe(XMLOutputFactory factory)
Deprecated.XMLInputFactory
normalize(XMLInputFactory factory)
Normalize anXMLInputFactory
.XMLOutputFactory
normalize(XMLOutputFactory factory)
Deprecated.
-
-
-
Method Detail
-
getName
String getName()
Get the name of this dialect.- Returns:
- the name of the dialect
-
enableCDataReporting
XMLInputFactory enableCDataReporting(XMLInputFactory factory)
Configure the given factory to enable reporting of CDATA sections by stream readers created from it. The example in the documentation of theXMLStreamReader.next()
method suggests that even if the parser is non coalescing, CDATA sections should be reported as CHARACTERS events. Some implementations strictly follow the example, while for others it is sufficient to make the parser non coalescing.- Parameters:
factory
- the factory to configure; this may be an already normalized factory or a "raw" factory object- Returns:
- the factory with CDATA reporting enabled; this may be the original factory instance or a wrapper
- Throws:
UnsupportedOperationException
- if reporting of CDATA sections is not supported
-
disallowDoctypeDecl
XMLInputFactory disallowDoctypeDecl(XMLInputFactory factory)
Configure the given factory to disallow DOCTYPE declarations. The effect of this is similar to thehttp://apache.org/xml/features/disallow-doctype-decl
feature in Xerces. The factory instance returned by this method MUST satisfy the following requirements:- The factory or the reader implementation MUST throw an exception when requested to parse
a document containing a DOCTYPE declaration. If the exception is not thrown by the factory,
it MUST be thrown by the reader before the first
XMLStreamConstants.START_ELEMENT
event. - The parser MUST NOT attempt to load the external DTD subset or any other external entity.
- The parser MUST protect itself against denial of service attacks based on deeply nested entity definitions present in the internal DTD subset. Ideally, the parser SHOULD NOT process the internal subset at all and throw an exception immediately when encountering the DOCTYPE declaration.
- Parameters:
factory
- the factory to configure; this may be an already normalized factory or a "raw" factory object- Returns:
- the factory that disallows DOCTYPE declarations; this may be the original factory instance or a wrapper
- The factory or the reader implementation MUST throw an exception when requested to parse
a document containing a DOCTYPE declaration. If the exception is not thrown by the factory,
it MUST be thrown by the reader before the first
-
makeThreadSafe
XMLInputFactory makeThreadSafe(XMLInputFactory factory)
Make anXMLInputFactory
object thread safe. The implementation may do this either by configuring the factory or by creating a thread safe wrapper. The returned factory must be thread safe for all method calls that don't change the (visible) state of the factory. This means that thread safety is not required forXMLInputFactory.setEventAllocator(javax.xml.stream.util.XMLEventAllocator)
,XMLInputFactory.setProperty(String, Object)
,XMLInputFactory.setXMLReporter(javax.xml.stream.XMLReporter)
andXMLInputFactory.setXMLResolver(javax.xml.stream.XMLResolver)
.- Parameters:
factory
- the factory to make thread safe- Returns:
- the thread safe factory
-
makeThreadSafe
XMLOutputFactory makeThreadSafe(XMLOutputFactory factory)
Deprecated.Make anXMLOutputFactory
object thread safe. The implementation may do this either by configuring the factory or by creating a thread safe wrapper. The returned factory must be thread safe for all method calls that don't change the (visible) state, i.e. the properties, of the factory.- Parameters:
factory
- the factory to make thread safe- Returns:
- the thread safe factory
-
normalize
XMLInputFactory normalize(XMLInputFactory factory)
Normalize anXMLInputFactory
. This will make sure that the readers created from the factory conform to the StAX specifications.- Parameters:
factory
- the factory to normalize- Returns:
- the normalized factory
-
normalize
XMLOutputFactory normalize(XMLOutputFactory factory)
Deprecated.Normalize anXMLOutputFactory
. This will make sure that the writers created from the factory conform to the StAX specifications.- Parameters:
factory
- the factory to normalize- Returns:
- the normalized factory
-
-