This section contains some thoughts on the handling of namespaces in the template language.
It is easy to produce a document containing the correct XML namespace declarations and usages, but it is harder to make it look like the user would like it look, even more so when you want streaming and fast output.
The XML should use sensible namespace prefixes, the same as those use in the template itself.
Some namespaces which are declared in the template may not be used in the target document, and hence we don't like to see their declarations in the target document either.
Namespace declarations shouldn't be repeated all over the place (in the output).
When the template language supports variables containing XML or subroutines, then XML content can be inserted in the result document at another location then it appeared in the source document, thus hence be inserted within a different context of declared namespaces, and as such requires emitting of necessary prefix-mappings and possibly changing or undeclaring the default namespace. An additional difficulty is that often we won't know on beforehand what namespace prefixes the subroutine/variable will actually use, so we'll have no choice but to emit prefix mapping events for all namespaces that were active at the location the subroutine/variable was declared.
Namespace prefixes are sometimes used not only in elements names and attributes names, but also in their content. This is something generic XML software can't know about (except when using schemas, but even then you're limited to what's expressible in the schema). For this case it is important to keep the prefixes as-is (which we want anyway, but a user could really make it difficult by using the same prefix with different namespaces in the same document), and also not to remove prefixes, even if they are not used in element or attribute names.
Besides removing unneeded prefix mappings, sometimes it can be desired to completely erase some namespaces, hence changing the elements and attributes in that namespace to non-namespaced elements. This is a specific case of the more general case of translating namespaces, but that's not a common need. Use case for erasing namespaces: when you write the template using all elements in XHTML namespace, but you want to send plain, non-namespaced HTML to the browser.
In the series other-cool-ideas-but-not-urgent:
the ability to say that one namespace is an alias for another one, as with xsl:namespace-alias. Useful to generate templates using the template language (the aliasing is only true for the generated output).
the ability to translate prefixes? E.g. in the template one uses 'xhtml:' as namespace prefix, but in the output you want it to be the default namespace. (One way to implement it might be to have a template instruction to generate a prefix-mapping, and use exclude-result-prefixes to drop the original mapping).
there are different variants on SAX, depending on the options enabled on the parser: namespace or not-namespace aware, report or don't report xmlns attributes, ... When the SAX events don't come from an XML parser, then sloppy sax producers might report xmlns attributes which conflict with the start/endPrefixMapping events or with the 'qNames' of elements or attributes.
when parsing an XML document, start/endPrefixMapping events usually come right before the start element and right after the end element, but for the implementation of template languages or SAX filters in general, it is not evident to keep this rule. Think of the simple case of a filter which drops one particular element (but leaves it childeren).
Often, most namespaces will be declared on the root element and the same namespace prefixes will be used throughout the document.
Users can be expected to help a bit, e.g. by supplying hints as to what namespaces can be removed.
What to do with the namespace declaration of the namespace of the template language itself? This depends on what we do with non-recognized elements and attributes. If we leave them in the output, then the namespace should also stay. Other approaches would be silently removing them or throwing a hard error on them, but we would go for the leave them in approach for now. So we can try to suppress this prefix mapping but might need to re-introduce it if needed.
The compiled template might need to store the active namespace contexts, if it has a need for resolving namespaces (e.g. for XPath expressions?)
In some cases, repeating namespace declarations will be unavoidable. Suppose our template language has an instruction to include a part of the content of an external XML file, e.g. everything below some element path. A common use-case is to include everything below html/body of an external HTML file. If the elements in this file are in the XHTML namespace, but that namespace is not yet declared, there won't be much else possible then repeating the XHTML namespace on every child element of the html/body:
<p xmlns=”...”>foo bar</p>
<p xmlns=”...”>foo bar</p>
<p xmlns=”...”>foo bar</p>
Users can of course work around this: if they know they will be including such files, they can on beforehand declare the namespace on a higher-up element.
Given that a parser might or might not report xmlns attributes, let's assume they won't be there, and to avoid any conflicts, actively remove them if they are present.
start/endPrefixMappings are probably easiest to handle if they are stored in the template as steps to execute, rather than e.g. associating them with start/end elements. This is the way CTemplate does it.
we'll assume that all prefixes defined in the template should be present in the output, unless specifically suppressed.
For the exclude-result-prefixes features (cfr. same concept in XSLT):
Do removal during template compilation or template execution? Doing it during template compilation saves some execution time, but doing it during execution will make that it also works on externally included content, and also allows to specify it at other locations than the root element.
Use #default as 'prefix' to list for default namespace
Use #all for all prefixes in-scope at place the exclude-result-prefixes is defined
If prefixes are removed but the corresponding namespace still used, then the prefixes will be re-emitted on an as-needed basis.
For the exclude-namespaces feature (no corresponding XSLT concept):
completely removes namespaces: both prefix mappings and there usage in element names and attribute names
Adding of xmlns attributes: should be a job of the serializer:
makes assumption declarations should be made for all prefixes reported via start/endPrefixMapping
needs to keep track of declared prefixes and don't redeclare them if they are already active in scope with the same namespace URI
handle (= remove or at least check) xmlns attributes present in incoming SAX events.
handle case of same prefix with two different URI's on same element: report error or auto-adjust
check if 'qNames' of element/attributes use a declared prefix, and that the prefix corresponds to the namespace URI of the element/attribute.
Namespace and prefix removal might be desirable at the level of the serializer too, can we avoid duplicate implementation and execution work? In pipeline situations, pipeline components after the template engine will assume the namespace removal already happened (prefix removal is less important). Another idea is to have attributes in a serializer-specific namespace to give these sort of hints to the serializer.