Expectations and Features
Runitme Expectation
- Standalone runtime system!
Developer Expectations
- Have access to a flexible xml-pipelining architecture to convert xml
dynamically in a streaming mode.
- This should allow to hook up reusable XML filter components
- as well as classic xslt processing, or pure streaming variants (like
stx.sf.net)
- Be able to combine this kind of transformer-sequences in groups of by
themselves reuseable convertors
- For efficiency this pipelining architecture should be optimized so it avoids
uneccesary xml -> byte ser/deser
- This should integrate nicely with the templating system:
- i.e. template should be able to pull in results from pipelines
- template outcome must be able to be handled in pipelines
- pipelines must be able to pull in info from templates
- Unsure: there must be some pipelining transformer component that can
interprete/perform the templating syntax/ constructs
- A ready set of useful configurable transforming components...
- Declarative syntax for pipeline description/configuration
- Profiling and tracing of intermediate passed formats
- Generic context interface with hooks to embedding environment for
- side-lane object passing
- uri resolving
- generic service lookup?
- Distinct but related convertor system
- fo to pdf, sv to png, gnumeric-xls, ...
(Re)Search and Inspiration out there
- Obviously the archetypical XML Pipelining System Apache Cocoon.
- Although we might find a way to only cater for transformers.
- Serializer might get banned to actual and more generic converting system
triggered by content negotiation.
- Generator might get handled by just having 'source' (ie URI) handlers that
also based on content negotiation know about delivering the content not as bytes
but rather as some expected XML-API-Model
Note: the Jon Postel axiom would suggest that any transformer component should
be able to accept input form byte-streams anyway. (and perform parsing
themselves)
- Consider side-effects and tee-like processing constructs for multi-output
generation.
- Considering however a more flexible pipeline architectural foundation: i.e.
not only support sax, but also stax, dom, xstream?
- In the same section xslt 2.0 (via saxon) support should be considered.
- Java - xml ser/deser http://xstream.codehaus.org/
- This should also cater for sidewise xslt processing resources pulled in via
the uriresolver
- Don't overdo integration: having this kind of system available CLI/ separate
from the request-response cycle seems more then just a bit useful (and one big
complaint about cocoon where the integration is too tight and cannot be used
outside that model)
- SmallX project (https://smallx.dev.java.net/)
Required Artefacts and implementations
PreBuild Trajectory
- Study and selection of modern XML APIs that should be supported in these
pipelines.
- Flexible modelling and structuring of pipeline-components. (Spring bean
based? intermixed with the URI mapping system like cocoon's sitemap?)
Guidelines
- Introduction into the advantages of the Two Step View
(http://martinfowler.com/eaaCatalog/twoStepView.html)
- Guidance on pipeline composition and structuring
Running code, Jars, APIs and docs for
- XML Pipelining system with
- flexible support for various foundational XML APIs
- available optimal conversion between them
- plugable uri-resolving features and
- plugable uri-retrieval component (both to allow integration with not only
the rest of kauri)
- Various ready transformers
- Integration code with the overall system
- Documented extension points for custom xml processing components
Reading up
Google Results ("xml pipelines java")
- General Wikipedia entries
- http://en.wikipedia.org/wiki/XML_pipeline
- http://en.wikipedia.org/wiki/XProc
- http://en.wikipedia.org/wiki/XML_transformation_language
- W3C
- "Note" XML Pipeline definition language (http://www.w3.org/TR/xml-pipeline/)
- "Member submission (orbeon)" XML Pipeline Language
(http://www.w3.org/Submission/xpl/)
- "Working Draft" XML Pipeline Language (http://www.w3.org/TR/xproc/)
- Number of blogposts by Norman Walsh on it:
http://www.google.com/search?as_q=xproc&hl=en&ie=UTF-8&btnG=Google%2BSearch&as_qdr=all&as_occt=any&as_dt=i&as_sitesearch=norman.walsh.name
- his list of implementations: http://del.icio.us/ndw/xprocimpl
- Norman Walsh' sxpipe: http://norman.walsh.name/2004/06/20/sxpipe
- yax, an xproc implementation (http://yax.sourceforge.net/)
- infoset and xproc processing through smallx (https://smallx.dev.java.net/)
- Stylus Component / Graphical Editor for their own 'xml pipelines'
(http://www.stylusstudio.com/videos/pipeline2/pipeline2.html)
- Oracle XML Developer Kit
- http://www.oracle.com/technology/tech/xml/xdkhome.html
-
http://download-east.oracle.com/docs/cd/B19306_01/appdev.102/b14252/adx_j_pipeline.htm
- Guide to XML pipelines:
http://www.1060research-server-1.co.uk/docs/3.2.0/book/tutorial/doc_guide_xml_pipelines_main.html
- Managing Complex Document Generation through Pipelining - Jeni Tennison -
http://idealliance.org/proceedings/xtech05/papers/04-03-01/
- Serving XML - http://servingxml.sourceforge.net/
- Yahoo Pipes - http://pipes.yahoo.com/pipes/
- STnG -
http://www.idealliance.org/papers/extreme/proceedings//html/2003/Krupnikov01/EML2003Krupnikov01.html
Other
- Obviously the archetypical XML Pipelining System Apache Cocoon.
- Although we might find a way to only cater for transformers.
- Serializer might get banned to actual and more generic converting system
triggered by content negotiation.
- Generator might get handled by just having 'source' (ie URI) handlers that
also based on content negotiation know about delivering the content not as bytes
but rather as some expected XML-API-Model
- Other technologies to possibly interact with:
- sax, dom
- xslt 2.0 (via saxon) support should be considered.
- Java - xml ser/deser http://xstream.codehaus.org/
- stax
- trax
- stx
- ...
Reviews
W3C Note: Pipeline Definition Language (2002)
- Old (2002), evolved into XProc?
- Some basic ideas, and suggestions
- Based on xmlinfoset, support for xml base and xml namespaces
- Flexible ordering based on dependecies...
- Context defining input(s), output(s), URI resolver, variables
- Allow XML syntax for pipeline declaration with embedded XML source (input)
to the process.
- Checking if 'result' is still up to date before processing! (based on ts of
inputs and pipe itself)
- Error-handling
W3C Member Submission: XML Pipeline Language XPL (2005)
- Defines XPL, a language for pipelined processing (XML syntax)
- Seems separate, and without much follow-up
- Submission from orbeon.com (who have an xforms implementation - LGPL
licensed)
- Some extra ideas
- again declaration of input and output
- quite some support for non-lineair processing (decission making while in
pipe)
W3C XProc: An XML Pipeline Language (2007)
- Active work (recent draft from November 2007) Looks like (and probably is
progress on) the 2002 Note.
- Quite some projects/implementations refering to it
- Extra ideas:
- non lineair processing
- introduces 'step' either being compound or atomic
- has xpath support ?
- introduces and describes 'environment'
- mainly ports for input/output
- distinct to 'context' which is for variables, function library, baseuri,
contextnode for xpath ...
- split between processor and step context
- there is other sorts of 'context' stuff in execution instances: language,
processer-info (like xpath version)
- unspecified xpath 1.0/2.0
Picked up ideas/questions
- does xproc have a way to refer/include/embed externally defined (virtual)
sub-pipes?
- should be standalone system, kauri modules could offer URL's that
refenerence xproc files and produce active transforming objects to be used in
filters.
Own API thoughts
- Have an enumeration of pipe-natures (dom, sax, trax, ....) for the ports
(input/output)
- And a system that can find listed matching pipe-fitters for those.
- By nature work in pull mode and fake a push by 'pulling' the start-trigger
trhough the pipe onto the source.
- Take a push-pull context free operation name like startFlow()
- Layer of reuseable factories 'TemplatePipe' and layer of statefull instance
objects 'ActivePipe'
- Note: the latter, low level runtime & stateful object might well be
kept hidden/internal to the the (Template)Pipe
- (in which case we could more clearly call those just Pipe)
- More ideas
- Pipe(Template)s(Registry)
- HashMap of (Template)Pipe objects
- void register("template-key", templatePipe)
- ActivePipe newActivePipe("template-key", environment) where environment
defines ports
- (Template)Pipe (takes composite pattern to allow for compound 'steps')
- getInFit("port-name") , getOutFit("name") both with a no-argument version
returning the default
- newActivePipe(environment) or if ActivePipe is kept hidden:
startFlow(environment)
- Fit (~port) has a 'nature' (sax, dom, ...)
- InFit.connect(OutFit of)
- OutFit.connect(InFit if)
- FitMatcher can work out mismatches
- ActivePipe
- has input-source & output-sink set through environment at creation time
- startFlow() produces the works from source > sink (how to know when
finished in case of tee?)
Possible implementations
(to build upon, integrate or join forces with)
- Apache Cocoon / Corona subproject
- http://svn.apache.org/viewvc/cocoon/whiteboard/corona/trunk/
- http://www.mail-archive.com/dev@cocoon.apache.org/msg56552.html
- Glassfish: XML Pipeline processor
- https://xproc.dev.java.net/
- yax
- http://yax.sourceforge.net/
- http://sourceforge.net/projects/yax/
- sxpipe
- https://sxpipe.dev.java.net/
- Norman Walshes' xproc implementation (currently unclear if this is distinct
from any of the others mentioned here)
- http://norman.walsh.name/2007/projects/xproc