Expectations and Features

Runitme Expectation

  • Standalone runtime system!

Developer Expectations

  • Have access to a flexible xml-pipelining architecture to convert xml dynamically in a streaming mode.
    • This should allow to hook up reusable XML filter components
    • as well as classic xslt processing, or pure streaming variants (like stx.sf.net)
  • Be able to combine this kind of transformer-sequences in groups of by themselves reuseable convertors
  • For efficiency this pipelining architecture should be optimized so it avoids uneccesary xml -> byte ser/deser
  • This should integrate nicely with the templating system:
    • i.e. template should be able to pull in results from pipelines
    • template outcome must be able to be handled in pipelines
    • pipelines must be able to pull in info from templates
    • Unsure: there must be some pipelining transformer component that can interprete/perform the templating syntax/ constructs
  • A ready set of useful configurable transforming components...
  • Declarative syntax for pipeline description/configuration
  • Profiling and tracing of intermediate passed formats
  • Generic context interface with hooks to embedding environment for
    • side-lane object passing
    • uri resolving
    • generic service lookup?
  • Distinct but related convertor system
    • fo to pdf, sv to png, gnumeric-xls, ...

(Re)Search and Inspiration out there

  • Obviously the archetypical XML Pipelining System Apache Cocoon.
    • Although we might find a way to only cater for transformers.
    • Serializer might get banned to actual and more generic converting system triggered by content negotiation.
    • Generator might get handled by just having 'source' (ie URI) handlers that also based on content negotiation know about delivering the content not as bytes but rather as some expected XML-API-Model
      Note: the Jon Postel axiom would suggest that any transformer component should be able to accept input form byte-streams anyway. (and perform parsing themselves)
    • Consider side-effects and tee-like processing constructs for multi-output generation.
  • Considering however a more flexible pipeline architectural foundation: i.e. not only support sax, but also stax, dom, xstream?
  • In the same section xslt 2.0 (via saxon) support should be considered.
  • Java - xml ser/deser http://xstream.codehaus.org/
  • This should also cater for sidewise xslt processing resources pulled in via the uriresolver
  • Don't overdo integration: having this kind of system available CLI/ separate from the request-response cycle seems more then just a bit useful (and one big complaint about cocoon where the integration is too tight and cannot be used outside that model)
  • SmallX project (https://smallx.dev.java.net/)

Required Artefacts and implementations

PreBuild Trajectory

  • Study and selection of modern XML APIs that should be supported in these pipelines.
  • Flexible modelling and structuring of pipeline-components. (Spring bean based? intermixed with the URI mapping system like cocoon's sitemap?)

Guidelines

  • Introduction into the advantages of the Two Step View (http://martinfowler.com/eaaCatalog/twoStepView.html)
  • Guidance on pipeline composition and structuring

Running code, Jars, APIs and docs for

  • XML Pipelining system with
    • flexible support for various foundational XML APIs
    • available optimal conversion between them
    • plugable uri-resolving features and
    • plugable uri-retrieval component (both to allow integration with not only the rest of kauri)
  • Various ready transformers
  • Integration code with the overall system
  • Documented extension points for custom xml processing components

Reading up

Google Results ("xml pipelines java")

  • General Wikipedia entries 
    • http://en.wikipedia.org/wiki/XML_pipeline
    • http://en.wikipedia.org/wiki/XProc
    • http://en.wikipedia.org/wiki/XML_transformation_language
  • W3C
    • "Note" XML Pipeline definition language (http://www.w3.org/TR/xml-pipeline/)
    • "Member submission (orbeon)" XML Pipeline Language (http://www.w3.org/Submission/xpl/)
    • "Working Draft" XML Pipeline Language (http://www.w3.org/TR/xproc/)
      • Number of blogposts by Norman Walsh on it: http://www.google.com/search?as_q=xproc&hl=en&ie=UTF-8&btnG=Google%2BSearch&as_qdr=all&as_occt=any&as_dt=i&as_sitesearch=norman.walsh.name
      • his list of implementations: http://del.icio.us/ndw/xprocimpl
  • Norman Walsh' sxpipe: http://norman.walsh.name/2004/06/20/sxpipe
  • yax, an xproc implementation (http://yax.sourceforge.net/)
  • infoset and xproc processing through smallx (https://smallx.dev.java.net/)
  • Stylus Component / Graphical Editor for their own 'xml pipelines' (http://www.stylusstudio.com/videos/pipeline2/pipeline2.html)
  • Oracle XML Developer Kit
    • http://www.oracle.com/technology/tech/xml/xdkhome.html
    • http://download-east.oracle.com/docs/cd/B19306_01/appdev.102/b14252/adx_j_pipeline.htm
  • Guide to XML pipelines: http://www.1060research-server-1.co.uk/docs/3.2.0/book/tutorial/doc_guide_xml_pipelines_main.html
  • Managing Complex Document Generation through Pipelining - Jeni Tennison - http://idealliance.org/proceedings/xtech05/papers/04-03-01/
  • Serving XML - http://servingxml.sourceforge.net/
  • Yahoo Pipes - http://pipes.yahoo.com/pipes/
  • STnG - http://www.idealliance.org/papers/extreme/proceedings//html/2003/Krupnikov01/EML2003Krupnikov01.html

Other

  • Obviously the archetypical XML Pipelining System Apache Cocoon.
    • Although we might find a way to only cater for transformers.
    • Serializer might get banned to actual and more generic converting system triggered by content negotiation.
    • Generator might get handled by just having 'source' (ie URI) handlers that also based on content negotiation know about delivering the content not as bytes but rather as some expected XML-API-Model
  • Other technologies to possibly interact with:
    • sax, dom
    • xslt 2.0 (via saxon) support should be considered.
    • Java - xml ser/deser http://xstream.codehaus.org/
    • stax
    • trax
    • stx
    • ...

Reviews

W3C Note: Pipeline Definition Language (2002)

  • Old (2002), evolved into XProc?
  • Some basic ideas, and suggestions
    • Based on xmlinfoset, support for xml base and xml namespaces
    • Flexible ordering based on dependecies...
    • Context defining input(s), output(s), URI resolver, variables
    • Allow XML syntax for pipeline declaration with embedded XML source (input) to the process.
    • Checking if 'result' is still up to date before processing! (based on ts of inputs and pipe itself)
    • Error-handling

W3C Member Submission: XML Pipeline Language XPL (2005)

  • Defines XPL, a language for pipelined processing (XML syntax)
  • Seems separate, and without much follow-up
  • Submission from orbeon.com (who have an xforms implementation - LGPL licensed)
  • Some extra ideas
    • again declaration of input and output
    • quite some support for non-lineair processing (decission making while in pipe)

W3C XProc: An XML Pipeline Language (2007)

  • Active work (recent draft from November 2007) Looks like (and probably is progress on) the 2002 Note.
  • Quite some projects/implementations refering to it
  • Extra ideas:
    • non lineair processing
    • introduces 'step' either being compound or atomic
    • has xpath support ?
    • introduces and describes 'environment' 
      • mainly ports for input/output
      • distinct to 'context' which is for variables, function library, baseuri, contextnode for xpath ...
        • split between processor and step context
      • there is other sorts of 'context' stuff in execution instances: language, processer-info (like xpath version)
    • unspecified xpath 1.0/2.0

Picked up ideas/questions

  • does xproc have a way to refer/include/embed externally defined (virtual) sub-pipes?
  • should be standalone system, kauri modules could offer URL's that refenerence xproc files and produce active transforming objects to be used in filters.

Own API thoughts

  • Have an enumeration of pipe-natures (dom, sax, trax, ....) for the ports (input/output)
  • And a system that can find listed matching pipe-fitters for those.
  • By nature work in pull mode and fake a push by 'pulling' the start-trigger trhough the pipe onto the source.
  • Take a push-pull context free operation name like startFlow()
  • Layer of reuseable factories 'TemplatePipe' and layer of statefull instance objects 'ActivePipe'
    • Note: the latter, low level  runtime & stateful object might well be kept hidden/internal to the the (Template)Pipe
    • (in which case we could more clearly call those just Pipe)
  • More ideas
    • Pipe(Template)s(Registry)
      • HashMap of (Template)Pipe objects
      • void register("template-key", templatePipe)
      • ActivePipe newActivePipe("template-key", environment) where environment defines ports 
    • (Template)Pipe (takes composite pattern to allow for compound 'steps')
      • getInFit("port-name") , getOutFit("name") both with a no-argument version returning the default
      • newActivePipe(environment) or if ActivePipe is kept hidden: startFlow(environment)
    • Fit (~port) has a 'nature' (sax, dom, ...)
      • InFit.connect(OutFit of)
      • OutFit.connect(InFit if)
    • FitMatcher can work out mismatches
    • ActivePipe 
      • has input-source & output-sink set through environment at creation time
      • startFlow() produces the works from source > sink (how to know when finished in case of tee?)

Possible implementations

(to build upon, integrate or join forces with)

  • Apache Cocoon / Corona subproject
    • http://svn.apache.org/viewvc/cocoon/whiteboard/corona/trunk/
    • http://www.mail-archive.com/dev@cocoon.apache.org/msg56552.html
  • Glassfish: XML Pipeline processor
    • https://xproc.dev.java.net/
  • yax
    • http://yax.sourceforge.net/
    • http://sourceforge.net/projects/yax/
  • sxpipe
    • https://sxpipe.dev.java.net/
  • Norman Walshes' xproc implementation (currently unclear if this is distinct from any of the others mentioned here)
    • http://norman.walsh.name/2007/projects/xproc