Crane's UBL 2.0 to UN Layout Key PDF and HTML stylesheets

G. Ken Holman

Crane Softwrights Ltd.

$Date: 2011/02/03 16:00:12 $(UTC)


Table of Contents

1. Introduction
1.1. Package updates
1.2. Multiple-language support
2. Supported documents
2.1. UBL Invoice to UN Layout Key form 380
3. Installation
4. Invocation
4.1. Stylesheet association in web browsers
4.2. Crane's supplied command-line invocation
4.3. Invocation parameters
4.4. Customized invocation for PDF
5. Accessing a UBL instance embedded in the PDF XMP metadata
6. Troubleshooting problems
7. Behind the scenes
7.1. Acknowledgement
7.2. Release notes
Bibliography

1. Introduction

The Universal Business Language 2.0 [UBL 2.0] from the Organization for the Advancement of Structured Information Standards [OASIS] describes the document models of a number of different business documents.

The United Nations Layout Key for Trade Documents [UNLK] describes the paper representation of a number of different business documents.

Crane Softwrights Ltd.'s UBL 2.0 to UN Layout Key PDF and HTML stylesheets[Crane Resources] environment is used to create a PDF file from a UBL instance.

When completely installed, this environment works under Windows, Mac and Linux by using the command prompt or terminal window. For browser-based HTML presentation the stylesheets are complete. For command line-based PDF and HTML transformation the only requirement is that a version of Java is available on your machine. You can create your own command line-invocation environment provided you are able to recreate the Java invocation with the libraries and the arguments given. The only command line-invocation files supplied are for Windows, and invocation arguments are described so that these can be easily adapted to a shell script for Mac and Linux.

1.1. Package updates

Please check Crane's download directory [Crane Resources] for any updates to this library and Crane's directory at Visual Programming [Ibex] for updates to the rendering engine. Please report any errors or feedback to info@CraneSoftwrights.com as this would be very much appreciated. Based on feedback and further developments, new packages will be posted periodically.

1.2. Multiple-language support

The UN Layout Key presentation supports English and at most one other language in the boxes drawn on the page.

If you need to support a language not already supported, you are invited to contribute a translation of the strings with explicit permission for Crane to submit the translation to the OASIS Human Interface Subcommittee (HISC) for publication in UBL documents. We would then be pleased to include the support for this language for anyone to freely download.

The strings are found in two files for each document type, and you only need to submit an edited version of one of these files, not both. For example, for the Invoice document type then:

Note that any strings in those files labeled with a "_2" suffix are labels used on the continuation page, rather than on the first page. Typically these should be identical to the strings used for the labels identified without the suffix, and the translated strings themselves would not include the suffix as part of the translation.

Please forward the translated file and your permission for us to submit it to OASIS to info@CraneSoftwrights.com and we will acknowledge its receipt as soon as possible.

2. Supported documents

At this time the only supported UBL document type is Invoice for use with the UNLK 380 form.

If you are interested in helping the cause of the UBL Human Interface Subcommittee [HISC] in developing more output specifications (from which these stylesheets are derived), please contact us at info@CraneSoftwrights.com and we will can tell you what information the subcommittee needs to progress the work.

As new documents are supported, new subsections will be added here.

2.1. UBL Invoice to UN Layout Key form 380

Stylesheet names to invoke, listed with the supplemental stylesheets that need to be available in the same directory as the invoked stylesheet:

  • XML to PDF

    • pro forma invocation name: xslt/CraneUBL2UN380Invoice-{language{-us}}.xsl

      • e.g. Portuguese to A4 page size: xslt/CraneUBL2UN380Invoice-PT.xsl

      • e.g. French to US-letter page size: xslt/CraneUBL2UN380Invoice-FR-us.xsl

    • supplemental Ibex files when using the Ibex Signature edition:

      • pro forma stylesheet manifest file: xslt/support/CraneUBL2UN380Invoice-{language{-us}}.xml

      • Ibex configuration file: xslt/support/ibexconfig.xml

        • this is delivered configured for a Windows environment and would need to be changed for font information in another environment

    • supplemental stylesheet fragments automatically incorporated:

      • xslt/CraneUBL2UN380Invoice-EN.xsl

      • xslt/support/CraneUBL2UN380Invoice-common.xsl

      • xslt/support/CraneUN380Invoice-{language}.xsl

      • xslt/support/CraneUN380Invoice-EN.xsl

      • xslt/support/CraneUBL2UNLK-print.xsl

      • xslt/support/CraneUBL2UNLK-common.xsl

  • XML to HTML

    • pro forma invocation name: xslt/CraneUBL2UN380Invoice-html-{language}.xsl

      • e.g. Icelandic: xslt/CraneUBL2UN380Invoice-html-IS.xsl

      • e.g. Norwegian: xslt/CraneUBL2UN380Invoice-html-NO.xsl

    • supplemental stylesheet fragments automatically incorporated:

      • xslt/CraneUBL2UN380Invoice-html-EN.xsl

      • xslt/support/CraneUBL2UN380Invoice-common.xsl

      • xslt/support/CraneUN380Invoice-{language}.xsl

      • xslt/support/CraneUN380Invoice-html-EN.xsl

      • xslt/support/CraneUBL2UNLK-common.xsl

Example command-line invocations (see Section 3, “Installation” and Section 4, “Invocation” for details):

  • XML to PDF

    • e.g. Spanish to A4 page size: xml2pdf invoice.xml invoice-a4.pdf 380 Invoice ES {parameters}

    • e.g. English to US-letter: xml2pdf invoice.xml invoice-us.pdf 380 Invoice EN-us {parameters}

  • XML to HTML

    • e.g. Italian: xml2html invoice.xml invoice.html 380 Invoice IT {parameters}

At this time the languages supported for the UBL invoice are (in alphabetical order of language code) as follows; the language code in the first column is important for invocation:

Table 1. Languages, sample 2-page renderings, HTML rendering, translation adaptation stylesheet
 LanguageA4 InvoiceUS-letter InvoiceHTML InvoiceBox label translations
BiHBosnian A4 sample PDFUS-letter sample PDFSample HTMLCraneUN380Invoice-BiH.xsl
CZ Czech A4 sample PDFUS-letter sample PDFSample HTMLCraneUN380Invoice-CZ.xsl
DA Danish A4 sample PDFUS-letter sample PDFSample HTMLCraneUN380Invoice-DA.xsl
DE German A4 sample PDFUS-letter sample PDFSample HTMLCraneUN380Invoice-DE.xsl
EN English A4 sample PDFUS-letter sample PDFSample HTMLCraneUN380Invoice-EN.xsl
ES Spanish A4 sample PDFUS-letter sample PDFSample HTMLCraneUN380Invoice-ES.xsl
FI Finnish A4 sample PDFUS-letter sample PDFSample HTMLCraneUN380Invoice-FI.xsl
FR French A4 sample PDFUS-letter sample PDFSample HTMLCraneUN380Invoice-FR.xsl
IS Icelandic A4 sample PDFUS-letter sample PDFSample HTMLCraneUN380Invoice-IS.xsl
IT Italian A4 sample PDFUS-letter sample PDFSample HTMLCraneUN380Invoice-IT.xsl
NL Dutch A4 sample PDFUS-letter sample PDFSample HTMLCraneUN380Invoice-NL.xsl
NO Norwegian A4 sample PDFUS-letter sample PDFSample HTMLCraneUN380Invoice-NO.xsl
PT PortugueseA4 sample PDFUS-letter sample PDFSample HTMLCraneUN380Invoice-PT.xsl
SK Slovak A4 sample PDFUS-letter sample PDFSample HTMLCraneUN380Invoice-SK.xsl

For demonstration purposes, the UBL-Invoice-2.0-Example-mod.xml instance modified from the UBL 2.0 sample directory is rendered in the above links in all of the available languages. The UBL-Invoice-2.0-Example-mod2.xml instance is identical except for the addition of a stylesheet association processing instruction that automatically invokes the English HTML stylesheet.

3. Installation

For automatic browser-based presentation in HTML of an XML document, it is necessary to use an XSLT-aware web browser that supports stylesheet association [Assoc] (see Section 4.1, “Stylesheet association in web browsers”).

For command-line invocation for transforming to HTML or printing to PDF a UBL instance, one or two pieces of software owned by other companies and made freely available on their own websites must be downloaded:

  • needed for both HTML and PDF results:

    • saxon9he.jar from http://saxon.sf.net/ [Saxon] (Saxon-HE - home edition)

      • as of this writing, the library has been tested with version 9.3.0.4

      • even though the XSLT files are labeled internally for XSLT 2.0, they have been designed to run with conforming XSLT 1 processors if that is what you have

  • also needed for PDF results:

    • ibex-crane-ss-A.B.C.D.jar [Ibex] from http://www.xmlpdf.com/ibex-downloads-signed.html

      • this file as downloaded needs to be renamed to ibex-crane-ss.jar to remove the version number

      • as of this writing, the library has been tested with version 4.7.2.7 (there may be problems with earlier versions of this JAR file)

      • please be sure to download Crane's version and not that of any other suppliers that may be listed, and any subsequent version that may be made available for Crane should be acceptable to use; check periodically for updates.

The sample invocations assume these two referenced jar files are placed in the base installation directory and renamed as directed.

Only those stylesheets referenced by a given language need be installed in the xslt/ and xslt/support/ directories, as it is not necessary to have the entire set of distribution files installed. There is no execution penalty for having the entire directories installed.

4. Invocation

For command-line invocation, both the HTML and PDF stylesheets can be invoked by any conformant processor that supports XSLT 1.0[XSLT 1.0] or XSLT 2.0[XSLT 2.0].

For command-line invocation, the XSL-FO intermediate print output from XSLT can be processed into PDF by any conformant processor that supports XSL-FO 1.0[XSL 1.0] or XSL-FO 1.1[XSL 1.1].

4.1. Stylesheet association in web browsers

For browser-based presentation in HTML of an XML document, modify your XML document to include a stylesheet association directive[Assoc] used to invoke the XSLT 1.0[XSLT 1.0] or XSLT 2.0[XSLT 2.0] transformation process automatically. The example from UBL-Invoice-2.0-Example-mod2.xml is seen as the second line beginning with "xml-stylesheet" in this fragment:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="CraneUBL2UN380Invoice-html-PT.xsl"?>
<Invoice xmlns:qdt="urn:oasis:names:specification:ubl:schema:xsd:QualifiedDatatypes-2"
 xmlns:ccts="urn:oasis:names:specification:ubl:schema:xsd:CoreComponentParameters-2"
 xmlns:stat="urn:oasis:names:specification:ubl:schema:xsd:DocumentStatusCode-1.0"
 xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2"
 xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2"
 xmlns:udt="urn:un:unece:uncefact:data:draft:UnqualifiedDataTypesSchemaModule:2" 
 xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2">
...

4.2. Crane's supplied command-line invocation

For illustration, the directory contains the batch file xml2pdf.bat that takes five pieces of information to produce output, as in the following pro-forma example to produce an English invoice in US-letter paper size PDF:

  xml2pdf input-file.xml  output-file.pdf  380  Invoice  EN-us  {parameters}

For illustration, the directory also contains the batch file xml2html.bat that takes five pieces of information to produce output, as in the following pro-forma example to produce an Danish invoice in HTML:

  xml2html input-file.xml  output-file.html  380  Invoice  DA  {parameters}

The arguments, in order, are as follows:

  • name of the input UBL file in XML

  • name of the output PDF file

  • number of the UN Layout Key form

  • name of the UBL document type

  • indication of language and format

    • the language prefix is the language code supported for the document type

    • the format suffix is omitted for HTML and A4 paper size, or is the suffix "-us" for US-letter paper size

During execution of the Ibex XSL-FO engine, various "info:" and "name from attribute" messages report the progress of the PDF production. There are problems when messages with "severe" or "error" are seen. A "warning" is typically related to font issues, but may be triggered by other non-fatal issues.

The following invocation will create the file example.html in HTML from the invoice sample:

  xml2html.bat UBL-Invoice-2.0-Example-mod.xml example.pdf 380 Invoice DE {parameters}

The following invocation will create the file example.pdf in A4 paper size from the invoice sample:

  xml2pdf.bat UBL-Invoice-2.0-Example-mod.xml example.pdf 380 Invoice IT {parameters}

The following invocation will create the file example-us.pdf in US-letter paper size in Spanish from the invoice sample:

  xml2pdf.bat UBL-Invoice-2.0-Example-mod.xml example-us.pdf 380 Invoice ES-us {parameters}

For regression testing, invoking in Windows the batch file runall.bat without arguments will create all of the PDF and HTML outputs possible from one of the samples included.

4.3. Invocation parameters

At the time that the XSLT stylesheet is invoked, various parameters can be passed to the stylesheet to influence the content of the resulting document.

4.3.1. Invocation parameters for the HTML stylesheets

The XSLT stylesheets for HTML generation support the following invocation parameters:

  • "alternate-currency=any value"

    • format currency values as "1 234,56" rather than "1,234.56"

  • "no-currency-formatting=any value"

    • present currency values as "1234.56" rather than "1,234.56"

4.3.2. Invocation parameters for the PDF stylesheets

The XSLT stylesheets for PDF generation support the following invocation parameters, though not all XSL-FO engines support the setting of all of these parameters:

  • "alternate-currency=any value"

    • format currency values as "1 234,56" rather than "1,234.56"

  • "no-currency-formatting=any value"

    • present currency values as "1234.56" rather than "1,234.56"

  • "title=title text"

    • embed this metadata into the document properties

  • "author=author text"

    • embed this metadata into the document properties

  • "subject=subject text"

    • embed this metadata into the document properties

  • "keywords=keywords text"

    • embed this metadata into the document properties

  • "owner-password=password text"

    • protect modifying the PDF file unless the given password is entered

  • "user-password=password text"

    • protect opening the PDF file unless the given password is entered

  • "omit-xmp=any value"

    • do not attempt to embed the input UBL instance into the XMP metadata

    • it may save time to use this option when you know your XSL-FO engine does not support embedding custom XMP metadata

Note that Ibex 4.7.2.7 [Ibex] and later versions of this XSL-FO engine support all of the above parameters. There may be problems using an earlier version of this engine.

4.4. Customized invocation for PDF

Should you wish to invoke Crane's PDF translation process yourself, you will need the following:

  • two Java JAR files on the class path:

    • saxon9he.jar for XSLT

    • ibex-crane-ss.jar for XSL-FO renamed from ibex-crane-ss-A.B.C.D.jar

  • either an argument to the configuration file: "-Dibexconfig=ibexconfig.xml" or adding the directory of the configuration file to the class path so that Ibex can find it automatically

  • a pointer to the appropriate XSLT transformation class:

    • -Djavax.xml.transform.TransformerFactory=net.sf.saxon.TransformerFactoryImpl

  • the name of the Ibex starting class: ibex.Signed

  • the manifest of the stylesheet being used: -manifest xslt/support/CraneUBL2UNXXXXYYYYY-ZZ.xml

  • the name of the input XML file: -xml input-file

  • the name of the input XML file a second time: -document-base-uri-xml input-file

  • the name of the stylesheet being used: -xsl xslt/CraneUBL2UNXXXYYYYY-ZZ.xsl

  • the name of the output PDF file: -pdf output-file

Where:

  • A.B.C.D is the version of the software downloaded per the instructions in Section 3, “Installation”;

  • XXX is the document number (e.g. 380 for the invoice);

  • YYYYY is the document name (e.g. Invoice);

  • ZZ is the language abbreviation (e.g. ES for Spanish); and

  • both the manifest and the stylesheet language abbreviations are suffixed with "-us" to indicate the use of the US-letter page geometry instead of the A4 page geometry

5. Accessing a UBL instance embedded in the PDF XMP metadata

When using an XSL-FO engine that supports embedding information in the Adobe XMP metadata of a PDF file (this library is currently configured for this function found the Ibex tool), one can use tools to export or extract the XMP metadata into a standalone XML file for subsequent processing.

Using Adobe Acrobat 8.2.5 this is accomplished as follows:

  • open the PDF document using Adobe Acrobat

  • press File/Properties to open the properties dialogue box

  • under the Description tab press the Additional Metadata button

  • in the left pane press Advanced to bring up the summary of XMP content

    • note the UBL instance is available when there is a section of XMP metadata labeled with the http://www.CraneSoftwrights.com/ns/XMP/ namespace

  • press Save... to write out the XMP content to any file

Alternatively, using Crane's Python application for XMP extraction this is accomplished as follows:

python Crane-PDF2XMP.py <ubl.pdf >ubl.xmp

Once the XMP file has been extracted from the PDF file, it is then processed through the included Crane-XMPExtract.xsl stylesheet as follows using saxon9he.jar or any other conforming XSLT 2.0 or XSLT 1.0 processor:

java -jar saxon9he.jar -o:ubl.xml -s:ubl.xmp -xsl:Crane-XMPExtract.xsl

Note that the included copy of Crane-XMPExtract.xsl is from a separately available package [Crane Resources] dedicated to the module. Please see the dedicated package for any updates should you experience problems with the stylesheet. The documentation is copied in xmp/readme-Crane-XMPExtract.html.

See Section 4.3.2, “Invocation parameters for the PDF stylesheets” for the command-line parameter to omit the generation of XMP metadata.

6. Troubleshooting problems

Should you encounter any problems not documented below, please contact us at info@CraneSoftwrights.com to allow us to add information to this section.

7. Behind the scenes

What you do not see happening is the use of the two XML stylesheet technologies: Extensible Stylesheet Language [XSL 1.0] and XSL Transformations [XSLT 1.0].

XSLT is used to read the XML instance of UBL and create a representation of the formatted result. The processor interpreting the XSLT is Saxon. This formatted representation is in XSL (also called XSL-FO for XSL Formatting Objects) and it is interpreted in order to produce a PDF result. The processor interpreting the XSL-FO is Ibex.

In this particular demonstration environment, however, the stylesheets are digitally signed and the XSL-FO processor will not run as delivered if the stylesheets are altered in any way. If you need to run altered stylesheets then you will need to obtain your own copy of a conforming XSL-FO processor.

If this demonstration environment is sufficient for your production use, you are welcome to use the unaltered stylesheets in a production environment.

7.1. Acknowledgement

Crane Softwrights Ltd. very much appreciates the efforts of Visual Programming Ltd. http://www.xmlpdf.com of New Zealand for their support in helping to create this ability to convert UBL documents to PDF, allowing anyone to use the Crane configuration of their conforming XSL-FO processor.

7.2. Release notes

The 2011-02-03 16:00z release supports the latest Ibex engine with a new invocation parameter (see Section 4.4, “Customized invocation for PDF”).

The 2011-01-29 18:40z release only repackages the previous release with a copy of the new XMP extraction package.

The 2011-01-28 20:00z release adds support for embedding the input UBL into the PDF XMP metadata, plus various bug report fixes and minor features.

The 2008-12-19 11:50z release addresses shortcomings in the use of the HTML vocabulary.

The 2008-07-23 20:50z release includes a new Dutch translation of Invoice box labels, and a repair in the common code for the punctuation in meta data.

Bibliography

[Saxon] Michael Kay Saxon

[UBL 2.0] Jon Bosak, Tim McGrath, G. Ken Holman Universal Business Language (UBL) Version 2.0, OASIS UBL Technical Committee 2006

[UNLK] United Nations Economic Commission for Europe United Nations Layout Key for Trade Documents - Recommendation No. 1

[XSL 1.0] Sharon Adler, et al Extensible Stylesheet Language Version 1.0 2001-10-15

[XSL 1.1] Anders Berglund Extensible Stylesheet Language Version 1.1 2006-12-05

[XSLT 1.0] James Clark XSL Transformations (XSLT) Version 1.0 1999-11-16

[XSLT 2.0] Michael Kay XSL Transformations (XSLT) Version 2.0 2007-01-23