Simplified UBL schema customization

"Portions originally copyright (C) 2006 National IT and Telecom Agency, Denmark" (as licensed under a Creative Commons Attribution 2.5 License. See http://creativecommons.org/licenses/by/2.5/ for details.)

$Date: 2006/12/21 21:00:14 $(UTC)


Table of Contents

1. Introduction
2. Customization subset specification
3. Process flow
4. Defining extensions to the customization
5. Installing the software for the environment
5.1. UBL 2.0
5.2. The Open Office suite
5.3. Apache Ant
5.4. Java tools.jar
5.5. Saxon 8 for XSLT 2.0
6. Demonstration
6.1. utility/ directory
6.2. scenario/ directory
Bibliography

1. Introduction

The OASIS Universal Business Language [UBL 2.0] defines a family of powerful, flexible and large document types to provide for many possible uses of information in business documents.

Not all communities of users need the flexibility found in the published document models. These users can live within a smaller set of information items in each document type based on the scenario in which they are using business documents. This smaller set may vary for a given document type based on the scenario where the business document is used.

Unlike other approaches that synthesize all of the schema declarations from the spreadsheet definitions of the business objects, this "Simplified UBL schema customization" environment defines a pruning process. This pruning process produces customization schema expressions direct from the original UBL schema expressions published by the UBL Technical Committee by commenting out constructs that are not needed in a customization. This pruning process is based on a spreadsheet specification of which UBL constructs are in and not in the customization.

Note

This environment does not try to address the information management processes by which a community of users decides what belongs in and not in a customization. There is a lot of intricate detail in managing a suite of document types used in a set of profiles. When all of the business decisions have been made about what is and is not included from the original UBL business objects, then customization spreadsheets can be populated accordingly to indicate those decisions. This environment acts on those spreadsheets.

A demonstration of this subset environment is included in the ZIP package in which this documentation is found. Two illustrative customizations, complete with a functional (but fictional) non-UBL extension, is defined. By mimicking the customization spreadsheet and the extension schema fragments, one can define and use one's own set of OASIS UBL customizations.

Note

This environment does not detect or accommodate differing definitions of aggregate business objects in different contexts. As with UBL 2.0, all aggregate business objects are assumed to have the identical definition in all contexts of use.

2. Customization subset specification

A set of office document spreadsheets [CustSpread] in the UBL Technical Committee repository can be used to specify the standardized UBL constructs that are in and not in a set of UBL customizations one might maintain for different profiles. The documentation accompanying the spreadsheets has the detailed instructions.

Two kinds of subsets can be specified using the spreadsheets: a "strict" subset defining those information items that a community defines as the core of a customization, and a "permitted" subset defining items in addition to core items that are allowed to be present if users wish, but are not expressly part of the customization definition. Neither subset includes those standardized information items that are expressly excluded from the customization.

The spreadsheets have a customization "Usage" column in which an information item's applicability is indicated using a string value. The string value "USED" indicates the item is part of the strict subset of the customization. The string value "EXCLUDED" indicates the item is not a part of any subset of the customization. The empty string (an empty cell) indicates the item is part of the permitted subset of the customization.

Note

Marking something as part of the strict or permitted customization does not make it mandatory. Such marking indicates that the information item (be it mandatory or optional) is included in the customization. The string "USED" does not imply the item is required.

The spreadsheets have a customization "Cardinality" column in which an information item's cardinality is indicated using the same notation as the original cardinality column. Mandatory and non-repeating objects have a cardinality of "1". Other typical cardinality indications are "0..1", "0..n" and "1..n". Any numeric value can be used for these provided the minimum number is less than or equal to the maximum number, and both are inclusively within the original values of cardinality.

The spreadsheet customization "Usage" column is initialized with all mandatory constructs having "USED" and all optional constructs having "EXCLUDED". The spreadsheet customization "Cardinality" column is initialized with the original cardinality. The specification of a subset is accomplished by modifying the cardinalities of all information items as desired, and indicating those optional constructs that are part of the strict or permitted customizations.

Note

This simplified environment assumes all mandatory constructs are obliged to be used. There is no review of the paths of information items to check their obligatory nature in the context of an absolute path from the document element. This will result in unused declarations of elements being present in the result, but this does not impact on document validation.

Any given community of UBL users may have multiple customizations of the same document model. When a community has a number of profiles of business exchanges, a given document model (say Invoice) might have more available constructs to be used in a complex procurement scenario than those used in a simple procurement scenario. Multiple customization subsets can be specified in a single spreadsheet, distinguished by unique prefixes attached to each of the "Usage" and "Cardinality" columns specifying the customization. Given that the spreadsheets are locked when downloaded from the OASIS web site, it is necessary to temporarily unlock the spreadsheet, duplicate the columns as many times as required, then locking the spreadsheet again to prevent damage to other rows. Note that in some spreadsheets one must delete the apparent (but not actually) empty columns appearing after the UBL columns to make room for the new columns to be inserted.

The demonstration spreadsheets illustrate two customizations of the Invoice document model and common library. There are two pairs of Usage/Cardinality columns, the first prefixed with "Demo1-" and the second prefixed with "Demo2-". The demonstration process produces two sets of schema documents, one for each customization, and differing based on the

3. Process flow

The processes involved in creating a new suite of schema fragments from the published suite of schema fragments is outlined in Figure 1, “Process Overview (no extensions)”. The box labeled "Pruning Process" is the Ant [Ant] script "filterXSDDK.xml", named to indicate the Danish provenance of some of the program fragments that were used. Though a Linux version is not included with this package, the compact batch files invoking this script can be easily converted into shell scripts. The invoked flow managed using Ant is implemented in Java and portable across all Java environments.

Figure 1. Process Overview (no extensions)

Process Overview (no extensions)

Note

The pruned fragments must have the identical filename as the corresponding fragment being pruned in order that existing import directives that incorporate the original fragment properly access the declarations in the pruned fragment.

Shown in the diagram are the customization specification spreadsheets, the UBL XSD schema fragments, and the customization XSD schema fragments. The XSD fragments in the customization that are different than the UBL fragments are indicated as "XSD'".

4. Defining extensions to the customization

Extensions are defined for a customization when a required business object (be it optional or mandatory) cannot be found in the UBL set of predefined business objects. All UBL document types have an extension point under which extensions are declared when necessary. Not all customizations have extensions, and in fact, very few should as the UBL specification recommends that extensions be used only when standardized constructs cannot be used.

The steps described in this section are in addition to the steps described earlier for pruning the standardized set of constructs into a customized subset of constructs.

Figure 2, “Process Supplement (with extensions)” shows how the UBL-ExtensionContentDatatype-2.0.xsd schema fragment is replaced with a schema fragment incorporating the extension constructs into the document model. As delivered by the UBL TC, this original fragment defines a wildcard that accepts any extension content from any namespace vocabulary. The replacement fragment accepts the choice between a recognized extension element and an element from any other namespace. This engages the remaining schema fragments that define the constraints on the use of the extension vocabulary.

Figure 2. Process Supplement (with extensions)

Process Supplement (with extensions)

Note

The replacement fragment must have the identical filename as the fragment being replaced in order that existing import directives that incorporate the replaced fragment properly access the declarations in the replacement fragment.

5.  Installing the software for the environment

This environment takes advantage of a number of publicly-available resources, some of which are quite substantial to obtain and install.

It does not matter where these components are installed in your system. The instructions for the running of the demonstration to test the environment makes reference to the environment variables to set that engage the various components.

5.1.  UBL 2.0

The Universal Business Language (UBL) 2.0 package [UBL 2.0] must be installed because this filter environment creates new schema files by reading the published schema files and removing unneeded constructs.

5.2.  The Open Office suite

If your source spreadsheets are Microsoft Excel XLS and not Open Document ODS, it is necessary to install and then configure Open Office with a custom macro as part of the installation.

OpenOffice.org [OpenOffice] is a multiplatform and multilingual office suite and an open-source project. This filter environment uses the "Calc" program in an unattended fashion to convert an instance of Microsoft Excel spreadsheet to an instance of OpenOffice.org spreadsheet.

As part of this filter environment, the utility/ directory has an Open Office file named CraneXLS2ODSMacro.odt that has a copy of the macro you can use to embed into your installation. Follow the illustrated instructions in the file that contains the macro in order to install the macro into your standalone environment.

It is acceptable to answer "Disable Macros" when opening the .odt file.

5.3.  Apache Ant

Apache Ant [Ant] is a Java-based build tool used for the choreography and orchestration of system tasks and programs. This filter environment uses Ant to invoke all of the necessary processes in the necessary order because Ant is supported across both Windows and Linux in a platform independent fashion.

5.4.  Java tools.jar

This filter environment's use of Apache Ant triggers the need in Ant for a Java JAR file named "tools.jar" from the JDK (Java Development Kit) that is not in the JRE (Java Runtime Environment). If you have installed only the Java runtime, then this important file will be missing, it will need to be obtained from the development kit and it is most conveniently placed into the Java runtime's lib/ subdirectory.

5.5.  Saxon 8 for XSLT 2.0

The Saxon 8 [Saxon] Java JAR file supporting XSLT 2.0 is required because the XSLT stylesheet processor built into Apache Ant supports only XSLT 1.0 and this filter environment takes advantage of new language features available in XSLT 2.0.

6.  Demonstration

There is a demonstration environment included in the ZIP package in which this documentation is found. The following files specify the two demonstration customizations of UBL 2.0. Note that the scenario/makexsd.bat and utility/filterXSDDK.bat invocation files must be modified to reflect the location of your installations of software.

6.1.  utility/ directory

The invocation files for this environment that produce the end result pruned schemas are:

  • filterXSDDK.bat

    • invoke the Ant filterXSDDK.xml process to prune a schema fragment

    • this batch file must be modified to find files in your environment by changing the following environment variables such that they point to your installation of Saxon [Saxon], to your installation of Ant [Ant] (noting the inclusion of the trailing slash), and to the invocation string you use to run Open Office [OpenOffice]:

      • set Saxon8Dir=p:\xml\xslt\saxon8\
        set AntDir=p:\apache\ant\
        set OpenOfficeRun=p:\OpenOffice.org 2.0\program\soffice.exe
        
      • note that the OpenOfficeRun value is the name of the Open Office executable program file suitable for the attribute named executable= for the Ant task named <exec>; this may be an explicit path and file name as indicated above, or an abbreviated use of only the file name if it can be found on the command path

  • CraneXLS2ODSMacro-20061221-1600z.odt

    • a macro and macro installation instructions for Open Office needed by filterXPathDK.xml

  • filterXPathDK.xml
    filterXSDDK.xsl
    odsUBL2xml.xsl
    xmlUBL2nested.xsl
    • various support Ant scripts and XSLT 2.0 stylesheets

6.2.  scenario/ directory

The demonstration files are as follows:

  • demo1.xml

    • an instance conforming to the Demo1 customization

  • demo2.xml

    • an instance conforming to the Demo2 customization

  • Demo-Invoice.xls

    • the specification of the use of the library constructs for the document element of each customization

  • Demo-CommonLibrary.xls

    • the specification of the definition of the library constructs of each customization

  • Demo-ExtensionContentDatatype.xsd

    • the replacement of the definition of the extension point

  • Demo-ExtensionDefinition.xsd

    • the definition of extension constructs beyond the restricted subset of the UBL constructs

  • DemoCopyright.txt

    • the text to inject into the modified schema fragments

The invocation files for this environment that produce the end result pruned schemas are:

  • makexsd.bat

    • invoke the necessary copying and transforming processes to produce customization schema fragments

    • invoke the batch file to prune a schema fragment using consistent arguments

    • this batch file must be modified to find files in your environment by changing the following environment variable such that it points to your installation of the UBL 2.0 package of schemas:

      • set UBL2xsd=u:\cd\artefacts\os-UBL-2.0\xsd\
        
      • you can choose to pass either the UBL 2.0 "xsd\" or "xsdrt\" directory into this process in order to produce a customized set of schemas based on either the full or runtime versions of the schema fragments

  • makesetxsd.bat

    • invoke the batch file to create a set of arguments for a particular profile

  • makeonexsd.bat

    • invoke the batch file to prune a schema fragment using consistent arguments

After running "makexsd.bat", the resulting set of schema fragments in the two subdirectories are as follows (note how the filenames below the subdirectory are unchanged from the UBL distribution and, thus, can be plugged into any existing UBL deployment):

  • Demo1Invoice\common\CCTS_CCT_SchemaModule-2.0.xsd
    Demo1Invoice\common\CodeList_CurrencyCode_ISO_7_04.xsd
    Demo1Invoice\common\CodeList_LanguageCode_ISO_7_04.xsd
    Demo1Invoice\common\CodeList_MIMEMediaTypeCode_IANA_7_04.xsd
    Demo1Invoice\common\CodeList_UnitCode_UNECE_7_04.xsd
    Demo1Invoice\common\Demo-ExtensionDefinition.xsd
    Demo1Invoice\common\UBL-CommonAggregateComponents-2.0.xsd
    Demo1Invoice\common\UBL-CommonBasicComponents-2.0.xsd
    Demo1Invoice\common\UBL-CommonExtensionComponents-2.0.xsd
    Demo1Invoice\common\UBL-CoreComponentParameters-2.0.xsd
    Demo1Invoice\common\UBL-ExtensionContentDatatype-2.0.xsd
    Demo1Invoice\common\UBL-QualifiedDatatypes-2.0.xsd
    Demo1Invoice\common\UnqualifiedDataTypeSchemaModule-2.0.xsd
    Demo1Invoice\maindoc\UBL-Invoice-2.0.xsd
    Demo2Invoice\common\CCTS_CCT_SchemaModule-2.0.xsd
    Demo2Invoice\common\CodeList_CurrencyCode_ISO_7_04.xsd
    Demo2Invoice\common\CodeList_LanguageCode_ISO_7_04.xsd
    Demo2Invoice\common\CodeList_MIMEMediaTypeCode_IANA_7_04.xsd
    Demo2Invoice\common\CodeList_UnitCode_UNECE_7_04.xsd
    Demo2Invoice\common\Demo-ExtensionDefinition.xsd
    Demo2Invoice\common\UBL-CommonAggregateComponents-2.0.xsd
    Demo2Invoice\common\UBL-CommonBasicComponents-2.0.xsd
    Demo2Invoice\common\UBL-CommonExtensionComponents-2.0.xsd
    Demo2Invoice\common\UBL-CoreComponentParameters-2.0.xsd
    Demo2Invoice\common\UBL-ExtensionContentDatatype-2.0.xsd
    Demo2Invoice\common\UBL-QualifiedDatatypes-2.0.xsd
    Demo2Invoice\common\UnqualifiedDataTypeSchemaModule-2.0.xsd
    Demo2Invoice\maindoc\UBL-Invoice-2.0.xsd
    

The resulting set of schema fragments can be tested by validating the supplied demo1.xml and demo2.xml test instances.

Bibliography

[Ant] Apache Software Foundation The Apache Ant Project

[OpenOffice] OpenOffice.org The Open Office Project

[UBL 2.0] Jon Bosak, Tim McGrath, G. Ken Holman Universal Business Language Version 2.0 (documentation) (ZIP) 2006-12-12

[Saxon] Michael Kay Saxon 8 for XSLT 2.0