Copyright © 2013 Stephen Roger Rowlison and Crane Softwrights Ltd.
$Date: 2013/10/13 00:20:18 $(UTC)
Table of Contents
This document describes a strategy for storing a set of OASIS UBL [UBL 2.0] and [UBL 2.1] documents in an SQL database. This database is configured with tables for all business information entities (BIE) as modelled by the UN/CEFACT Core Component Technical Specification [CCTS]. The CCTS approach to business document modeling is a level more abstract from generic XML as defined by XSD. This provides the opportunity to model the SQL database around CCTS concepts and not XSD concepts.
Thus, this strategy could more aptly be described as one mapping CCTS-modeled documents into SQL databases, of which UBL is but one example. Note that all UBL 2.0 documents are valid UBL 2.1 documents, so building the SQL tables out of UBL 2.1 definitions will support storing instances of UBL 2.0.
This approach also provides for pre-building all tables in advance of receiving any document instances, and then just populating the tables from the instance content. Moreover, while UBL documents can have recursive uses of the same constructs, this proposal utilizes the fixed number of pre-built tables regardless of the depth of an instance (i.e. the infinite recursion of the abstract model is not implemented as an infinite number of tables).
All information from the input UBL document is stored in the database, but not the syntax used to represent that information. When reconstituting a UBL document from the database as an XML instance, it will parse with the identical information but not the identical syntax. Per XML rules, such properties as namespace prefixes, white space in tags, order of attributes and others are issues of syntax and not information. As such these properties are not relevant to accurately representing the information in an XML document. Only some of the information, of the items listed the namespace prefixes, might be storable but without the benefit outweighing the overhead.
At this writing, accommodating extension elements is not considered. Standardized extensions can be modeled in the future (see Section 5.3, “Document extensions”).
In the CCTS design, a business information entity (BIE) describes in a business context some abstract concept. The aggregate BIE (ABIE) is an abstract structure that contains two kinds of children: a basic BIE (BBIE) is an expressed value and an association BIE (ASBIE) is a concrete instance of an ABIE found in a CCTS document.
In an aggregate every child BIE must have a cardinality of one of the four possibilities: mandatory, optional, mandatory and repeatable, or optional and repeatable.
The data type of an ASBIE is an ABIE, and not necessarily an ABIE
of the same name as the ASBIE. For example, the ASBIE
JurisdictionRegionalAddress
is an instance of the
Address
ABIE.
While a BBIE appears to be simply an expressed value, it in fact is
a composite value. The basic value is expressed as an instance of a
core component data type. For example, the BBIE
InstructionID
is an instance of the
Identifier
core component data type. There are
supplementary components for every core component that may be, and in
some cases must be, expressed in addition to the basic value. Consider
the "Amount" core component data type described at http://docs.oasis-open.org/ubl/cs1-UBL-2.1/mod/summary/reports/UBL-AllDocuments-2.1.html#UDT
(where all unqualified data types are listed and where they are used is
indicated by the hyperlinked left angle brackets). There is one
mandatory supplementary component named currencyID
and one optional supplementary component named
currencyCodeListVersionID
.
One can see at that link that only date, time and boolean values do not
have supplementary components, while all others do.
Note there are two kinds of ABIE: the document ABIE (which stands alone and is not currently ever referenced as an ASBIE, but may be in the future) and the library ABIE (which never stands alone and is always referenced by an ASBIE). To accommodate possible future references to a document ABIE, both the document ABIE and the library ABIE are modeled the same in this proposal.
The proposed essence of accommodating both cardinality and complex data types is to map every XML element in the UBL document into a set of pointers into rows. An ABIE row has pointers for every BBIE and ASBIE in that ABIE. Every abstract data type (both BBIE and ABIE) has an SQL table. Every element in the UBL document represents a row in a data type's SQL table. The ABIE is therefore only a set of reference values and the data type tables dereference that value into the set of objects for the parent.
Consider this example shown in Figure 1, “Table references”, though only slightly modified from reality to illustrate the concepts. Each table has reference ("Ref") and sequence ("Seq") columns. The reference values are unique identifiers of some kind, quickly looked up in a column, and not unique. The sequence column has numbers and are not unique. The combination of reference and sequence columns is unique across all rows in a table. The combination need not be unique across the entire database but may be so if the implementation is faster because of that choice.
An invoice instance begins by populating a new row in the Invoice
ABIE table. This row describes all children of the
<in:Invoice>
element. Being a document ABIE,
the reference and sequence columns are null.
The invoice child <cbc:ID>
element creates
the reference A to the Identifier Data Type table. A row in that table
is created with the reference A and the sequence number 1. The
element's value is put in the value column ("Core") and any attributes
that may have been found in the element populate the corresponding
columns in that table.
Consider for illustration only that the cardinality of
<cbc:IssueDate>
is "1..n
"
and not "1
" and the input instance has two such date
elements. The one Issue Date column is assigned the reference B and
two rows are created in the corresponding Date Data Type table,
distinguished in their order by the sequence values
1
and 2
. Note there are no
supplementary components on date values and so no attributes are
expected to be found.
Another child of the invoice is <cbc:DueDate>
and it creates a Due Date column value of reference C, and as there is
only one there is only one row created in the table with that reference
and the sequence number 1
.
The invoice child <cac:PrepaidPayment>
creates the reference D and a row in the Payment ABIE table (note the
ABIE name is different than the ASBIE name), and as there is only one
there is only one row created in the table with that reference and the
sequence number 1
. The payment columns are shown
here: http://docs.oasis-open.org/ubl/cs1-UBL-2.1/mod/summary/reports/UBL-AllDocuments-2.1.html#Table_Payment.Details
.
Note the diagram shows ASBIE columns for illustration of other library
ABIE constructs as the payment ABIE doesn't actually happen to have any
ASBIE.
The prepaid payment child <cbc:PaidDate>
element creates the reference E to the Date Data Type table. A row in
that table is created with the reference E and the sequence number 1.
The element's value is put in the value column ("Core").
The prepaid payment child <cbc:InstructionID>
element creates the reference F to the Identifier Data Type table. A
row in that table is created with the reference F and the sequence
number 1. The element's value is put in the value column ("Core") and
any attributes that may have been found in the element populate the
corresponding columns in that table.
In UBL CS1 there are 66 document ABIE constructs, 221 library ABIE constructs and 14 unqualified data types, thus requiring only 294 tables to be defined to store all possible UBL instances.
The style of references ("Ref" in the figure) needs to be chosen based on quick lookup in the tables.
In a valid UBL document an ABIE can never be empty and a BBIE can
never be empty. See http://docs.oasis-open.org/ubl/cs1-UBL-2.1/UBL-2.1.html#EMPTY-ELEM
for details.
Many UBL elements are optional. When an ABIE row is being populated, all columns of absent child elements are set to null.
Many UBL elements are repeatable. When an ABIE row is being populated, only a single column for the repeated child is populated, that being with the reference to the ASBIE or data type of that child. As many rows in the applicable ABIE or data type table are populated for each of the repeated elements. Each row is assigned the ordinal sequence number.
An ABIE table has the reference and sequence columns, then as many BBIE and ASBIE columns as required for one of all possible children for an instance of the ABIE.
A data type table has the reference, sequence and core value columns, then as many supplementary value columns as supplementary components are defined for the core component type. The BBIE element value goes in the core value column and the BBIE attribute values go into their corresponding supplementary component column. Absent attributes are represented with null values in the corresponding column.
The files generated in this package represent SQL table definitions for
UBL 2.1 CS1 documents available
at
http://docs.oasis-open.org/ubl/cs1-UBL-2.1/
.
Version with comments: ubl-20131013-0020z-commented.sql
Version without comments: ubl-20131013-0020z.sql
Asking for an UBL document to be unmarshalled into the database is to populate the database columns with the information from the XML instance.
Every one of the 66 document ABIE elements is in their own unique namespaces. Users may use any prefix in the XML instance.
Every one of the 221 library ABIE elements is in the aggregate namespace
"urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2
",
referred to in UBL documentation with the "cac:
"
prefix but users may use any prefix in the XML instance.
Every BBIE element is in the basic namespace
"urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2
",
referred to in UBL documentation with the "cbc:
"
prefix but users may use any prefix in the XML instance.
The unmarshalling code is written (or synthesized) once a priori to know which ABIE table each document ABIE and each ASBIE element will be stored in, and which data type table each BBIE element will be stored in. This is unambiguous by the combination of namespace and name. Note it is ambiguous by only the name, thus the namespace cannot be ignored.
Starting with the document ABIE on detection, a row is made in the corresponding ABIE table. No reference or sequence number is used because the row is not being referenced. This distinguishes those document ABIE rows that begin documents from those in future UBL documents that are embedded, and will thus have reference and sequence numbers.
The code will encounter all BBIE elements of the ABIE first and all ASBIE elements after all BBIE elements. BBIE columns are populated with references to rows in a data type table. ASBIE columns are populated with references to rows in an ABIE table. Null values are put into columns of child elements absent from the ABIE.
Asking for a particular document instance to be marshalled into XML
syntax is to start at that document's row in that document's ABIE table.
Typically (but not required) the default namespace is used for the
emitted document element, in which case no prefix is used. The document
namespace is of the form
"urn:oasis:names:specification:ubl:schema:xsd:
".
Typically (but not required) the document element also includes the
declaration of the aggregate namespace
"{document-element-name}
-2urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2
"
(typically with the "cac:
" prefix) and the basic
namespace
"urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2
(typically with the "cbc:
" prefix).
The UBL document is accurately reproduced by following all column references and constituting all elements from referenced rows in the tables in sequence order.
A BBIE element is emitted in the basic namespace. An ASBIE element is emitted in the aggregate namespace.
Note that in XML element order is important and attribute order is irrelevant. Namespace URI strings are important and namespace prefixes are irrelevant (but following conventions is polite).
In this proposal, all BBIE field and values are stored in the corresponding unqualified data type table. Thus, constraints imposed on the unqualified data type table are imposed on all instances of that type. In XML there is no maximum length or value of any BBIE. If SQL mandates a maximum length of a field, that maximum length is for all values of the data type for the table.
This can be finessed later on by creating qualified data type
tables and cherry-picking which BBIE elements are associated with which
qualified data type. While the schemas do not constrain the values of a
BBIE, the second pass validation step in UBL validation can be modified
to check such constraints. See http://docs.oasis-open.org/ubl/cs1-UBL-2.1/UBL-2.1.html#CODELISTS
.
Note that the binary object unqualified data type could be very large. Though it is made up of only text characters (in base64 encoding), it is opaque and could be treated as a blob.
Assuming the integrity of party identifiers (at a business level), would it be possible to identify that after loading the database with the information from an instance it is a copy of an existing item? Doing so, would there then be only one instance of the party expressed in the database?
In fact this might not be desirable because if one changes the party information for the generated invoice for a corresponding order, the party information in the corresponding order would change and not reflect the true data of the order.
The one (so far) standardized extension for signatures can be modeled the same as CCTS elements, except for an opaque signature descendent element that can be stored as a blob or text sequence. Signature generation in the database would not be possible and has to be done after the syntax of the document is marshalled out of the database.
Non-standardized extensions can be modeled as a blob or text sequence.
The extension point scaffolding elements are well-defined and can be modeled the same as the CCTS elements.
Non-standardized extensions that follow the CCTS principles as illustrated in most of the signature extension could be modeled in the database in a like fashion to the rest of UBL.
We welcome feedback from all readers regarding the utility of these
SQL definitions and the viability of the strategy we propose for storing
UBL documents in a database. Please be sure to copy Stephen Rowlison
srowlison@puremethods.com
and
G. Ken Holman gkholman@CraneSoftwrights.com
when contacting us with your comments and questions.
[CCTS] Core Components Technical Specification Version 2.01 - 15 November 2003
[UBL 2.0] Jon Bosak, Tim McGrath, G. Ken Holman Universal Business Language (UBL) Version 2.0, UBL 2.0 Update (Errata), OASIS UBL Technical Committee 2006
[UBL 2.1] Jon Bosak, Tim McGrath, G. Ken Holman Universal Business Language (UBL) Version 2.1 Committee Specification OASIS UBL Technical Committee 2013