[Accessibility conventions are described at the bottom of the page]
8. Controlled vocabulary overview
[> 9.][< 7.1.11][^^^]
8.0 Controlled vocabularies in business documents
[> 8.1][> 9.][< 8.][^^][^^^]
Business documents have many information items valued using controlled vocabularies
[[1] - an abstract and compact value expressed to represent an agreed-upon semantic
[1] - often mnemonic in a particular language
[[2] - e.g. "USD" for the US dollar currency code
[2] - e.g. "ES" for the Spain country code
][1] - sometimes non-mnemonic to be language independent
[[2] - e.g. "42" for "Payment to bank account" payment means code
]]
Controlled vocabularies include codes and identifiers
[[1] - codes represent abstract concepts
[1] - identifiers distinguish concrete instantiations of concepts
[[2] - e.g. account codes specific to a trading partner
]]
Registration authorities are responsible for publicly-available value lists
[[1] - e.g. International Organization for Standardization (ISO)
[1] - e.g. United Nations Economic Commission for Europe (UN/ECE)
]
Trading partner agreements need a rigorous expression of constraints
[[1] - there is an opportunity for misunderstanding if the parties cannot agree a priori on the coded values acceptable to their
document exchanges
[1] - value constraints are layered on top of structural and lexical constraints for a business document vocabulary
[[2] - so as not to disturb the structural and lexical constraints for the documents
]]
Traditional use of XSD Schema enumerations to specify value lists is too restrictive
[[1] - ties the value validation to the structural and lexical validation in a single expression of the document constraints
[[2] - communities of users work with standardized expressions of document constraints
[2] - when business requirements need to be tailored, the structural expressions are tampered with
[2] - interoperability is promoted when the document constraint expressions are read-only and unchanged from the published standards
][1] - globally-declared information items have document-wide value constraints
[[2] - business rules for trading partners my require an information item to have different value constraints in different document
contexts
]]
Emerging standards for the outboard expression of controlled vocabularies
[[1] - OASIS code list representation technical committee
[[2] - [http://www.oasis-open.org/committees/codelist]
][1] - OASIS genericode 1.0
[[2] - [http://docs.oasis-open.org/codelist/genericode]
[2] - an XML vocabulary for the expression of a list of values
][1] - OASIS context/value association using genericode (draft)
[[2] - [http://www.oasis-open.org/committees/document.php?document_id=29990]
[2] - an XML vocabulary for the expression of the association of XML document contexts with lists of values
[2] - useful for validation or user interface implementation or any other purpose
[2] - independent of the XML vocabulary of the documents being validated
[[3] - works in step with any structural validation technology (e.g. XSD, RELAX-NG, DTD)
]]]
Crane's Schematron-based validation using CVA using genericode: CVA2sch
[[1] - one way to use CVA using genericode files for validation
[[2] - there are many possible uses of genericode files without obligation to use this approach
][1] - other OASIS committees and companies using XSD are considering adopting this methodology
[1] - migrating to become part of an Apache project for Schematron
]
Crane's "Practical Code List Implementation" book details the methodology
[[1] - see "Book excerpts" at [http://www.CraneSoftwrights.com/links/trn-20090212.htm]
[1] - the methodology applies to any XML vocabulary, not just UBL
]
The UBL package includes a representative default set of controlled vocabularies
[[1] - the genericode XML vocabulary is used to create an instance of an enumeration of values
[[2] - meta data identifies the set of values
][1] - a snapshot of controlled vocabularies is included as genericode files
[[2] - [http://docs.oasis-open.org/ubl/os-UBL-2.0/cl/]
[2] - includes meta data for all UBL lists
[2] - the UBL 2.0 package uses genericode 0.4
[2] - the UBL 2.0 update package uses genericode 1.0
][1] - trading partners can agree on their own lists of values to use
[[2] - would include meta data to identify the custom lists
]]
The defaultCodeList.xsl stylesheet is an informative (non-normative) implementation of the default set of codes
[[1] - [http://docs.oasis-open.org/ubl/os-UBL-2.0/val/]
[1] - recall the validation scenarios [UBL document validation - Section 5.2.8 UBL document validation] and [Figure 5.2]
[1] - created using an early version of CVA2sch
]
Trading partners can exchange context/value association files and genericode files
[[1] - can choose to use the default set of controlled vocabularies "out of the box"
[1] - can choose to select a different set of vocabularies
[[2] - represents an agreement to conform to code lists separate from the agreement to conform to UBL structures
][1] - the files are tailored to the particular business process agreed upon between trading partners
[1] - the files form part of the formal trading partner agreement
[1] - each party can have an independent implementation of the validation that uses these declarative files
[[2] - implementation choices are particular to a trading partner environment
][1] - each party continues to use published, standardized and unmodified structural and lexical expressions
[1] - partners can also agree on various business rules constraining the values of data
[[2] - expressed as assertions that need to be true or false regarding content found in the UBL instances
]]
8.1 Controlled vocabulary overview
[> 9.][< 8.0][^^][^^^]
8.1.1 UBL use of controlled vocabularies
[> 8.1.2][> 9.][< 8.0][^^][^^^]
Some elements and attributes in UBL are governed by controlled vocabularies
[[1] - the committee provides sample validation for a subset of items
[1] - communities can prescribe their own collections of lists
[1] - trading partners can prescribe their own collections of lists
]
[Example 8-1: Sample instance highlighting some coded values01 <?xml version="1.0" encoding="UTF-8"?>
02 <Invoice xmlns="urn:oasis:...:xsd:Invoice-2"
03 xmlns:cac="urn:oasis:...:xsd:CommonAggregateComponents-2"
04 xmlns:cbc="urn:oasis:...:xsd:CommonBasicComponents-2">
05 <cbc:ID>A00095678</cbc:ID>
06 <cbc:CopyIndicator>false</cbc:CopyIndicator>
07 <cbc:UUID>849FBBCE-E081-40B4-906C-94C5FF9D1AC3</cbc:UUID>
08 <cbc:IssueDate>2005-06-21</cbc:IssueDate>
09 <cbc:InvoiceTypeCode>SalesInvoice</cbc:InvoiceTypeCode>
10 <cbc:Note>sample</cbc:Note>
11 <cbc:TaxPointDate>2005-06-21</cbc:TaxPointDate>
12 <cbc:DocumentCurrencyCode>GBP</cbc:DocumentCurrencyCode>
13 <cac:OrderReference>
14 <cbc:ID>AEG012345</cbc:ID>
15 <cbc:SalesOrderID>CON0095678</cbc:SalesOrderID>
16 <cbc:UUID>6E09886B-DC6E-439F-82D1-7CCAC7F4E3B1</cbc:UUID>
17 <cbc:IssueDate>2005-06-20</cbc:IssueDate>
18 </cac:OrderReference>
19 ...
20 <cac:AllowanceCharge>
21 <cbc:ChargeIndicator>false</cbc:ChargeIndicator>
22 <cbc:AllowanceChargeReasonCode>17</cbc:AllowanceChargeReasonCode>
23 <cbc:MultiplierFactorNumeric>0.10</cbc:MultiplierFactorNumeric>
24 <cbc:Amount currencyID="GBP">10.00</cbc:Amount>
25 </cac:AllowanceCharge>
26 ...
]
Example coded items in the above sample:
[[1] - <cbc:InvoiceTypeCode>
[[2] - sample values not provided by the UBL committee
][1] - <cbc:DocumentCurrencyCode> and currencyID=
[[2] - sample values provided by the UBL committee constrained by UN/CEFACT schema limitations
]]
Specifying a coded value in a UBL document can include instance-level meta data
[[1] - distinguishes a value as being from that list with matching list-level meta data
]
The instance-level meta data varies slightly for UN/CEFACT-defined unqualified data types
[[1] - inherited by those UBL information items derived from CCTS unqualified types
[1] - the core component value is the value of the element
[1] - the supplementary component value is an attribute of the element
[1] - the meta data values are in other attributes of the same element
[1] - for currencyID= of amounts
[[2] - currencyCodeListVersionID=
[[3] - mapped in CVA2sch to genericode <Version>
]][1] - for unitCode= of the <cbc:MeasureType> element
[[2] - unitCodeListVersionID=
[[3] - mapped in CVA2sch to genericode <Version>
]][1] - for unitCode= of the <cbc:QuantityType> element
[[2] - unitCodeListID=
[[3] - mapped in CVA2sch to genericode <Version>
][2] - unitCodeListAgencyID=
[[3] - mapped in CVA2sch to genericode <Agency><Identifier>
][2] - unitCodeListAgencyName=
[[3] - mapped in CVA2sch to genericode <Agency><LongName>
]]]
Consider how currency values are entered in a UBL instance
[[1] - e.g. <cbc:Amount currencyID="RON">10.00</cbc:Amount>
[[2] - specifies an amount of 10 Romanian new leu
[2] - no instance level meta data is included in the element specification
[2] - the recipient must make an assumption about which code list the value comes from
][1] - e.g. <cbc:Amount currencyID="RON" currencyCodeListVersionID="1951">10.00</cbc:Amount>
[[2] - first Romanian leu in 1867
[2] - Romanian new leu "RON" in 1947
[2] - in 1952 "RON" was replaced with "ROL" for Romanian leu
[2] - in 2005 "ROL" was replaced with "RON" for Romanian new leu
[2] - 1 RON(2005) is worth over 200,000 RON(1951)
]]
UBL-defined code list element meta data attributes for elements <cbc:????Code>:
[[1] - listID=
[[2] - mapped in CVA2sch to genericode <LongName @Identifier='listID'> or just the first <LongName>
][1] - listAgencyID=
[[2] - mapped in CVA2sch to genericode <Agency><Identifier>
][1] - listAgencyName=
[[2] - mapped in CVA2sch to genericode <Agency><LongName>
][1] - listName=
[[2] - mapped in CVA2sch to genericode first <LongName>
][1] - listVersionID=
[[2] - mapped in CVA2sch to genericode <Version>
][1] - listURI=
[[2] - mapped in CVA2sch to genericode <LocationUri>
][1] - listSchemeURI=
[[2] - mapped in CVA2sch to genericode <CanonicalVersionUri>
]]
UBL-defined identifier element meta data attributes for elements <cbc:????ID>:
[[1] - schemeAgencyID=
[[2] - mapped in CVA2sch to genericode <Agency><Identifier>
][1] - schemeAgencyName=
[[2] - mapped in CVA2sch to genericode <Agency><LongName>
][1] - schemeName=
[[2] - mapped in CVA2sch to genericode first <LongName>
][1] - schemeVersionID=
[[2] - mapped in CVA2sch to genericode <Version>
][1] - schemeDataURI=
[[2] - mapped in CVA2sch to genericode <LocationUri>
][1] - schemeURI=
[[2] - mapped in CVA2sch to genericode <CanonicalVersionUri>
]]
Each code list has its own definition of list-level meta data values
[[1] - different versioning schemes
[1] - different wording of list titles and names
[1] - CVA2sch assumes the list-level meta data structures are genericode
]
8.1.2 UBL validation of controlled vocabularies
[> 8.1.3][> 9.][< 8.1.1][^][^^][^^^]
The supplied defaultCodeList.xsl provides code list conformance validation supporting the following:
[[1] - with the corresponding genericode file name in the UBL 2.0 delivery
[1] - UN/ECE Recommendation 19 Transport Mode Code
[[2] - cl/gc/default/TransportModeCode-2.0.gc
[2] - incorrectly labeled inside as "Recommendation 16"
][1] - UN/ECE Recommendation 20 Unit of Measure Codes
[[2] - cl/gc/cefact/UnitOfMeasureCode-2.0.gc
][1] - UN/ECE Recommendation 21 Packaging Type Code
[[2] - cl/gc/default/PackagingTypeCode-2.0.gc
][1] - UN/ECE Recommendation 24 Transportation Status Codes
[[2] - cl/gc/default/TransportationStatusCode-2.0.gc
][1] - UN/ECE 3155 Communication Address Code Qualifier
[[2] - cl/gc/default/ChannelCode-2.0.gc
][1] - UN/ECE 4461 Payment Means
[[2] - cl/gc/default/PaymentMeansCode-2.0.gc
][1] - UN/ECE 4465 Adjustment Reason Description
[[2] - cl/gc/default/AllowanceChargeReasonCode-2.0.gc
][1] - UN/ECE 8053 Equipment Type Code Qualifier
[[2] - cl/gc/default/TransportEquipmentTypeCode-2.0.gc
][1] - IANA_7_04 Binary Object MIME Code
[[2] - cl/gc/cefact/BinaryObjectMimeCode-2.0.gc
][1] - ISO 3166-1 Country Codes
[[2] - cl/gc/default/CountryIdentificationCode-2.0.gc
][1] - ISO 4217 Alpha Currency Codes
[[2] - cl/gc/cefact/CurrencyCode-2.0.gc
][1] - UBL chip codes
[[2] - cl/gc/default/ChipCode-2.0.gc
][1] - UBL document status codes
[[2] - cl/gc/default/DocumentStatusCode-2.0.gc
][1] - UBL latitude direction codes
[[2] - cl/gc/default/LatitudeDirectionCode-2.0.gc
][1] - UBL line status codes
[[2] - cl/gc/default/LineStatusCode-2.0.gc
][1] - UBL longitude direction codes
[[2] - cl/gc/default/LongitudeDirectionCode-2.0.gc
][1] - UBL operator codes
[[2] - cl/gc/default/OperatorCode-2.0.gc
][1] - UBL substitution codes
[[2] - cl/gc/default/SubstitutionStatusCode-2.0.gc
]]
8.1.3 Specifying code list conformance
[> 8.1.4][> 9.][< 8.1.2][^][^^][^^^]
Specifying, extending and restricting controlled vocabularies in XML documents
[[1] - context/value association declaratively ties a document context to sets of values
[[2] - i.e. "those enumerated values over there are to be used to validate the specified values for this particular document context"
][1] - an XML vocabulary is used to create an instance of a context/value association file
[[2] - W3C XPath is used to specify document context
[2] - a URI is used to point to an external expression of enumerated values
[2] - an association ties a document context to the sets of values for that context
]]
[Figure 8.1:
]
Validating the use of a controlled vocabulary
[[1] - recall the validation scenarios [UBL document validation - Section 5.2.8 UBL document validation] and [Figure 5.2]
[[2] - first-pass performs structural and lexical validation on the input instance
[2] - second-pass value validation implementation:
[2] - only when the first pass is successful does it make sense to do a second pass to perform value validation on the input instance
[[3] - structural validation ensures the information items are correctly found
[3] - lexical validation ensures the information items are correctly formed
]][1] - the UBL methodology prepares the second-pass validation artefact based on ISO/IEC 19757-3 Schematron
[[2] - this diagram shows the use of XSLT for the implementation of Schematron
[2] - other implementations of Schematron are available (e.g. Python)
]]
[Figure 8.2:
The diagram shows triangles and boxes in three different areas.
The area labeled "Definition" shows the "XML" labeled triangle titled "Code List Context Associations" and identified with
a circled "3", a set of "GC" labeled triangles titled "External Code List Expressions" and identified with a circled "4",
and a setoff "SCH" labeled triangles titled "Business Rules" and identified with a circled "6". Each of these has an arrow
directed to a box labeled "UBL Methodology for Code List and Value Validation" in the area labeled "Preparation".
One arrow leaves this box to the "XSLT" labeled triangle titled "Assertion Validation Stylesheet" and identified with a circled
"2".
One arrow leaves this box to the "XSLT Process" labeled box in the area labeled "Processing". The other input to this box
is a set of "XML" labeled triangles titled "Document Instances Being Validated". The one output from this box is a set of
"Report" labeled parallelograms titled "Validation Reports".
]
8.1.4 Context/value validation implementation
[> 9.][< 8.1.3][^][^^][^^^]
Two separate implementations of running code are used to effect the result:
[[1] - the "Methodology Stylesheets" box represents the CVA XSLT stylesheets supplied with this methodology to transform context/value
association files into a Schematron pattern
[1] - the "Schematron" box represents the particular implementation of Schematron being deployed by a user of this methodology
[[2] - the methodology package supplies an XSLT implementation of Schematron that exits with a non-zero return code when it reports
violations to the standard error port in a simple text format
[2] - alternative implementations of Schematron are publicly-available from [http://www.Schematron.com] including versions that report violations in a rich XML vocabulary for subsequent downstream processing
]]
[Figure 8.3: CVA implementation in Schematron
Three sets of triangles feed a box labeled "XSLT Process" at the left of a large diagram. The single "XML" labeled triangle
titled Code List Context Associations" and identified with a circled "3" is one of the three sets. A set of "GC" labeled triangles
titled "External Code List Expressions" and identified with a circled "4" is another. A set of "XSLT" labeled triangle titled
"Association Stylesheet and Imported Fragments" is the third. This third one lists the following fragments, with the first
fragment in boldface font:
]
Recall [Figure 8.2]
Recall [Figure 5.1]
Context/value associations and external code list expressions are converted to Schematron
[[1] - Crane-UBL-genericode2Schematron.xsl (UBL elements and meta data)
[1] - Crane-NM-genericode2Schematron.xsl (no meta data)
]
Schematron patterns assembled into a complete Schematron schema
[[1] - schematron-ISO-assembly.xsl
]
Schematron schema translated into an XSLT stylesheet
[[1] - Message-Schematron-terminator.xsl
[[2] - a wrapper of the iso_schematron_skeleton.xsl implementation
]]
[Figure 8.4: CVA implementation in Schematron and XSLT
Three sets of triangles feed a box labeled "XSLT Process" at the left of a large diagram. The single "XML" labeled triangle
titled Code List Context Associations" and identified with a circled "3" is one of the three sets. A set of "GC" labeled triangles
titled "External Code List Expressions" and identified with a circled "4" is another. A set of "XSLT" labeled triangle titled
"Association Stylesheet and Imported Fragments" is the third. This third one lists the following fragments, with the first
fragment in boldface font:
[[1] - UBL-genericode2Schematron.xsl
[1] - {documentModel}-{externalFormat}2Schematron.xsl
[1] - {documentModel}-Metadata.xsl
[1] - {externalFormat}-CodeList.xsl
[1] - Constraints2Schematron.xsl
]
The output from the XSLT process is an "SCH" labeled triangle titled "Code List Pattern". This is input to a set of "SCH"
labeled triangles titled "Business Rules" and identified with a circled "5". These are input to another box labeled "XSLT
Process".
The other input to this box is an "XSLT" labeled triangle titled "Schematron-ISO-assembly.xsl", and the output is an "SCH" labeled triangle titled "Validation Rules". This is input to yet another box labeled "XSLT Process".
The other input to this box is an "XSLT" labeled triangle titled "Schematron-ISO-incomplete-text.xsl".
All of the shapes to this point are above a bracket titled "Preparation". The remaining shapes are all above a bracket titled
"Processing".
The output from the last process is an "XSLT" labeled triangle titled "Assertion Validation Stylesheet" and identified with
a circled "2". This is input to still yet another box labeled "XSLT Process".
The remaining inputs to this last XSLT process is a set of "XML" labeled triangles titled "Document Instances Being Validated".
The outputs from this last XSLT process is a set of "Report" labeled parallelograms titled "Validation Reports".
]
Recall [Figure 8.2]
Recall [Figure 5.1]
This is an accessible version of Crane's commercial training material.
The content has been specifically designed to assist screen reader software
in viewing the entire textual content. Figures are replaced with text
narratives.
Navigation hints are in square brackets:
[Tx.x] and [Fx.x] are textual representations of the applicability icons;
[digit] indicates list depth for nested lists;
[link [URL]] indicates the URL of a hyperlink if different than link;
[EXAMPLE] indicates an example listing of code;
[FIGURE] indicates the presence of a figure replaced by its description;
[>] jumps forward;
[<] jumps backward;
[^] jumps to start of the section;
[^^] jumps to the start of the chapter;
[^^^] jumps to the table of contents.
Suggestions for improvement are welcome:
[info@CraneSoftwrights.com]
Book sales: [http://www.CraneSoftwrights.com/links/trn-acc.htm]
Information: [http://www.CraneSoftwrights.com/links/info-acc.htm]
This content is protected by copyright and, as there are no means to protect
this accessible version from plagiarism, please do not make any
commercial edition available to others.
+//ISBN 1-894049::CSL::Presentation::UBL//DOCUMENT Practical Universal Business Language Deployment 2009-02-12 13:50UTC//EN
Practical Universal Business Language Deployment
Third Edition - 2009-02-12
ISBN 978-1-894049-23-8
Copyright © Crane Softwrights Ltd.