Table of Contents
Import/include tree (in order of importance; reverse import order)
Available invocation parameter ('**' indicates a required parameter)
Filename: Crane-csv2gc.xsl
Include statement:
$Id: Crane-csv2gc.xsl,v 1.6 2013/01/18 23:00:01 admin Exp $
This converts a CSV file following conventions of the first few rows into a genericode file conforming to http://docs.oasis-open.org/codelist/cs-genericode-1.0/.
Using a package such as SaxonHE http://saxon.sf.net
, one can invoke this
stylesheet along the lines of:
java -Xss2m -Xms128m -Xmx512m -jar saxon9he.jar -it:start Crane-csv2gc.xsl "csvFile=test.csv" >test.gc
It is convenient to invoke a schema validation of the result to ensure critical information is not missed in the file.
csvFile as="xsd:string" required="yes" (xsl:param)
The filename of the input CSV information.
The first few records of this CSV file are critically important and not found in typical CSV files exported from spreadsheet or database software. However, there should only be six additional records placed before the start of the exported CSV files.
It is assumed that all language identifiers are "en" for English.
- Record 1 - list-level metadata identification information in order (values ending in "?" can be left empty to omit the metadata)
- short name - e.g. PackagingTypeCode
- long name - e.g. Packaging Type
- list identifier - e.g. UN/ECE rec 20
- version - e.g. 9e
- canonical URI - e.g. urn:oasis:names:specification:ubl:codelist:gc:PackagingTypeCode
- canonical version URI - e.g. urn:oasis:names:specification:ubl:codelist:gc:PackagingTypeCode-2.0-update
- location URI? - e.g. http://docs.oasis-open.org/ubl/os-UBL-2.0-update/cl/gc/default/PackagingTypeCode-2.0.gc
- alternate format location uri - e.g. http://www.unece.org/fileadmin/DAM/cefact/recommendations/rec21/rec21_Rev9e_2012.xls
- alternate format location MIME type - e.g. application/vnd.ms-excel
- ageny name? - e.g. United Nations Economic Commission for Europe
- agency identifier? - e.g. 6
- Record 2 - column identifiers ordered corresponding to row columns, adding an asterisk "*" at the end of the identifier that is the key, plus a single key identifier at the end; leave an identifier empty in order to ignore the column in the rows
- e.g. status,code*,name,description,numeric,codeKey
- Record 3 - column short names ordered corresponding to row columns, plus a single key short name at the end
- e.g. Status,Code,Name,Description,NumericCode,CodeKey
- Record 4 - column cardinality as "optional" or "required"
- e.g. optional,required,required,optional,optional
- Record 5 - column data types ordered corresponding to row columns
- e.g. normalizedString,normalizedString,string,string,string
- Record 6 - column long names ordered corresponding to row columns; these are typically the column headers of the table that were exported to make the CSV file; this line can be left entirely blank to skip column long names
- e.g. Code,Name,Description Field
- Records 7 and on - rows of coded values and associated value-level metadata (empty input values do not create output values); these are typically the rows of the table that were exported to make the CSV file
- e.g. ,1A,"Drum, steel",,34
- e.g. ,1B,"Drum, aluminium",,34
- e.g. ,1D,"Drum, plywood",,34
- e.g. ,1F,"Container, flexible",A packaging container of flexible construction.,93
- e.g. X,SX,Set,,11 to 45,? To be removed from list?
A genericode file is most easily reviewed when indented.
start match="/" (xsl:template)
All work is done in one pull-styled template, walking over the CSV file.
c:csv2gcContent (xsl:template)
The content template is separate in order to be exploited by an importing stylesheet.
Parameter
csvFile as="xsd:string"
The file name to be parsed
Parameter
encoding as="xsd:string"
The encoding of the file name to be parsed
Path: Crane-ParseCSV.xsl
Filename: parseCSV.xsl
$Id: Crane-ParseCSV.xsl,v 1.9 2013/01/18 22:30:24 admin Exp $
This stylesheet defines a named template that can be invoked to translate
CSV content into a set of <record>
elements.
Walk through the given CSV data or file, creating a record for each line and a field for each comma-separated field:
<record> <field>...</field> <field>...</field> <field>...</field> </record> <record> <field>...</field> <field>...</field> <field>...</field> </record> <record> <field>...</field> <field>...</field> <field>...</field> </record>This assumes any field with a comma, quote or end-of-line sequence is quoted and an embedded quote is escaped using two quotes.
Ref:
http://en.wikipedia.org/wiki/Comma-separated_values
Parameter
csv as="xsd:string?"
The comma-separated values content to be parsed into records.
Parameter
filename as="xsd:string?"
The name of a CSV file to open when CSV content is not supplied. When it is a relative URI it is resolved relative to the URI of this stylesheet. This argument is ignored if CSV content is explicitly supplied.
Parameter
encoding as="xsd:string"
The encoding of the input file, defaulting to UTF-8.
Parameter
xlate-eol as="xsd:boolean"
The indication all quoted end-of-line sequences should be translated to a single space. The default is to preserve end-of-line sequences as a single newline character.
Parse a record full of fields.
Parameter
record as="xsd:string?"
The text of a record of fields.
Parameter
xlate-eol as="xsd:boolean"
An indication if any end-of-line sequences should be translated to a single space.
Parameter
debug-field-number as="xsd:integer" tunnel="yes"
When non-zero, this will expose the running record number and the given field number for each record.
Parse a field value. If the first or last character of the field is a quote, then it must have been a quoted field.
Parameter
field as="xsd:string?"
The text of a record of fields.
Parameter
xlate-eol as="xsd:boolean"
An indication if any end-of-line sequences should be translated to a single space.