Crane-csv2gc - convert a CSV file into a simple genericode file


Table of Contents

1. Crane-csv2gc - convert a CSV file into a simple genericode file - Crane-csv2gc.xsl
1.1. Invocation
2. Read a text file into records - parseCSV.xsl
3. Index

Import/include tree (in order of importance; reverse import order)

Available invocation parameter ('**' indicates a required parameter)

1. Crane-csv2gc - convert a CSV file into a simple genericode file - Crane-csv2gc.xsl

Filename: Crane-csv2gc.xsl

Include statement:

$Id: Crane-csv2gc.xsl,v 1.6 2013/01/18 23:00:01 admin Exp $

This converts a CSV file following conventions of the first few rows into a genericode file conforming to http://docs.oasis-open.org/codelist/cs-genericode-1.0/.

1.1. Invocation

Using a package such as SaxonHE http://saxon.sf.net, one can invoke this stylesheet along the lines of:

java -Xss2m -Xms128m -Xmx512m -jar saxon9he.jar -it:start Crane-csv2gc.xsl "csvFile=test.csv" >test.gc

It is convenient to invoke a schema validation of the result to ensure critical information is not missed in the file.

csvFile as="xsd:string" required="yes" (xsl:param)

The filename of the input CSV information.

The first few records of this CSV file are critically important and not found in typical CSV files exported from spreadsheet or database software. However, there should only be six additional records placed before the start of the exported CSV files.

It is assumed that all language identifiers are "en" for English.

  • Record 1 - list-level metadata identification information in order (values ending in "?" can be left empty to omit the metadata)
    • short name - e.g. PackagingTypeCode
    • long name - e.g. Packaging Type
    • list identifier - e.g. UN/ECE rec 20
    • version - e.g. 9e
    • canonical URI - e.g. urn:oasis:names:specification:ubl:codelist:gc:PackagingTypeCode
    • canonical version URI - e.g. urn:oasis:names:specification:ubl:codelist:gc:PackagingTypeCode-2.0-update
    • location URI? - e.g. http://docs.oasis-open.org/ubl/os-UBL-2.0-update/cl/gc/default/PackagingTypeCode-2.0.gc
    • alternate format location uri - e.g. http://www.unece.org/fileadmin/DAM/cefact/recommendations/rec21/rec21_Rev9e_2012.xls
    • alternate format location MIME type - e.g. application/vnd.ms-excel
    • ageny name? - e.g. United Nations Economic Commission for Europe
    • agency identifier? - e.g. 6
  • Record 2 - column identifiers ordered corresponding to row columns, adding an asterisk "*" at the end of the identifier that is the key, plus a single key identifier at the end; leave an identifier empty in order to ignore the column in the rows
    • e.g. status,code*,name,description,numeric,codeKey
  • Record 3 - column short names ordered corresponding to row columns, plus a single key short name at the end
    • e.g. Status,Code,Name,Description,NumericCode,CodeKey
  • Record 4 - column cardinality as "optional" or "required"
    • e.g. optional,required,required,optional,optional
  • Record 5 - column data types ordered corresponding to row columns
    • e.g. normalizedString,normalizedString,string,string,string
  • Record 6 - column long names ordered corresponding to row columns; these are typically the column headers of the table that were exported to make the CSV file; this line can be left entirely blank to skip column long names
    • e.g. Code,Name,Description Field
  • Records 7 and on - rows of coded values and associated value-level metadata (empty input values do not create output values); these are typically the rows of the table that were exported to make the CSV file
    • e.g. ,1A,"Drum, steel",,34
    • e.g. ,1B,"Drum, aluminium",,34
    • e.g. ,1D,"Drum, plywood",,34
    • e.g. ,1F,"Container, flexible",A packaging container of flexible construction.,93
    • e.g. X,SX,Set,,11 to 45,? To be removed from list?

indent="yes" (xsl:output)

A genericode file is most easily reviewed when indented.

start match="/" (xsl:template)

All work is done in one pull-styled template, walking over the CSV file.

c:csv2gcContent (xsl:template)

The content template is separate in order to be exploited by an importing stylesheet.

Parameter csvFile as="xsd:string"

The file name to be parsed

Parameter encoding as="xsd:string"

The encoding of the file name to be parsed

2. Read a text file into records - parseCSV.xsl

Path: Crane-ParseCSV.xsl

Filename: parseCSV.xsl

$Id: Crane-ParseCSV.xsl,v 1.9 2013/01/18 22:30:24 admin Exp $

This stylesheet defines a named template that can be invoked to translate CSV content into a set of <record> elements.

c:parseCSV (xsl:template)

Walk through the given CSV data or file, creating a record for each line and a field for each comma-separated field:

  <record>
    <field>...</field>
    <field>...</field>
    <field>...</field>
  </record>
  <record>
    <field>...</field>
    <field>...</field>
    <field>...</field>
  </record>
  <record>
    <field>...</field>
    <field>...</field>
    <field>...</field>
  </record>

This assumes any field with a comma, quote or end-of-line sequence is quoted and an embedded quote is escaped using two quotes.

Ref: http://en.wikipedia.org/wiki/Comma-separated_values

Parameter csv as="xsd:string?"

The comma-separated values content to be parsed into records.

Parameter filename as="xsd:string?"

The name of a CSV file to open when CSV content is not supplied. When it is a relative URI it is resolved relative to the URI of this stylesheet. This argument is ignored if CSV content is explicitly supplied.

Parameter encoding as="xsd:string"

The encoding of the input file, defaulting to UTF-8.

Parameter xlate-eol as="xsd:boolean"

The indication all quoted end-of-line sequences should be translated to a single space. The default is to preserve end-of-line sequences as a single newline character.

c:parseRecord (xsl:template)

Parse a record full of fields.

Parameter record as="xsd:string?"

The text of a record of fields.

Parameter xlate-eol as="xsd:boolean"

An indication if any end-of-line sequences should be translated to a single space.

Parameter debug-field-number as="xsd:integer" tunnel="yes"

When non-zero, this will expose the running record number and the given field number for each record.

c:parseField (xsl:template)

Parse a field value. If the first or last character of the field is a quote, then it must have been a quoted field.

Parameter field as="xsd:string?"

The text of a record of fields.

Parameter xlate-eol as="xsd:boolean"

An indication if any end-of-line sequences should be translated to a single space.

3. Index

C P S

C

P

S