n2x: An environment for translating SGML to XML

The SX tool from James Clark found in the http://www.jclark.com/sp package is used to convert SGML files to XML files, but does not accommodate SDATA entities, NDATA entities, or an output document type declaration.

The ISO entity sets for SGML found in http://xml.coverpages.org/ISOEnts.zip include SDATA entity definitions for all of the public text entities commonly used for SGML publishing at the ISO Central Secretariat.

This conversion tool from Crane Softwrights Ltd. found in the http://www.CraneSoftwrights.com/resources/n2x directory is a Python application to accommodate SDATA entity references, NDATA entity references, notation declarations, and a document type declaration.

The current download file is n2x-20030301-0240.zip and is about 41kb in size.

This Python program works with the output from James Clark's NSGMLS tool (found in the same SP package referenced above) to create XML syntax from the parsed content of an SGML document.

T:\sgml>python n2x.py -?

Arguments: {options} {filename}

Options:   -  = use stdin (also true if the filename is absent)
           -? = print help
           -noi or -nointernal = suppress the internal declaration subset
           -non or -nonotation = suppress notation declarations
           -noe or -noerrors = suppress errors for NDATA entity references
           -nos or -noSDATA = suppress sdata entity replacement
           -nop or -noPI = suppress processing instruction preservation
           -l or -lower = ignore SGML declaration and use lower case for all
                          element type and attribute names and non-CDATA
                          attribute values
           -p:public-id or -public:public-id or "-p:public-id with spaces"
           -s:system-id or -system:system-id or "-s:system-id with spaces"

Assumes:   - spaces are not significant in SDATA entity references

Input:     - the output from the nsgmls tool found at http://www.jclark.com/sp

IMPORTANT: - the "-bUTF-8" runtime option must be specified for nsgmls

Example:   nsgmls -bUTF-8 <test.sgm >test.nsgmls
           python n2x.py <test.nsmgls >test.xml

Note:      - the options "-noi -nos -noe" reproduces the output from SX


SDATA entity definitions

An SDATA mapping is supplied in the package in sdata.py and nothing need be done for n2x to work as delivered.

The Unicode entity resource from John Cowan found in the ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MISC/SGML.TXT file is a mapping of the ISO entity names to the Unicode character equivalents in XML.

The MathML entity resources mmlalias.ent and mmlextra.ent from David Carlisle are mappings of MathML names to Unicode character equivalents in XML.

The supplied sdata.py file was created using the enclosed XML entity definition files derived from the enclosed euro.txt file for the "[euro]" SDATA entity and the files above.

This program will also work with arbitrary SDATA definitions by creating your own mapping file from bracketed SDATA names to hexadecimal Unicode characters by using the supplied makesdata.py program in the sdata directory to create the required sdata.py source code imported by n2x.py. Note that according to ISO SDATA entity value conventions, the name is surrounded in square brackets.

The following is the format of an entity definition XML file used to create the importable Python source (note that the set= attribute is only documentary regarding the name of an entity set from which the entity is obtained):

The Euro symbol is not provided for by any entity set.
<!-- entity definition for SDATA "[euro]"-->
<sdata ref="[euro]" set="none" code="20AC">EURO SIGN</sdata>

Note that multiple code points in the replacement code= attribute are separated by semicolons.


Please note that this program was written to meet specific requirements and we are anxious to receive details on any and all problems you may find, or if you find the program is not general enough to meet other requirements.

Crane logo

Please consider to

towards our
free resources.

+1 (613) 489-0999 (Voice)


Link traversal: This web site relies heavily on client-side redirection. If certain links do not work for you, please ensure you have this behaviour enabled in your browser.

Site navigation:

Small print: All use of this web site and all business conducted with Crane Softwrights Ltd. is subject to the legal disclaimers detailed at http://www.CraneSoftwrights.com/legal ... please contact us if you have any questions. All trademarks, servicemarks, registered trademarks, and registered servicemarks are the property of their respective owners.

Link legend: links that are marked with this dotted underline will open up a new browser window, otherwise the same browser window is used for the link target. 

Last changed: $Date: 2006/12/27 19:57:02 $(UTC) (Privacy policy)