Crane Softwrights Ltd.An environment for translating SGML to XMLThe SX tool from James Clark found in the http://www.jclark.com/sp package is used to convert SGML files to XML files, but does not accommodate SDATA entities, NDATA entities, or an output document type declaration. The ISO entity sets for SGML found in http://xml.coverpages.org/ISOEnts.zip include SDATA entity definitions for all of the public text entities commonly used for SGML publishing at the ISO Central Secretariat. This conversion tool from Crane Softwrights Ltd. found in the http://www.CraneSoftwrights.com/resources/n2x directory is a Python application to accommodate SDATA entity references, NDATA entity references, notation declarations, and a document type declaration. The current download file is n2x-20030301-0240.zip and is about 41kb in size. |
![]() |
![]() CRANE
|
This Python program works with the output from James Clark's NSGMLS tool (found in the same SP package referenced above) to create XML syntax from the parsed content of an SGML document.
T:\sgml>python n2x.py -? Arguments: {options} {filename} Options: - = use stdin (also true if the filename is absent) -? = print help -noi or -nointernal = suppress the internal declaration subset -non or -nonotation = suppress notation declarations -noe or -noerrors = suppress errors for NDATA entity references -nos or -noSDATA = suppress sdata entity replacement -nop or -noPI = suppress processing instruction preservation -l or -lower = ignore SGML declaration and use lower case for all element type and attribute names and non-CDATA attribute values -p:public-id or -public:public-id or "-p:public-id with spaces" -s:system-id or -system:system-id or "-s:system-id with spaces" Assumes: - spaces are not significant in SDATA entity references Input: - the output from the nsgmls tool found at http://www.jclark.com/sp IMPORTANT: - the "-bUTF-8" runtime option must be specified for nsgmls Example: nsgmls -bUTF-8 <test.sgm >test.nsgmls python n2x.py <test.nsmgls >test.xml Note: - the options "-noi -nos -noe" reproduces the output from SX T:\sgml>
An SDATA mapping is supplied in the package in sdata.py and nothing need be done for n2x to work as delivered.
The Unicode entity resource from John Cowan found in the ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MISC/SGML.TXT file is a mapping of the ISO entity names to the Unicode character equivalents in XML.
The MathML entity resources mmlalias.ent and mmlextra.ent from David Carlisle found in the http://www.mathweb.org/cvsweb/cvsweb.cgi//omdoc/dtd/mathml/ directory are mappings of MathML names to Unicode character equivalents in XML.
The supplied sdata.py file was created using the enclosed XML entity definition files derived from the enclosed euro.txt file for the "[euro]" SDATA entity and the files above.
This program will also work with arbitrary SDATA definitions by creating your own mapping file from bracketed SDATA names to hexadecimal Unicode characters by using the supplied makesdata.py program in the sdata directory to create the required sdata.py source code imported by n2x.py. Note that according to ISO SDATA entity value conventions, the name is surrounded in square brackets.
The following is the format of an entity definition XML file used to create the importable Python source (note that the set= attribute is only documentary regarding the name of an entity set from which the entity is obtained):
<!-- The Euro symbol is not provided for by any entity set. --> <sdatas> <!-- entity definition for SDATA "[euro]"--> <sdata ref="[euro]" set="none" code="20AC">EURO SIGN</sdata> </sdatas>
Note that multiple code points in the replacement code= attribute are separated by semicolons.
Please note that this program was written to meet specific requirements and we are anxious to receive details on any and all problems you may find, or if you find the program is not general enough to meet other requirements.
If anyone has comments on this document or the tool itself, they are welcome to send them to resources@CraneSoftwrights.com.