quick start to Patlac::Xml2cpp

Table of Contents

1   Introduction

Developing bioinformatics programs in C++, I had to code classes that contain and manipulate proteins data. It quickly appears to me that for each data member I had to write many trivial lines of code: for accessing the member, setting it, initializing, copying, destructing, writing, reading, debugging. Alls these lines was not only error prone but also boring, so I concluded that a good code should have all its data classes generated automatically.

This is not an original idea, but it doesn't seem to be so clear cause none of the software I know do have such classes generated automatically. Even those which manipulate data describe by schema have classes manually written that cover only the part of the schema they effectively use.

Surfing the web (not enough maybe) I found some software doing code generation for data:

There is also software doing code generation for gui:

They offers lots of options to customize classes but it is still impossible to obtain generated classes exactly the same as the one I would have written if I had to write them myself. Worst of all, the code generators I tried didn't succeed with the schemas I'm interested in.

2   Rational and limitation

Diagram of a good usage of class generation.

2.1   Reduction of the schema

Before generating the code, Patlac::xml2cpp perform a reduction of the schema. This is done because xsd files just offer too many possibilities. Some of the useful concept of xsd schemas are still in my todo list. But hard work has been done to ensure that the reduced schema has the abilities to read, manipulate and write the data without loss.

One major exception is the <xs:sequence maxOccurs="unbounded"> of <xs:choice> which is simplified by a xs:sequence where all elements have maxOccurs="unbounded". In that case the data are preserved but they might appear in a different order. For example, the <xs:annotation> into XMLSchema.xsd are all moved to the front of the file, loosing their meaning. Also, <!-- commentaries --> are not preserve by the actual generated classes. It means that useful commentaries should be placed into an element of the schema instead of being placed between <!-- --> .

In the reduced schema, every xs:element ( both xs:topTopLevelElement and xs:localElement ) has a type attribute which refer to a xs:simpleType or a xs:topLevelComplexType. They're is no local definition for xs:complexType.

2.2   Naming classes and members

One particularly delicate issue for automatic code generation is to avoid name collision. One way to limit possible collision is to use some case convention. Indeed, all classes have name beginning with a capital letter, and accessor have name beginning by lowercase. Of course, Customization is possible.

Patlac::Xml2cpp offers a very flexible way to issued name collision. The xml options file allows the declaration of reserved names which are replaced using regular expression pattern. For instance, if the name of an xs:element is a reserved c/c++ word, it's const accessor would has an illegal name. In this case, a replacement occurs which doubled the letter of the type xs:element name.

3   Examples and usages

3.1   XMLSchema.xsd

As a first proof of usability, Patlac::xml2cpp is compiled using classes generated by himself. So the first test you could perform is to rebuild it using classes that you had just generate. To do so, first untar the source files into a working directory, say DIRBASE, I use DIRBASE=~/cpp :

[a@a ~]$ cd DIRBASE
[a@a cpp]$ tar xzv patlac--xml2cpp-1.0.tar.gz

Before you compile, make sure that you have boost-devel, expat-devel, zlib-devel and google-ctemplate-devel installed. Prepare to be patient, this may takes many minutes (Ã(c)crire le temps pour mon orid à la maison et celui du complan).

[a@a cpp]$ cd patlac--xml2cpp-1.0
[a@a patlac--xml2cpp-1.0]$ ./configure
[a@a patlac--xml2cpp-1.0]$ make
[a@a patlac--xml2cpp-1.0]$ sudo make install

Note that you need the install the software before to use it, since it will look for data files into the installation directory. If your not sudoer, you'll have to precise where you want things to be installed and maybe add this location to your PATH and LD_LIBRARY_PATH environment variable. In this example, you may replace INSTALLDIR by ~/local.:

[a@a patlac--xml2cpp-1.0]$ ./configure --prefix=INSTALLDIR --exec-prefix=INSTALLDIR
[a@a patlac--xml2cpp-1.0]$ make
[a@a patlac--xml2cpp-1.0]$ make install
[a@a patlac--xml2cpp-1.0]$ echo 'export PATH=$PATH:INSTALLDIR/bin' >> ~/.bashrc; source ~/.bashrc

As a first use, we'll generate the classes that Patlac::Xml2cpp uses to handle xsd schemas.

[a@a patlac--xml2cpp-1.0]$ xml2cpp INSTALLDIR/share/patlac/xml2cpp/xsd/XMLSchema.xsd ~/cpp/data_classes/xs/src --project cpp_classes 2>xml2cpp.log && tail -n1 xml2cpp.log

First argument share/patlac/xml2cpp/xsd/XMLSchema.xsd is the schema used for the classes. Second argument ~/cpp/test2_xs/src is the directory where the project will be generated. Third argument has a value --project cpp_classes_plus that specify the kind of project to produce. This project must be the name of a Patlac:project entry into the options.xml file.

cpp_classes_plus imply that a .hpp and a .cpp files will be generated for each xs:complexType. These files will declare and define the classes and their member values and functions. An additional file for each class is created to define a function writing xml to a template argument stream. An additional unique file is created to read an xml file.

(parler du warn.log) The new file xml2cpp.log now contains some notes about types. Most of these notes aren't harmful, but it's always a good idea the read them. The last line should say Project successfully generated..

There is lots of generated files. You can compare these new files with those included into this package.

[a@a patlac--xml2cpp-1.0]$ ls ~/cpp/data_classes/xs/src/*.[ch]pp | while read file; do diff $file src/data_classes/xs/; echo $file; done > diff.txt

If everything goes well, the only differences should concern the date.

3.2   uniprot and uniref

The uniprot databank contains annotated information about proteins. It is available as one huge xml file. To manipulate this file with a DOM parser, one must first fractionate the file into many little ones. Using classes generated by Patlac::Xml2cpp and the SaxMagiqueIterator library provided, you simply iterate through the nodes of the file. As a bonus, you benefit from an API simpler than DOM.

In this example, we make a program that create a fasta file from an uniprot one, making special entries for sequence variants. First generate the classes from uniprot.xsd [3] in a new project directory.

[a@a patlac--xml2cpp-1.0]$ xml2cpp INSTALLDIR/share/patlac/xml2cpp/xsd/uniprot.xsd ~/cpp/data_classes/uniprot --project cpp_libtool 2>xml2cpp.log && tail -n1 xml2cpp.log

Here the kind of project is cpp_libtool. In this kind of project, all the files generated by a project of kind cpp_classes are generated into the subdirectory src of a complete gnu/gettext/libtool project. Everything is done; we already have a packageable project. Now suppose you have an up-to-date version of the autotools ( aclocal, automake, autoconf, autoreconf [4], libtool ) :

[a@a patlac--xml2cpp-1.0]$ cd ~/cpp/data_classes/uniprot
[a@a uniprot]$ autoreconf && ./configure && make

If all this works, you can test the xml parser and writer on the example files provided (to not try with the complete uniprot data bank because this test programs load the entire file ) :

[a@a uniprot] cd test
[a@a test] ./test_uniprot INSTALLDIR/share/patlac/xml2cpp/test_files/P02754.xml

You should have a copy of the file P02754.xml, slightly differing by indentation, into the current directory. Ready to continue, now install the uniprot classes (refer to XMLSchema.xsd section of this documentation if you don't have admin access )

[a@a uniprot]$ sudo make install

uniref <http://beta.uniprot.org/docs/uniref.xsd>

[a@a patlac--xml2cpp-1.0]$ xml2cpp INSTALLDIR/share/patlac/xml2cpp/xsd/uniref.xsd ~/cpp/data_classes/uniref --project cpp_libtool 2>xml2cpp.log && tail -n1 xml2cpp.log

We want to create an application that will use our new library to iterate through an uniprot file to generate a fasta file. Iterate means that we don't want to load the entire file into memory, but only one entry at a time. Let Patlac::Xml2cpp generate the base of the project. In fact, it will copy a kind of generic gnu project into the output directory and change some lines to link with the newly created library. In this purpose, we add a project node to our xml options file. This node provides nominal information about the project. This project will be named uniprot2fasta :

[a@a uniprot]$ xml2cpp INSTALLDIR/share/patlac/xml2cpp/xsd/uniprot.xsd ~/cpp/uniprot2fasta --project uniprot2fasta 2>xml2cpp.log && tail -n1 xml2cpp.log
[a@a uniprot]$ cd ~/cpp/uniprot2fasta
[a@a uniprot2fasta]$ autoreconf && ./configure && make
[a@a uniprot2fasta]$ cd src
[a@a src]$ ./uniprot2fasta INSTALLDIR/share/patlac/xml2cpp/test_files/P02754.xml

This new application is the test application generated with the uniprot library. It loads the input file in memory than rewrite it in the current directory. After you tried it ( or not ), just copy the supplied files into the src directory. :

[a@a src]$ cp INSTALLDIR/share/patlac/xml2cpp/project/uniprot2fasta/*.cpp .
[a@a src]$ make
[a@a src]$ ./uniprot2fasta INSTALLDIR/share/patlac/xml2cpp/test_files/P02754.xml > P02754.fasta

3.3   mzXML and mzdata

Mass spectrometers are used in biology to identify molecules such as proteins. Typically, proteins are fragmented in small pieces, called peptides, which are separated by a chromatograph before to go into the mass spectrometer. The mass spectrometer isolates the peptides and fragments them another time. It then calculates the mass of the peptide's fragments and output its result into a huge data file. As many mass spectrometer exists as many different format files exists, so bioinformaticians quickly feel the need for a unique format. In fact, many people feeling this need have created many unique formats, which are no more unique. One of these is called mzXML, another is called mzdata. Lets try Patlac::Xml2cpp on these schemas :

[a@a patlac--xml2cpp-1.0]$ xml2cpp INSTALLDIR/share/patlac/xml2cpp/xsd/mzXML_3.0.xsd ~/cpp/data_classes/mzXML --project cpp_libtool
[a@a patlac--xml2cpp-1.0]$ xml2cpp INSTALLDIR/share/patlac/xml2cpp/xsd/mzdata.xsd ~/cpp/data_classes/mzdata --project
   cpp_libtool

These schemas have a special feature, spectral data are base64 encoded. In both case, the decoding process cannot be guess by Patlac::Xml2cpp, it depends on other data written in the xml files in a way not specified by the w3c. There's many way to solve this issue, the simplest is just to put the base64 encoded data into a string and supply a non-member function get_spectrum ( manually generated ) into a separate file. A call to get_spectrum will decodes the base64 and return the m/z and intensity values into two separated vector.

Question : Why the manually generated function get_spectrum should be non-member? Answer : The gold rule with auto generated stuff is to never modify it manually. Very far away, in an other country, some guys are working on a new draft of your prefer schema ( see mzData). When it will be ready, you'll have to regenerate your data classes, parser and writer. The last thing that you want to do is to remember what you manually changed and have to do it again. You might also regenerate your classes because of a new version a your classes template files (.tpl) .

mzXML also have a very special feature, a sha1 :

The sha1-sum is calculated from the beginning of the document to the end of the opening tag of
the sha1 element (i.e. <sha1>).

The sha1 has to be calculated after the rest of the document has been written. Of course this caprice is not part of the Patlac::Xml2cpp core, it has to be handled by the xml options file and the tpl template files.

For this, we add some special sections into the write_xml.tpl file. These special sections should be has generic as possible. To allow these section to be specifically shown for msRun's write_xml function we need to use a new element of the xml options file, a complexType_dictionary. A complexType_dictionary is like a simpleType_dictionary but its elements are applied to COMPLEXTYPE_SECTION of complexType with their name equal to the xsd_type of the complexType_dictionary.

3.4   pepXML and protXML

With pepXML and protXML a new feature appears that is not easy to handle: xs:any. The pepx:analysis_summary node has the ability to own childs of any type. The corresponding c++ classes that would correspond to this node should has the ability to own instance of any class. While it is not easy nor natural, it is still possible and will be available in the next release of the software.

For files not using pepx:analysis_summary, like those created by trans_proteomic_pipeline when converting from other proteomics results files, the generated files work well. In fact, autogenerated pepXML classes would be a cleaner why to do these conversions.

3.5   Bioml and Gaml for X!Tandem

X!Tandem is an open source sofware destinate to identify peptides from mass spectra. The software generates bioml files for which there is no xsd schema available on the web. In fact, there is a dtd, but it does not validate the files. In addition, bioml files incorporate elements from the gaml schema. Here again, the schema available on Internet do not validates output files from the software.

In this case, the best is to use a xml2xsd program (several ones exist but are not perfect) which creates the most restrictive schema validating some example files. Then a manual inspection is wished to cure to resulting schema. This has been done, I called the schemas bioml_for_tandem.xsd and gaml_for_tandem.xsd. These schemas bring two interesting new features.

First, a content node from gaml_for_tandem.xsd as a list of integer instead of a raw string. In this release of Patlac::Xml2cpp, the xs:simpleType defined by a xs:list are not handled by a particular method. We have to add a simpleType_dictionary into the xml options file. Let creates a new project named gaml_intergerList and gives it the simpleType_dictionary. Another project named gaml will has gaml_intergerList as subproject.

Serialization functions for integerList are supplied in a separate file gaml_for_tandem_utils.hpp. There's no ethical problems to had a file to your generated project, if you regenerate the project, additional files will not be removed. Although, you must not manualy edit your classes to include this header, it's why the simpleType_dictionary for integerList add a section dictionary to activate a INCLUDE_SECTION.

The second interesting new feature is that one schema ( bioml_for_tandem.xsd ) uses elements from another schema ( gaml_for_tandem.xsd ).

In this special case, it would had be easier to forgive the scission and to merge directly the gaml schema into the bioml. For the benefit of illustration, we will keep these schemas distinct and try to generate their classes. To make things funniest, suppose we're not sudoer and that we install stuff into ~/local instead of /usr/local. We must begin by gaml:

[a@a patlac--xml2cpp-1.0]$ xml2cpp ~/local/share/patlac/xml2cpp/xsd/gaml_for_tandem.xsd ~/cpp/data_classes/gaml_for_tandem --project
[a@a patlac--xml2cpp-1.0]$ cd ~/cpp/data_classes/gaml_for_tandem
[a@a gaml_for_tandem]$ autoreconf && ./configure --prefix=~/local --exec-prefix=~/local

If we do make right now, it will complain the lack of gaml_for_tandem_utils.hpp:

[a@a gaml_for_tandem]$ cp ~/local/share/patlac/xml2cpp/project/gaml/gaml_for_tandem_utils.hpp ~/cpp/data_classes/gaml_for_tandem/src
[a@a gaml_for_tandem]$ make && make install

After the installation of the gaml classes, we can create and compile the bioml classes.

[a@a patlac--xml2cpp-1.0]$ xml2cpp ~/local/share/patlac/xml2cpp/xsd/bioml_for_tandem.xsd ~/cpp/data_classes/bioml_for_tandem --project
[a@a patlac--xml2cpp-1.0]$ cd ~/cpp/data_classes/bioml_for_tandem
[a@a gaml_for_tandem]$ autoreconf && ./configure --prefix=~/local --exec-prefix=~/local

Here is an illustration of a minimal usage of the newly generated classes. Suppose we want to know all ids of bioml::group with type="model". These ids are integers that will be put into a vector. This will be done using libxml++ first and Patlac::Xml2cpp autogenerated classes next. Using libxml++

1 xmlpp::DomParser parser;
2 xmlpp::Document* docBioml;
3 parserBioml1.parse_file( strBioml.data() ); docBioml = parser.get_document();
4 std::vector< int > vIds;
5 xmlpp::NodeSet mModels = docBioml->get_root_node()->find( "//group[@type=\"model\"]" );
6 for( itModel = mModels.begin(); itModel != mModels.end(); ++itModel )
7   vIds.push_back( atoi( dynamic_cast<Element*>( *itModel )->get_attribute( "id" )->get_value().data() ) );

Using Patlac::Xml2cpp autogenerated classes

1 BiomlSaxParseur parseur;
2 bioml::Bioml mBioml;
3 parseur( mBioml, strBioml );
4 std::vector< int > vIds;
5 bioml::Group::ListPtr mModels = mBioml.groups()( bind( &bioml::Group::model, _1 ) == "model" );
6 for( itModel = mModels.begin(); itModel != mModels.end(); ++itModel )
7  vIds.push_back( ( *itModel )->id() );

This minimal example illustrates similarities and differences between the two paradigms. Lines 1,2 and 3 are similar: instantiation of a parser and a data receptacle followed by the actual parsing. Line 5 shows two different way to get the set of group for which attribute type="model". Both methods have syntax particularities. The libxml++ uses an xpath expression with c/c++ escaping. The Patlac::Xml2cpp classes need a predicate. It might be an instance of a class that define a Boolean operator()( const bioml::Group& ) or a pointer to function. We chose here to use of a boost::lambda expression. Both method (libxml++ and Patlac::Xml2cpp) return a container filled with pointers to the requested groups. The seventh line shows a simple attribute accession. It is abusively complicated with libxml++ and would be much easier with perl libXML.

4   Customization

4.1   File description by template and xml options file

To allow maximum flexibility Patlac::Xml2cpp is based on template files. The choice of the google::ctemplate library to handle them is mainly based on a desire to separate code from template. Refer to google-ctemplate for a tutorial about the syntax of these.

  • classe_hpp.tpl describes the declaration of the class and classe_cpp.tpl the definition of the non-inlined functions.
  • write_xml_hpp.tpl describes the function which exports the class into a xml file.
  • sax_parser.cpp.tpl and sax_parser.hpp.tpl describe the saxparser. The generated saxparsers use the SaxMagique library which is the simplest possible way to instanciate a saxparser. In other hand, it is realy hard to compile and create huge executable. It should be rewrite to generate a saxparser not using the SaxMagique library (Todo).

A fundamental advantage of Patlac::Xml2cpp over other code generators, is that it knows nothing about the particular files it will generate. Every .tpl file is treated the same way. Values are setted and sections are turned on only depending on the specifications of the schema and the user options. Furthermore, it is possible to use Patlac::Xml2cpp to generate classes for an other language, without any modification to the source code of this program.

To edit or create a .tpl file, one only need to know about xsd schemas and its reduced form inferred by Patlac::Xml2cpp, to know the meaning of the tags filled by Patlac::Xml2cpp and to know about the xml options file.

4.2   Tags and sections set by Patlac::Xml2cpp

There is two kind of tags and sections set by Patlac::Xml2cpp. The first group only depends on the schema and are hard-coded. The second group are set by the user into the xml options file. All these tags should have names self-explanatory, the principals ones are mention here.

4.2.1   Global tags and sections

Patlac::Xml2cpp sets some global tags that appears into each generated files.

  • XML2CPP_VERSION is set to the actual version of Patlac::Xml2cpp.
  • SCHEMA is set to the name of the file containing the xsd schema.
  • DATE is set to the date of execution of Patlac::Xml2cpp.

4.2.2   ComplexType sections

In the reduced form of the schema, any xs:element has a corresponding xs:topLevelComplexType. Into the xml options files, some template_file have domain='global'. For each of these global template_file one file will be generated and into those files, one COMPLEXTYPE_SECTION will be unroll for each xs:topLevelComplexType. This file will then be filled like a global template_file with additional stuff that depend on the particular xs:topLevelComplexType.

Into the xml options files, some others template_file have domain='complexType'. For each of these complexType template_file there will be one file generated for every xs:topLevelComplexType. Those files will then be filled exactly like the COMPLEXTYPE_SECTION of the global template_file, except for COMPLEXTYPE_SECTION that will not appear.

Into those COMPLEXTYPE_SECTION and complexType template_file :

  • COMPLEXTYPE_ANNOTATION_SECTION section is activated if the xs:complexType has a xs:annotation. Into a COMPLEXTYPE_ANNOTATION_SECTION:
    • DOCUMENTATION_SECTION is unroll for each xs:documentation node. Into a DOCUMENTATION_SECTION:
      • DOCUMENTATION take the value of the content of the xs:documentation node.
      • LANG take the value of the xml:lang attribute.
  • HAS_CHILDS section is activated if the complexType has a xs:element or a xs:any.
  • NB_CHILDS is set to the number of xs:element.
  • HAS_OPTIONAL_CHILDS section is activated if the complexType has optional childs.
  • NB_OPTIONAL_CHILDS is set to the number of optional childs.
  • HAS_REQUIRED_CHILDS section is activated if the complexType has xs:element with minOccurs="1" and maxOccurs="1".
  • NB_REQUIRED_CHILDS is set to the number of xs:element with minOccurs="1" and maxOccurs="1".
  • HAS_ATTRIBUTES section is activated if the complexType has xs:attribute.
  • NB_ATTRIBUTES is set to the number of xs:attribute.
  • HAS_OPTIONAL_ATTRIBUTES section is activated if the complexType has xs:attribute with use not set to required.
  • NB_OPTIONAL_ATTRIBUTES is set to the number of xs:attribute with use not set to required.
  • HAS_REQUIRED_ATTRIBUTES section is activated if the complexType has xs:attribute with use="required".
  • NB_REQUIRED_ATTRIBUTES is set to the number of xs:attribute with use="required".
  • EXTENSION_SECTION section is activated if the xs:topLevelComplexType has an xs:extension. Into an EXTENSION_SECTION :
    • XSD_TYPE is the value of the base attribute of the xs:extension.
    • TYPE is the type of the extension in the generated code.
    • TYPE_AS_NAME_PART is a variant of EXTENSION_TYPE taht might be agregate to form a variable or function name.
    • IS_EMPTY section is activated if the base is a node without content nor child (only attributes).
    • IS_NOT_EMPTY section is activated if the EXTENSION_IS_EMPTY is not activated.
    • IS_A_COMPLEXTYPE section is activated if the base refer to a xs:complexType.
    • IS_A_SIMPLETYPE section is activated if the base refer to a xs:simpleType.
    • In addition, if the base of the xs:extension refer to a xs:simpleType then the xml configuration file must have a simpleType_dictionary with xsd_type equal to this type. This simpleType_dictionary will be used to fill the EXTENSION_SECTION.
  • CONTENT_NODE_SECTION section is activated if the complexType has content that is not inherited from it's extension. When CONTENT_NODE_SECTION is activated, DOES_RECUPERATE_CONTENT section is also activated. If the complexType is an extension of a compleType that has content, the base type will have a CONTENT_NODE_SECTION section, but the derived type will not, it will only activate DOES_RECUPERATE_CONTENT section. Into a CONTENT_NODE_SECTION :
    • TYPE is the type used to stock the content. This type may be stock as a member or a base class, depending on tags defined in simpleType_dictionary of the options.xml file.
    • ?? In addition, if the base of the xs:extension refer to a xs:simpleType then the xml configuration file must have a simpleType_dictionary with xsd_type equal to this type. ?? This simpleType_dictionary will be used to fill the CONTENT_NODE_SECTION.
  • NOT_DIRECTLY_A_CONTENT_NODE section is activated if CONTENT_NODE_SECTION isn't.
  • DOESNT_RECUPERATE_CONTENT section is activated if DOES_RECUPERATE_CONTENT isn't.
  • CHILD_SECTION section is unroll once for each xs:element of the xs:complexType. Into a CHILD_SECTION:
    • UNIQUE section is activated if maxOccurs="1".
    • OPTIONAL section is activated if minOccurs="0" and maxOccurs="1", so OPTIONAL implies UNIQUE.
    • ELEMENT_MARKER is the value of the name attribute of the xs:element.
    • XSD_TYPE is the value of the type attribute of the xs:element, without the namespace prefix.
    • XSD_PREFIX is the namespace prefix of the xs:element.
    • TYPE is the type that will represent this child as a member of the new class.
    • TYPE_AS_NAME_PART is a variant of CHILD_TYPE that can be used by agragetion to form a variable or function name.
    • DOESNT_HAS_A_SCHIZOPHRENE_MARKER section is activated if the xml marker of this element is only used for one single type into this schema.
    • FIRST_CHILD_FOR_THIS_MARKER section is activated for one of the xs:element sharing a same marker.
    • NOT_CHILD_OF_ITSELF section is activated when the type of the xs:element is not the type of its parent xs:complexType.
    • CHILD_IS_LESS_THAN is in little intricated. Suppose a complexType A is an extension of complexType B which may contains elements of type A. A.hpp must include B.hpp, and B.hpp cannot include A.hpp, so Patlac::Xml2cpp consider that B is less than A. Some cases of inclusions are more complicated and an arbitrary decision must be taken to decide which .hpp is allowed to include which one. XMLSchema.xsd is a good example of such incest. If an xs:element of a xs:complexType is consider less than its parent, CHILD_IS_LESS_THAN section is activated. If not, CHILD_IS_NOT_LESS_THAN section is activated. Of course, xs:simpleType are always less than xs:complexTypes, but also Patlac::Xml2cpp creates a partial ordering (in mathematical sense) of the collection of xs:complexTypes such that :
      • If a xs:complexTypes A is an xs:extension from base="B", than B is less than A.
      • If a xs:complexTypes A contains an xs:element of type="B", it does not imply that B is less than A.
      • If a xs:complexTypes A contains an xs:element of type="B" which is not from the same schema, than B is less than A.
      • If it is theoreticaly possible for all childs to be less than their respective parents, than they will.
      • If it is not theoreticaly possible for all childs to be less than their respective parents, it is not sure weather the better choice will be done. Some more work could be done (Todo). Users can help by orderring the xsd schema.
      • If B is less than A, than A.hpp can include B.hpp, and no cyclic inclusions will occurs.
    • In addition to these tags and sections, the CHILD_SECTION will also be filled like and ELEMENT_SECTION.

4.2.3   Element sections

  • ELEMENT_SECTION in unrolled once for each xs:topLevelElement. It shares with CHILD_SECTION tags and sections described in < nom du chapitre >_.

4.2.4   Tags and sections shared by xs:localElement ans xs:topLevelElement.

  • ROOT_ELEMENT section is activated for the root node of the document.
  • ELEMENT_ANNOTATION_SECTION section is activated if the xs:element has a xs:annotation. ELEMENT_ANNOTATION_SECTION are filled like COMPLEXTYPE_ANNOTATION_SECTION.
    • In addition, if the type of the xs:element refer to a xs:simpleType then the xml configuration file must have a simpleType_dictionary with xsd_type equal to this type. This simpleType_dictionary will be used to fill the CHILD_SECTION.
  • ATTRIBUTE_SECTION is developed once for each xs:attributes of the xs:complexType. Into an ATTRIBUTE_SECTION :
    • MARKER is the value of the name attribute of the xs:attribute.
    • REQUIRED section is activated if the use attribute of the xs:attribute is set to required.
    • OPTIONAL section is activated if REQUIRED section isn't.
    • ATTRIBUTE_ANNOTATION_SECTION section is activated if the xs:element has a xs:annotation. ATTRIBUTE_ANNOTATION_SECTION are filled like COMPLEXTYPE_ANNOTATION_SECTION.
    • In addition, the xml configuration file must have a simpleType_dictionary with xsd_type equal to the type of the xs:attribute. The simpleType_dictionary will be used to fill the ATTRIBUTE_SECTION.

Section partagé par les types.

  • XSD_TYPE is the name of the type attribute into the schema.
  • TYPE is the type used to represent XSD_TYPE into the generated code. For xs:complexType, TYPE = XSD_TYPE, while for xs:simpleType it is determined by the simpleType_dictionary associated to this XSD_TYPE into the xml options file.

4.2.5   include_dictionary

The xml options file has many ways to improve users actions. First of all, include_dictionary can be used to avoid large repetition or to clarify the text. For each include_dictionary into the xml options file there is one .tpl file and one or many {{>TAG}} reference into others .tpl files. Each time the {{>TAG}} is encounter into a .tpl file it is replaced by the contain of the corresponding .tpl file. This is done before anything else, especially before substitutions. The default files released with this package contain :

  • {{>HEADER}} is replaced by the contain of header.tpl at the beginning of each file generated.
  • {{>CONSTRUCTORS}} is replaced by the contain of constructors.cpp.tpl into class.cpp.tpl.
  • {{>OPERATORS}} is replaced by the contain of operators.cpp.tpl into class.cpp.tpl.
  • {{>ACCESSORS}} is replaced by the contain of accessors.hpp.tpl if it is encounter into class.hpp.tpl and by accessors.cpp.tpl if it is encounter into class.cpp.tpl.
  • {{>SUPPORTS}} is replaced by the contain of supports.cpp.tpl into class.cpp.tpl. It is used to describe utility functions. These might be absent if users comment their corresponding show_section.
  • {{>CONTAINER}} is replaced by the contain of container.hpp.tpl if it is encounter into class.hpp.tpl and by container.cpp.tpl if it is encounter into class.cpp.tpl. It is used to describe the children container. If the corresponding show_section are uncommented, these container have grandchildren accessors member functions.

4.2.6   substitution

Like include_dictionary, substitution are literally replaced into the text by there content, but the content of substitution is written directly into the xml options file instead of being into a separate .tpl file. Also the tag of substitution have the syntax of value {{TAG}} while those of include_dictionary have the syntax of IncludeSectionDictionary {{>TAG}}. Because the substitution applied one-by-one you must ordered them carefully. Substitutions are usefull to have only one central place to change some constructed name like the name of accessors. In a future release of this software, a gentle user interface will help configured them.

4.2.7   Support section

Into each .tpl file are some sections that might be turned on or off. By convention, those sections intended to be optional are suffixed by _SUPPORT. Of course, any tpl writers can add such sections. All these sections are turn off by defaults, they can be turn on by adding a show_section node into the xml file.

  • DEBUG_XML_SUPPORT creates classes with in a highly verbose DO_DEBUG mode. Each assignment of an object will cause the object before and after the assignment to be printed into the standard error. Useful to test newly modified operator=.
  • GRANDCHILDS_GETTER_SUPPORT gives member functions to Myclasse::List and cie. to retrieve their children. So, my_list_of_schemas.complexTypes().complexContents().extensions() gives you a Extension::ListPtr containing all xs:extension: of all *xs:complexContent of all xs:complexType into my_list_of_schemas. It avoids to mechanically write a lot of nested for( ... ), reducing potential typos, but it creates huge classes.
  • SUBLIST_SUPPORT makes the type Myclasse::List to own a template member operator() which takes a predicate at entrance and give back a Myclasse::PtrList containing a pointer to each element satisfying the predicate.
  • XS_ANY_SUPPORT is quite experimental. It allows classes containing an xs:any child to contain instance of any type. It uses a boost::any inspired Patlac::conteneur_pour_any_xml. Tested for pepXML.

4.2.8   Setting values

There is also tags, for which a value most be set. It can be done by adding a set_value node into the xml file.

4.3   Xml2cpp_options : describing projects in the xml options file

Templates .tpl describe generated code at file scope, options.xml describe the project scope. Typically, for a complete project, a base directory is copied to the result directory. This base directory is chosen into the options.xml file. Then .tpl are used to create files corresponding to the given schema. The location of the template files and the relative location of the result files are also chosen into the options.xml file. Finally, particularities of different xs:simpleType are also describe into the options.xml file by way of dictionaries that mimic the functionality of google::Templates More detail about can be found here options.xml. This documentation is automatically generated by doxygen from the classes automatically generated by Patlac::Xml2cpp from the xsd schema file 'xml2cpp_options.xsd'.

5   Create rpms

To create an rpm from a cpp_libtool project, one just have to type make rpm from the root directory of the library. 111111111 refaire 111111111

5.1   Notes

[1]xml.xsd is slightly different from original found at http://www.w3.org/2001/xml.xsd . The reason is that noting into the original file indicates that the namespace's prefix corresponding to the targetNamespace="http://www.w3.org/XML/1998/namespace" should be xml:. The atribute xmlns:xml is missing (why?) and Patlac::xml2cpp can not guess it. It could maybe try guessing that the file name is also the namespace's prefix, but it is not true in general.
[2]Note that offset elements documented for mzXML 2.1 are not part of mzXML 3.0.
[3]The schema uniprot.xsd diverges from uniprot.dtd, the last one is no longer supported by its owner.
[4]Notice that the first example at http://www.w3schools.com/schema/schema_complex_empty.asp will not be interpreted by Patlac::Xml2cpp as a node with a positiveInteger attribute, but it will complain that you try to have a complexType restriction of a non-complexType.
[5]If you don't have autoreconf, try : aclocal && automake && autoconf

6   Patlac::Xml2cpp license

Copyright (c) 2006-2007 by Patrick Lacasse.

Permission to use, copy, modify, and distribute this software and its documentation under the terms of the GNU General Public License is hereby granted. No representations are made about the suitability of this software for any purpose. It is provided "as is" without express or implied warranty. See the GNU General Public License for more details.

Classes produced by Patlac::Xml2cpp are derivative works derived from the input used in their production; they are not affected by this license.

7   Contact

patrick.lacasse@genome.ulaval.ca

placasse@mat.ulaval.ca