Guide

The basic building blocks used in the declxml library are processor objects. Processors are used to define the structure of an XML document. There are two types of processors:

  • Primitive processors - Used for processing simple, primitive values like
    booleans, integers, floats, and strings.
  • Aggregate processors - Used for processing aggregate values such as
    dictionaries, arrays, and objects. Aggregate processors are themselves composed of other processors.

Primitive processors are created using the processor factory function that corresponds to the type of value to process. The factory functions offer several options for configuring a processor. The following creates a basic processor for integer values contained within an “id” element:

>>> import declxml as xml

>>> xml.integer('id')
<declxml._PrimitiveValue object at ...>

Aggregate processors are created by specifying a list of child processors that compose the aggregate. The following creates a processor for dictionary values contained within a “user” element that itself contains a “user-name” and an “id” sub-element:

>>> import declxml as xml

>>> xml.dictionary('user', [
...     xml.integer('id'),
...     xml.string('user-name')
... ])
<declxml._Dictionary object at ...>

Parsing and Serialization

Processors define the structure of an XML document and are used to both parse and serialize data to and from XML.

>>> import declxml as xml
>>> from pprint import pprint

>>> author_xml = """
... <author>
...     <name>Robert A. Heinlein</name>
...     <birth-year>1907</birth-year>
... </author>
... """

>>> author_processor = xml.dictionary('author', [
...     xml.string('name'),
...     xml.integer('birth-year')
... ])

>>> pprint(xml.parse_from_string(author_processor, author_xml))
{'birth-year': 1907, 'name': 'Robert A. Heinlein'}


>>> author = {'name': 'Isaac Asimov', 'birth-year': 1920}
>>> print(xml.serialize_to_string(author_processor, author, indent='  '))
<?xml version="1.0" encoding="utf-8"?>
<author>
  <name>Isaac Asimov</name>
  <birth-year>1920</birth-year>
</author>

Attributes

Processors may be configured to read and write values from attributes.

>>> import declxml as xml
>>> from pprint import pprint

>>> author_processor = xml.dictionary('author', [
...     xml.string('name'),
...     xml.integer('birth-year'),
...     xml.string('birth-year', attribute='birth-month')
... ])

>>> author_xml = """
... <author>
...     <name>Robert A. Heinlein</name>
...     <birth-year birth-month="July">1907</birth-year>
... </author>
... """

>>> pprint(xml.parse_from_string(author_processor, author_xml))
{'birth-month': 'July', 'birth-year': 1907, 'name': 'Robert A. Heinlein'}


>>> author = {
...     'name': 'Isaac Asimov',
...     'birth-year': 1920,
...     'birth-month': 'January'
... }
>>> print(xml.serialize_to_string(author_processor, author, indent='    '))
<?xml version="1.0" encoding="utf-8"?>
<author>
    <name>Isaac Asimov</name>
    <birth-year birth-month="January">1920</birth-year>
</author>

Validation

Processors can perform basic validation such as ensuring required elements are present.

>>> import declxml as xml

>>> author_processor = xml.dictionary('author', [
...     xml.string('name'),
...     xml.integer('birth-year')
... ])

>>> author_xml = """
... <author>
...     <name>Robert A. Heinlein</name>
... </author>
... """

>>> xml.parse_from_string(author_processor, author_xml)
Traceback (most recent call last):
...
MissingValue: Missing required element "birth-year" at author/birth-year

Processors also ensure values are of the correct type.

>>> import declxml as xml

>>> author_processor = xml.dictionary('author', [
...     xml.string('name'),
...     xml.integer('birth-year')
... ])

>>> author_xml = """
... <author>
...     <name>Robert A. Heinlein</name>
...     <birth-year>Hello</birth-year>
... </author>
... """

>>> xml.parse_from_string(author_processor, author_xml)
Traceback (most recent call last):
...
InvalidPrimitiveValue: Invalid numeric value "Hello" at author/birth-year

Optional and Default Values

Processors may specify optional and default values.

>>> import declxml as xml
>>> from pprint import pprint

>>> author_processor = xml.dictionary('author', [
...     xml.string('name'),
...     xml.integer('birth-year'),
...     xml.string('genre', required=False, default='Sci-Fi')
... ])

>>> author_xml = """
... <author>
...     <name>Robert A. Heinlein</name>
...     <birth-year>1907</birth-year>
... </author>
... """

>>> pprint(xml.parse_from_string(author_processor, author_xml))
{'birth-year': 1907, 'genre': 'Sci-Fi', 'name': 'Robert A. Heinlein'}


>>> author_xml = """
... <author>
...     <name>J. K. Rowling</name>
...     <birth-year>1965</birth-year>
...     <genre>Fantasy</genre>
... </author>
... """

>>> pprint(xml.parse_from_string(author_processor, author_xml))
{'birth-year': 1965, 'genre': 'Fantasy', 'name': 'J. K. Rowling'}

Aliases

By default, processors use the element name as the name of the value in Python. An alias can be provided to use a different name for the value in Python.

>>> import declxml as xml
>>> from pprint import pprint

>>> author_xml = """
... <author>
...     <name>Robert A. Heinlein</name>
...     <birth-year>1907</birth-year>
... </author>
... """

>>> author_processor = xml.dictionary('author', [
...     xml.string('name', alias='author_name'),
...     xml.integer('birth-year', alias='year_born')
... ])

>>> pprint(xml.parse_from_string(author_processor, author_xml))
{'author_name': 'Robert A. Heinlein', 'year_born': 1907}


>>> author = {'author_name': 'Isaac Asimov', 'year_born': 1920}
>>> print(xml.serialize_to_string(author_processor, author, indent='   '))
<?xml version="1.0" encoding="utf-8"?>
<author>
   <name>Isaac Asimov</name>
   <birth-year>1920</birth-year>
</author>

Omitting Empty Values

Processors can be configured to omit missing or falsey values when serializing. Only optional values may be omitted.

>>> import declxml as xml

>>> author_processor = xml.dictionary('author', [
...     xml.string('name'),
...     xml.integer('birth-year'),
...     xml.string('nationality', required=False, omit_empty=True)
... ])

>>> author = {'name': 'Isaac Asimov', 'birth-year': 1920, 'nationality': ''}
>>> print(xml.serialize_to_string(author_processor, author, indent='    '))
<?xml version="1.0" encoding="utf-8"?>
<author>
    <name>Isaac Asimov</name>
    <birth-year>1920</birth-year>
</author>

>>> author = {'name': 'Robert A. Heinlein', 'birth-year': 1907, 'nationality': 'American'}
>>> print(xml.serialize_to_string(author_processor, author, indent='    '))
<?xml version="1.0" encoding="utf-8"?>
<author>
    <name>Robert A. Heinlein</name>
    <birth-year>1907</birth-year>
    <nationality>American</nationality>
</author>

Arrays

Processors can be defined for array values. When creating an array processor, a processor must be specified for processing the array’s items. An array is treated as optional if its item processor is configured as optional.

An array can be either embedded or nested. An embedded array is embedded directly within its parent as in the following:

>>> import declxml as xml
>>> from pprint import pprint

>>> author_processor = xml.dictionary('author', [
...     xml.string('name'),
...     xml.array(xml.string('book'), alias='books')
... ])

>>> author_xml = """
... <author>
...     <name>Robert A. Heinlein</name>
...     <book>Starship Troopers</book>
...     <book>Stranger in a Strange Land</book>
... </author>
... """

>>> pprint(xml.parse_from_string(author_processor, author_xml))
{'books': ['Starship Troopers', 'Stranger in a Strange Land'],
 'name': 'Robert A. Heinlein'}

A nested array is nested within a separate array element

>>> import declxml as xml
>>> from pprint import pprint

>>> author_processor = xml.dictionary('author', [
...     xml.string('name'),
...     xml.array(xml.string('book'), nested='books')
... ])

>>> author_xml = """
... <author>
...     <name>Robert A. Heinlein</name>
...     <books>
...         <book>Starship Troopers</book>
...         <book>Stranger in a Strange Land</book>
...     </books>
... </author>
... """

>>> pprint(xml.parse_from_string(author_processor, author_xml))
{'books': ['Starship Troopers', 'Stranger in a Strange Land'],
 'name': 'Robert A. Heinlein'}

Composing Processors

Processors can be composed to define more complex document structures.

>>> import declxml as xml
>>> from pprint import pprint

>>> genre_xml = """
... <genre-authors>
...     <genre>Science Fiction</genre>
...     <author>
...         <name>Robert A. Heinlein</name>
...         <birth-year>1907</birth-year>
...         <book>
...             <title>Starship Troopers</title>
...             <year-published>1959</year-published>
...         </book>
...         <book>
...             <title>Stranger in a Strange Land</title>
...             <year-published>1961</year-published>
...         </book>
...     </author>
...     <author>
...         <name>Isaac Asimov</name>
...         <birth-year>1920</birth-year>
...         <book>
...             <title>I, Robot</title>
...             <year-published>1950</year-published>
...         </book>
...         <book>
...             <title>Foundation</title>
...             <year-published>1951</year-published>
...         </book>
...     </author>
... </genre-authors>
... """

>>> book_processor = xml.dictionary('book', [
...     xml.string('title'),
...     xml.integer('year-published')
... ])

>>> author_processor = xml.dictionary('author', [
...     xml.string('name'),
...     xml.integer('birth-year'),
...     xml.array(book_processor, alias='books')
... ])

>>> genre_processor = xml.dictionary('genre-authors', [
...     xml.string('genre'),
...     xml.array(author_processor, alias='authors')
... ])


>>> pprint(xml.parse_from_string(genre_processor, genre_xml))
{'authors': [{'birth-year': 1907,
              'books': [{'title': 'Starship Troopers',
                         'year-published': 1959},
                        {'title': 'Stranger in a Strange Land',
                         'year-published': 1961}],
              'name': 'Robert A. Heinlein'},
             {'birth-year': 1920,
              'books': [{'title': 'I, Robot', 'year-published': 1950},
                        {'title': 'Foundation', 'year-published': 1951}],
              'name': 'Isaac Asimov'}],
 'genre': 'Science Fiction'}

User-Defined Classes

Processors can also be created for parsing and serializing XML data to and from user-defined classes. Simply provide the class to the processor factory function.

>>> import declxml as xml

>>> class Author:
...
...    def __init__(self):
...        self.name = None
...        self.birth_year = None
...
...    def __repr__(self):
...        return 'Author(name=\'{}\', birth_year={})'.format(
...            self.name, self.birth_year)


>>> author_processor = xml.user_object('author', Author, [
...     xml.string('name'),
...     xml.integer('birth-year', alias='birth_year')
... ])

>>> author_xml = """
... <author>
...     <name>Robert A. Heinlein</name>
...     <birth-year>1907</birth-year>
... </author>
... """

>>> xml.parse_from_string(author_processor, author_xml)
Author(name='Robert A. Heinlein', birth_year=1907)

>>> author = Author()
>>> author.name = 'Isaac Asimov'
>>> author.birth_year = 1920

>>> print(xml.serialize_to_string(author_processor, author, indent='    '))
<?xml version="1.0" encoding="utf-8"?>
<author>
  <name>Isaac Asimov</name>
  <birth-year>1920</birth-year>
</author>

Note that the class provided to the user_object factory function must have a zero-argument constructor. It is also possible to pass any other callable object that takes zero parameters and returns an object instance to which parsed values will be read into.

Named Tuples

Processors may also be created for named tuple values.

>>> from collections import namedtuple
>>> import declxml as xml


>>> Author = namedtuple('Author', ['name', 'birth_year'])


>>> author_processor = xml.named_tuple('author', Author, [
...     xml.string('name'),
...     xml.integer('birth-year', alias='birth_year')
... ])

>>> author_xml = """
... <author>
...     <name>Robert A. Heinlein</name>
...     <birth-year>1907</birth-year>
... </author>
... """

>>> xml.parse_from_string(author_processor, author_xml)
Author(name='Robert A. Heinlein', birth_year=1907)

>>> author = Author(name='Isaac Asimov', birth_year=1920)
>>> print(xml.serialize_to_string(author_processor, author, indent='    '))
<?xml version="1.0" encoding="utf-8"?>
<author>
  <name>Isaac Asimov</name>
  <birth-year>1920</birth-year>
</author>