Python Xpath Example


xml.etree.ElementTree — The ElementTree XML API ...

xml.etree.ElementTree — The ElementTree XML API …

Source code: Lib/xml/etree/
The module implements a simple and efficient API
for parsing and creating XML data.
Changed in version 3. 3: This module will use a fast implementation whenever available.
Deprecated since version 3. 3: The module is deprecated.
The module is not secure against
maliciously constructed data. If you need to parse untrusted or
unauthenticated data see XML vulnerabilities.
This is a short tutorial for using (ET in
short). The goal is to demonstrate some of the building blocks and basic
concepts of the module.
XML tree and elements¶
XML is an inherently hierarchical data format, and the most natural way to
represent it is with a tree. ET has two classes for this purpose –
ElementTree represents the whole XML document as a tree, and
Element represents a single node in this tree. Interactions with
the whole document (reading and writing to/from files) are usually done
on the ElementTree level. Interactions with a single XML element
and its sub-elements are done on the Element level.
Parsing XML¶
We’ll be using the following XML document as the sample data for this section:




We can import this data by reading from a file:
import as ET
tree = (”)
root = troot()
Or directly from a string:
root = omstring(country_data_as_string)
fromstring() parses XML from a string directly into an Element,
which is the root element of the parsed tree. Other parsing functions may
create an ElementTree. Check the documentation to be sure.
As an Element, root has a tag and a dictionary of attributes:
It also has children nodes over which we can iterate:
>>> for child in root:… print(, )…
country {‘name’: ‘Liechtenstein’}
country {‘name’: ‘Singapore’}
country {‘name’: ‘Panama’}
Children are nested, and we can access specific child nodes by index:
>>> root[0][1]
Not all elements of the XML input will end up as elements of the
parsed tree. Currently, this module skips over any XML comments,
processing instructions, and document type declarations in the
input. Nevertheless, trees built using this module’s API rather
than parsing from XML text can have comments and processing
instructions in them; they will be included when generating XML
output. A document type declaration may be accessed by passing a
custom TreeBuilder instance to the XMLParser
Pull API for non-blocking parsing¶
Most parsing functions provided by this module require the whole document
to be read at once before returning any result. It is possible to use an
XMLParser and feed data into it incrementally, but it is a push API that
calls methods on a callback target, which is too low-level and inconvenient for
most needs. Sometimes what the user really wants is to be able to parse XML
incrementally, without blocking operations, while enjoying the convenience of
fully constructed Element objects.
The most powerful tool for doing this is XMLPullParser. It does not
require a blocking read to obtain the XML data, and is instead fed with data
incrementally with () calls. To get the parsed XML
elements, call ad_events(). Here is an example:
>>> parser = ET. XMLPullParser([‘start’, ‘end’])
>>> (‘sometext’)
>>> list(ad_events())
[(‘start’, )]
>>> (‘ more text
>>> for event, elem in ad_events():… print(event)… print(, ‘text=’, )…
The obvious use case is applications that operate in a non-blocking fashion
where the XML data is being received from a socket or read incrementally from
some storage device. In such cases, blocking reads are unacceptable.
Because it’s so flexible, XMLPullParser can be inconvenient to use for
simpler use-cases. If you don’t mind your application blocking on reading XML
data but would still like to have incremental parsing capabilities, take a look
at iterparse(). It can be useful when you’re reading a large XML document
and don’t want to hold it wholly in memory.
Finding interesting elements¶
Element has some useful methods that help iterate recursively over all
the sub-tree below it (its children, their children, and so on). For example,
>>> for neighbor in (‘neighbor’):… print()…
{‘name’: ‘Austria’, ‘direction’: ‘E’}
{‘name’: ‘Switzerland’, ‘direction’: ‘W’}
{‘name’: ‘Malaysia’, ‘direction’: ‘N’}
{‘name’: ‘Costa Rica’, ‘direction’: ‘W’}
{‘name’: ‘Colombia’, ‘direction’: ‘E’}
ndall() finds only elements with a tag which are direct
children of the current element. () finds the first child
with a particular tag, and accesses the element’s text
content. () accesses the element’s attributes:
>>> for country in ndall(‘country’):… rank = (‘rank’)… name = (‘name’)… print(name, rank)…
Liechtenstein 1
Singapore 4
Panama 68
More sophisticated specification of which elements to look for is possible by
using XPath.
Modifying an XML File¶
ElementTree provides a simple way to build XML documents and write them to files.
The () method serves this purpose.
Once created, an Element object may be manipulated by directly changing
its fields (such as), adding and modifying attributes
(() method), as well as adding new children (for example
with ()).
Let’s say we want to add one to each country’s rank, and add an updated
attribute to the rank element:
>>> for rank in (‘rank’):… new_rank = int() + 1… = str(new_rank)… (‘updated’, ‘yes’)…
>>> (”)
Our XML now looks like this:
We can remove elements using (). Let’s say we want to
remove all countries with a rank higher than 50:
>>> for country in ndall(‘country’):… # using ndall() to avoid removal during traversal… rank = int((‘rank’))… if rank > 50:… (country)…
Note that concurrent modification while iterating can lead to problems,
just like when iterating and modifying Python lists or dicts.
Therefore, the example first collects all matching elements with
ndall(), and only then iterates over the list of matches.
Building XML documents¶
The SubElement() function also provides a convenient way to create new
sub-elements for a given element:
>>> a = ET. Element(‘a’)
>>> b = bElement(a, ‘b’)
>>> c = bElement(a, ‘c’)
>>> d = bElement(c, ‘d’)
>>> (a)

Parsing XML with Namespaces¶
If the XML input has namespaces, tags and attributes
with prefixes in the form prefix:sometag get expanded to
{uri}sometag where the prefix is replaced by the full URI.
Also, if there is a default namespace,
that full URI gets prepended to all of the non-prefixed tags.
Here is an XML example that incorporates two namespaces, one with the
prefix “fictional” and the other serving as the default namespace:

By default, the href attribute is treated as a file name. You can use custom loaders to override this behaviour. Also note that the standard helper does not support XPointer syntax.
To process this file, load it as usual, and pass the root element to the module:
from import ElementTree, ElementInclude
tree = (“”)
The ElementInclude module replaces the {include element with the root element from the document. The result might look something like this: This is a paragraph. If the parse attribute is omitted, it defaults to “xml”. The href attribute is required.
To include a text document, use the {include element, and set the parse attribute to “text”:
Copyright (c) .
The result might look something like:
Copyright (c) 2003.
(href, parse, encoding=None)¶
Default loader. This default loader reads an included resource from disk. href is a URL.
parse is for parse mode either “xml” or “text”. encoding
is an optional text encoding. If not given, encoding is utf-8. Returns the
expanded resource. If the parse mode is “xml”, this is an ElementTree
instance. If the parse mode is “text”, this is a Unicode string. If the
loader fails, it can return None or raise an exception.
(elem, loader=None, base_url=None, max_depth=6)¶
This function expands XInclude directives. elem is the root element. loader is
an optional resource loader. If omitted, it defaults to default_loader().
If given, it should be a callable that implements the same interface as
default_loader(). base_url is base URL of the original file, to resolve
relative include file references. max_depth is the maximum number of recursive
inclusions. Limited to reduce the risk of malicious content explosion. Pass a
negative value to disable the limitation.
Returns the expanded resource. If the parse mode is
“xml”, this is an ElementTree instance. If the parse mode is “text”,
this is a Unicode string. If the loader fails, it can return None or
raise an exception.
New in version 3. 9: The base_url and max_depth parameters.
Element Objects¶
class (tag, attrib={}, **extra)¶
Element class. This class defines the Element interface, and provides a
reference implementation of this interface.
bytestrings or Unicode strings. tag is the element name. attrib is
an optional dictionary, containing element attributes. extra contains
additional attributes, given as keyword arguments.
A string identifying what kind of data this element represents (the
element type, in other words).
These attributes can be used to hold additional data associated with
the element. Their values are usually strings but may be any
application-specific object. If the element is created from
an XML file, the text attribute holds either the text between
the element’s start tag and its first child or end tag, or None, and
the tail attribute holds either the text between the element’s
end tag and the next tag, or None. For the XML data
the a element has None for both text and tail attributes,
the b element has text “1” and tail “4”,
the c element has text “2” and tail None,
and the d element has text None and tail “3”.
To collect the inner text of an element, see itertext(), for
example “”(ertext()).
Applications may store arbitrary objects in these attributes.
A dictionary containing the element’s attributes. Note that while the
attrib value is always a real mutable Python dictionary, an ElementTree
implementation may choose to use another internal representation, and
create the dictionary only if someone asks for it. To take advantage of
such implementations, use the dictionary methods below whenever possible.
The following dictionary-like methods work on the element attributes.
Resets an element. This function removes all subelements, clears all
attributes, and sets the text and tail attributes to None.
get(key, default=None)¶
Gets the element attribute named key.
Returns the attribute value, or default if the attribute was not found.
Returns the element attributes as a sequence of (name, value) pairs. The
attributes are returned in an arbitrary order.
Returns the elements attribute names as a list. The names are returned
in an arbitrary order.
set(key, value)¶
Set the attribute key on the element to value.
The following methods work on the element’s children (subelements).
Adds the element subelement to the end of this element’s internal list
of subelements. Raises TypeError if subelement is not an
Appends subelements from a sequence object with zero or more elements.
Raises TypeError if a subelement is not an Element.
find(match, namespaces=None)¶
Finds the first subelement matching match. match may be a tag name
or a path. Returns an element instance
or None. namespaces is an optional mapping from namespace prefix
to full name. Pass ” as prefix to move all unprefixed tag names
in the expression into the given namespace.
findall(match, namespaces=None)¶
Finds all matching subelements, by tag name or
path. Returns a list containing all matching
elements in document order. namespaces is an optional mapping from
namespace prefix to full name. Pass ” as prefix to move all
unprefixed tag names in the expression into the given namespace.
findtext(match, default=None, namespaces=None)¶
Finds text for the first subelement matching match. match may be
a tag name or a path. Returns the text content
of the first matching element, or default if no element was found.
Note that if the matching element has no text content an empty string
is returned. namespaces is an optional mapping from namespace prefix
insert(index, subelement)¶
Inserts subelement at the given position in this element. Raises
TypeError if subelement is not an Element.
Creates a tree iterator with the current element as the root.
The iterator iterates over this element and all elements below it, in
document (depth first) order. If tag is not None or ‘*’, only
elements whose tag equals tag are returned from the iterator. If the
tree structure is modified during iteration, the result is undefined.
iterfind(match, namespaces=None)¶
path. Returns an iterable yielding all
matching elements in document order. namespaces is an optional mapping
from namespace prefix to full name.
Creates a text iterator. The iterator loops over this element and all
subelements, in document order, and returns all inner text.
makeelement(tag, attrib)¶
Creates a new element object of the same type as this element. Do not
call this method, use the SubElement() factory function instead.
Removes subelement from the element. Unlike the find* methods this
method compares elements based on the instance identity, not on tag value
or contents.
Element objects also support the following sequence type methods
for working with subelements: __delitem__(),
__getitem__(), __setitem__(),
Caution: Elements with no subelements will test as False. This behavior
will change in future versions. Use specific len(elem) or elem is
None test instead.
element = (‘foo’)
if not element: # careful!
print(“element not found, or element has no subelements”)
if element is None:
print(“element not found”)
Prior to Python 3. 8, the serialisation order of the XML attributes of
elements was artificially made predictable by sorting the attributes by
their name. Based on the now guaranteed ordering of dicts, this arbitrary
reordering was removed in Python 3. 8 to preserve the order in which
attributes were originally parsed or created by user code.
In general, user code should try not to depend on a specific ordering of
attributes, given that the XML Information Set explicitly excludes the attribute
order from conveying information. Code should be prepared to deal with
any ordering on input. In cases where deterministic XML output is required,
e. for cryptographic signing or test data sets, canonical serialisation
is available with the canonicalize() function.
In cases where canonical output is not applicable but a specific attribute
order is still desirable on output, code should aim for creating the
attributes directly in the desired order, to avoid perceptual mismatches
for readers of the code. In cases where this is difficult to achieve, a
recipe like the following can be applied prior to serialisation to enforce
an order independently from the Element creation:
def reorder_attributes(root):
for el in ():
attrib =
if len(attrib) > 1:
# adjust attribute order, e. by sorting
attribs = sorted(())
ElementTree Objects¶
class (element=None, file=None)¶
ElementTree wrapper class. This class represents an entire element
hierarchy, and adds some extra support for serialization to and from
standard XML.
element is the root element. The tree is initialized with the contents
of the XML file if given.
Replaces the root element for this tree. This discards the current
contents of the tree, and replaces it with the given element. Use with
care. element is an element instance.
Same as (), starting at the root of the tree.
Same as ndall(), starting at the root of the tree.
Same as ndtext(), starting at the root of the tree.
Returns the root element for this tree.
Creates and returns a tree iterator for the root element. The iterator
loops over all elements in this tree, in section order. tag is the tag
to look for (default is to return all elements).
Same as erfind(), starting at the root of the tree.
parse(source, parser=None)¶
Loads an external XML section into this element tree. source is a file
name or file object. parser is an optional parser instance.
If not given, the standard XMLParser parser is used. Returns the
section root element.
write(file, encoding=”us-ascii”, xml_declaration=None, default_namespace=None, method=”xml”, *, short_empty_elements=True)¶
Writes the element tree to a file, as XML. file is a file name, or a
file object opened for writing. encoding 1 is the output
encoding (default is US-ASCII).
xml_declaration controls if an XML declaration should be added to the
file. Use False for never, True for always, None
for only if not US-ASCII or UTF-8 or Unicode (default is None).
default_namespace sets the default XML namespace (for “xmlns”).
method is either “xml”, “html” or “text” (default is
The keyword-only short_empty_elements parameter controls the formatting
of elements that contain no content. If True (the default), they are
emitted as a single self-closed tag, otherwise they are emitted as a pair
of start/end tags.
The output is either a string (str) or binary (bytes).
This is controlled by the encoding argument. If encoding is
“unicode”, the output is a string; otherwise, it’s binary. Note that
this may conflict with the type of file if it’s an open
file object; make sure you do not try to write a string to a
binary stream and vice versa.
Changed in version 3. 8: The write() method now preserves the attribute order specified
This is the XML file that is going to be manipulated:

Example page

Moved to .

Example of changing the attribute “target” of every link in first paragraph:
>>> from import ElementTree
>>> tree = ElementTree()
>>> (“”)

>>> p = (“body/p”) # Finds first occurrence of tag p in body
>>> p

>>> links = list((“a”)) # Returns list of all links
>>> links
[, ]
>>> for i in links: # Iterates through all found links… [“target”] = “blank”
QName Objects¶
class (text_or_uri, tag=None)¶
QName wrapper. This can be used to wrap a QName attribute value, in order
to get proper namespace handling on output. text_or_uri is a string
containing the QName value, in the form {uri}local, or, if the tag argument
is given, the URI part of a QName. If tag is given, the first argument is
interpreted as a URI, and this argument is interpreted as a local name.
How to use XPath in Python? - Stack Overflow

How to use XPath in Python? – Stack Overflow

What are the libraries that support XPath? Is there a full implementation? How is the library used? Where is its website?
funnydman3, 4873 gold badges17 silver badges31 bronze badges
asked Aug 12 ’08 at 11:28
libxml2 has a number of advantages:
Compliance to the spec
Active development and a community participation
Speed. This is really a python wrapper around a C implementation.
Ubiquity. The libxml2 library is pervasive and thus well tested.
Downsides include:
Compliance to the spec. It’s strict. Things like default namespace handling are easier in other libraries.
Use of native code. This can be a pain depending on your how your application is distributed / deployed. RPMs are available that ease some of this pain.
Manual resource handling. Note in the sample below the calls to freeDoc() and xpathFreeContext(). This is not very Pythonic.
If you are doing simple path selection, stick with ElementTree ( which is included in Python 2. 5). If you need full spec compliance or raw speed and can cope with the distribution of native code, go with libxml2.
Sample of libxml2 XPath Use
import libxml2
doc = rseFile(“”)
ctxt = doc. xpathNewContext()
res = ctxt. xpathEval(“//*”)
if len(res)! = 2:
print “xpath query: wrong node set size”
if res[0]! = “doc” or res[1]! = “foo”:
print “xpath query: wrong node set value”
ctxt. xpathFreeContext()
Sample of ElementTree XPath Use
from elementtree. ElementTree import ElementTree
mydoc = ElementTree(file=”)
for e in ndall(‘/foo/bar’):
print (‘title’)
Markus Safar5, 9735 gold badges25 silver badges41 bronze badges
answered Aug 26 ’08 at 13:06
Ryan CoxRyan Cox4, 8772 gold badges23 silver badges18 bronze badges
The lxml package supports xpath. It seems to work pretty well, although I’ve had some trouble with the self:: axis. There’s also Amara, but I haven’t used it personally.
MvG52. 8k15 gold badges129 silver badges257 bronze badges
answered Aug 12 ’08 at 11:40
James SulakJames Sulak28. 8k11 gold badges50 silver badges56 bronze badges
Sounds like an lxml advertisement in here. 😉 ElementTree is included in the std library. Under 2. 6 and below its xpath is pretty weak, but in 2. 7+ much improved:
import as ET
root = (filename)
result = ”
for elem in ndall(‘. //child/grandchild’):
# How to make decisions based on attributes even in 2. 6:
if (‘name’) == ‘foo’:
result =
answered Nov 22 ’12 at 1:05
Gringo SuaveGringo Suave26. 2k6 gold badges79 silver badges71 bronze badges
Use LXML. LXML uses the full power of libxml2 and libxslt, but wraps them in more “Pythonic” bindings than the Python bindings that are native to those libraries. As such, it gets the full XPath 1. 0 implementation. Native ElemenTree supports a limited subset of XPath, although it may be good enough for your needs.
answered Nov 13 ’09 at 23:11
user210794user2107941, 0118 silver badges3 bronze badges
Another option is py-dom-xpath, it works seamlessly with minidom and is pure Python so works on appengine.
import xpath
(‘//item’, doc)
answered Jan 23 ’10 at 9:30
SamSam6, 1204 gold badges40 silver badges52 bronze badges
You can use:
from import Sax2
from xml import xpath
doc = omXmlFile(”). documentElement
for url in xpath. Evaluate(‘//@Url’, doc):
doc = rseFile(”)
for url in doc. xpathEval(‘//@Url’):
print ntent
answered Aug 23 ’10 at 13:00
0xAX0xAX19. 1k24 gold badges108 silver badges195 bronze badges
You can use the simple soupparser from lxml
from import fromstring
tree = fromstring(“Find me! “)
print (“//a/text()”)
answered Nov 15 ’15 at 5:31
Aminah NurainiAminah Nuraini14. 8k7 gold badges76 silver badges97 bronze badges
The latest version of elementtree supports XPath pretty well. Not being an XPath expert I can’t say for sure if the implementation is full but it has satisfied most of my needs when working in Python. I’ve also use lxml and PyXML and I find etree nice because it’s a standard module.
NOTE: I’ve since found lxml and for me it’s definitely the best XML lib out there for Python. It does XPath nicely as well (though again perhaps not a full implementation).
answered Aug 14 ’08 at 9:48
jkpjkp72. 2k26 gold badges99 silver badges102 bronze badges
If you want to have the power of XPATH combined with the ability to also use CSS at any point you can use parsel:
>>> from parsel import Selector
>>> sel = Selector(text=u”””

Hello, Parsel!

>>> (‘h1::text’). extract_first()
‘Hello, Parsel! ‘
>>> (‘//h1/text()’). extract_first()
answered Dec 16 ’17 at 22:16
eLRuLLeLRuLL17. 4k8 gold badges68 silver badges94 bronze badges
Another library is 4Suite:
I do not know how spec-compliant it is. But it has worked very well for my use. It looks abandoned.
answered Aug 23 ’10 at 12:57
codeapecodeape91. 6k22 gold badges145 silver badges175 bronze badges
PyXML works well.
You didn’t say what platform you’re using, however if you’re on Ubuntu you can get it with sudo apt-get install python-xml. I’m sure other Linux distros have it as well.
If you’re on a Mac, xpath is already installed but not immediately accessible. You can set PY_USE_XMLPLUS in your environment or do it the Python way before you import
if artswith(‘darwin’):
os. environ[‘PY_USE_XMLPLUS’] = ‘1’
In the worst case you may have to build it yourself. This package is no longer maintained but still builds fine and works with modern 2. x Pythons. Basic docs are here.
answered Aug 12 ’08 at 19:34
David JoynerDavid Joyner20. 4k4 gold badges26 silver badges33 bronze badges
If you are going to need it for html:
import as html
root = omstring(string)
answered May 29 ’19 at 13:48
Thomas G. 2, 18417 silver badges19 bronze badges
XPath and XSLT with lxml

XPath and XSLT with lxml

lxml supports XPath 1. 0, XSLT 1. 0 and the EXSLT extensions through
libxml2 and libxslt in a standards compliant way.
supports the simple path syntax of the find, findall and
findtext methods on ElementTree and Element, as known from the original
ElementTree library (ElementPath). As an lxml specific extension, these
classes also provide an xpath() method that supports expressions in the
complete XPath syntax, as well as custom extension functions.
There are also specialized XPath evaluator classes that are more efficient for
frequent evaluation: XPath and XPathEvaluator. See the performance
comparison to learn when to use which. Their semantics when used on
Elements and ElementTrees are the same as for the xpath() method described
Note that the *() methods are usually faster than the full-blown XPath
support. They also support incremental tree processing through the. iterfind()
method, whereas XPath always collects all results before returning them.
The xpath() method
For ElementTree, the xpath method performs a global XPath query against the
document (if absolute) or against the root node (if relative):
>>> f = StringIO(‘‘)
>>> tree = (f)
>>> r = (‘/foo/bar’)
>>> len(r)
>>> r[0]
>>> r = (‘bar’)
When xpath() is used on an Element, the XPath expression is evaluated
against the element (if relative) or against the root tree (if absolute):
>>> root = troot()
>>> bar = root[0]
>>> tree = troottree()
The xpath() method has support for XPath variables:
>>> expr = “//*[local-name() = $name]”
>>> print((expr, name = “foo”)[0])
>>> print((expr, name = “bar”)[0])
>>> print((“$text”, text = “Hello World! “))
Hello World!
Namespaces and prefixes
If your XPath expression uses namespace prefixes, you must define them
in a prefix mapping. To this end, pass a dictionary to the
namespaces keyword argument that maps the namespace prefixes used
in the XPath expression to namespace URIs:
>>> f = StringIO(”’… Text… ”’)
>>> doc = (f)
>>> r = (‘/x:foo/b:bar’,… namespaces={‘x’: ”,… ‘b’: ”})
The prefixes you choose here are not linked to the prefixes used
inside the XML document. The document may define whatever prefixes it
likes, including the empty prefix, without breaking the above code.
Note that XPath does not have a notion of a default namespace. The
empty prefix is therefore undefined for XPath and cannot be used in
namespace prefix mappings.
There is also an optional extensions argument which is used to
define custom extension functions in Python that are local to this
evaluation. The namespace prefixes that they use in the XPath
expression must also be defined in the namespace prefix mapping.
XPath return values
The return value types of XPath evaluations vary, depending on the
XPath expression used:
True or False, when the XPath expression has a boolean result
a float, when the XPath expression has a numeric result (integer or float)
a ‘smart’ string (as described below), when the XPath expression has
a string result.
a list of items, when the XPath expression has a list as result.
The items may include Elements (also comments and processing
instructions), strings and tuples. Text nodes and attributes in the
result are returned as ‘smart’ string values. Namespace
declarations are returned as tuples of strings: (prefix, URI).
XPath string results are ‘smart’ in that they provide a
getparent() method that knows their origin:
for attribute values, tparent() returns the Element
that carries them. An example is //foo/@attribute, where the
parent would be a foo Element.
for the text() function (as in //text()), it returns the
Element that contains the text or tail that was returned.
You can distinguish between different text origins with the boolean
properties is_text, is_tail and is_attribute.
Note that getparent() may not always return an Element. For
example, the XPath functions string() and concat() will
construct strings that do not have an origin. For them,
getparent() will return None.
There are certain cases where the smart string behaviour is
undesirable. For example, it means that the tree will be kept alive
by the string, which may have a considerable memory impact in the case
that the string value is the only thing in the tree that is actually
of interest. For these cases, you can deactivate the parental
relationship using the keyword argument smart_strings.
>>> root = (“TEXT“)
>>> find_text = (“//text()”)
>>> text = find_text(root)[0]
>>> print(text)
>>> print(tparent())
>>> find_text = (“//text()”, smart_strings=False)
>>> hasattr(text, ‘getparent’)
Generating XPath expressions
ElementTree objects have a method getpath(element), which returns a
structural, absolute XPath expression to find that element:
>>> a = etree. Element(“a”)
>>> b = bElement(a, “b”)
>>> c = bElement(a, “c”)
>>> d1 = bElement(c, “d”)
>>> d2 = bElement(c, “d”)
>>> tree = etree. ElementTree(c)
>>> print(tpath(d2))
>>> (tpath(d2)) == [d2]
The XPath class
The XPath class compiles an XPath expression into a callable function:
>>> root = (““)
>>> find = (“//b”)
>>> print(find(root)[0])
The compilation takes as much time as in the xpath() method, but it is
done only once per class instantiation. This makes it especially efficient
for repeated evaluation of the same XPath expression.
Just like the xpath() method, the XPath class supports XPath
>>> count_elements = (“count(//*[local-name() = $name])”)
>>> print(count_elements(root, name = “a”))
1. 0
>>> print(count_elements(root, name = “b”))
2. 0
This supports very efficient evaluation of modified versions of an XPath
expression, as compilation is still only required once.
Prefix-to-namespace mappings can be passed as second parameter:
>>> root = (““)
>>> find = (“//n:b”, namespaces={‘n’:’NS’})
Regular expressions in XPath
By default, XPath supports regular expressions in the EXSLT namespace:
>>> regexpNS = ”
>>> find = (“//*[re:test(., ‘^abc$’, ‘i’)]”,… namespaces={‘re’:regexpNS})
>>> root = (“aBaBc“)
You can disable this with the boolean keyword argument regexp which
defaults to True.
The XPathEvaluator classes
provides two other efficient XPath evaluators that work on
ElementTrees or Elements respectively: XPathDocumentEvaluator and
XPathElementEvaluator. They are automatically selected if you use the
XPathEvaluator helper for instantiation:
>>> xpatheval = etree. XPathEvaluator(root)
>>> print(isinstance(xpatheval, etree. XPathElementEvaluator))
>>> print(xpatheval(“//b”)[0])
This class provides efficient support for evaluating different XPath
expressions on the same Element or ElementTree.
ElementTree supports a language named ElementPath in its find*() methods.
One of the main differences between XPath and ElementPath is that the XPath
language requires an indirection through prefixes for namespace support,
whereas ElementTree uses the Clark notation ({ns}name) to avoid prefixes
completely. The other major difference regards the capabilities of both path
languages. Where XPath supports various sophisticated ways of restricting the
result set through functions and boolean expressions, ElementPath only
supports pure path traversal without nesting or further conditions. So, while
the ElementPath syntax is self-contained and therefore easier to write and
handle, XPath is much more powerful and expressive.
bridges this gap through the class ETXPath, which accepts XPath
expressions with namespaces in Clark notation. It is identical to the
XPath class, except for the namespace notation. Normally, you would
>>> root = (““)
>>> find = (“//p:b”, namespaces={‘p’: ‘ns’})
ETXPath allows you to change this to:
>>> find = XPath(“//{ns}b”)
Error handling
raises exceptions when errors occur while parsing or evaluating an
XPath expression:
>>> find = (“\”)
Traceback (most recent call last):…
Invalid expression
lxml will also try to give you a hint what went wrong, so if you pass a more
complex expression, you may get a somewhat more specific error:
>>> find = (“//*[1. 1. 1]”)
Invalid predicate
During evaluation, lxml will emit an XPathEvalError on errors:
>>> find = (“//ns:a”)
>>> find(root)
Undefined namespace prefix
This works for the XPath class, however, the other evaluators (including
the xpath() method) are one-shot operations that do parsing and evaluation
in one step. They therefore raise evaluation exceptions in all cases:
>>> root = etree. Element(“test”)
Note that lxml versions before 1. 3 always raised an XPathSyntaxError for
all errors, including evaluation errors. The best way to support older
versions is to except on the superclass XPathError.
introduces a new class, The class can be
given an ElementTree or Element object to construct an XSLT
>>> xslt_root = (”’… … … ”’)
>>> transform = (xslt_root)
You can then run the transformation on an ElementTree document by simply
calling it, and this results in another ElementTree object:
>>> f = StringIO(‘Text‘)
>>> result_tree = transform(doc)
By default, XSLT supports all extension functions from libxslt and
libexslt as well as Python regular expressions through the EXSLT
regexp functions. Also see the documentation on custom extension
functions, XSLT extension elements and document resolvers.
There is a separate section on controlling access to external
documents and resources.
XSLT result objects
The result of an XSL transformation can be accessed like a normal ElementTree
>>> root = (‘Text‘)
>>> result = transform(root)
>>> troot()
but, as opposed to normal ElementTree objects, can also be turned into an (XML
or text) string by applying the bytes() function (str() in Python 2):
>>> bytes(result)
The result is always a plain string, encoded as requested by the xsl:output
element in the stylesheet. If you want a Python Unicode/Text string instead,
you should set this encoding to UTF-8 (unless the ASCII default
is sufficient). This allows you to call the builtin str() function on
the result (unicode() in Python 2):
>>> str(result)
You can use other encodings at the cost of multiple recoding. Encodings that
are not supported by Python will result in an error:
>>> xslt_tree = (”’… … ”’)
>>> transform = (xslt_tree)
>>> result = transform(doc)
LookupError: unknown encoding: UCS4
While it is possible to use the () method (known from ElementTree
objects) to serialise the XSLT result into a file, it is better to use the. write_output() method. The latter knows about the tag
and writes the expected data into the output file.
>>> xslt_root = (”’… … ”’)
>>> result. write_output(“”, compression=9) # doctest: +SKIP
>>> from io import BytesIO
>>> out = BytesIO()
>>> result. write_output(out)
>>> data = tvalue()
>>> b’Text’ in data
Stylesheet parameters
It is possible to pass parameters, in the form of XPath expressions, to the
XSLT template:
>>> xslt_tree = (”’… … ”’)
>>> doc_root = (‘Text‘)
The parameters are passed as keyword parameters to the transform call.
First, let’s try passing in a simple integer expression:
>>> result = transform(doc_root, a=”5″)
You can use any valid XPath expression as parameter value:
>>> result = transform(doc_root, a=”/a/b/text()”)
It’s also possible to pass an XPath object as a parameter:
>>> result = transform(doc_root, (“/a/b/text()”))
Passing a string expression looks like this:
>>> result = transform(doc_root, a=”‘A'”)
To pass a string that (potentially) contains quotes, you can use the. strparam() class method. Note that it does not escape the
string. Instead, it returns an opaque object that keeps the string
>>> plain_string_value = (… “”” It’s “Monty Python” “””)
>>> result = transform(doc_root, a=plain_string_value)
b’n It’s “Monty Python” n’
If you need to pass parameters that are not legal Python identifiers,
pass them inside of a dictionary:
>>> transform = ((”’… … ”’))
>>> result = transform(doc_root, **{‘non-python-identifier’: ‘5’})
Errors and messages
Like most of the processing oriented objects in, XSLT
provides an error log that lists messages and error output from the
last run. See the parser documentation for a description of the
error log.
>>> xslt_root = (”’… STARTINGDONE… ”’)
>>> result = transform(doc_root)
>>> print(ror_log)
>>> for entry in ror_log:… print(‘message from line%s, col%s:%s’% (…,, ssage))… print(‘domain:%s (%d)’% (main_name, ))… print(‘type:%s (%d)’% (entry. type_name, ))… print(‘level:%s (%d)’% (entry. level_name, ))… print(‘filename:%s’% lename)
message from line 0, col 0: STARTING
domain: XSLT (22)
type: ERR_OK (0)
level: ERROR (2)
message from line 0, col 0: DONE
Note that there is no way in XSLT to distinguish between user
messages, warnings and error messages that occurred during the
run. libxslt simply does not provide this information. You can
partly work around this limitation by making your own messages
uniquely identifiable, e. g. with a common text prefix.
The xslt() tree method
There’s also a convenience method on ElementTree objects for doing XSL
transformations. This is less efficient if you want to apply the same XSL
transformation to multiple documents, but is shorter to write for one-shot
operations, as you do not have to instantiate a stylesheet yourself:
>>> result = (xslt_tree, a=”‘A'”)
This is a shortcut for the following code:
>>> result = transform(doc, a=”‘A'”)
Dealing with stylesheet complexity
Some applications require a larger set of rather diverse stylesheets.
allows you to deal with this in a number of ways. Here are
some ideas to try.
The most simple way to reduce the diversity is by using XSLT
parameters that you pass at call time to configure the stylesheets.
The partial() function in the functools module
may come in handy here. It allows you to bind a set of keyword
arguments (i. e. stylesheet parameters) to a reference of a callable
stylesheet. The same works for instances of the XPath()
evaluator, obviously.
You may also consider creating stylesheets programmatically. Just
create an XSL tree, e. from a parsed template, and then add or
replace parts as you see fit. Passing an XSL tree into the XSLT()
constructor multiple times will create independent stylesheets, so
later modifications of the tree will not be reflected in the already
created stylesheets. This makes stylesheet generation very straight
A third thing to remember is the support for custom extension
functions and XSLT extension elements. Some things are much
easier to express in XSLT than in Python, while for others it is the
complete opposite. Finding the right mixture of Python code and XSL
code can help a great deal in keeping applications well designed and
If you want to know how your stylesheet performed, pass the profile_run
keyword to the transform:
>>> result = transform(doc, a=”/a/b/text()”, profile_run=True)
>>> profile = result. xslt_profile
The value of the xslt_profile property is an ElementTree with profiling
data about each template, similar to the following: