Skip to content Skip to sidebar Skip to footer

Get Attribute Of Complex Element Using Lxml

I have a simple file XML like below: BMW

Solution 1:

The .find method on an lxml Element will only search the direct sub-children of that element. so for example in this xml:

<root><brandNametype="http://example.com/codes/bmw#"abbrev="BMW"value="BMW">BMW</brandName><maxspeed><value>250</value><unittype="http://example.com/codes/units#"value="miles per hour"abbrev="mph" /></maxspeed></root>

You can use the root Elements .find method to locate the brandname element, or the maxspeed element, but the search will not traverse inside these inner elements.

So you could for example do something like this:

root.find('maxspeed').find('unit') #returns the unit Element

From this returned element you can access the attributes.

If you'd like to search through all the elements within an XML doc, you can use the .iter() method. So for the previous example you could say:

for element in root.iter(tag='unit'):
    print element #This would print all the unit elements in the document.

EDIT: Here is a small fully functional example using the xml you provided:

import lxml.etree
from StringIO import StringIO

defns_join(element, tag, namespace=None):
    '''Joins the namespace and tag together, and
    returns the fully qualified name.
    @param element - The lxml.etree._Element you're searching
    @param tag - The tag you're joining
    @param namespace - (optional) The Namespace shortname default is None'''return'{%s}%s' % (element.nsmap[namespace], tag)

defparse_car(element):
    '''Parse a car element, This will return a dictionary containing
    brand_name, maxspeed_value, and maxspeed_unit'''

    maxspeed = element.find(ns_join(element,'maxspeed'))
    return { 
        'brand_name' : element.findtext(ns_join(element,'brandName')), 
        'maxspeed_value' : maxspeed.findtext(ns_join(maxspeed,'value')), 
        'maxspeed_unit' : maxspeed.find(ns_join(maxspeed, 'unit')).attrib['abbrev']
        }

#Create the StringIO object to feed to the parser.
XML = StringIO('''
<Reports>
    <Car xmlns="http://example.com/vocab/xml/cars#">
        <dateStarted>2011-02-05</dateStarted>
        <dateSold>2011-02-13</dateSold>
        <name type="http://example.com/codes/bmw#" abbrev="X6" value="BMW X6" >BMW X6</name>
        <brandName type="http://example.com/codes/bmw#" abbrev="BMW" value="BMW" >BMW</brandName>
        <maxspeed>
            <value>250</value>
            <unit type="http://example.com/codes/units#" value="miles per hour" abbrev="mph" />
        </maxspeed>
        <route type="http://example.com/codes/routes#" abbrev="HW" value="Highway" >Highway</route>
        <power>
            <value>180</value>
            <unit type="http://example.com/codes/units#" value="powerhorse" abbrev="ph" />
        </power>
        <frequency type="http://example.com/codes/frequency#" value="daily" >Daily</frequency>  
    </Car>
</Reports>
''')

#Get the root element object of the xml
car_root_element = lxml.etree.parse(XML).getroot()

# For each 'Car' tag in the root element,# we want to parse it and save the list as cars
cars = [ parse_car(element) 
    for element in car_root_element.iter() if element.tag.endswith('Car')]

print cars

Hope it helps.

Solution 2:

import lxml.etree as ET
content='''
<Car xmlns="http://example.com/vocab/xml/cars#">
 <brandName type="http://example.com/codes/bmw#" abbrev="BMW" value="BMW" >BMW</brandName>
   <maxspeed>
     <value>250</value>
     <unit type="http://example.com/codes/units#" value="miles per hour" abbrev="mph" />
   </maxspeed>
 </Car>
'''

doc=ET.fromstring(content)
NS = 'http://example.com/vocab/xml/cars#'# print(ET.tostring(doc,pretty_print=True))for x in doc.xpath('//ns:maxspeed/ns:unit/@abbrev',namespaces={'ns': NS}):
    print(x)

yields

mph

Post a Comment for "Get Attribute Of Complex Element Using Lxml"