Get Attribute Of Complex Element Using Lxml
I have a simple file XML like below: BMW
Solution 1:
The .find method on an lxml Element will only search the direct sub-children of that element. so for example in this xml:
<root><brandNametype="http://example.com/codes/bmw#"abbrev="BMW"value="BMW">BMW</brandName><maxspeed><value>250</value><unittype="http://example.com/codes/units#"value="miles per hour"abbrev="mph" /></maxspeed></root>
You can use the root Elements .find method to locate the brandname element, or the maxspeed element, but the search will not traverse inside these inner elements.
So you could for example do something like this:
root.find('maxspeed').find('unit') #returns the unit Element
From this returned element you can access the attributes.
If you'd like to search through all the elements within an XML doc, you can use the .iter() method. So for the previous example you could say:
for element in root.iter(tag='unit'):
print element #This would print all the unit elements in the document.
EDIT: Here is a small fully functional example using the xml you provided:
import lxml.etree
from StringIO import StringIO
defns_join(element, tag, namespace=None):
'''Joins the namespace and tag together, and
returns the fully qualified name.
@param element - The lxml.etree._Element you're searching
@param tag - The tag you're joining
@param namespace - (optional) The Namespace shortname default is None'''return'{%s}%s' % (element.nsmap[namespace], tag)
defparse_car(element):
'''Parse a car element, This will return a dictionary containing
brand_name, maxspeed_value, and maxspeed_unit'''
maxspeed = element.find(ns_join(element,'maxspeed'))
return {
'brand_name' : element.findtext(ns_join(element,'brandName')),
'maxspeed_value' : maxspeed.findtext(ns_join(maxspeed,'value')),
'maxspeed_unit' : maxspeed.find(ns_join(maxspeed, 'unit')).attrib['abbrev']
}
#Create the StringIO object to feed to the parser.
XML = StringIO('''
<Reports>
<Car xmlns="http://example.com/vocab/xml/cars#">
<dateStarted>2011-02-05</dateStarted>
<dateSold>2011-02-13</dateSold>
<name type="http://example.com/codes/bmw#" abbrev="X6" value="BMW X6" >BMW X6</name>
<brandName type="http://example.com/codes/bmw#" abbrev="BMW" value="BMW" >BMW</brandName>
<maxspeed>
<value>250</value>
<unit type="http://example.com/codes/units#" value="miles per hour" abbrev="mph" />
</maxspeed>
<route type="http://example.com/codes/routes#" abbrev="HW" value="Highway" >Highway</route>
<power>
<value>180</value>
<unit type="http://example.com/codes/units#" value="powerhorse" abbrev="ph" />
</power>
<frequency type="http://example.com/codes/frequency#" value="daily" >Daily</frequency>
</Car>
</Reports>
''')
#Get the root element object of the xml
car_root_element = lxml.etree.parse(XML).getroot()
# For each 'Car' tag in the root element,# we want to parse it and save the list as cars
cars = [ parse_car(element)
for element in car_root_element.iter() if element.tag.endswith('Car')]
print cars
Hope it helps.
Solution 2:
import lxml.etree as ET
content='''
<Car xmlns="http://example.com/vocab/xml/cars#">
<brandName type="http://example.com/codes/bmw#" abbrev="BMW" value="BMW" >BMW</brandName>
<maxspeed>
<value>250</value>
<unit type="http://example.com/codes/units#" value="miles per hour" abbrev="mph" />
</maxspeed>
</Car>
'''
doc=ET.fromstring(content)
NS = 'http://example.com/vocab/xml/cars#'# print(ET.tostring(doc,pretty_print=True))for x in doc.xpath('//ns:maxspeed/ns:unit/@abbrev',namespaces={'ns': NS}):
print(x)
yields
mph
Post a Comment for "Get Attribute Of Complex Element Using Lxml"