XML parsing in python

[[Python]] [Docs](https://docs.python.org/3/library/xml.etree.elementtree.html) ```python import xml.etree.ElementTree as ET ``` Parsing a string into an `Element` object: ```python ## finding the data in one big chunk at the beginning of the file xmp_start = file.find('<x:xmpmeta') xmp_end = file.find('</x:xmpmeta') xmp_str = file[xmp_start:xmp_end+12] ## passing the string to fromString tree = ET.fromString(xmp_str) ``` ## Finding info in the Element tree `find` and `findall` both take two arguments 1: a URI 2: a namespace dictionary (optional) ### Using a namespace dictionary ```python nmspdict = {'x':'adobe:ns:meta/', 'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#', 'dc': 'http://purl.org/dc/elements/1.1/', 'diss': "http://ns.pointelectronic.com/DISS/1.0/", 'diImg' : "http://ns.pointelectronic.com/DISS/1.0/Images/", 'diAcq': "http://ns.pointelectronic.com/DISS/1.0/types/AcqParams#" } tags = tree.findall('rdf:RDF/rdf:Description/diss:Images/rdf:Seq/rdf:li/diImg:SignalName', namespaces = nmspdict) ``` The prefixes in the URI thing being searched are replaced with what is stored in the dictionary. ```python tree.findall('rdf:RDF') ``` is equivalent to ```python tree.findall('{http://www.w3.org/1999/02/22-rdf-syntax-ns#}RDF') ``` ### XPath searching To search all subelements: ```python tree.findall('.//') ``` another way to get about the same result: ```python tree.iter() ```