
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
XML Processing Modules in Python
XML stands for "Extensible Markup Language". It is mainly used in webpages, where the data has a specific structure. It has elements, defined by a beginning and an ending tag. A tag is a markup construct that begins with < and ends with >. The characters between the start-tag and end-tag, are the element's content. Elements can contain other elements, which are called "child elements".
Example
Below is the example of an XML file we are going to use in this tutorial.
<?xml version="1.0"?> <Tutorials> <Tutorial id="Tu101"> <author>Vicky, Matthew</author> <title>Geo-Spatial Data Analysis</title> <stream>Python</stream> <price>4.95</price> <publish_date>2020-07-01</publish_date> <description>Learn geo Spatial data Analysis using Python.</description> </Tutorial> <Tutorial id="Tu102"> <author>Bolan, Kim</author> <title>Data Structures</title> <stream>Computer Science</stream> <price>12.03</price> <publish_date>2020-1-19</publish_date> <description>Learn Data structures using different programming lanuages.</description> </Tutorial> <Tutorial id="Tu103"> <author>Sora, Everest</author> <title>Analytics using Tensorflow</title> <stream>Data Science</stream> <price>7.11</price> <publish_date>2020-1-19</publish_date> <description>Learn Data analytics using Tensorflow.</description> </Tutorial> </Tutorials>
Reading xml Using xml.etree.ElementTree
This module provides access to the root of the xml file and then we can access the contents of the inner elements. In the below example we use the attribute called text and get the content of those elements.
Example
import xml.etree.ElementTree as ET xml_tree = ET.parse('E:\TutorialsList.xml') xml_root = xml_tree.getroot() # Header print('Tutorial List :') for xml_elmt in xml_root: for inner_elmt in xml_elmt: print(inner_elmt.text)
Output
Running the above code gives us the following result −
Tutorial List : Vicky, Matthew Geo-Spatial Data Analysis Python 4.95 2020-07-01 Learn geo Spatial data Analysis using Python. Bolan, Kim Data Structures Computer Science 12.03 2020-1-19 Learn Data structures using different programming lanuages. Sora, Everest Analytics using Tensorflow Data Science 7.11 2020-1-19 Learn Data analytics using Tensorflow.
Getting the xml attributes
We can get the list of attributes and their values in the root tag. Once we find the attributes, it helps us navigate the XML tree easily.
Example
import xml.etree.ElementTree as ET xml_tree = ET.parse('E:\TutorialsList.xml') xml_root = xml_tree.getroot() # Header print('Tutorial List :') for movie in xml_root.iter('Tutorial'): print(movie.attrib)
Output
Running the above code gives us the following result −
Tutorial List : {'id': 'Tu101'} {'id': 'Tu102'} {'id': 'Tu103'}
Filtering Results
We can also filter the results out of the xml tree by using the findall() function of this module. In the below example we find out the id of the tutorial which has a price of 12.03.
Example
import xml.etree.ElementTree as ET xml_tree = ET.parse('E:\TutorialsList.xml') xml_root = xml_tree.getroot() # Header print('Tutorial List :') for movie in xml_root.findall("./Tutorial/[price ='12.03']"): print(movie.attrib)
Output
Running the above code gives us the following result −
Tutorial List : {'id': 'Tu102'}
Parsing XML with DOM APIs
We create a minidom object using the xml.dom module. The minidom object provides a simple parser method that quickly creates a DOM tree from the XML file. The sample phrase calls the parse( file [,parser] ) function of the minidom object to parse the XML file designated by file into a DOM tree object.
Example
from xml.dom.minidom import parse import xml.dom.minidom # Open XML document using minidom parser DOMTree = xml.dom.minidom.parse('E:\TutorialsList.xml') collection = DOMTree.documentElement # Get all the movies in the collection tut_list = collection.getElementsByTagName("Tutorial") print("*****Tutorials*****") # Print details of each Tutorial. for tut in tut_list: strm = tut.getElementsByTagName('stream')[0] print("Stream: ",strm.childNodes[0].data) prc = tut.getElementsByTagName('price')[0] print("Price: ", prc.childNodes[0].data)
Output
Running the above code gives us the following result −
*****Tutorials***** Stream: Python Price: 4.95 Stream: Computer Science Price: 12.03 Stream: Data Science Price: 7.11