Parsing XML file in python
Hello all,
This is the sixth article in the series Python for Data Science. If you are new to this series, we would recommend you to read our previous articles
- Python for Data Science Series - Part 1
- Python for Data Science Series - Part 2
- Using Numpy in Python
- Using Pandas in Python
- Data Visualization using Matplotlib Python
Please refer the videos below for detailed explanation on how to parse xml in python.
Please refer the following notebook to understand on how to do xml parsing in python.
In [4]:
import csv
import requests
import xml.etree.ElementTree as ET
import os
In [2]:
main_folder_path = r"E:\openknowledgeshare.blogspot.com\Python\Outputs"
In [5]:
url = 'http://www.hindustantimes.com/rss/topnews/rssfeed.xml'
# creating HTTP response object from given url
resp = requests.get(url)
# saving the xml file
with open(os.path.join(main_folder_path,'topnewsfeed.xml'), 'wb') as f:
f.write(resp.content)
In [10]:
resp
Out[10]:
Read XML¶
In [42]:
xml_file_path = os.path.join(main_folder_path,'topnewsfeed.xml')
# create element tree object
tree = ET.parse(xml_file_path)
In [43]:
tree
Out[43]:
In [44]:
# get root element
root = tree.getroot()
root
Out[44]:
In [45]:
root.items()
Out[45]:
In [46]:
root.items()[0][0]
Out[46]:
In [47]:
root.items()[0][1]
Out[47]:
In [48]:
root.getchildren()
Out[48]:
In [49]:
root.getchildren()[0].items()
Out[49]:
In [50]:
root.getchildren()[0].getchildren()
Out[50]:
In [51]:
root.getchildren()[0].getchildren()[0].text
Out[51]:
In [52]:
root.getchildren()[0].getchildren()[1].text
Out[52]:
In [53]:
root.getchildren()[1].getchildren()[0].text
Out[53]:
In [54]:
root.getchildren()[1].getchildren()[1].text
Out[54]:
In [55]:
root.findall("holiday")
Out[55]:
In [61]:
root.findall("anckfk")
Out[61]:
In [59]:
for each_node in root.findall('holiday'):
date_node = each_node.find('date')
name_node = each_node.find('name')
print(date_node.text, name_node.text)
In [ ]:
Comments
Post a Comment