How to Create Multiple XML Files From Excel With Python

So you have an Excel file that has hundreds or maybe thousands of records, but you want to convert it to an XML file, and have each row as a separate file?

This can be achieved very easily using Python, in three easy steps.

First of all, what is XML?

XML stands for “extensible markup language”, and is used extensively with websites, Webservices and where companies want to transfer data between each other in a predefined format that follows a recognised standard.

To understand more about what is xml, follow the link. One thing to note is that it may reference the financial services industry, but XML is used widely across multiple industries too, as we are now digitising a lot of our processes, the transfer of data needs to follow a standard that everyone can work with seamlessly.

Step 1 – Let’s look at our raw data

Below is an excel file with three columns and eight rows. The objective is to create eight separate XML files for each row.

raw data to populate multiple excel files

Step 2 – Import the data and create a master XML file

Below is the logic I used in How to Create an XML File from Excel using Python, I have attached comments where necessary to explain what each line is doing.

The purpose at this point is to create one file with all the data from excel called output.xml. We will then use this to create the separate files we need and then delete it.

The next steps will take you through that.

import pandas as pd
from lxml import etree as et
import os


# STEP 1 - This step creates a master XML file

raw_data = pd.read_excel(r'\Users\haugh\OneDrive\dataanalyticsireland\YOUTUBE\how_to_read_each_excel_row_into_a_separate_XML_file\excel_raw_data_city.xlsx')

root = et.Element('root')

for row in raw_data.iterrows(): #==> This is a loop that takes runs through each record and populates for each tag.
    root_tags = et.SubElement(root, 'ExportData') #=== > Root name
# These are the tag names for each row (SECTION 1)
    Column_heading_1 = et.SubElement(root_tags, 'City')
    Column_heading_2 = et.SubElement(root_tags, 'Area')
    Column_heading_3 = et.SubElement(root_tags, 'Population')


###These are the values that will be populated for each row above
# The values inside the [] are the raw file column headings.(SECTION 2)
    Column_heading_1.text = str(row[1]['City'])
    Column_heading_2.text = str(row[1]['Area'])
    Column_heading_3.text = str(row[1]['Population'])

# This Section outputs the data to an xml file
# Unless you tell it otherwise it saves it to the same folder as the script.
tree = et.ElementTree(root) #==> The variable tree is to hold all the values of "root"
et.indent(tree, space="\t", level=0) #===> This just formats in a way that the XML is readable
tree.write('output.xml', encoding="utf-8") #==> The data is saved to an XML file

Step 3 – Loop through each row and create the separate files

In the below code, it opens the output.xml file and using the loop looks for all the data between the tags “ExportData”.

Then it copies them and then creates a new file.

Once one is done, it moves to the next set of tags and repeats the process, till it reaches the end of the output.xml file.

master_file = et.iterparse('output.xml', events=('end', ))
index = 0
for i, j in master_file:
    if j.tag == 'ExportData':
        index += 1
        filename = format(str(index) + ".xml")
        with open(filename, 'wb') as f:
            f.write(et.tostring(j))

Step 4 – Delete the output.xml file – purely optional

The objective here was to create separate XML files, based on the input of the original Excel file.

We have also created an additional file called output.xml to get us to the end goal.

If we have no need for this file then it can be removed with the below code, be sure that it is at the very end of your logic.

The import statement for os.unlink is in step two.

os.unlink(r'\Users\haugh\OneDrive\dataanalyticsireland\YOUTUBE\how_to_read_each_excel_row_into_a_separate_XML_file\output.xml')

Python Tutorial: How to import data from files

Estimated reading time: 1 minute

Is there a need for you to be quickly open files, and import the data into a data frame?

In this post and video on Python, we will look at several options for you to do this as well as some additional things to consider.

The import of files covered here is as follows:

  • Reading data from a CSV file.
  • Reading data from a TXT file.
  • Reading data from an XLSX file.

On importation there are many things to consider, here are a few to consider:

(A) The file format

(B) How the data looks within the file.

(C) Special requirements to get the data looking correct when loaded.

In this file importing example dealing with tab delimiters, headers and sorting are referenced.  Here are some different ways to approach it a little differently if you are looking for alternatives CSV File Reading and Writing

Thanks for watching, please follow us by clicking on the links to the right!

Need to check if a file is empty? Have a look here Python – How to check if a file is empty

Thanks!

Data Analytics Ireland