How to Create Multiple XML Files From Excel With Python

So you have an Excel file that has hundreds or maybe thousands of records, but you want to convert it to an XML file, and have each row as a separate file?

This can be achieved very easily using Python, in three easy steps.

First of all, what is XML?

XML stands for “extensible markup language”, and is used extensively with websites, Webservices and where companies want to transfer data between each other in a predefined format that follows a recognised standard.

To understand more about what is xml, follow the link. One thing to note is that it may reference the financial services industry, but XML is used widely across multiple industries too, as we are now digitising a lot of our processes, the transfer of data needs to follow a standard that everyone can work with seamlessly.

Step 1 – Let’s look at our raw data

Below is an excel file with three columns and eight rows. The objective is to create eight separate XML files for each row.

raw data to populate multiple excel files

Step 2 – Import the data and create a master XML file

Below is the logic I used in How to Create an XML File from Excel using Python, I have attached comments where necessary to explain what each line is doing.

The purpose at this point is to create one file with all the data from excel called output.xml. We will then use this to create the separate files we need and then delete it.

The next steps will take you through that.

import pandas as pd
from lxml import etree as et
import os


# STEP 1 - This step creates a master XML file

raw_data = pd.read_excel(r'\Users\haugh\OneDrive\dataanalyticsireland\YOUTUBE\how_to_read_each_excel_row_into_a_separate_XML_file\excel_raw_data_city.xlsx')

root = et.Element('root')

for row in raw_data.iterrows(): #==> This is a loop that takes runs through each record and populates for each tag.
    root_tags = et.SubElement(root, 'ExportData') #=== > Root name
# These are the tag names for each row (SECTION 1)
    Column_heading_1 = et.SubElement(root_tags, 'City')
    Column_heading_2 = et.SubElement(root_tags, 'Area')
    Column_heading_3 = et.SubElement(root_tags, 'Population')


###These are the values that will be populated for each row above
# The values inside the [] are the raw file column headings.(SECTION 2)
    Column_heading_1.text = str(row[1]['City'])
    Column_heading_2.text = str(row[1]['Area'])
    Column_heading_3.text = str(row[1]['Population'])

# This Section outputs the data to an xml file
# Unless you tell it otherwise it saves it to the same folder as the script.
tree = et.ElementTree(root) #==> The variable tree is to hold all the values of "root"
et.indent(tree, space="\t", level=0) #===> This just formats in a way that the XML is readable
tree.write('output.xml', encoding="utf-8") #==> The data is saved to an XML file

Step 3 – Loop through each row and create the separate files

In the below code, it opens the output.xml file and using the loop looks for all the data between the tags “ExportData”.

Then it copies them and then creates a new file.

Once one is done, it moves to the next set of tags and repeats the process, till it reaches the end of the output.xml file.

master_file = et.iterparse('output.xml', events=('end', ))
index = 0
for i, j in master_file:
    if j.tag == 'ExportData':
        index += 1
        filename = format(str(index) + ".xml")
        with open(filename, 'wb') as f:
            f.write(et.tostring(j))

Step 4 – Delete the output.xml file – purely optional

The objective here was to create separate XML files, based on the input of the original Excel file.

We have also created an additional file called output.xml to get us to the end goal.

If we have no need for this file then it can be removed with the below code, be sure that it is at the very end of your logic.

The import statement for os.unlink is in step two.

os.unlink(r'\Users\haugh\OneDrive\dataanalyticsireland\YOUTUBE\how_to_read_each_excel_row_into_a_separate_XML_file\output.xml')

How to Create an XML file from Excel using Python

Estimated reading time: 3 minutes

Are you working on a data analytics project where you need to feed your data to a location that is able to process an XML file?

The ability to get your data into a structured format like XML has many benefits:

(A) You can transfer the data to a web service for processing.

(B) Multiple different formats of your raw data can be standardised, enabling quick conversion and processing.

(C) XML files can be read by multiple different programs, as long as you deliver them in the correct format.

(D) The receiver of data can easily read the XML file and store it on their database.

The ability to use this method to read and transfer data is a very powerful way to help a data analyst process large data sets.

In fact, if you are using cloud-based applications to analyse the information you are storing, this will quickly enable you to deliver the data.

What packages do I need in Python?

The first step is to import the following:

import pandas as pd
from lxml import etree as et

Next we want to read in the source data

In this instance, we are reading an excel file

raw_data = pd.read_excel(r'Link to where your data is stored including the full file name')

Now we want to start building the XML structure

The FIRST STEP is to define the root

root = et.Element('root')

The root is the parent of all the data items (tags) contained in the XML file and is needed as part of the structure

The SECOND STEP is to define the tag names that will store each row of the source data

for row in raw_data.iterrows(): ==> This is a loop that takes runs through each record and populates for each tag.
    root_tags = et.SubElement(root, 'ExportData') #=== > Root name
# These are the tag names for each row (SECTION 1)
    Column_heading_1 = et.SubElement(root_tags, 'Name')
    Column_heading_2 = et.SubElement(root_tags, 'Area')
    Column_heading_3 = et.SubElement(root_tags, 'NoPurchases')
    Column_heading_4 = et.SubElement(root_tags, 'Active')

###These are the values that will be populated for each row above
# The values inside the [] are the raw file column headings.(SECTION 2)
    Column_heading_1.text = str(row[1]['Name'])
    Column_heading_2.text = str(row[1]['Area'])
    Column_heading_3.text = str(row[1]['No Purchases'])
    Column_heading_4.text = str(row[1]['Active'])

The raw file looks like this:

NameAreaNo PurchasesActive
JohnDublin2Y
MaryGalway3N
JoeLimerick4N
JimmyKilkenny55Y
JenniferBelfast6N
SusanWaterford3Y
JakeCork1Y
BobbyDundalk11N
SarahSligo9N
CianEnnis8Y
Raw file data that will be imported into the XML file

The THIRD STEP is to create the XML file and populate it with the data from the source file

# This Section outputs the data to an xml file
# Unless you tell it otherwise it saves it to the same folder as the script.
tree = et.ElementTree(root) ==> The variable tree is to hold all the values of "root"
et.indent(tree, space="\t", level=0) ===> This just formats in a way that the XML is readable
tree.write('output.xml', encoding="utf-8") ==> The data is saved to an XML file

The XML output should look like the below

<root>
	<ExportData>
		<Name>John</Name>
		<Area>Dublin</Area>
		<NoPurchases>2</NoPurchases>
		<Active>Y</Active>
	</ExportData>
	<ExportData>
		<Name>Mary</Name>
		<Area>Galway</Area>
		<NoPurchases>3</NoPurchases>
		<Active>N</Active>
	</ExportData>
	<ExportData>
		<Name>Joe</Name>
		<Area>Limerick</Area>
		<NoPurchases>4</NoPurchases>
		<Active>N</Active>
	</ExportData>
	<ExportData>
		<Name>Jimmy</Name>
		<Area>Kilkenny</Area>
		<NoPurchases>55</NoPurchases>
		<Active>Y</Active>
	</ExportData>
	<ExportData>
		<Name>Jennifer</Name>
		<Area>Belfast</Area>
		<NoPurchases>6</NoPurchases>
		<Active>N</Active>
	</ExportData>
	<ExportData>
		<Name>Susan</Name>
		<Area>Waterford</Area>
		<NoPurchases>3</NoPurchases>
		<Active>Y</Active>
	</ExportData>
	<ExportData>
		<Name>Jake</Name>
		<Area>Cork</Area>
		<NoPurchases>1</NoPurchases>
		<Active>Y</Active>
	</ExportData>
	<ExportData>
		<Name>Bobby</Name>
		<Area>Dundalk</Area>
		<NoPurchases>11</NoPurchases>
		<Active>N</Active>
	</ExportData>
	<ExportData>
		<Name>Sarah</Name>
		<Area>Sligo</Area>
		<NoPurchases>9</NoPurchases>
		<Active>N</Active>
	</ExportData>
	<ExportData>
		<Name>Cian</Name>
		<Area>Ennis</Area>
		<NoPurchases>8</NoPurchases>
		<Active>Y</Active>
	</ExportData>
</root>

Additional XML data can be added

  1. Add more rows – All you need to do is add onto the source file and save. When you rerun the logic it will read in the extra information.
  2. Add more columns – All you need to do is go to the second step above add in a tag name to SECTION 1. Seperately you will need to add an additional column with data to the source file, and then add that column name to SECTION 2 as well