Python Pandas - Home
Python Pandas - Introduction
Python Pandas - Environment Setup
Python Pandas - Basics
Python Pandas - Introduction to Data Structures
Python Pandas - Index Objects
Python Pandas - Panel
Python Pandas - Basic Functionality
Python Pandas - Indexing & Selecting Data
Python Pandas - Series
Python Pandas - Series
Python Pandas - Slicing a Series Object
Python Pandas - Attributes of a Series Object
Python Pandas - Arithmetic Operations on Series Object
Python Pandas - Converting Series to Other Objects
Python Pandas - DataFrame
Python Pandas - DataFrame
Python Pandas - Accessing DataFrame
Python Pandas - Slicing a DataFrame Object
Python Pandas - Modifying DataFrame
Python Pandas - Removing Rows from a DataFrame
Python Pandas - Arithmetic Operations on DataFrame
Python Pandas - IO Tools
Python Pandas - IO Tools
Python Pandas - Working with CSV Format
Python Pandas - Reading & Writing JSON Files
Python Pandas - Reading Data from an Excel File
Python Pandas - Writing Data to Excel Files
Python Pandas - Working with HTML Data
Python Pandas - Clipboard
Python Pandas - Working with HDF5 Format
Python Pandas - Comparison with SQL
Python Pandas - Data Handling
Python Pandas - Sorting
Python Pandas - Reindexing
Python Pandas - Iteration
Python Pandas - Concatenation
Python Pandas - Statistical Functions
Python Pandas - Descriptive Statistics
Python Pandas - Working with Text Data
Python Pandas - Function Application
Python Pandas - Options & Customization
Python Pandas - Window Functions
Python Pandas - Aggregations
Python Pandas - Merging/Joining
Python Pandas - MultiIndex
Python Pandas - Basics of MultiIndex
Python Pandas - Indexing with MultiIndex
Python Pandas - Advanced Reindexing with MultiIndex
Python Pandas - Renaming MultiIndex Labels
Python Pandas - Sorting a MultiIndex
Python Pandas - Binary Operations
Python Pandas - Binary Comparison Operations
Python Pandas - Boolean Indexing
Python Pandas - Boolean Masking
Python Pandas - Data Reshaping & Pivoting
Python Pandas - Pivoting
Python Pandas - Stacking & Unstacking
Python Pandas - Melting
Python Pandas - Computing Dummy Variables
Python Pandas - Categorical Data
Python Pandas - Categorical Data
Python Pandas - Ordering & Sorting Categorical Data
Python Pandas - Comparing Categorical Data
Python Pandas - Handling Missing Data
Python Pandas - Missing Data
Python Pandas - Filling Missing Data
Python Pandas - Interpolation of Missing Values
Python Pandas - Dropping Missing Data
Python Pandas - Calculations with Missing Data
Python Pandas - Handling Duplicates
Python Pandas - Duplicated Data
Python Pandas - Counting & Retrieving Unique Elements
Python Pandas - Duplicated Labels
Python Pandas - Grouping & Aggregation
Python Pandas - GroupBy
Python Pandas - Time-series Data
Python Pandas - Date Functionality
Python Pandas - Timedelta
Python Pandas - Sparse Data Structures
Python Pandas - Sparse Data
Python Pandas - Visualization
Python Pandas - Visualization
Python Pandas - Additional Concepts
Python Pandas - Caveats & Gotchas

Python Pandas - Writing XML

Quiz

Just like Parsing XML Files, Pandas also provides an easy method to convert DataFrames into XML document. The DataFrame.to_xml() method in Python Pandas allows you to render the contents of a DataFrame as an XML document. XML (Extensible Markup Language) is widely used for data representation format due to its flexibility.

In this tutorial, we will learn about functionality of the DataFrame.to_xml() method, its parameters, and examples to demonstrate different use cases.

The to_xml() Method

The Pandas DataFrame object provides a method called to_xml() for converting the contents of a DataFrame into an XML document. This method can write the output XML to a file or return it as a string. It also supports customization of XML structure, namespaces, attributes, formatting, and more by using its various options.

Syntax

Following is the syntax of the to_xml() method −

DataFrame.to_xml(path_or_buffer=None, *, root_name='data', row_name='row', attr_cols=None, elem_cols=None, namespaces=None, prefix=None, ...)

Where,

path_or_buffer: Specifies the XML output location. It can be a string, path object, or file-like object. If None, the XML is returned as a string instead of saving it to a file.
root_name: Specifies the name of the root element in the XML document. Default is 'data'.
row_name: Specifies the name of the row element. Default is 'row'.
attr_cols: List of columns to be written as attributes in the row elements.
elem_cols: List of columns to be written as child elements of the row element.
namespaces: Dictionary of namespaces to include in the XML.
prefix: Namespace prefix for elements and attributes.

You can get more details about this method from the following tutorial DataFrame.to_xml().

Example

Here is a simple example demonstrating the conversion of a Pandas DataFrame into XML format with default settings using the DataFrame.to_xml() method.

import pandas as pd

# Sample DataFrame
df = pd.DataFrame({'name': ['Tanmay', 'Manisha'],
'company': ['TutorialsPoint', 'TutorialsPoint'],
'phone': ['(011) 123-4567', '(011) 789-4567']
})

# Convert to XML
print(df.to_xml())

Following is the output of the above code −

<?xml version='1.0' encoding='utf-8'?>
<data>
  <row>
    <index>0</index>
    <name>Tanmay</name>
    <company>TutorialsPoint</company>
    <phone>(011) 123-4567</phone>
  </row>
  <row>
    <index>1</index>
    <name>Manisha</name>
    <company>TutorialsPoint</company>
    <phone>(011) 789-4567</phone>
  </row>
</data>

Customizing Root and Row Names

While converting a Pandas DataFrame into the XML Format, you can change the default root and row element ('data' and 'row') names for better context representation. For this we can use the root_name and row_name parameters of the DataFrame.to_xml() method.

Example

The following example uses the root_name and row_name parameters for customizing the element tags of the XML data.

import pandas as pd

# Sample DataFrame
df = pd.DataFrame({'name': ['Tanmay', 'Manisha'],
'company': ['TutorialsPoint', 'TutorialsPoint'],
'phone': ['(011) 123-4567', '(011) 789-4567']
})

# Convert to XML with custom root and row names
print(df.to_xml(root_name="contact-info", row_name="contact"))

Following is the output of the above code −

<?xml version='1.0' encoding='utf-8'?>
<contact-info>
  <contact>
    <index>0</index>
    <name>Tanmay</name>
    <company>TutorialsPoint</company>
    <phone>(011) 123-4567</phone>
  </contact>
  <contact>
    <index>1</index>
    <name>Manisha</name>
    <company>TutorialsPoint</company>
    <phone>(011) 789-4567</phone>
  </contact>
</contact-info>

Writing an Attribute-Centric XML

The attr_cols parameter of the DataFrame.to_xml() method is used to represent columns as attributes instead of row elements.

Example

This example shows how to write the attribute-centric XML using Pandas to_xml() method. When you specify columns in attr_cols, their values appear as attributes of the row elements instead of child elements.

import pandas as pd

# Sample DataFrame
df = pd.DataFrame({'name': ['Tanmay', 'Manisha'],
'company': ['TutorialsPoint', 'TutorialsPoint'],
'phone': ['(011) 123-4567', '(011) 789-4567']
})

# Write columns as attributes
print(df.to_xml(attr_cols=df.columns.tolist()))

Following is the output of the above code −

<?xml version='1.0' encoding='utf-8'?>
<data>
  <row index="0" name="Tanmay" company="TutorialsPoint" phone="(011) 123-4567"/>
  <row index="1" name="Manisha" company="TutorialsPoint" phone="(011) 789-4567"/>
</data>

Mixing Attributes and Elements

You can also mix some columns as attributes and others as child elements using attr_cols and elem_cols parameters. These parameters allow you to control the structure of row elements, defining which columns become attributes and which remain as child elements.

Example

This example demonstrates how convert a DataFrame to an XML with a mix of attributes and elements. Here, the name attribute is added to the <row> element, while company and phone are nested as child elements of <row>.

import pandas as pd

# Sample DataFrame
df = pd.DataFrame({'name': ['Tanmay', 'Manisha'],
'company': ['TutorialsPoint', 'TutorialsPoint'],
'phone': ['(011) 123-4567', '(011) 789-4567']
})

# Mix attributes and elements
print(df.to_xml(attr_cols=['name'], elem_cols=['company', 'phone']))

Following is the output of the above code −

<?xml version='1.0' encoding='utf-8'?>
<data>
  <row index="0" name="Tanmay">
    <index>0</index>
    <company>TutorialsPoint</company>
    <phone>(011) 123-4567</phone>
  </row>
  <row index="1" name="Manisha">
    <index>1</index>
    <company>TutorialsPoint</company>
    <phone>(011) 789-4567</phone>
  </row>
</data>

Handling Hierarchical Columns

Any hierarchical columns in a Pandas DataFrame will be flattened with underscores when converting it to XML documents.

Example

The following example demonstrates handling the hierarchical column names are flattened using an underscore (_) delimiter to create valid XML element names.

import pandas as pd

# Create a MultiIndex object
index = pd.MultiIndex.from_tuples([('A', 'one'), ('A', 'two'), ('B', 'one'), ('B', 'two')])

# Create hierarchical DataFrame
data = [[1, 2], [3, 4], [5, 6], [7, 8]]
df = pd.DataFrame(data, index=index, columns=['X', 'Y'])

# Diaply the hierarchical DataFrame
print("Hierarchical DataFrame:")
print(df)

# Convert to XML
print('Output XML:')
print(df.to_xml())

Following is the output of the above code −

Hierarchical DataFrame:

		X	Y
A	one	1	2
A	two	3	4
B	one	5	6
B	two	7	8

Output XML: <?xml version='1.0' encoding='utf-8'?> <data> <row> <level_0>A</level_0> <level_1>one</level_1> <X>1</X> <Y>2</Y> </row> <row> <level_0>A</level_0> <level_1>two</level_1> <X>3</X> <Y>4</Y> </row> <row> <level_0>B</level_0> <level_1>one</level_1> <X>5</X> <Y>6</Y> </row> <row> <level_0>B</level_0> <level_1>two</level_1> <X>7</X> <Y>8</Y> </row> </data>

Adding Namespaces While Writing XML

Namespaces can be included for the root element and other XML nodes using the namespaces parameter.

Example

The following example demonstrates adding the default name space to the XML documents while converting it from a Pandas DataFrame using the to_xml() method.

import pandas as pd

# Create a MultiIndex object
index = pd.MultiIndex.from_tuples([('A', 'one'), ('A', 'two'), ('B', 'one'), ('B', 'two')])

# Create a DataFrame
data = [[1, 2], [3, 4], [5, 6], [7, 8]]
df = pd.DataFrame(data, index=index, columns=['X', 'Y'])

# Add default namespace
print(df.to_xml(namespaces={"": "https://example.com"}))

Following is the output of the above code −

<?xml version='1.0' encoding='utf-8'?>
<data xmlns="https://example.com">
  <row>
    <level_0>A</level_0>
    <level_1>one</level_1>
    <X>1</X>
    <Y>2</Y>
  </row>
  <row>
    <level_0>A</level_0>
    <level_1>two</level_1>
    <X>3</X>
    <Y>4</Y>
  </row>
  <row>
    <level_0>B</level_0>
    <level_1>one</level_1>
    <X>5</X>
    <Y>6</Y>
  </row>
  <row>
    <level_0>B</level_0>
    <level_1>two</level_1>
    <X>7</X>
    <Y>8</Y>
  </row>
</data>

Writing XML with Namespace Prefix

You can define a prefix namespace to the elements and attributes of the XML document while creating it from Pandas using the prefix parameter.

Example

This example uses the prefix parameter to specifies the namespace prefix to the elements and attributes of the XML.

import pandas as pd

# Create a MultiIndex object
index = pd.MultiIndex.from_tuples([('A', 'one'), ('A', 'two'), ('B', 'one'), ('B', 'two')])

# Create a DataFrame
data = [[1, 2], [3, 4], [5, 6], [7, 8]]
df = pd.DataFrame(data, index=index, columns=['X', 'Y'])

# Add namespace with prefix
print(df.to_xml(namespaces={"doc": "https://example.com"}, prefix="doc"))

Following is the output of the above code −

<?xml version='1.0' encoding='utf-8'?>
<doc:data xmlns:doc="https://example.com">
  <doc:row>
    <doc:level_0>A</doc:level_0>
    <doc:level_1>one</doc:level_1>
    <doc:X>1</doc:X>
    <doc:Y>2</doc:Y>
  </doc:row>
  <doc:row>
    <doc:level_0>A</doc:level_0>
    <doc:level_1>two</doc:level_1>
    <doc:X>3</doc:X>
    <doc:Y>4</doc:Y>
  </doc:row>
  <doc:row>
    <doc:level_0>B</doc:level_0>
    <doc:level_1>one</doc:level_1>
    <doc:X>5</doc:X>
    <doc:Y>6</doc:Y>
  </doc:row>
  <doc:row>
    <doc:level_0>B</doc:level_0>
    <doc:level_1>two</doc:level_1>
    <doc:X>7</doc:X>
    <doc:Y>8</doc:Y>
  </doc:row>
</doc:data>

Disabling XML Declaration and Pretty Print

The xml_declaration and pretty_print options can be set to False for disabling the XML declaration and pretty formatting.

Example

This example shows how to disable the the XML declaration and pretty formatting using the xml_declaration and pretty_print parameters.

import pandas as pd

# Sample DataFrame
df = pd.DataFrame({'name': ['Tanmay', 'Manisha'],
'company': ['TutorialsPoint', 'TutorialsPoint'],
'phone': ['(011) 123-4567', '(011) 789-4567']
})

# Disabling XML Declaration and Pretty Print
print(df.to_xml(xml_declaration=False, pretty_print=False))

Following is the output of the above code −

<data><row><index>0</index><name>Tanmay</name><company>TutorialsPoint</company><phone>(011) 123-4567</phone></row><row><index>1</index><name>Manisha</name><company>TutorialsPoint</company><phone>(011) 789-4567</phone></row></data>

Transforming XML with Stylesheet

You can transform the output XML with the XSLT stylesheet using the stylesheet parameter of the DataFrame.to_xml() method. This will apply an XSLT stylesheet to modify the XML structure.

Example

The following example demonstrates the transforming XML with XSLT stylesheet. In this example we initially provided an XSLT script to transform the raw XML into a custom layout.

import pandas as pd

# Create an XSLT stylesheet
xsl = """<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   <xsl:output method="xml" omit-xml-declaration="no" indent="yes"/>
   <xsl:strip-space elements="*"/>
   <xsl:template match="/data">
     <contact>
       <xsl:apply-templates select="row"/>
     </contact>
   </xsl:template>
   <xsl:template match="row">
     <object index="{index}">
       <xsl:copy-of select="@*|node()"/>
     </object>
   </xsl:template>
</xsl:stylesheet>"""

# Sample DataFrame
df = pd.DataFrame({'name': ['Tanmay', 'Manisha'],
'company': ['TutorialsPoint', 'TutorialsPoint'],
'phone': ['(011) 123-4567', '(011) 789-4567']
})

# Apply stylesheet
print(df.to_xml(stylesheet=xsl))

Following is the output of the above code −

<?xml version="1.0"?>
<contact>
  <object index="0">
    <index>0</index>
    <name>Tanmay</name>
    <company>TutorialsPoint</company>
    <phone>(011) 123-4567</phone>
  </object>
  <object index="1">
    <index>1</index>
    <name>Manisha</name>
    <company>TutorialsPoint</company>
    <phone>(011) 789-4567</phone>
  </object>
</contact>

Print Page