In data-driven workflows, converting PDF documents with tables to Excel improves accessibility and usability. While PDFs preserve document integrity, their static nature makes data extraction challenging, often leading to error-prone manual work. By leveraging JavaScript in React, developers can automate the conversion process, seamlessly transferring structured data like financial reports into Excel worksheets for real-time analysis and collaboration. This article explores how to use Spire.PDF for JavaScript to efficiently convert PDFs to Excel files with JavaScript in React applications.
- Steps to Convert PDF to Excel Using JavaScript
- Simple PDF to Excel Conversion in JavaScript
- Convert PDF to Excel with XlsxLineLayoutOptions
- Convert PDF to Excel with XlsxTextLayoutOptions
Install Spire.PDF for JavaScript
To get started with converting PDF to Excel with JavaScript in a React application, you can either download Spire.PDF for JavaScript from our website or install it via npm with the following command:
npm i spire.pdf
After that, copy the "Spire.Pdf.Base.js" and "Spire.Pdf.Base.wasm" files to the public folder of your project. Additionally, make sure to include the required font files to ensure accurate and consistent text rendering.
For more details, refer to the documentation: How to Integrate Spire.PDF for JavaScript in a React Project
Steps to Convert PDF to Excel Using JavaScript
With the Spire.PDF for JavaScript WebAssembly module, PDF documents can be loaded from the Virtual File System (VFS) using the PdfDocument.LoadFromFile() method and converted into Excel workbooks using the PdfDocument.SaveToFile() method.
In addition to direct conversion, developers can customize the process by configuring conversion options through the XlsxLineLayoutOptions and XlsxTextLayoutOptions classes, along with the PdfDocument.ConvertOptions.SetPdfToXlsxOptions() method.
The following steps demonstrate how to convert a PDF document to an Excel file using Spire.PDF for JavaScript:
- Load the Spire.Pdf.Base.js file to initialize the WebAssembly module.
- Fetch the PDF file into the Virtual File System (VFS) using the wasmModule.FetchFileToVFS() method.
- Fetch the font files used in the PDF document to the “/Library/Fonts/” folder in the VFS using the wasmModule.FetchFileToVFS() method.
- Create an instance of the PdfDocument class using the wasmModule.PdfDocument.Create() method.
- Load the PDF document from the VFS into the PdfDocument instance using the PdfDocument.LoadFromFile() method.
- (Optional) Customize the conversion options:
- Create an instance of the XlsxLineLayoutOptions or XlsxTextLayoutOptions class and specify the desired conversion settings.
- Apply the conversion options using the PdfDocument.ConvertOptions.SetPdfToXlsxOptions() method.
- Convert the PDF document to an Excel file using the PdfDocument.SaveToFile({ filename: string, wasmModule.FileFormat.XLSX }) method.
- Retrieve the converted file from the VFS for download or further use.
Simple PDF to Excel Conversion in JavaScript
Developers can directly load a PDF document from the VFS and convert it to an Excel file using the default conversion settings. These settings map one PDF page to one Excel worksheet, preserve rotated and overlapped text, allow cell splitting, and enable text wrapping.
Below is a code example demonstrating this process:
- JavaScript
import React, { useState, useEffect } from 'react'; function App() { // State to store the loaded WASM module const [wasmModule, setWasmModule] = useState(null); // useEffect hook to load the WASM module when the component mounts useEffect(() => { const loadWasm = async () => { try { // Access the Module and spirepdf from the global window object const { Module, spirepdf } = window; // Set the wasmModule state when the runtime is initialized Module.onRuntimeInitialized = () => { setWasmModule(spirepdf); }; } catch (err) { // Log any errors that occur during module loading console.error('Failed to load the WASM module:', err); } }; // Create a script element to load the WASM JavaScript file const script = document.createElement('script'); script.src = `${process.env.PUBLIC_URL}/Spire.Pdf.Base.js`; script.onload = loadWasm; // Append the script to the document body document.body.appendChild(script); // Cleanup function to remove the script when the component unmounts return () => { document.body.removeChild(script); }; }, []); // Function to convert PDF to Excel const ConvertPDFToExcel = async () => { if (wasmModule) { // Specify the input and output file names const inputFileName = 'Sample.pdf'; const outputFileName = 'PDFToExcel.xlsx'; // Fetch the input file and add it to the VFS await wasmModule.FetchFileToVFS(inputFileName, '', `${process.env.PUBLIC_URL}/`); // Fetch the font file used in the PDF to the VFS await wasmModule.FetchFileToVFS('Calibri.ttf', '/Library/Fonts/', `${process.env.PUBLIC_URL}/`); await wasmModule.FetchFileToVFS('Symbol.ttf', '/Library/Fonts/', `${process.env.PUBLIC_URL}/`); // Create an instance of the PdfDocument class const pdf = wasmModule.PdfDocument.Create(); // Load the PDF document from the VFS pdf.LoadFromFile(inputFileName); // Convert the PDF document to an Excel file pdf.SaveToFile({ fileName: outputFileName, fileFormat: wasmModule.FileFormat.XLSX}); // Read the Excel file from the VFS const excelArray = await wasmModule.FS.readFile(outputFileName) // Create a Blob object from the Excel file and trigger a download const blob = new Blob([excelArray], { type: 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet' }); const url = URL.createObjectURL(blob); const a = document.createElement('a'); a.href = url; a.download = `${outputFileName}`; document.body.appendChild(a); a.click(); document.body.removeChild(a); URL.revokeObjectURL(url); } }; return ( <div style={{ textAlign: 'center', height: '300px' }}> <h1>Convert PDF to Excel Using JavaScript in React</h1> <button onClick={ConvertPDFToExcel} disabled={!wasmModule}> Convert and Download </button> </div> ); } export default App;
Convert PDF to Excel with XlsxLineLayoutOptions
Spire.PDF for JavaScript provides the XlsxLineLayoutOptions class for configuring line-based conversion settings when converting PDFs to Excel. By adjusting these options, developers can achieve different conversion results, such as merging all PDF pages into a single worksheet.
The table below outlines the available parameters in XlsxLineLayoutOptions:
Parameter (bool) | Function |
convertToMultipleSheet | Specifies whether to convert each page into a separate worksheet. |
rotatedText | Specifies whether to retain rotated text. |
splitCell | Specifies whether to split cells. |
wrapText | Specifies whether to wrap text within cells. |
overlapText | Specifies whether to retain overlapped text. |
Special attention should be given to the splitCell parameter, as it significantly impacts the way tables are converted. Setting it to false preserves table cell structures, making the output table cells more faithful to the original PDF. Conversely, setting it to true allows plain text to be split smoothly in cells, which may be useful for text-based layouts rather than structured tables.
Below is a code example demonstrating PDF-to-Excel conversion using XlsxLineLayoutOptions:
- JavaScript
import React, { useState, useEffect } from 'react'; function App() { // State to store the loaded WASM module const [wasmModule, setWasmModule] = useState(null); // useEffect hook to load the WASM module when the component mounts useEffect(() => { const loadWasm = async () => { try { // Access the Module and spirepdf from the global window object const { Module, spirepdf } = window; // Set the wasmModule state when the runtime is initialized Module.onRuntimeInitialized = () => { setWasmModule(spirepdf); }; } catch (err) { // Log any errors that occur during module loading console.error('Failed to load the WASM module:', err); } }; // Create a script element to load the WASM JavaScript file const script = document.createElement('script'); script.src = `${process.env.PUBLIC_URL}/Spire.Pdf.Base.js`; script.onload = loadWasm; // Append the script to the document body document.body.appendChild(script); // Cleanup function to remove the script when the component unmounts return () => { document.body.removeChild(script); }; }, []); // Function to convert PDF to Excel with XlsxLineLayoutOptions const ConvertPDFToExcelXlsxLineLayoutOptions = async () => { if (wasmModule) { // Specify the input and output file names const inputFileName = 'Sample.pdf'; const outputFileName = 'PDFToExcelXlsxLineLayoutOptions.xlsx'; // Fetch the input file and add it to the VFS await wasmModule.FetchFileToVFS(inputFileName, '', `${process.env.PUBLIC_URL}/`); // Fetch the font file used in the PDF to the VFS await wasmModule.FetchFileToVFS('Calibri.ttf', '/Library/Fonts/', `${process.env.PUBLIC_URL}/`); await wasmModule.FetchFileToVFS('Symbol.ttf', '/Library/Fonts/', `${process.env.PUBLIC_URL}/`); // Create an instance of the PdfDocument class const pdf = wasmModule.PdfDocument.Create(); // Load the PDF document from the VFS pdf.LoadFromFile(inputFileName); // Create an instance of the XlsxLineLayoutOptions class and specify the conversion options const options = wasmModule.XlsxLineLayoutOptions.Create({ convertToMultipleSheet: true, rotatedText: false, splitCell: false, wrapText: false, overlapText: true}); // Set the XlsxLineLayoutOptions instance as the conversion options pdf.ConvertOptions.SetPdfToXlsxOptions(options); // Convert the PDF document to an Excel file pdf.SaveToFile({ fileName: outputFileName, fileFormat: wasmModule.FileFormat.XLSX}); // Read the Excel file from the VFS const excelArray = await wasmModule.FS.readFile(outputFileName) // Create a Blob object from the Excel file and trigger a download const blob = new Blob([excelArray], { type: 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet' }); const url = URL.createObjectURL(blob); const a = document.createElement('a'); a.href = url; a.download = `${outputFileName}`; document.body.appendChild(a); a.click(); document.body.removeChild(a); URL.revokeObjectURL(url); } }; return ( <div style={{ textAlign: 'center', height: '300px' }}> <h1>Convert PDF to Excel with XlsxLineLayoutOptions Using JavaScript in React</h1> <button onClick={ConvertPDFToExcelXlsxLineLayoutOptions} disabled={!wasmModule}> Convert and Download </button> </div> ); } export default App;
Convert PDF to Excel Using XlsxTextLayoutOptions
Developers can also customize conversion settings using the XlsxTextLayoutOptions class, which focuses on text-based layout formatting. The table below lists its parameters:
Parameter (bool) | Function |
convertToMultipleSheet | Specifies whether to convert each page into a separate worksheet. |
rotatedText | Specifies whether to retain rotated text. |
overlapText | Specifies whether to retain overlapped text. |
Below is a code example demonstrating PDF-to-Excel conversion using XlsxTextLayoutOptions:
- JavaScript
import React, { useState, useEffect } from 'react'; function App() { // State to store the loaded WASM module const [wasmModule, setWasmModule] = useState(null); // useEffect hook to load the WASM module when the component mounts useEffect(() => { const loadWasm = async () => { try { // Access the Module and spirepdf from the global window object const { Module, spirepdf } = window; // Set the wasmModule state when the runtime is initialized Module.onRuntimeInitialized = () => { setWasmModule(spirepdf); }; } catch (err) { // Log any errors that occur during module loading console.error('Failed to load the WASM module:', err); } }; // Create a script element to load the WASM JavaScript file const script = document.createElement('script'); script.src = `${process.env.PUBLIC_URL}/Spire.Pdf.Base.js`; script.onload = loadWasm; // Append the script to the document body document.body.appendChild(script); // Cleanup function to remove the script when the component unmounts return () => { document.body.removeChild(script); }; }, []); // Function to convert PDF to Excel with XlsxTextLayoutOptions const ConvertPDFToExcelXlsxTextLayoutOptions = async () => { if (wasmModule) { // Specify the input and output file names const inputFileName = 'Sample.pdf'; const outputFileName = 'PDFToExcelXlsxTextLayoutOptions.xlsx'; // Fetch the input file and add it to the VFS await wasmModule.FetchFileToVFS(inputFileName, '', `${process.env.PUBLIC_URL}/`); // Fetch the font file used in the PDF to the VFS await wasmModule.FetchFileToVFS('Calibri.ttf', '/Library/Fonts/', `${process.env.PUBLIC_URL}/`); await wasmModule.FetchFileToVFS('Symbol.ttf', '/Library/Fonts/', `${process.env.PUBLIC_URL}/`); // Create an instance of the PdfDocument class const pdf = wasmModule.PdfDocument.Create(); // Load the PDF document from the VFS pdf.LoadFromFile(inputFileName); // Create an instance of the XlsxTextLayoutOptions class and specify the conversion options const options = wasmModule.XlsxTextLayoutOptions.Create({ convertToMultipleSheet: false, rotatedText: true, overlapText: true}); // Set the XlsxTextLayoutOptions instance as the conversion options pdf.ConvertOptions.SetPdfToXlsxOptions(options); // Convert the PDF document to an Excel file pdf.SaveToFile({ fileName: outputFileName, fileFormat: wasmModule.FileFormat.XLSX}); // Read the Excel file from the VFS const excelArray = await wasmModule.FS.readFile(outputFileName) // Create a Blob object from the Excel file and trigger a download const blob = new Blob([excelArray], { type: 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet' }); const url = URL.createObjectURL(blob); const a = document.createElement('a'); a.href = url; a.download = `${outputFileName}`; document.body.appendChild(a); a.click(); document.body.removeChild(a); URL.revokeObjectURL(url); } }; return ( <div style={{ textAlign: 'center', height: '300px' }}> <h1>Convert PDF to Excel with XlsxTextLayoutOptions Using JavaScript in React</h1> <button onClick={ConvertPDFToExcelXlsxTextLayoutOptions} disabled={!wasmModule}> Convert and Download </button> </div> ); } export default App;
Get a Free License
To fully experience the capabilities of Spire.PDF for JavaScript without any evaluation limitations, you can request a free 30-day trial license.