Converting PDF to HTML is important for improving accessibility and interactivity in web environments. While PDFs are widely used for their reliable layout and ease of sharing, they can be restrictive when it comes to online use. HTML provides greater flexibility, allowing content to be displayed more effectively on websites and mobile devices. By converting a PDF document into HTML, developers can enhance search engine visibility, enable easier editing, and create more user-friendly experiences. In this article, we will demonstrate how to convert PDF to HTML in React with JavaScript and the Spire.PDF for JavaScript library.
- Convert PDF to HTML in React
- Customize PDF to HTML Conversion Settings in React
- Convert PDF to HTML Stream in React
Install Spire.PDF for JavaScript
To get started with converting PDF to HTML with JavaScript in a React application, you can either download Spire.PDF for JavaScript from our website or install it via npm with the following command:
npm i spire.pdf
After that, copy the "Spire.Pdf.Base.js" and "Spire.Pdf.Base.wasm" files to the public folder of your project. Additionally, include the required font files to ensure accurate and consistent text rendering.
For more details, refer to the documentation: How to Integrate Spire.PDF for JavaScript in a React Project
Convert PDF to HTML in React
The PdfDocument.SaveToFile() method offered by Spire.PDF for JavaScript allows developers to effortlessly convert a PDF file into HTML format. The detailed steps are as follows.
- Load the required font file and the input PDF file into the Virtual File System (VFS).
- Create a PdfDocument object with the wasmModule.PdfDocument.Create() method.
- Load the PDF file using the PdfDocument.LoadFromFile() method.
- Save the PDF file to HTML format using the PdfDocument.SaveToFile() method.
- JavaScript
import React, { useState, useEffect } from 'react'; function App() { // State to hold the loaded WASM module const [wasmModule, setWasmModule] = useState(null); // useEffect hook to load the WASM module when the component mounts useEffect(() => { const loadWasm = async () => { try { // Access the Module and spirepdf from the global window object const { Module, spirepdf } = window; // Set the wasmModule state when the runtime is initialized Module.onRuntimeInitialized = () => { setWasmModule(spirepdf); }; } catch (err) { // Log any errors that occur during loading console.error('Failed to load WASM module:', err); } }; // Create a script element to load the WASM JavaScript file const script = document.createElement('script'); script.src = `${process.env.PUBLIC_URL}/Spire.Pdf.Base.js`; script.onload = loadWasm; // Append the script to the document body document.body.appendChild(script); // Cleanup function to remove the script when the component unmounts return () => { document.body.removeChild(script); }; }, []); // Function to convert PDF to HTML const ConvertPdfToHTML = async () => { if (wasmModule) { // Load the necessary font file into the virtual file system (VFS) await wasmModule.FetchFileToVFS('ARIAL.TTF', '/Library/Fonts/', `${process.env.PUBLIC_URL}/`); // Load the input PDF file into the VFS let inputFileName = 'Input.pdf'; await wasmModule.FetchFileToVFS(inputFileName, '', `${process.env.PUBLIC_URL}/`); // Create a new document const doc = wasmModule.PdfDocument.Create(); // Load the PDF file doc.LoadFromFile(inputFileName); // Define the output file name const outputFileName = 'PdfToHtml.html'; // Save the document to an HTML file doc.SaveToFile({fileName: outputFileName, fileFormat: wasmModule.FileFormat.HTML}); // Clean up resources doc.Close(); doc.Dispose(); // Read the saved file and convert it to a Blob object const modifiedFileArray = wasmModule.FS.readFile(outputFileName); const modifiedFile = new Blob([modifiedFileArray], { type: 'text/html' }); // Create a URL for the Blob and initiate the download const url = URL.createObjectURL(modifiedFile); const a = document.createElement('a'); a.href = url; a.download = outputFileName; document.body.appendChild(a); a.click(); document.body.removeChild(a); URL.revokeObjectURL(url); } }; return ( <div style={{ textAlign: 'center', height: '300px' }}> <h1>Convert PDF to HTML in React Using JavaScript</h1> <button onClick={ConvertPdfToHTML} disabled={!wasmModule}> Convert </button> </div> ); } export default App;
Run the code to launch the React app at localhost:3000. Once it's running, click on the "Convert" button to convert the PDF file to HTML format:
Here is the screenshot of the input PDF file and the converted HTML file:
Customize PDF to HTML Conversion Settings in React
Developers can use the PdfDocument.ConvertOptions.SetPdfToHtmlOptions() method to customize settings during the PDF to HTML conversion process. For instance, they can choose whether to embed SVG or images in the resulting HTML and set the maximum number of pages included in each HTML file. The detailed steps are as follows.
- Load the required font file and the input PDF file into the Virtual File System (VFS).
- Create a PdfDocument object with the wasmModule.PdfDocument.Create() method.
- Load the PDF file using the PdfDocument.LoadFromFile() method.
- Customize the PDF to HTML conversion settings using the PdfDocument.ConvertOptions.SetPdfToHtmlOptions() method.
- Save the PDF document to HTML format using the PdfDocument.SaveToFile() method.
- JavaScript
import React, { useState, useEffect } from 'react'; function App() { // State to hold the loaded WASM module const [wasmModule, setWasmModule] = useState(null); // useEffect hook to load the WASM module when the component mounts useEffect(() => { const loadWasm = async () => { try { // Access the Module and spirepdf from the global window object const { Module, spirepdf } = window; // Set the wasmModule state when the runtime is initialized Module.onRuntimeInitialized = () => { setWasmModule(spirepdf); }; } catch (err) { // Log any errors that occur during loading console.error('Failed to load WASM module:', err); } }; // Create a script element to load the WASM JavaScript file const script = document.createElement('script'); script.src = `${process.env.PUBLIC_URL}/Spire.Pdf.Base.js`; script.onload = loadWasm; // Append the script to the document body document.body.appendChild(script); // Cleanup function to remove the script when the component unmounts return () => { document.body.removeChild(script); }; }, []); // Function to convert PDF to HTML const ConvertPdfToHTML = async () => { if (wasmModule) { // Load the necessary font file into the virtual file system (VFS) await wasmModule.FetchFileToVFS('ARIAL.TTF', '/Library/Fonts/', `${process.env.PUBLIC_URL}/`); // Load the input PDF file into the VFS let inputFileName = 'Input.pdf'; await wasmModule.FetchFileToVFS(inputFileName, '', `${process.env.PUBLIC_URL}/`); // Create a new document const doc = wasmModule.PdfDocument.Create(); // Load the PDF file doc.LoadFromFile(inputFileName); // Customize the conversion settings // Parameters: useEmbeddedSvg: false, useEmbeddedImg: true, maxPageOneFile: 1 doc.ConvertOptions.SetPdfToHtmlOptions(false, true, 1); // Define the output file name const outputFileName = 'CutomizePdfToHtmlConversion.html'; // Save the document to an HTML file doc.SaveToFile({fileName: outputFileName, fileFormat: wasmModule.FileFormat.HTML}); // Clean up resources doc.Close(); doc.Dispose(); // Read the saved file and convert it to a Blob object const modifiedFileArray = wasmModule.FS.readFile(outputFileName); const modifiedFile = new Blob([modifiedFileArray], { type: 'text/html' }); // Create a URL for the Blob and initiate the download const url = URL.createObjectURL(modifiedFile); const a = document.createElement('a'); a.href = url; a.download = outputFileName; document.body.appendChild(a); a.click(); document.body.removeChild(a); URL.revokeObjectURL(url); } }; return ( <div style={{ textAlign: 'center', height: '300px' }}> <h1>Convert PDF to HTML in React Using JavaScript</h1> <button onClick={ConvertPdfToHTML} disabled={!wasmModule}> Convert </button> </div> ); } export default App;
Convert PDF to HTML Stream in React
Spire.PDF for JavaScript also supports converting a PDF to an HTML stream using the PdfDocument.SaveToStream() method. The detailed steps are as follows.
- Load the required font file and the input PDF file into the Virtual File System (VFS).
- Create a PdfDocument object with the wasmModule.PdfDocument.Create() method.
- Load the PDF file using the PdfDocument.LoadFromFile() method.
- Create a memory stream using the wasmModule.Stream.CreateByFile() method.
- Save the PDF document as an HTML stream using the PdfDocument.SaveToStream() method.
- Write the content of the stream to an HTML file using the wasmModule.FS.writeFile() method.
- JavaScript
import React, { useState, useEffect } from 'react'; function App() { // State to hold the loaded WASM module const [wasmModule, setWasmModule] = useState(null); // useEffect hook to load the WASM module when the component mounts useEffect(() => { const loadWasm = async () => { try { // Access the Module and spirepdf from the global window object const { Module, spirepdf } = window; // Set the wasmModule state when the runtime is initialized Module.onRuntimeInitialized = () => { setWasmModule(spirepdf); }; } catch (err) { // Log any errors that occur during loading console.error('Failed to load WASM module:', err); } }; // Create a script element to load the WASM JavaScript file const script = document.createElement('script'); script.src = `${process.env.PUBLIC_URL}/Spire.Pdf.Base.js`; script.onload = loadWasm; // Append the script to the document body document.body.appendChild(script); // Cleanup function to remove the script when the component unmounts return () => { document.body.removeChild(script); }; }, []); // Function to convert PDF to HTML const ConvertPdfToHTML = async () => { if (wasmModule) { // Load the necessary font file into the virtual file system (VFS) await wasmModule.FetchFileToVFS('ARIAL.TTF', '/Library/Fonts/', `${process.env.PUBLIC_URL}/`); // Load the input PDF file into the VFS let inputFileName = 'Input.pdf'; await wasmModule.FetchFileToVFS(inputFileName, '', `${process.env.PUBLIC_URL}/`); // Create a new document const doc = wasmModule.PdfDocument.Create(); // Load the PDF file doc.LoadFromFile(inputFileName); // Define the output file name const outputFileName = 'PdfToHtmlStream.html'; // Create a new memory stream let ms = wasmModule.Stream.CreateByFile(outputFileName); // Save the PDF document to an HTML stream doc.SaveToStream({stream: ms, fileformat: wasmModule.FileFormat.HTML}); // Write the content of the memory stream to an HTML file wasmModule.FS.writeFile(outputFileName, ms.ToArray()); // Clean up resources doc.Close(); doc.Dispose(); // Read the saved file and convert it to a Blob object const modifiedFileArray = wasmModule.FS.readFile(outputFileName); const modifiedFile = new Blob([modifiedFileArray], { type: 'text/html' }); // Create a URL for the Blob and initiate the download const url = URL.createObjectURL(modifiedFile); const a = document.createElement('a'); a.href = url; a.download = outputFileName; document.body.appendChild(a); a.click(); document.body.removeChild(a); URL.revokeObjectURL(url); } }; return ( <div style={{ textAlign: 'center', height: '300px' }}> <h1>Convert PDF to HTML in React Using JavaScript</h1> <button onClick={ConvertPdfToHTML} disabled={!wasmModule}> Convert </button> </div> ); } export default App;
Get a Free License
To fully experience the capabilities of Spire.PDF for JavaScript without any evaluation limitations, you can request a free 30-day trial license.