Convert HTML to Word and Word to HTML Using C# .NET

Convert HTML to Word and Word to HTML using C# .NET

Microsoft Word and HTML (Hypertext Markup Language) are two of the most widely used formats worldwide. Microsoft Word is the go-to solution for crafting rich, feature-packed documents such as reports, proposals, and print-ready files, while HTML is the foundational language that powers content on the web. Understanding how to effectively convert between these formats can enhance document usability and accessibility.

In this article, we will provide a detailed step-by-step guide on converting HTML to Word and Word to HTML in .NET using C#. It covers the following topics:

Why Convert Between Word and HTML?

Before diving into the technical details, let's understand why you might need to convert between Word and HTML:

  • Cross-Platform Accessibility: HTML is the backbone of web pages, while Word documents are industry-standard for creating, sharing and editing content. Converting between them enables content to be accessible and editable across different platforms.
  • Rich Formatting: Word documents support complex formatting and elements; converting HTML to Word lets users retain formatting when exporting web content.
  • Document Archiving and Data Exchange: Archive HTML content as Word or publish Word-based reports to the web.

.NET Word Library Installation

The .NET framework does not natively support HTML or Word conversions. To bridge this gap, Spire.Doc for .NET provides a powerful, developer-friendly API for document creation, manipulation, and conversion—without requiring Microsoft Office or Interop libraries.

Install Spire.Doc for .NET

Before getting started with the conversion, you need to install Spire.Doc for .NET through one of the following methods:

Method 1: Install via NuGet

Run the following command in the NuGet Package Manager Console:

Install-Package Spire.Doc

Method 2: Manually Add the DLLs

You can also download the Spire.Doc for .NET package, extract the files, and then reference Spire.Doc.dll manually in your Visual Studio project.

How to Convert HTML to Word Using C#

Spire.Doc enables you to load HTML files or HTML strings and save them as Word documents. Let’s see how to implement these conversions.

Convert HTML String to Word

To convert an HTML string to Word format, follow these steps:

  • Create a Document Object: Instantiate a new Document object.
  • Add a Section and Paragraph: Create a section in the document and add a paragraph.
  • Append HTML String: Use the Paragraph.AppendHTML() method to include the HTML content.
  • Save the Document: Save the document using Document.SaveToFile() with the desired format (e.g., Docx).

Example code

using Spire.Doc;
using Spire.Doc.Documents;
using System.IO;

namespace ConvertHtmlStringToWord
{
    class Program
    {
        static void Main(string[] args)
        {
            // Create a Document object
            Document document = new Document();

            // Add a section to the document
            Section section = document.AddSection();

            // Set the page margins
            section.PageSetup.Margins.All = 2;

            // Add a paragraph to the section
            Paragraph paragraph = section.AddParagraph();

            // Read HTML string from a file
            string htmlFilePath = @"C:\Users\Administrator\Desktop\Html.html";
            string htmlString = File.ReadAllText(htmlFilePath, System.Text.Encoding.UTF8);

            // Append the HTML string to the paragraph
            paragraph.AppendHTML(htmlString);

            // Save the document to a Word file
            document.SaveToFile("AddHtmlStringToWord.docx", FileFormat.Docx);

            // Dispose resources
            document.Dispose();
        }
    }
}

Convert HTML String to Word using C# .NET

Convert HTML File to Word

If you have existing HTML files, converting them to Word is straightforward. Here’s how to do that:

  • Create a Document Object: Instantiate a new Document object.
  • Load the HTML File: Use Document.LoadFromFile() to load the HTML file.
  • Save as Word Format: Save the document using Document.SaveToFile() with the desired format (e.g., Docx).

Example Code

using Spire.Doc;

namespace ConvertHtmlToWord
{
    class Program
    {
        static void Main(string[] args)
        {
            // Create a Document object
            Document document = new Document();
            // Load the HTML file
            document.LoadFromFile(@"C:\Users\Administrator\Desktop\MyHtml.html", FileFormat.Html);

            // Save the file as a Word document
            document.SaveToFile("HtmlToWord.docx", FileFormat.Docx);

            // Dispose resources
            document.Dispose();
        }
    }
}

Convert HTML File to Word using C# .NET

How to Convert Word to HTML Using C#

Spire.Doc also supports exporting Word documents (such as .docx and .doc) to HTML format. You can perform basic conversion with default behavior, or customize the output using advanced settings.

Basic Word to HTML Conversion

To convert a Word document to an HTML file using default settings, follow these steps:

  • Create a Document Object: Instantiate a new Document object.
  • Load the Word Document: Use Document.LoadFromFile() to load the Word document.
  • Save as HTML File: Save the document using Document.SaveToFile() with HTML as the format.

Example Code

using Spire.Doc;

namespace BasicWordToHtmlConversion
{
    class Program
    {
        static void Main(string[] args)
        {
            // Create a Document object
            Document document = new Document();
            // Load the Word document
            document.LoadFromFile("input.docx");

            // Save the document as an HTML file
            document.SaveToFile("BasicWordToHtmlConversion.html", FileFormat.Html);

            // Dispose resources
            document.Dispose();
        }
    }
}

Advanced Word to HTML Conversion Settings

To tailor the conversion process, use the HtmlExportOptions class, which allows you to adjust a variety of settings, including:

  • Whether to export the document's styles.
  • Whether to embed images in the converted HTML.
  • Whether to export headers and footers.
  • Whether to export form fields as text.

Follow these steps to convert a Word document to HTML with customized options:

  • Create a Document Object: Instantiate a new Document object.
  • Load the Word Document: Use Document.LoadFromFile() to load the Word document.
  • Get HtmlExportOptions: Access the HtmlExportOptions through Document.HtmlExportOptions.
  • Customize Conversion Settings: Modify the properties of HtmlExportOptions to customize the conversion.
  • Save as HTML File: Save the document using Document.SaveToFile() with HTML as the format.

Example Code

using Spire.Doc;

namespace AdvancedWordToHtmlConversion
{
    class Program
    {
        static void Main(string[] args)
        {
            //Create a Document object
            Document doc = new Document();

            //Load a Word document
            doc.LoadFromFile("sample.docx");

            HtmlExportOptions htmlExportOptions = doc.HtmlExportOptions;
            // Set whether to export the document styles
            htmlExportOptions.IsExportDocumentStyles = true;
            // Set whether to embed the images in the HTML
            htmlExportOptions.ImageEmbedded = true;
            // Set the type of the CSS style sheet
            htmlExportOptions.CssStyleSheetType = CssStyleSheetType.Internal;
            // Set whether to export headers and footers
            htmlExportOptions.HasHeadersFooters = true;
            // Set whether to export form fields as text
            htmlExportOptions.IsTextInputFormFieldAsText = false;

            // Save the document as an HTML file
            doc.SaveToFile("AdvancedWordToHtmlConversion.html", FileFormat.Html);
            doc.Close();
        }
    }
}

Conclusion

Converting HTML to Word and Word to HTML using C# and the Spire.Doc library is a seamless process that enhances document management and accessibility. By following the detailed steps outlined in this tutorial, developers can easily implement these conversions in their applications, improving workflow and productivity.

FAQs

Q1: Is it possible to batch convert multiple Word files to HTML using C#?

A1: Yes, you can loop through a list of Word files and apply the conversion logic in your C# code.

Q2: What types of HTML elements are supported during conversion to Word?

A2: Spire.Doc supports a wide range of HTML elements, including text, tables, images, lists, and more. However, certain elements not supported by Microsoft Word may also not be rendered correctly in Spire.Doc.

Q3: Can I convert formats other than HTML and Word?

A3: Yes. Spire.Doc supports various file format conversions, such as Word to PDF, Markdown to Word, Word to Markdown, RTF to Word, RTF to PDF.

Q4: Is Spire.Doc free to use?

A4: Spire.Doc offers a free version for lightweight use, but for extensive features and commercial use, a licensed version is recommended.

Get a Free License

To fully experience the capabilities of Spire.Doc for .NET without any evaluation limitations, you can request a free 30-day trial license.