Articles in this section
Category / Section

How to open HTML in ASP.NET Core Word and Extract Image from the URL?

2 mins read

You can convert HTML to a Word document and vice versa using Syncfusion® ASP.NET Core Word (Essential® DocIO) without Microsoft Word or interop dependencies.

When converting HTML to a Word document using .NET Core Word library, the images referred to as URLs in the input HTML file (“<img src=”https://”>”) are not imported into the Word document. Essential® DocIO doesn’t support to downloading images from website URLs in ASP.NET Core, Xamarin, and Blazor platforms. You can import these images using the ImageNodeVisited event in DocIO.

Get the image from the URL in the input HTML:

To import the images referred to as URLs in the input HTML, we suggest you download the image using the ImageNodeVisited event in DocIO.

The following code example shows how to hook the ImageNodeVisited event while converting HTML to a Word document.

C#

//Open the file as Stream
FileStream docStream = new FileStream("Input.html", FileMode.Open, FileAccess.Read);
//Creates a new instance of WordDocument
WordDocument document = new WordDocument();
 
//Hooks the ImageNodeVisited event to download the image from a Website URL
document.HTMLImportSettings.ImageNodeVisited += DownloadImage;
 
//Opens the input HTML document
document.Open(docStream, FormatType.Html);
 
//Unhooks the ImageNodeVisited event after loading HTML
document.HTMLImportSettings.ImageNodeVisited -= DownloadImage;
 
FileStream outputStream = new FileStream("HtmlToWord.docx", FileMode.Create, FileAccess.ReadWrite, FileShare.ReadWrite);
//Saves the Word document
document.Save(outputStream, FormatType.Docx);
//Closes the Word document
document.Close();
//Disposes the output stream
outputStream.Flush();
outputStream.Dispose();

 

The following code example shows the event handler to download the image from a website URL.

C#

/// <summary>
/// Event handler to download the image from website.
/// </summary>
private static void DownloadImage(object sender, ImageNodeVisitedEventArgs args)
{
    //Check whether the image src is mentioned as a website URL.
    if (args.Uri.StartsWith("https://"))
    {
       WebClient client = new WebClient();
       //Download the image as a stream.
       byte[] image = client.DownloadData(args.Uri);
       Stream stream = new MemoryStream(image);
       //Set the retrieved image from the input HTML.
       args.ImageStream = stream;
    }
}

 

Note:

Hook the ImageNodeVisited event before opening the input HTML document and do not dispose of the image stream in the event handler. Otherwise, the image will not be preserved. Internally, DocIO will dispose of the image stream.

Take a moment to peruse the documentation, where you can find more information about HTML to Word conversion and vice versa.

Explore more about the rich set of Syncfusion® Word Framework features.


Conclusion

I hope you enjoyed learning about how to open HTML in ASP.NET Core Word and extract images from the URL.

You can refer to our ASP.NET Core Word library feature tour page to know about its other groundbreaking feature representations and documentation, and how to quickly get started for configuration specifications. You can also explore our .NET PDF example to understand how to create and manipulate data in the .NET File Format Libraries.

For current customers, you can check out our Document processing libraries from the License and Downloads page. If you are new to Syncfusion®, you can try our 30-day free trial to check out our .NET Core File Format Libraries and other .NET Core controls.

If you have any queries or require clarifications, please let us know in the comments section below. You can also contact us through our support forumsDirect-Trac, or feedback portal. We are always happy to assist you!

Did you find this information helpful?
Yes
No
Help us improve this page
Please provide feedback or comments
Comments (0)
Please  to leave a comment
Access denied
Access denied