When working with Word documents, batch extraction of hyperlinks has significant practical applications. Manually extracting URLs from technical documents or product manuals is not only inefficient but also prone to omissions and errors. To address this, this article presents an automated solution using C# to accurately extract hyperlink anchor text, corresponding URLs, and screen tips by parsing document elements. The extracted hyperlink data can support data analysis, SEO optimization, and other applications. The following sections demonstrate how to use Spire.Doc for .NET to extract hyperlinks from a Word document with C# code in .NET programs.
Install Spire.Doc for .NET
To begin with, you need to add the DLL files included in the Spire.Doc for .NET package as references in your .NET project. The DLL files can be either downloaded from this link or installed via NuGet.
PM> Install-Package Spire.Doc
Extracting All Hyperlinks from a Word Document Using C#
In a Word document, hyperlinks are stored as fields. To extract them, the first step is to identify all field objects by checking whether each document object is an instance of the Field class. Then, by checking whether the field object's Type property equals FieldType.FieldHyperlink, we can extract all hyperlink fields.
Once the hyperlinks are identified, we can use the Field.FieldText property to retrieve the hyperlink anchor text and the Field.GetFieldCode() method to obtain the full field code in the following format:
Hyperlink Type | Field Code Example |
Standard Hyperlink | HYPERLINK "https://www.example.com/example" |
Hyperlink with ScreenTip | HYPERLINK "https://www.example.com/example" \o "ScreenTip" |
By parsing the field code, we can extract both the hyperlink URL and the screen tip text, enabling complete retrieval of hyperlink information.
- Create a Document object and use the Document.LoadFromFile() method to load the target Word document.
- Iterate through all sections in the document using foreach (Section section in doc.Sections) to retrieve each section object.
- For each section, iterate through its child objects using foreach (DocumentObject secObj in section.Body.ChildObjects) to access individual elements.
- If a child object is of type Paragraph:
- Iterate through the child objects within the paragraph using foreach (DocumentObject paraObj in paragraph.ChildObjects).
- If a paragraph child object is of type Field and its Field.Type property value equals FieldType.FieldHyperlink, process the Field object.
- For each Field object:
- Extract the anchor text using the Field.FieldText property.
- Retrieve the field code string using the Field.GetFieldCode() method.
- Process the field code string:
- Extract the URL enclosed in quotation marks after "HYPERLINK".
- Check if the field code contains the \o parameter; if present, extract the screen tip text enclosed in double quotes.
- Store the extracted hyperlinks and write them to an output file.
- C#
using Spire.Doc; using Spire.Doc.Documents; using Spire.Doc.Fields; namespace ExtractWordHyperlink { class Program { static void Main(string[] args) { // Create an instance of Document Document doc = new Document(); // Load a Word document doc.LoadFromFile("Sample.docx"); // Create a string list to store the hyperlink information List<string> hyperlinkInfoList = new List<string>(); // Iterate through the sections in the document foreach (Section section in doc.Sections) { // Iterate through the child objects in the section foreach (DocumentObject secObj in section.Body.ChildObjects) { // Check if the current document object is a Paragraph instance if (secObj is Paragraph paragraph ) { // Iterate through the child objects in the paragraph foreach (DocumentObject paraObj in paragraph.ChildObjects) { // Check if the current child object is a field if (paraObj is Field field && field.Type == FieldType.FieldHyperlink) { string hyperlinkInfo = ""; // Get the anchor text string anchorText = field.FieldText; // Get the field code string fieldCode = field.GetFieldCode(); // Get the URL from the field code string url = fieldCode.Split('"')[1]; // Check if there is a ScreenTip if (fieldCode.Contains("\\o")) { // Get the ScreenTip text string screenTip = fieldCode.Split("\"")[3].Trim(); // Consolidate the information hyperlinkInfo += $"Anchor Text: {anchorText}\nURL: {url}\nScreenTip: {screenTip}"; } else { hyperlinkInfo += $"Anchor Text: {anchorText}\nURL: {url}"; } hyperlinkInfo += "\n"; // Append the hyperlink information to the list hyperlinkInfoList.Add(hyperlinkInfo); } } } } } // Write the extracted hyperlink information to a text file File.WriteAllLines("output/ExtractedHyperlinks.txt", hyperlinkInfoList); doc.Close(); } } }
Apply for a Temporary License
If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.