Java: Extract Images from PDF Documents

Extracting images from PDF documents is a highly valuable skill for anyone dealing with digital files. This capability is particularly beneficial for graphic designers who need to source visuals, content creators looking to repurpose images for blogs or social media, and data analysts who require specific graphics for reports. By efficiently retrieving images from PDFs, users can enhance their productivity and streamline their workflows, saving both time and effort.

In this article, you will learn how to extract images from an individual PDF page as well as from an entire PDF document, using Spire.PDF for Java.

Extract Images from a Specific PDF Page
Extract Images from an Entire PDF Document

Install Spire.PDF for Java

First of all, you're required to add the Spire.Pdf.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.

Package Manager

<repositories>
    <repository>
        <id>com.e-iceblue</id>
        <name>e-iceblue</name>
        <url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>e-iceblue</groupId>
        <artifactId>spire.pdf</artifactId>
        <version>11.5.2</version>
    </dependency>
</dependencies>

Extract Images from a Specific PDF Page in Java

The PdfImageHelper class in Spire.PDF for Java is designed to facilitate image management within PDF documents. It enables users to perform several operations, such as deleting, replacing, and retrieving images.

To get information about the images on a specific PDF page, developers can use the PdfImageHelper.getImagesInfo(PdfPageBase page) method. Once they have this information, they can export the image data in widely used formats such as PNG and JPEG.

The steps to extract images from a specific PDF page using Java are as follows:

Create a PdfDocument object.
Load a PDF file using the PdfDocument.loadFromFile() method.
Get a specific page using the PdfDocument.getPages().get(index) method.
Create a PdfImageHelper object.
Get the image information collection from the page using the PdfImageHelper.getImagesInfo() method.
Iterate through the image information collection.
- Get a specific piece of image information.
- Get the image data from the image information using the PdfImageInfo.getImage() method.
- Write the image data as a PNG file using the ImageIO.write() method.

The following code demonstrates how to extract images from a particular page in a PDF document and save them in a specified folder.

Java

import com.spire.pdf.*;
import com.spire.pdf.utilities.*;

import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;

public class ExtractImagesFromPage {

    public static void main(String[] args) throws IOException {

        // Create a PdfDocument object
        PdfDocument doc = new PdfDocument();

        // Load a PDF document
        doc.loadFromFile("C:\\Users\\Administrator\\Desktop\\Input.pdf");

        // Get a specific page
        PdfPageBase page = doc.getPages().get(0);

        // Create a PdfImageHelper object
        PdfImageHelper imageHelper = new PdfImageHelper();

        // Get all image information from the page
        PdfImageInfo[] imageInfos = imageHelper.getImagesInfo(page);

        // Iterate through the image information
        for (int i = 0; i < imageInfos.length; i++)
        {
            // Get a specific piece of image information
            PdfImageInfo imageInfo = imageInfos[i];

            // Get the image
            BufferedImage image = imageInfo.getImage();
            File file = new File(String.format("C:\\Users\\Administrator\\Desktop\\Extracted\\Image-%d.png",i));

            // Save the image file in PNG format
            ImageIO.write(image, "PNG", file);
        }

        // Dispose resources
        doc.dispose();
    }
}

Extract images from a certain page in a PDF document and save them in a folder

Extract Images from an Entire PDF Document in Java

From the example above, you learned how to extract images from a specific page. By iterating through each page in the document and performing image extraction on every one, you can easily gather all images from the entire document.

The steps to extract images from an entire PDF document using Java are as follows:

Create a PdfDocument object.
Load a PDF file using the PdfDocument.loadFromFile() method.
Create a PdfImageHelper object.
Iterate through the pages in the document.
- Get a specific page using the PdfDocument.getPages().get(index) method.
- Get the image information collection from the page using PdfImageHelper.getImagesInfo() method.
- Iterate through the image information collection and save each instance as a PNG file using the ImageIO.write() method.

The following code illustrates how to extract all images from a PDF document and save them in a specified folder.

Java

import com.spire.pdf.PdfDocument;
import com.spire.pdf.PdfPageBase;
import com.spire.pdf.utilities.PdfImageHelper;
import com.spire.pdf.utilities.PdfImageInfo;

import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;

public class ExtractAllImages {

    public static void main(String[] args) throws IOException {

        // Create a PdfDocument object
        PdfDocument doc = new PdfDocument();

        // Load a PDF document
        doc.loadFromFile("C:\\Users\\Administrator\\Desktop\\Input.pdf");

        // Create a PdfImageHelper object
        PdfImageHelper imageHelper = new PdfImageHelper();

        // Declare an int variable
        int m = 0;

        // Iterate through the pages
        for (int i = 0; i < doc.getPages().getCount(); i++) {

            // Get a specific page
            PdfPageBase page = doc.getPages().get(i);

            // Get all image information from the page
            PdfImageInfo[] imageInfos = imageHelper.getImagesInfo(page);

            // Iterate through the image information
            for (int j = 0; j < imageInfos.length; j++)
            {
                // Get a specific image information
                PdfImageInfo imageInfo = imageInfos[j];

                // Get the image
                BufferedImage image = imageInfo.getImage();
                File file = new File(String.format("C:\\Users\\Administrator\\Desktop\\Extracted\\Image-%d.png",m));
                m++;

                // Save the image file in PNG format
                ImageIO.write(image, "PNG", file);
            }
        }

        // Dispose resources
        doc.dispose();
    }
}

Extract all images from an entire PDF document and save them in a folder

Apply for a Temporary License

If you'd like to remove the evaluation message from the generated documents, or to get rid of the function limitations, please request a 30-day trial license for yourself.

Java: Extract Images from PDF Documents

Install Spire.PDF for Java

Extract Images from a Specific PDF Page in Java

Extract Images from an Entire PDF Document in Java

Apply for a Temporary License

See Also