With the document format conversion feature, you can convert various types of documents to target formats and save the conversion results to a specified OSS path.
Scenarios
Online preview optimization: After uploading PDF, Word, Excel, PPT, and other documents to OSS, you can call the document conversion interface to convert documents into images for direct preview on web or mobile devices without downloading.
Cross-platform compatibility: Through document conversion services, users with different devices and operating systems can view documents seamlessly.
Supported input file types
File type | File extension |
Word | doc, docx, wps, wpss, docm, dotm, dot, dotx, html |
PPT | pptx, ppt, pot, potx, pps, ppsx, dps, dpt, pptm, potm, ppsm, dpss |
Excel | xls, xlt, et, ett, xlsx, xltx, csv, xlsb, xlsm, xltm, ets |
How to use
Prerequisites
Create a bucket in OSS, upload the document to be converted to the bucket, and bind an Intelligent Media Management (IMM) Project to the bucket. The IMM Project must be in the same region as the bucket.
You must have the relevant permissions required for IMM processing.
Convert document format
You can use SDK to call the document conversion interface for processing and save the processed files to the specified bucket. You can use OSS SDKs only for Java, Python, or Go to convert a document.
Java
OSS SDK for Java V3.17.4 or later is required.
import com.aliyun.oss.ClientBuilderConfiguration;
import com.aliyun.oss.OSS;
import com.aliyun.oss.OSSClientBuilder;
import com.aliyun.oss.common.auth.CredentialsProviderFactory;
import com.aliyun.oss.common.auth.EnvironmentVariableCredentialsProvider;
import com.aliyun.oss.common.comm.SignVersion;
import com.aliyun.oss.model.AsyncProcessObjectRequest;
import com.aliyun.oss.model.AsyncProcessObjectResult;
import com.aliyuncs.exceptions.ClientException;
import java.util.Base64;
public class Demo1 {
public static void main(String[] args) throws ClientException {
// Specify the endpoint of the region in which the bucket is located.
String endpoint = "https://oss-cn-hangzhou.aliyuncs.com";
// Specify the ID of the Alibaba Cloud region in which the bucket is located. Example: cn-hangzhou.
String region = "cn-hangzhou";
// Obtain a credential from the environment variables. Before you run the sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured.
EnvironmentVariableCredentialsProvider credentialsProvider = CredentialsProviderFactory.newEnvironmentVariableCredentialsProvider();
// Specify the name of the bucket.
String bucketName = "examplebucket";
// Specify the name of the output object.
String targetKey = "dest.png";
// Specify the name of the source document.
String sourceKey = "src.docx";
// Create an OSSClient instance.
// When the OSSClient instance is no longer in use, call the shutdown method to release resources.
ClientBuilderConfiguration clientBuilderConfiguration = new ClientBuilderConfiguration();
clientBuilderConfiguration.setSignatureVersion(SignVersion.V4);
OSS ossClient = OSSClientBuilder.create()
.endpoint(endpoint)
.credentialsProvider(credentialsProvider)
.clientConfiguration(clientBuilderConfiguration)
.region(region)
.build();
try {
// Create a style variable of the string type to store document conversion parameters.
String style = String.format("doc/convert,target_png,source_docx");
// Create an asynchronous processing instruction.
String bucketEncoded = Base64.getUrlEncoder().withoutPadding().encodeToString(bucketName.getBytes());
String targetEncoded = Base64.getUrlEncoder().withoutPadding().encodeToString(targetKey.getBytes());
String process = String.format("%s|sys/saveas,b_%s,o_%s", style, bucketEncoded, targetEncoded);
// Create an AsyncProcessObjectRequest object.
AsyncProcessObjectRequest request = new AsyncProcessObjectRequest(bucketName, sourceKey, process);
// Execute the asynchronous processing task.
AsyncProcessObjectResult response = ossClient.asyncProcessObject(request);
System.out.println("EventId: " + response.getEventId());
System.out.println("RequestId: " + response.getRequestId());
System.out.println("TaskId: " + response.getTaskId());
} finally {
// Close your OSSClient.
ossClient.shutdown();
}
}
}
Python
OSS SDK for Python 2.18.4 or later is required.
# -*- coding: utf-8 -*-
import base64
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider
def main():
# Obtain the temporary access credentials from the environment variables. Before you execute the sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured.
auth = oss2.ProviderAuthV4(EnvironmentVariableCredentialsProvider())
# Specify the endpoint for the region in which the bucket is located. For example, if the bucket is located in the China (Hangzhou) region, set the endpoint to https://oss-cn-hangzhou.aliyuncs.com.
endpoint = 'https://oss-cn-hangzhou.aliyuncs.com'
# Specify the ID of the Alibaba Cloud region in which the bucket is located. Example: cn-hangzhou.
region = 'cn-hangzhou'
# Specify the name of the bucket. Example: examplebucket.
bucket = oss2.Bucket(auth, endpoint, 'examplebucket', region=region)
# Specify the name of the source document.
source_key = 'src.docx'
# Specify the name of the output object.
target_key = 'dest.png'
# Create a style variable of the string type to store document conversion parameters.
animation_style = 'doc/convert,target_png,source_docx'
# Create a processing instruction, in which the name of the bucket and the name of the output object are Base64-encoded.
bucket_name_encoded = base64.urlsafe_b64encode('examplebucket'.encode()).decode().rstrip('=')
target_key_encoded = base64.urlsafe_b64encode(target_key.encode()).decode().rstrip('=')
process = f"{animation_style}|sys/saveas,b_{bucket_name_encoded},o_{target_key_encoded}"
try:
# Execute the asynchronous processing task.
result = bucket.async_process_object(source_key, process)
print(f"EventId: {result.event_id}")
print(f"RequestId: {result.request_id}")
print(f"TaskId: {result.task_id}")
except Exception as e:
print(f"Error: {e}")
if __name__ == "__main__":
main()
Go
OSS SDK for Go 3.0.2 or later is required.
package main
import (
"encoding/base64"
"fmt"
"os"
"github.com/aliyun/aliyun-oss-go-sdk/oss"
"log"
)
func main() {
// Obtain the temporary access credentials from the environment variables. Before you execute the sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured.
provider, err := oss.NewEnvironmentVariableCredentialsProvider()
if err != nil {
fmt.Println("Error:", err)
os.Exit(-1)
}
// Create an OSSClient instance.
// Specify the endpoint of the region in which the bucket is located. For example, if the bucket is located in the China (Hangzhou) region, set the endpoint to https://oss-cn-hangzhou.aliyuncs.com. Specify your actual endpoint.
// Specify the ID of the Alibaba Cloud region in which the bucket is located. Example: cn-hangzhou.
client, err := oss.New("https://oss-cn-hangzhou.aliyuncs.com", "", "", oss.SetCredentialsProvider(&provider), oss.AuthVersion(oss.AuthV4), oss.Region("cn-hangzhou"))
if err != nil {
fmt.Println("Error:", err)
os.Exit(-1)
}
// Specify the name of the bucket. Example: examplebucket.
bucketName := "examplebucket"
bucket, err := client.Bucket(bucketName)
if err != nil {
fmt.Println("Error:", err)
os.Exit(-1)
}
// Specify the name of the source document.
sourceKey := "src.docx"
// Specify the name of the output object.
targetKey := "dest.png"
// Create a style variable of the string type to store document conversion parameters.
animationStyle := "doc/convert,target_png,source_docx"
// Create a processing instruction, in which the name of the bucket and the name of the output object are Base64-encoded.
bucketNameEncoded := base64.URLEncoding.EncodeToString([]byte(bucketName))
targetKeyEncoded := base64.URLEncoding.EncodeToString([]byte(targetKey))
process := fmt.Sprintf("%s|sys/saveas,b_%v,o_%v", animationStyle, bucketNameEncoded, targetKeyEncoded)
// Execute the asynchronous processing task.
result, err := bucket.AsyncProcessObject(sourceKey, process)
if err != nil {
log.Fatalf("Failed to async process object: %s", err)
}
fmt.Printf("EventId: %s\n", result.EventId)
fmt.Printf("RequestId: %s\n", result.RequestId)
fmt.Printf("TaskId: %s\n", result.TaskId)
}
Parameter description
Action: doc/convert
The following table describes the parameters for document conversion.
Parameter name | Type | Required | Description |
target | string | Yes | The format of the output object. Valid values:
|
source | string | No | The source file format. By default, the extension of the object name is used. Valid values:
|
pages | string | No | The page numbers to convert. For example: |
You need to use the sys/saveas
parameter to save the converted document in the specified bucket. For more information, see Save As. If you need to obtain the processing result of the conversion task, you need to use the notify
parameter. For more information, see Notifications.
More scenarios
Document format conversion is submitted as an asynchronous request, which means that you cannot directly obtain the document conversion result (such as processing success or failure information) when the processing result is returned. If you need to obtain the processing result, we recommend that you configure event notifications with Simple Message Queue (SMQ, formerly MNS) to receive instant notifications after processing is completed, without the need to repeatedly query the task status.
Configure event notifications
Related APIs
If your business requires a high level of customization, you can directly call RESTful APIs. To directly call a RESTful API, you must include the signature calculation in your code. For information about how to calculate the Authorization header, see Signature Version 4 (Recommended).
Convert document format
Source object
Document format: DOCX
Document name: example.docx
Destination object
Object format: PNG
Storage path: oss://test-bucket/doc_images/{index}.png
b_dGVzdC1idWNrZXQ=: After the conversion is complete, save to a bucket named test-bucket (
dGVzdC1idWNrZXQ=
is the Base64-encoded value oftest-bucket
).o_ZG9jX2ltYWdlcy97aW5kZXh9LnBuZw==: The object uses the {index} variable to save images with example.docx page numbers as file names to the doc_images directory (
ZG9jX2ltYWdlcy97aW5kZXh9LnBuZw==
is the Base64-encoded value ofdoc_images/{index}.png
).
Conversion completion notification: Send to the Simple Message Queue (SMQ, formerly MNS) topic named
test-topic
Processing example
// Convert the example.docx file to PNG format image files.
POST /example.docx?x-oss-async-process HTTP/1.1
Host: doc-demo.oss-cn-hangzhou.aliyuncs.com
Date: Fri, 28 Oct 2022 06:40:10 GMT
Authorization: SignatureValue
x-oss-async-process=doc/convert,target_png,source_docx|sys/saveas,b_dGVzdC1idWNrZXQ=,o_ZG9jX2ltYWdlcy97aW5kZXh9LnBuZw==/notify,topic_dGVzdC10b3BpYw
Notes
Document conversion supports only asynchronous processing (x-oss-async-process).
Anonymous access is not supported.
The maximum file size supported for document format conversion is 200 MB, which cannot be adjusted.
FAQ
Does OSS document conversion support specifying the content of an Excel sheet?
No, it does not. OSS document conversion only supports converting all sheets in an Excel file. If you need to convert a specific sheet, we recommend calling the IMM CreateOfficeConversionTask - Create document conversion task interface and setting the SheetIndex parameter.
References
For more information, see Document format conversion.