Categories:

String & binary functions (Large Language Model)

AI_CLASSIFY

Note

When AI_CLASSIFY becomes Generally Available, it will replace CLASSIFY_TEXT (SNOWFLAKE.CORTEX). Use AI_CLASSIFY in preview to try out the latest functionality. Snowflake does not recommend using preview functions with production workloads.

Classifies text or images into categories that you specify.

Region availability

The following table shows the regions where you can use the AI_CLASSIFY function for both text and images:

Data type
AWS US West 2
(Oregon)
AWS US East 1
(N. Virginia)
AWS Europe Central 1
(Frankfurt)
AWS Europe West 1
(Ireland)
AWS AP Southeast 2
(Sydney)
AWS AP Northeast 1
(Tokyo)
Azure East US 2
(Virginia)
Azure West Europe
(Netherlands)
AWS
(Cross-Region)
TEXT

IMAGE

Syntax

AI_CLASSIFY( <input> , <list_of_categories> [, <config_object>] )
Copy

Arguments

Required:

input

The string, image, or prompt object that you’re classifying.

For text classification, the input string is case sensitive. Results may vary based on capitalization.

list_of_categories

An array of categories with at least one and at most 500 unique values. Categories are case sensitive.

Categories can be simple strings or SQL objects of the same type. If you’re using objects, you can provide a description for one or more categories to improve classification accuracy.

For each category, specify the following:

  • label (Required): The name of the category.

  • description (Optional): Describes the category in no more than 25 words.

Note

Descriptions count as input tokens, which increases the cost of the classification operation. For more information, see Cost considerations.

Optional:

config_object

Configuration settings specified as key/value pairs. Supported keys:

  • task_description: A explanation of the classification task that is 50 words or fewer. This can help the model understand the context of the classification task and improve accuracy.

  • output_mode: Set to 'multi' for multi-label classification. Defaults to 'single' for single-label classification.

  • examples: A list of example objects for few-shot learning. Each example must include:

    • input: Example text to classify.

    • labels: List of correct categories for the input.

    • explanation: Explanation of why the input maps to those categories.

Returns

A serialized object. The object’s labels field is an array that specifies the list of categories to which the input belongs.

For single label classification, the labels array has exactly one element. For multi-label classification, the labels field can have multiple elements.

If you specify invalid values for the arguments, the function returns an error. For a list of possible errors, see Error conditions.

Access control requirements

Users must use a role that has the SNOWFLAKE.CORTEX_USER database role. For more information about this privilege, see Required privileges.

Usage notes

For best results, follow these guidelines:

  • Use plain text in English for the input and list_of_categories.

  • Avoid including code snippets, logs, or non-English text.

  • Avoid using code or formatting that is not open source (such as proprietary languages or formats) in the text. The underlying language model is not trained on proprietary formats.

  • Don’t use abbreviations, special characters, or jargon in the category labels.

  • Use descriptive categories. Avoid using category names such as “Xa4s3” or “category 1”.

  • Use mutually exclusive categories.

  • Providing a clear task description can improve accuracy when the relationship between the input and categories is unclear or complex.

  • Adding label descriptions can improve accuracy, especially when labels are ambiguous or require specific selection criteria. Write descriptions that clearly highlight what distinguishes each label from the others.

  • Each label, description, and example increases the number of input tokens for every AI_CLASSIFY call, which affects cost.

  • Examples can help to improve accuracy.

Examples

The following examples use the AI_CLASSIFY function with only the required arguments.

AI_CLASSIFY: Text

The following example classifies the prompt into one of two categories, travel or cooking:

SELECT AI_CLASSIFY('One day I will see the world', ['travel', 'cooking']),

'{
  "labels": ["travel"]
 }';
Copy

The following example uses multi-label classification:

SELECT AI_CLASSIFY('One day I will see the world and learn to cook my favorite dishes', ['travel', 'cooking', 'reading', 'driving'], {'output_mode': 'multi'}),

'{
  "labels": ["travel", "cooking"]
 }';
Copy

The following example passes in a task description, label descriptions, and few-shot examples:

SELECT AI_CLASSIFY(
  'One day I will see the world and learn to cook my favorite dishes',
  [
    {'label': 'travel', 'description': 'content related to traveling'},
    {'label': 'cooking'},
    {'label': 'reading'},
    {'label': 'driving'}
  ],
  {
    'task_description': 'Determine topics related to the given text',
    'output_mode': 'multi',
    'examples': [
      {
        'input': 'i love traveling with a good book',
        'labels': ['travel', 'reading'],
        'explanation': 'the text mentions traveling and a good book which relates to reading'
      }
    ]
  }),
  '{
  "labels": ["travel", "cooking"]
}';
Copy
'{
  "labels": ["travel", "cooking"]
 }'

The following example creates a text_classification_table that contains a column for text and a column for possible categories for that text. The AI_CLASSIFY function is called on each row of the table to classify the string in the text column.

CREATE OR REPLACE TEMPORARY TABLE text_classification_table AS
SELECT 'France' AS input, ['North America', 'Europe', 'Asia'] AS classes
UNION ALL
SELECT 'Singapore', ['North America', 'Europe', 'Asia']
UNION ALL
SELECT 'one day I will see the world', ['travel', 'cooking', 'dancing']
UNION ALL
SELECT 'my lobster bisque is second to none', ['travel', 'cooking', 'dancing'];

SELECT input,
    classes,
    PARSE_JSON(AI_CLASSIFY(input, classes)):labels::text AS classification
FROM text_classification_table;
Copy

AI_CLASSIFY: Images

Using single file input:

WITH food_pictures AS (
  SELECT
      TO_FILE(file_url) AS img
  FROM DIRECTORY(@file_stage)
)
SELECT
*,
PARSE_JSON(AI_CLASSIFY(img, ['dessert', 'drink', 'main dish', 'side dish'])):labels::array AS classification
FROM food_pictures;
Copy

Using a prompt object constructed by PROMPT():

  WITH food_pictures AS (
  SELECT
      TO_FILE(file_url) AS img
  FROM DIRECTORY(@file_stage)
)
SELECT
*,
PARSE_JSON(AI_CLASSIFY(PROMPT('Please help me classify the food within this image {0}', img),
  ['dessert', 'drink', 'main dish', 'side dish'])):labels::array AS classification
FROM food_pictures;
Copy

Limitations

  • Snowflake AI functions don’t support dynamic table incremental refresh.

  • Snowflake AI functions don’t work on FILE objects created from files in the following kinds of stages:

    • Internal stages with encryption mode TYPE = 'SNOWFLAKE_FULL'

    • External stages with any customer-side encrypted mode:

      • TYPE = 'AWS_CSE'

      • TYPE = 'AZURE_CSE'

    • User stage

    • Table stage

    • Stage with double-quoted names