Perform chat completion inference
Added in 8.18.0
Path parameters
-
inference_id
string Required The inference Id
Query parameters
-
timeout
string Specifies the amount of time to wait for the inference request to complete.
Body
Required
-
messages
array[object] Required A list of objects representing the conversation.
-
model
string The ID of the model to use.
-
max_completion_tokens
number The upper bound limit for the number of tokens that can be generated for a completion request.
-
stop
array[string] A sequence of strings to control when the model should stop generating additional tokens.
-
temperature
number The sampling temperature to use.
tool_choice
string | object -
tools
array[object] A list of tools that the model can call.
-
top_p
number Nucleus sampling, an alternative to sampling with temperature.
POST
/_inference/chat_completion/{inference_id}/_stream
curl \
--request POST 'http://api.example.com/_inference/chat_completion/{inference_id}/_stream' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '{"messages":[{"":"string","role":"string","tool_call_id":"string","tool_calls":[{"id":"string","function":{"arguments":"string","name":"string"},"type":"string"}]}],"model":"string","max_completion_tokens":42.0,"stop":["string"],"temperature":42.0,"":"string","tools":[{"type":"string","function":{"description":"string","name":"string","parameters":{},"strict":true}}],"top_p":42.0}'