Explain data frame analytics config
Generally available; Added in 7.3.0
All methods and paths for this operation:
This API provides explanations for a data frame analytics config that either exists already or one that has not been created yet. The following explanations are provided:
- which fields are included or not in the analysis and why,
- how much memory is estimated to be required. The estimate can be used when deciding the appropriate value for model_memory_limit setting later on. If you have object fields or fields that are excluded via source filtering, they are not included in the explanation.
Required authorization
- Cluster privileges:
monitor_ml
Path parameters
-
Identifier for the data frame analytics job. This identifier can contain lowercase alphanumeric characters (a-z and 0-9), hyphens, and underscores. It must start and end with alphanumeric characters.
Body
-
A description of the job.
-
The approximate maximum amount of memory resources that are permitted for analytical processing. If your
elasticsearch.yml
file contains anxpack.ml.max_model_memory_limit
setting, an error occurs when you try to create data frame analytics jobs that havemodel_memory_limit
values greater than that setting.Default value is
1gb
. -
The maximum number of threads to be used by the analysis. Using more threads may decrease the time necessary to complete the analysis at the cost of using more CPU. Note that the process may use additional threads for operational functionality other than the analysis itself.
Default value is
1
. -
Specifies whether this job can start when there is insufficient machine learning node capacity for it to be immediately assigned to a node.
Default value is
false
.
POST _ml/data_frame/analytics/_explain
{
"source": {
"index": "houses_sold_last_10_yrs"
},
"analysis": {
"regression": {
"dependent_variable": "price"
}
}
}
resp = client.ml.explain_data_frame_analytics(
source={
"index": "houses_sold_last_10_yrs"
},
analysis={
"regression": {
"dependent_variable": "price"
}
},
)
const response = await client.ml.explainDataFrameAnalytics({
source: {
index: "houses_sold_last_10_yrs",
},
analysis: {
regression: {
dependent_variable: "price",
},
},
});
response = client.ml.explain_data_frame_analytics(
body: {
"source": {
"index": "houses_sold_last_10_yrs"
},
"analysis": {
"regression": {
"dependent_variable": "price"
}
}
}
)
$resp = $client->ml()->explainDataFrameAnalytics([
"body" => [
"source" => [
"index" => "houses_sold_last_10_yrs",
],
"analysis" => [
"regression" => [
"dependent_variable" => "price",
],
],
],
]);
curl -X POST -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"source":{"index":"houses_sold_last_10_yrs"},"analysis":{"regression":{"dependent_variable":"price"}}}' "$ELASTICSEARCH_URL/_ml/data_frame/analytics/_explain"
client.ml().explainDataFrameAnalytics(e -> e
.analysis(a -> a
.regression(r -> r
.dependentVariable("price")
)
)
.source(s -> s
.index("houses_sold_last_10_yrs")
)
);
{
"source": {
"index": "houses_sold_last_10_yrs"
},
"analysis": {
"regression": {
"dependent_variable": "price"
}
}
}
{
"field_selection": [
{
"field": "number_of_bedrooms",
"mappings_types": [
"integer"
],
"is_included": true,
"is_required": false,
"feature_type": "numerical"
},
{
"field": "postcode",
"mappings_types": [
"text"
],
"is_included": false,
"is_required": false,
"reason": "[postcode.keyword] is preferred because it is aggregatable"
},
{
"field": "postcode.keyword",
"mappings_types": [
"keyword"
],
"is_included": true,
"is_required": false,
"feature_type": "categorical"
},
{
"field": "price",
"mappings_types": [
"float"
],
"is_included": true,
"is_required": true,
"feature_type": "numerical"
}
],
"memory_estimation": {
"expected_memory_without_disk": "128MB",
"expected_memory_with_disk": "32MB"
}
}