Prometheus
Coder exposes many metrics which can be consumed by a Prometheus server, and give insight into the current state of a live Coder deployment.
If you don't have a Prometheus server installed, you can follow the Prometheus Getting started guide.
Enable Prometheus metrics
Coder server exports metrics via the HTTP endpoint, which can be enabled using
either the environment variable CODER_PROMETHEUS_ENABLE or the flag
--prometheus-enable.
The Prometheus endpoint address is http://localhost:2112/ by default. You can
use either the environment variable CODER_PROMETHEUS_ADDRESS or the flag
--prometheus-address <network-interface>:<port> to select a different listen
address.
If coder server --prometheus-enable is started locally, you can preview the
metrics endpoint in your browser or with curl:
$ curl http://localhost:2112/
# HELP coderd_api_active_users_duration_hour The number of users that have been active within the last hour.
# TYPE coderd_api_active_users_duration_hour gauge
coderd_api_active_users_duration_hour 0
...
Kubernetes deployment
The Prometheus endpoint can be enabled in the Helm chart's
values.yml by setting CODER_PROMETHEUS_ENABLE=true. Once enabled, the environment variable CODER_PROMETHEUS_ADDRESS will be set by default to
0.0.0.0:2112. A Service Endpoint will not be exposed; if you need to
expose the Prometheus port on a Service, (for example, to use a
ServiceMonitor), create a separate headless service instead.
apiVersion: v1
kind: Service
metadata:
name: coder-prom
namespace: coder
spec:
clusterIP: None
ports:
- name: prom-http
port: 2112
protocol: TCP
targetPort: 2112
selector:
app.kubernetes.io/instance: coder
app.kubernetes.io/name: coder
type: ClusterIP
Prometheus configuration
To allow Prometheus to scrape the Coder metrics, you will need to create a
scrape_config in your prometheus.yml file, or in the Prometheus Helm chart
values. The following is an example scrape_config.
scrape_configs:
- job_name: "coder"
scheme: "http"
static_configs:
# replace with the the IP address of the Coder pod or server
- targets: ["<ip>:2112"]
labels:
apps: "coder"
To use the Kubernetes Prometheus operator to scrape metrics, you will need to
create a ServiceMonitor in your Coder deployment namespace. The following is
an example ServiceMonitor.
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: coder-service-monitor
namespace: coder
spec:
endpoints:
- port: prom-http
interval: 10s
scrapeTimeout: 10s
namespaceSelector:
matchNames:
- coder
selector:
matchLabels:
app.kubernetes.io/name: coder
Available metrics
You must first enable coderd_agentstats_* with the flag
--prometheus-collect-agent-stats, or the environment variable
CODER_PROMETHEUS_COLLECT_AGENT_STATS before they can be retrieved from the
deployment. They will always be available from the agent.
| Name | Type | Description | Labels |
|---|---|---|---|
agent_scripts_executed_total | counter | Total number of scripts executed by the Coder agent. Includes cron scheduled scripts. | agent_name success template_name username workspace_name |
coder_aibridged_circuit_breaker_rejects_total | counter | Total number of requests rejected due to open circuit breaker. | endpoint model provider |
coder_aibridged_circuit_breaker_state | gauge | Current state of the circuit breaker (0=closed, 0.5=half-open, 1=open). | endpoint model provider |
coder_aibridged_circuit_breaker_trips_total | counter | Total number of times the circuit breaker transitioned to open state. | endpoint model provider |
coder_aibridged_injected_tool_invocations_total | counter | The number of times an injected MCP tool was invoked by aibridge. | model name provider server |
coder_aibridged_interceptions_duration_seconds | histogram | The total duration of intercepted requests, in seconds. The majority of this time will be the upstream processing of the request. aibridge has no control over upstream processing time, so it's just an illustrative metric. | model provider |
coder_aibridged_interceptions_inflight | gauge | The number of intercepted requests which are being processed. | model provider route |
coder_aibridged_interceptions_total | counter | The count of intercepted requests. | initiator_id method model provider route status |
coder_aibridged_non_injected_tool_selections_total | counter | The number of times an AI model selected a tool to be invoked by the client. | model name provider |
coder_aibridged_passthrough_total | counter | The count of requests which were not intercepted but passed through to the upstream. | method provider route |
coder_aibridged_prompts_total | counter | The number of prompts issued by users (initiators). | initiator_id model provider |
coder_aibridged_tokens_total | counter | The number of tokens used by intercepted requests. | initiator_id model provider type |
coder_aibridgeproxyd_connect_sessions_total | counter | Total number of CONNECT sessions established. | type |
coder_aibridgeproxyd_inflight_mitm_requests | gauge | Number of MITM requests currently being processed. | provider |
coder_aibridgeproxyd_mitm_requests_total | counter | Total number of MITM requests handled by the proxy. | provider |
coder_aibridgeproxyd_mitm_responses_total | counter | Total number of MITM responses by HTTP status code class. | code provider |
coder_pubsub_connected | gauge | Whether we are connected (1) or not connected (0) to postgres | |
coder_pubsub_current_events | gauge | The current number of pubsub event channels listened for | |
coder_pubsub_current_subscribers | gauge | The current number of active pubsub subscribers | |
coder_pubsub_disconnections_total | counter | Total number of times we disconnected unexpectedly from postgres | |
coder_pubsub_latency_measure_errs_total | counter | The number of pubsub latency measurement failures | |
coder_pubsub_latency_measures_total | counter | The number of pubsub latency measurements | |
coder_pubsub_messages_total | counter | Total number of messages received from postgres | size |
coder_pubsub_published_bytes_total | counter | Total number of bytes successfully published across all publishes | |
coder_pubsub_publishes_total | counter | Total number of calls to Publish | success |
coder_pubsub_receive_latency_seconds | gauge | The time taken to receive a message from a pubsub event channel | |
coder_pubsub_received_bytes_total | counter | Total number of bytes received across all messages | |
coder_pubsub_send_latency_seconds | gauge | The time taken to send a message into a pubsub event channel | |
coder_pubsub_subscribes_total | counter | Total number of calls to Subscribe/SubscribeWithErr | success |
coder_servertailnet_connections_total | counter | Total number of TCP connections made to workspace agents. | network |
coder_servertailnet_open_connections | gauge | Total number of TCP connections currently open to workspace agents. | network |
coderd_agentapi_metadata_batch_size | histogram | Total number of metadata entries in each batch, updated before flushes. | |
coderd_agentapi_metadata_batch_utilization | histogram | Number of metadata keys per agent in each batch, updated before flushes. | |
coderd_agentapi_metadata_batches_total | counter | Total number of metadata batches flushed. | reason |
coderd_agentapi_metadata_dropped_keys_total | counter | Total number of metadata keys dropped due to capacity limits. | |
coderd_agentapi_metadata_flush_duration_seconds | histogram | Time taken to flush metadata batch to database and pubsub. | reason |
coderd_agentapi_metadata_flushed_total | counter | Total number of unique metadatas flushed. | |
coderd_agentapi_metadata_publish_errors_total | counter | Total number of metadata batch pubsub publish calls that have resulted in an error. | |
coderd_agents_apps | gauge | Agent applications with statuses. | agent_name app_name health username workspace_name |
coderd_agents_connection_latencies_seconds | gauge | Agent connection latencies in seconds. | agent_name derp_region preferred username workspace_name |
coderd_agents_connections | gauge | Agent connections with statuses. | agent_name lifecycle_state status tailnet_node username workspace_name |
coderd_agents_up | gauge | The number of active agents per workspace. | template_name template_version username workspace_name |
coderd_agentstats_connection_count | gauge | The number of established connections by agent | agent_name username workspace_name |
coderd_agentstats_connection_median_latency_seconds | gauge | The median agent connection latency | agent_name username workspace_name |
coderd_agentstats_currently_reachable_peers | gauge | The number of peers (e.g. clients) that are currently reachable over the encrypted network. | agent_name connection_type template_name username workspace_name |
coderd_agentstats_rx_bytes | gauge | Agent Rx bytes | agent_name username workspace_name |
coderd_agentstats_session_count_jetbrains | gauge | The number of session established by JetBrains | agent_name username workspace_name |
coderd_agentstats_session_count_reconnecting_pty | gauge | The number of session established by reconnecting PTY | agent_name username workspace_name |
coderd_agentstats_session_count_ssh | gauge | The number of session established by SSH | agent_name username workspace_name |
coderd_agentstats_session_count_vscode | gauge | The number of session established by VSCode | agent_name username workspace_name |
coderd_agentstats_startup_script_seconds | gauge | The number of seconds the startup script took to execute. | agent_name success template_name username workspace_name |
coderd_agentstats_tx_bytes | gauge | Agent Tx bytes | agent_name username workspace_name |
coderd_api_active_users_duration_hour | gauge | The number of users that have been active within the last hour. | |
coderd_api_concurrent_requests | gauge | The number of concurrent API requests. | method path |
coderd_api_concurrent_websockets | gauge | The total number of concurrent API websockets. | path |
coderd_api_request_latencies_seconds | histogram | Latency distribution of requests in seconds. | method path |
coderd_api_requests_processed_total | counter | The total number of processed API requests | code method path |
coderd_api_total_user_count | gauge | The total number of registered users, partitioned by status. | status |
coderd_api_websocket_durations_seconds | histogram | Websocket duration distribution of requests in seconds. | path |
coderd_api_workspace_latest_build | gauge | The current number of workspace builds by status for all non-deleted workspaces. | status |
coderd_authz_authorize_duration_seconds | histogram | Duration of the 'Authorize' call in seconds. Only counts calls that succeed. | allowed |
coderd_authz_prepare_authorize_duration_seconds | histogram | Duration of the 'PrepareAuthorize' call in seconds. | |
coderd_db_query_counts_total | counter | Total number of queries labelled by HTTP route, method, and query name. | method query route |
coderd_db_query_latencies_seconds | histogram | Latency distribution of queries in seconds. | query |
coderd_db_tx_duration_seconds | histogram | Duration of transactions in seconds. | success tx_id |
coderd_db_tx_executions_count | counter | Total count of transactions executed. 'retries' is expected to be 0 for a successful transaction. | retries success tx_id |
coderd_dbpurge_iteration_duration_seconds | histogram | Duration of each dbpurge iteration in seconds. | success |
coderd_dbpurge_records_purged_total | counter | Total number of records purged by type. | record_type |
coderd_experiments | gauge | Indicates whether each experiment is enabled (1) or not (0) | experiment |
coderd_insights_applications_usage_seconds | gauge | The application usage per template. | application_name slug template_name |
coderd_insights_parameters | gauge | The parameter usage per template. | parameter_name parameter_type parameter_value template_name |
coderd_insights_templates_active_users | gauge | The number of active users of the template. | template_name |
coderd_license_active_users | gauge | The number of active users. | |
coderd_license_errors | gauge | The number of active license errors. | |
coderd_license_limit_users | gauge | The user seats limit based on the active Coder license. | |
coderd_license_user_limit_enabled | gauge | Returns 1 if the current license enforces the user limit. | |
coderd_license_warnings | gauge | The number of active license warnings. | |
coderd_lifecycle_autobuild_execution_duration_seconds | histogram | Duration of each autobuild execution. | |
coderd_notifications_dispatcher_send_seconds | histogram | The time taken to dispatch notifications. | method |
coderd_notifications_inflight_dispatches | gauge | The number of dispatch attempts which are currently in progress. | method notification_template_id |
coderd_notifications_pending_updates | gauge | The number of dispatch attempt results waiting to be flushed to the store. | |
coderd_notifications_queued_seconds | histogram | The time elapsed between a notification being enqueued in the store and retrieved for dispatching (measures the latency of the notifications system). This should generally be within CODER_NOTIFICATIONS_FETCH_INTERVAL seconds; higher values for a sustained period indicates delayed processing and CODER_NOTIFICATIONS_LEASE_COUNT can be increased to accommodate this. | method |
coderd_notifications_retry_count | counter | The count of notification dispatch retry attempts. | method notification_template_id |
coderd_notifications_synced_updates_total | counter | The number of dispatch attempt results flushed to the store. | |
coderd_oauth2_external_requests_rate_limit | gauge | The total number of allowed requests per interval. | name resource |
coderd_oauth2_external_requests_rate_limit_next_reset_unix | gauge | Unix timestamp for when the next interval starts | name resource |
coderd_oauth2_external_requests_rate_limit_remaining | gauge | The remaining number of allowed requests in this interval. | name resource |
coderd_oauth2_external_requests_rate_limit_reset_in_seconds | gauge | Seconds until the next interval | name resource |
coderd_oauth2_external_requests_rate_limit_used | gauge | The number of requests made in this interval. | name resource |
coderd_oauth2_external_requests_total | counter | The total number of api calls made to external oauth2 providers. 'status_code' will be 0 if the request failed with no response. | name source status_code |
coderd_open_file_refs_current | gauge | The count of file references currently open in the file cache. Multiple references can be held for the same file. | |
coderd_open_file_refs_total | counter | The total number of file references ever opened in the file cache. The 'hit' label indicates if the file was loaded from the cache. | hit |
coderd_open_files_current | gauge | The count of unique files currently open in the file cache. | |
coderd_open_files_size_bytes_current | gauge | The current amount of memory of all files currently open in the file cache. | |
coderd_open_files_size_bytes_total | counter | The total amount of memory ever opened in the file cache. This number never decrements. | |
coderd_open_files_total | counter | The total count of unique files ever opened in the file cache. | |
coderd_prebuilds_reconciliation_duration_seconds | histogram | Duration of each prebuilds reconciliation cycle. | |
coderd_prebuilt_workspace_claim_duration_seconds | histogram | Time to claim a prebuilt workspace by organization, template, and preset. | organization_name preset_name template_name |
coderd_prebuilt_workspaces_claimed_total | counter | Total number of prebuilt workspaces which were claimed by users. Claiming refers to creating a workspace with a preset selected for which eligible prebuilt workspaces are available and one is reassigned to a user. | organization_name preset_name template_name |
coderd_prebuilt_workspaces_created_total | counter | Total number of prebuilt workspaces that have been created to meet the desired instance count of each template preset. | organization_name preset_name template_name |
coderd_prebuilt_workspaces_desired | gauge | Target number of prebuilt workspaces that should be available for each template preset. | organization_name preset_name template_name |
coderd_prebuilt_workspaces_eligible | gauge | Current number of prebuilt workspaces that are eligible to be claimed by users. These are workspaces that have completed their build process with their agent reporting 'ready' status. | organization_name preset_name template_name |
coderd_prebuilt_workspaces_failed_total | counter | Total number of prebuilt workspaces that failed to build. | organization_name preset_name template_name |
coderd_prebuilt_workspaces_metrics_last_updated | gauge | The unix timestamp when the metrics related to prebuilt workspaces were last updated; these metrics are cached. | |
coderd_prebuilt_workspaces_preset_hard_limited | gauge | Indicates whether a given preset has reached the hard failure limit (1 = hard-limited). Metric is omitted otherwise. | organization_name preset_name template_name |
coderd_prebuilt_workspaces_reconciliation_paused | gauge | Indicates whether prebuilds reconciliation is currently paused (1 = paused, 0 = not paused). | |
coderd_prebuilt_workspaces_resource_replacements_total | counter | Total number of prebuilt workspaces whose resource(s) got replaced upon being claimed. In Terraform, drift on immutable attributes results in resource replacement. This represents a worst-case scenario for prebuilt workspaces because the pre-provisioned resource would have been recreated when claiming, thus obviating the point of pre-provisioning. See https://coder.com/docs/admin/templates/extending-templates/prebuilt-workspaces#preventing-resource-replacement | organization_name preset_name template_name |
coderd_prebuilt_workspaces_running | gauge | Current number of prebuilt workspaces that are in a running state. These workspaces have started successfully but may not yet be claimable by users (see coderd_prebuilt_workspaces_eligible). | organization_name preset_name template_name |
coderd_prometheusmetrics_agents_execution_seconds | histogram | Histogram for duration of agents metrics collection in seconds. | |
coderd_prometheusmetrics_agentstats_execution_seconds | histogram | Histogram for duration of agent stats metrics collection in seconds. | |
coderd_prometheusmetrics_metrics_aggregator_execution_cleanup_seconds | histogram | Histogram for duration of metrics aggregator cleanup in seconds. | |
coderd_prometheusmetrics_metrics_aggregator_execution_update_seconds | histogram | Histogram for duration of metrics aggregator update in seconds. | |
coderd_prometheusmetrics_metrics_aggregator_store_size | gauge | The number of metrics stored in the aggregator | |
coderd_provisioner_job_queue_wait_seconds | histogram | Time from job creation to acquisition by a provisioner daemon. | build_reason job_type provisioner_type transition |
coderd_provisionerd_job_timings_seconds | histogram | The provisioner job time duration in seconds. | provisioner status |
coderd_provisionerd_jobs_current | gauge | The number of currently running provisioner jobs. | provisioner |
coderd_provisionerd_num_daemons | gauge | The number of provisioner daemons. | |
coderd_provisionerd_workspace_build_timings_seconds | histogram | The time taken for a workspace to build. | status template_name template_version workspace_transition |
coderd_proxyhealth_health_check_duration_seconds | histogram | Histogram for duration of proxy health collection in seconds. | |
coderd_proxyhealth_health_check_results | gauge | This endpoint returns a number to indicate the health status. -3 (unknown), -2 (Unreachable), -1 (Unhealthy), 0 (Unregistered), 1 (Healthy) | proxy_id |
coderd_template_workspace_build_duration_seconds | histogram | Duration from workspace build creation to agent ready, by template. | is_prebuild organization_name status template_name transition |
coderd_workspace_builds_enqueued_total | counter | Total number of workspace build enqueue attempts. | build_reason provisioner_type status transition |
coderd_workspace_builds_total | counter | The number of workspaces started, updated, or deleted. | status template_name template_version workspace_name workspace_owner workspace_transition |
coderd_workspace_creation_duration_seconds | histogram | Time to create a workspace by organization, template, preset, and type (regular or prebuild). | organization_name preset_name template_name type |
coderd_workspace_creation_total | counter | Total regular (non-prebuilt) workspace creations by organization, template, and preset. | organization_name preset_name template_name |
coderd_workspace_latest_build_status | gauge | The current workspace statuses by template, transition, and owner for all non-deleted workspaces. | status template_name template_version workspace_owner workspace_transition |
go_gc_duration_seconds | summary | A summary of the pause duration of garbage collection cycles. | |
go_goroutines | gauge | Number of goroutines that currently exist. | |
go_info | gauge | Information about the Go environment. | version |
go_memstats_alloc_bytes | gauge | Number of bytes allocated and still in use. | |
go_memstats_alloc_bytes_total | counter | Total number of bytes allocated, even if freed. | |
go_memstats_buck_hash_sys_bytes | gauge | Number of bytes used by the profiling bucket hash table. | |
go_memstats_frees_total | counter | Total number of frees. | |
go_memstats_gc_sys_bytes | gauge | Number of bytes used for garbage collection system metadata. | |
go_memstats_heap_alloc_bytes | gauge | Number of heap bytes allocated and still in use. | |
go_memstats_heap_idle_bytes | gauge | Number of heap bytes waiting to be used. | |
go_memstats_heap_inuse_bytes | gauge | Number of heap bytes that are in use. | |
go_memstats_heap_objects | gauge | Number of allocated objects. | |
go_memstats_heap_released_bytes | gauge | Number of heap bytes released to OS. | |
go_memstats_heap_sys_bytes | gauge | Number of heap bytes obtained from system. | |
go_memstats_last_gc_time_seconds | gauge | Number of seconds since 1970 of last garbage collection. | |
go_memstats_lookups_total | counter | Total number of pointer lookups. | |
go_memstats_mallocs_total | counter | Total number of mallocs. | |
go_memstats_mcache_inuse_bytes | gauge | Number of bytes in use by mcache structures. | |
go_memstats_mcache_sys_bytes | gauge | Number of bytes used for mcache structures obtained from system. | |
go_memstats_mspan_inuse_bytes | gauge | Number of bytes in use by mspan structures. | |
go_memstats_mspan_sys_bytes | gauge | Number of bytes used for mspan structures obtained from system. | |
go_memstats_next_gc_bytes | gauge | Number of heap bytes when next garbage collection will take place. | |
go_memstats_other_sys_bytes | gauge | Number of bytes used for other system allocations. | |
go_memstats_stack_inuse_bytes | gauge | Number of bytes in use by the stack allocator. | |
go_memstats_stack_sys_bytes | gauge | Number of bytes obtained from system for stack allocator. | |
go_memstats_sys_bytes | gauge | Number of bytes obtained from system. | |
go_threads | gauge | Number of OS threads created. | |
process_cpu_seconds_total | counter | Total user and system CPU time spent in seconds. | |
process_max_fds | gauge | Maximum number of open file descriptors. | |
process_open_fds | gauge | Number of open file descriptors. | |
process_resident_memory_bytes | gauge | Resident memory size in bytes. | |
process_start_time_seconds | gauge | Start time of the process since unix epoch in seconds. | |
process_virtual_memory_bytes | gauge | Virtual memory size in bytes. | |
process_virtual_memory_max_bytes | gauge | Maximum amount of virtual memory available in bytes. | |
promhttp_metric_handler_requests_in_flight | gauge | Current number of scrapes being served. | |
promhttp_metric_handler_requests_total | counter | Total number of scrapes by HTTP status code. | code |
Note on Prometheus native histogram support
The following metrics support native histograms:
coderd_workspace_creation_duration_secondscoderd_prebuilt_workspace_claim_duration_secondscoderd_template_coderd_template_workspace_build_duration_seconds
Native histograms are an experimental Prometheus feature that removes the need to predefine bucket boundaries and allows higher-resolution buckets that adapt to deployment characteristics. Whether a metric is exposed as classic or native depends entirely on the Prometheus server configuration (see Prometheus docs for details):
- If native histograms are enabled, Prometheus ingests the high-resolution histogram.
- If not, it falls back to the predefined buckets.
⚠️ Important: classic and native histograms cannot be aggregated together. If Prometheus is switched from classic to native at a certain point in time, dashboards may need to account for that transition. For this reason, it’s recommended to follow Prometheus’ migration guidelines when moving from classic to native histograms.


