HomeAdministrationIntegrationsPrometheus

Prometheus

Coder exposes many metrics which can be consumed by a Prometheus server, and give insight into the current state of a live Coder deployment.

If you don't have a Prometheus server installed, you can follow the Prometheus Getting started guide.

Enable Prometheus metrics

Coder server exports metrics via the HTTP endpoint, which can be enabled using either the environment variable CODER_PROMETHEUS_ENABLE or the flag --prometheus-enable.

The Prometheus endpoint address is http://localhost:2112/ by default. You can use either the environment variable CODER_PROMETHEUS_ADDRESS or the flag --prometheus-address <network-interface>:<port> to select a different listen address.

If coder server --prometheus-enable is started locally, you can preview the metrics endpoint in your browser or with curl:

$ curl http://localhost:2112/
# HELP coderd_api_active_users_duration_hour The number of users that have been active within the last hour.
# TYPE coderd_api_active_users_duration_hour gauge
coderd_api_active_users_duration_hour 0
...

Kubernetes deployment

The Prometheus endpoint can be enabled in the Helm chart's values.yml by setting CODER_PROMETHEUS_ENABLE=true. Once enabled, the environment variable CODER_PROMETHEUS_ADDRESS will be set by default to 0.0.0.0:2112. A Service Endpoint will not be exposed; if you need to expose the Prometheus port on a Service, (for example, to use a ServiceMonitor), create a separate headless service instead.

apiVersion: v1
kind: Service
metadata:
  name: coder-prom
  namespace: coder
spec:
  clusterIP: None
  ports:
    - name: prom-http
      port: 2112
      protocol: TCP
      targetPort: 2112
  selector:
    app.kubernetes.io/instance: coder
    app.kubernetes.io/name: coder
  type: ClusterIP

Prometheus configuration

To allow Prometheus to scrape the Coder metrics, you will need to create a scrape_config in your prometheus.yml file, or in the Prometheus Helm chart values. The following is an example scrape_config.

scrape_configs:
  - job_name: "coder"
    scheme: "http"
    static_configs:
      # replace with the the IP address of the Coder pod or server
      - targets: ["<ip>:2112"]
        labels:
          apps: "coder"

To use the Kubernetes Prometheus operator to scrape metrics, you will need to create a ServiceMonitor in your Coder deployment namespace. The following is an example ServiceMonitor.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: coder-service-monitor
  namespace: coder
spec:
  endpoints:
    - port: prom-http
      interval: 10s
      scrapeTimeout: 10s
  namespaceSelector:
    matchNames:
    - coder
  selector:
    matchLabels:
      app.kubernetes.io/name: coder

Available metrics

You must first enable coderd_agentstats_* with the flag --prometheus-collect-agent-stats, or the environment variable CODER_PROMETHEUS_COLLECT_AGENT_STATS before they can be retrieved from the deployment. They will always be available from the agent.

Name	Type	Description	Labels
`agent_scripts_executed_total`	counter	Total number of scripts executed by the Coder agent. Includes cron scheduled scripts.	`agent_name` `success` `template_name` `username` `workspace_name`
`coder_aibridged_circuit_breaker_rejects_total`	counter	Total number of requests rejected due to open circuit breaker.	`endpoint` `model` `provider`
`coder_aibridged_circuit_breaker_state`	gauge	Current state of the circuit breaker (0=closed, 0.5=half-open, 1=open).	`endpoint` `model` `provider`
`coder_aibridged_circuit_breaker_trips_total`	counter	Total number of times the circuit breaker transitioned to open state.	`endpoint` `model` `provider`
`coder_aibridged_injected_tool_invocations_total`	counter	The number of times an injected MCP tool was invoked by aibridge.	`model` `name` `provider` `server`
`coder_aibridged_interceptions_duration_seconds`	histogram	The total duration of intercepted requests, in seconds. The majority of this time will be the upstream processing of the request. aibridge has no control over upstream processing time, so it's just an illustrative metric.	`model` `provider`
`coder_aibridged_interceptions_inflight`	gauge	The number of intercepted requests which are being processed.	`model` `provider` `route`
`coder_aibridged_interceptions_total`	counter	The count of intercepted requests.	`initiator_id` `method` `model` `provider` `route` `status`
`coder_aibridged_non_injected_tool_selections_total`	counter	The number of times an AI model selected a tool to be invoked by the client.	`model` `name` `provider`
`coder_aibridged_passthrough_total`	counter	The count of requests which were not intercepted but passed through to the upstream.	`method` `provider` `route`
`coder_aibridged_prompts_total`	counter	The number of prompts issued by users (initiators).	`initiator_id` `model` `provider`
`coder_aibridged_tokens_total`	counter	The number of tokens used by intercepted requests.	`initiator_id` `model` `provider` `type`
`coder_aibridgeproxyd_connect_sessions_total`	counter	Total number of CONNECT sessions established.	`type`
`coder_aibridgeproxyd_inflight_mitm_requests`	gauge	Number of MITM requests currently being processed.	`provider`
`coder_aibridgeproxyd_mitm_requests_total`	counter	Total number of MITM requests handled by the proxy.	`provider`
`coder_aibridgeproxyd_mitm_responses_total`	counter	Total number of MITM responses by HTTP status code class.	`code` `provider`
`coder_pubsub_connected`	gauge	Whether we are connected (1) or not connected (0) to postgres
`coder_pubsub_current_events`	gauge	The current number of pubsub event channels listened for
`coder_pubsub_current_subscribers`	gauge	The current number of active pubsub subscribers
`coder_pubsub_disconnections_total`	counter	Total number of times we disconnected unexpectedly from postgres
`coder_pubsub_latency_measure_errs_total`	counter	The number of pubsub latency measurement failures
`coder_pubsub_latency_measures_total`	counter	The number of pubsub latency measurements
`coder_pubsub_messages_total`	counter	Total number of messages received from postgres	`size`
`coder_pubsub_published_bytes_total`	counter	Total number of bytes successfully published across all publishes
`coder_pubsub_publishes_total`	counter	Total number of calls to Publish	`success`
`coder_pubsub_receive_latency_seconds`	gauge	The time taken to receive a message from a pubsub event channel
`coder_pubsub_received_bytes_total`	counter	Total number of bytes received across all messages
`coder_pubsub_send_latency_seconds`	gauge	The time taken to send a message into a pubsub event channel
`coder_pubsub_subscribes_total`	counter	Total number of calls to Subscribe/SubscribeWithErr	`success`
`coder_servertailnet_connections_total`	counter	Total number of TCP connections made to workspace agents.	`network`
`coder_servertailnet_open_connections`	gauge	Total number of TCP connections currently open to workspace agents.	`network`
`coderd_agentapi_metadata_batch_size`	histogram	Total number of metadata entries in each batch, updated before flushes.
`coderd_agentapi_metadata_batch_utilization`	histogram	Number of metadata keys per agent in each batch, updated before flushes.
`coderd_agentapi_metadata_batches_total`	counter	Total number of metadata batches flushed.	`reason`
`coderd_agentapi_metadata_dropped_keys_total`	counter	Total number of metadata keys dropped due to capacity limits.
`coderd_agentapi_metadata_flush_duration_seconds`	histogram	Time taken to flush metadata batch to database and pubsub.	`reason`
`coderd_agentapi_metadata_flushed_total`	counter	Total number of unique metadatas flushed.
`coderd_agentapi_metadata_publish_errors_total`	counter	Total number of metadata batch pubsub publish calls that have resulted in an error.
`coderd_agents_apps`	gauge	Agent applications with statuses.	`agent_name` `app_name` `health` `username` `workspace_name`
`coderd_agents_connection_latencies_seconds`	gauge	Agent connection latencies in seconds.	`agent_name` `derp_region` `preferred` `username` `workspace_name`
`coderd_agents_connections`	gauge	Agent connections with statuses.	`agent_name` `lifecycle_state` `status` `tailnet_node` `username` `workspace_name`
`coderd_agents_up`	gauge	The number of active agents per workspace.	`template_name` `template_version` `username` `workspace_name`
`coderd_agentstats_connection_count`	gauge	The number of established connections by agent	`agent_name` `username` `workspace_name`
`coderd_agentstats_connection_median_latency_seconds`	gauge	The median agent connection latency	`agent_name` `username` `workspace_name`
`coderd_agentstats_currently_reachable_peers`	gauge	The number of peers (e.g. clients) that are currently reachable over the encrypted network.	`agent_name` `connection_type` `template_name` `username` `workspace_name`
`coderd_agentstats_rx_bytes`	gauge	Agent Rx bytes	`agent_name` `username` `workspace_name`
`coderd_agentstats_session_count_jetbrains`	gauge	The number of session established by JetBrains	`agent_name` `username` `workspace_name`
`coderd_agentstats_session_count_reconnecting_pty`	gauge	The number of session established by reconnecting PTY	`agent_name` `username` `workspace_name`
`coderd_agentstats_session_count_ssh`	gauge	The number of session established by SSH	`agent_name` `username` `workspace_name`
`coderd_agentstats_session_count_vscode`	gauge	The number of session established by VSCode	`agent_name` `username` `workspace_name`
`coderd_agentstats_startup_script_seconds`	gauge	The number of seconds the startup script took to execute.	`agent_name` `success` `template_name` `username` `workspace_name`
`coderd_agentstats_tx_bytes`	gauge	Agent Tx bytes	`agent_name` `username` `workspace_name`
`coderd_api_active_users_duration_hour`	gauge	The number of users that have been active within the last hour.
`coderd_api_concurrent_requests`	gauge	The number of concurrent API requests.	`method` `path`
`coderd_api_concurrent_websockets`	gauge	The total number of concurrent API websockets.	`path`
`coderd_api_request_latencies_seconds`	histogram	Latency distribution of requests in seconds.	`method` `path`
`coderd_api_requests_processed_total`	counter	The total number of processed API requests	`code` `method` `path`
`coderd_api_total_user_count`	gauge	The total number of registered users, partitioned by status.	`status`
`coderd_api_websocket_durations_seconds`	histogram	Websocket duration distribution of requests in seconds.	`path`
`coderd_api_workspace_latest_build`	gauge	The current number of workspace builds by status for all non-deleted workspaces.	`status`
`coderd_authz_authorize_duration_seconds`	histogram	Duration of the 'Authorize' call in seconds. Only counts calls that succeed.	`allowed`
`coderd_authz_prepare_authorize_duration_seconds`	histogram	Duration of the 'PrepareAuthorize' call in seconds.
`coderd_db_query_counts_total`	counter	Total number of queries labelled by HTTP route, method, and query name.	`method` `query` `route`
`coderd_db_query_latencies_seconds`	histogram	Latency distribution of queries in seconds.	`query`
`coderd_db_tx_duration_seconds`	histogram	Duration of transactions in seconds.	`success` `tx_id`
`coderd_db_tx_executions_count`	counter	Total count of transactions executed. 'retries' is expected to be 0 for a successful transaction.	`retries` `success` `tx_id`
`coderd_dbpurge_iteration_duration_seconds`	histogram	Duration of each dbpurge iteration in seconds.	`success`
`coderd_dbpurge_records_purged_total`	counter	Total number of records purged by type.	`record_type`
`coderd_experiments`	gauge	Indicates whether each experiment is enabled (1) or not (0)	`experiment`
`coderd_insights_applications_usage_seconds`	gauge	The application usage per template.	`application_name` `slug` `template_name`
`coderd_insights_parameters`	gauge	The parameter usage per template.	`parameter_name` `parameter_type` `parameter_value` `template_name`
`coderd_insights_templates_active_users`	gauge	The number of active users of the template.	`template_name`
`coderd_license_active_users`	gauge	The number of active users.
`coderd_license_errors`	gauge	The number of active license errors.
`coderd_license_limit_users`	gauge	The user seats limit based on the active Coder license.
`coderd_license_user_limit_enabled`	gauge	Returns 1 if the current license enforces the user limit.
`coderd_license_warnings`	gauge	The number of active license warnings.
`coderd_lifecycle_autobuild_execution_duration_seconds`	histogram	Duration of each autobuild execution.
`coderd_notifications_dispatcher_send_seconds`	histogram	The time taken to dispatch notifications.	`method`
`coderd_notifications_inflight_dispatches`	gauge	The number of dispatch attempts which are currently in progress.	`method` `notification_template_id`
`coderd_notifications_pending_updates`	gauge	The number of dispatch attempt results waiting to be flushed to the store.
`coderd_notifications_queued_seconds`	histogram	The time elapsed between a notification being enqueued in the store and retrieved for dispatching (measures the latency of the notifications system). This should generally be within CODER_NOTIFICATIONS_FETCH_INTERVAL seconds; higher values for a sustained period indicates delayed processing and CODER_NOTIFICATIONS_LEASE_COUNT can be increased to accommodate this.	`method`
`coderd_notifications_retry_count`	counter	The count of notification dispatch retry attempts.	`method` `notification_template_id`
`coderd_notifications_synced_updates_total`	counter	The number of dispatch attempt results flushed to the store.
`coderd_oauth2_external_requests_rate_limit`	gauge	The total number of allowed requests per interval.	`name` `resource`
`coderd_oauth2_external_requests_rate_limit_next_reset_unix`	gauge	Unix timestamp for when the next interval starts	`name` `resource`
`coderd_oauth2_external_requests_rate_limit_remaining`	gauge	The remaining number of allowed requests in this interval.	`name` `resource`
`coderd_oauth2_external_requests_rate_limit_reset_in_seconds`	gauge	Seconds until the next interval	`name` `resource`
`coderd_oauth2_external_requests_rate_limit_used`	gauge	The number of requests made in this interval.	`name` `resource`
`coderd_oauth2_external_requests_total`	counter	The total number of api calls made to external oauth2 providers. 'status_code' will be 0 if the request failed with no response.	`name` `source` `status_code`
`coderd_open_file_refs_current`	gauge	The count of file references currently open in the file cache. Multiple references can be held for the same file.
`coderd_open_file_refs_total`	counter	The total number of file references ever opened in the file cache. The 'hit' label indicates if the file was loaded from the cache.	`hit`
`coderd_open_files_current`	gauge	The count of unique files currently open in the file cache.
`coderd_open_files_size_bytes_current`	gauge	The current amount of memory of all files currently open in the file cache.
`coderd_open_files_size_bytes_total`	counter	The total amount of memory ever opened in the file cache. This number never decrements.
`coderd_open_files_total`	counter	The total count of unique files ever opened in the file cache.
`coderd_prebuilds_reconciliation_duration_seconds`	histogram	Duration of each prebuilds reconciliation cycle.
`coderd_prebuilt_workspace_claim_duration_seconds`	histogram	Time to claim a prebuilt workspace by organization, template, and preset.	`organization_name` `preset_name` `template_name`
`coderd_prebuilt_workspaces_claimed_total`	counter	Total number of prebuilt workspaces which were claimed by users. Claiming refers to creating a workspace with a preset selected for which eligible prebuilt workspaces are available and one is reassigned to a user.	`organization_name` `preset_name` `template_name`
`coderd_prebuilt_workspaces_created_total`	counter	Total number of prebuilt workspaces that have been created to meet the desired instance count of each template preset.	`organization_name` `preset_name` `template_name`
`coderd_prebuilt_workspaces_desired`	gauge	Target number of prebuilt workspaces that should be available for each template preset.	`organization_name` `preset_name` `template_name`
`coderd_prebuilt_workspaces_eligible`	gauge	Current number of prebuilt workspaces that are eligible to be claimed by users. These are workspaces that have completed their build process with their agent reporting 'ready' status.	`organization_name` `preset_name` `template_name`
`coderd_prebuilt_workspaces_failed_total`	counter	Total number of prebuilt workspaces that failed to build.	`organization_name` `preset_name` `template_name`
`coderd_prebuilt_workspaces_metrics_last_updated`	gauge	The unix timestamp when the metrics related to prebuilt workspaces were last updated; these metrics are cached.
`coderd_prebuilt_workspaces_preset_hard_limited`	gauge	Indicates whether a given preset has reached the hard failure limit (1 = hard-limited). Metric is omitted otherwise.	`organization_name` `preset_name` `template_name`
`coderd_prebuilt_workspaces_reconciliation_paused`	gauge	Indicates whether prebuilds reconciliation is currently paused (1 = paused, 0 = not paused).
`coderd_prebuilt_workspaces_resource_replacements_total`	counter	Total number of prebuilt workspaces whose resource(s) got replaced upon being claimed. In Terraform, drift on immutable attributes results in resource replacement. This represents a worst-case scenario for prebuilt workspaces because the pre-provisioned resource would have been recreated when claiming, thus obviating the point of pre-provisioning. See https://coder.com/docs/admin/templates/extending-templates/prebuilt-workspaces#preventing-resource-replacement	`organization_name` `preset_name` `template_name`
`coderd_prebuilt_workspaces_running`	gauge	Current number of prebuilt workspaces that are in a running state. These workspaces have started successfully but may not yet be claimable by users (see coderd_prebuilt_workspaces_eligible).	`organization_name` `preset_name` `template_name`
`coderd_prometheusmetrics_agents_execution_seconds`	histogram	Histogram for duration of agents metrics collection in seconds.
`coderd_prometheusmetrics_agentstats_execution_seconds`	histogram	Histogram for duration of agent stats metrics collection in seconds.
`coderd_prometheusmetrics_metrics_aggregator_execution_cleanup_seconds`	histogram	Histogram for duration of metrics aggregator cleanup in seconds.
`coderd_prometheusmetrics_metrics_aggregator_execution_update_seconds`	histogram	Histogram for duration of metrics aggregator update in seconds.
`coderd_prometheusmetrics_metrics_aggregator_store_size`	gauge	The number of metrics stored in the aggregator
`coderd_provisioner_job_queue_wait_seconds`	histogram	Time from job creation to acquisition by a provisioner daemon.	`build_reason` `job_type` `provisioner_type` `transition`
`coderd_provisionerd_job_timings_seconds`	histogram	The provisioner job time duration in seconds.	`provisioner` `status`
`coderd_provisionerd_jobs_current`	gauge	The number of currently running provisioner jobs.	`provisioner`
`coderd_provisionerd_num_daemons`	gauge	The number of provisioner daemons.
`coderd_provisionerd_workspace_build_timings_seconds`	histogram	The time taken for a workspace to build.	`status` `template_name` `template_version` `workspace_transition`
`coderd_proxyhealth_health_check_duration_seconds`	histogram	Histogram for duration of proxy health collection in seconds.
`coderd_proxyhealth_health_check_results`	gauge	This endpoint returns a number to indicate the health status. -3 (unknown), -2 (Unreachable), -1 (Unhealthy), 0 (Unregistered), 1 (Healthy)	`proxy_id`
`coderd_template_workspace_build_duration_seconds`	histogram	Duration from workspace build creation to agent ready, by template.	`is_prebuild` `organization_name` `status` `template_name` `transition`
`coderd_workspace_builds_enqueued_total`	counter	Total number of workspace build enqueue attempts.	`build_reason` `provisioner_type` `status` `transition`
`coderd_workspace_builds_total`	counter	The number of workspaces started, updated, or deleted.	`status` `template_name` `template_version` `workspace_name` `workspace_owner` `workspace_transition`
`coderd_workspace_creation_duration_seconds`	histogram	Time to create a workspace by organization, template, preset, and type (regular or prebuild).	`organization_name` `preset_name` `template_name` `type`
`coderd_workspace_creation_total`	counter	Total regular (non-prebuilt) workspace creations by organization, template, and preset.	`organization_name` `preset_name` `template_name`
`coderd_workspace_latest_build_status`	gauge	The current workspace statuses by template, transition, and owner for all non-deleted workspaces.	`status` `template_name` `template_version` `workspace_owner` `workspace_transition`
`go_gc_duration_seconds`	summary	A summary of the pause duration of garbage collection cycles.
`go_goroutines`	gauge	Number of goroutines that currently exist.
`go_info`	gauge	Information about the Go environment.	`version`
`go_memstats_alloc_bytes`	gauge	Number of bytes allocated and still in use.
`go_memstats_alloc_bytes_total`	counter	Total number of bytes allocated, even if freed.
`go_memstats_buck_hash_sys_bytes`	gauge	Number of bytes used by the profiling bucket hash table.
`go_memstats_frees_total`	counter	Total number of frees.
`go_memstats_gc_sys_bytes`	gauge	Number of bytes used for garbage collection system metadata.
`go_memstats_heap_alloc_bytes`	gauge	Number of heap bytes allocated and still in use.
`go_memstats_heap_idle_bytes`	gauge	Number of heap bytes waiting to be used.
`go_memstats_heap_inuse_bytes`	gauge	Number of heap bytes that are in use.
`go_memstats_heap_objects`	gauge	Number of allocated objects.
`go_memstats_heap_released_bytes`	gauge	Number of heap bytes released to OS.
`go_memstats_heap_sys_bytes`	gauge	Number of heap bytes obtained from system.
`go_memstats_last_gc_time_seconds`	gauge	Number of seconds since 1970 of last garbage collection.
`go_memstats_lookups_total`	counter	Total number of pointer lookups.
`go_memstats_mallocs_total`	counter	Total number of mallocs.
`go_memstats_mcache_inuse_bytes`	gauge	Number of bytes in use by mcache structures.
`go_memstats_mcache_sys_bytes`	gauge	Number of bytes used for mcache structures obtained from system.
`go_memstats_mspan_inuse_bytes`	gauge	Number of bytes in use by mspan structures.
`go_memstats_mspan_sys_bytes`	gauge	Number of bytes used for mspan structures obtained from system.
`go_memstats_next_gc_bytes`	gauge	Number of heap bytes when next garbage collection will take place.
`go_memstats_other_sys_bytes`	gauge	Number of bytes used for other system allocations.
`go_memstats_stack_inuse_bytes`	gauge	Number of bytes in use by the stack allocator.
`go_memstats_stack_sys_bytes`	gauge	Number of bytes obtained from system for stack allocator.
`go_memstats_sys_bytes`	gauge	Number of bytes obtained from system.
`go_threads`	gauge	Number of OS threads created.
`process_cpu_seconds_total`	counter	Total user and system CPU time spent in seconds.
`process_max_fds`	gauge	Maximum number of open file descriptors.
`process_open_fds`	gauge	Number of open file descriptors.
`process_resident_memory_bytes`	gauge	Resident memory size in bytes.
`process_start_time_seconds`	gauge	Start time of the process since unix epoch in seconds.
`process_virtual_memory_bytes`	gauge	Virtual memory size in bytes.
`process_virtual_memory_max_bytes`	gauge	Maximum amount of virtual memory available in bytes.
`promhttp_metric_handler_requests_in_flight`	gauge	Current number of scrapes being served.
`promhttp_metric_handler_requests_total`	counter	Total number of scrapes by HTTP status code.	`code`

Note on Prometheus native histogram support

The following metrics support native histograms:

coderd_workspace_creation_duration_seconds
coderd_prebuilt_workspace_claim_duration_seconds
coderd_template_coderd_template_workspace_build_duration_seconds

Native histograms are an experimental Prometheus feature that removes the need to predefine bucket boundaries and allows higher-resolution buckets that adapt to deployment characteristics. Whether a metric is exposed as classic or native depends entirely on the Prometheus server configuration (see Prometheus docs for details):

If native histograms are enabled, Prometheus ingests the high-resolution histogram.
If not, it falls back to the predefined buckets.

⚠️ Important: classic and native histograms cannot be aggregated together. If Prometheus is switched from classic to native at a certain point in time, dashboards may need to account for that transition. For this reason, it’s recommended to follow Prometheus’ migration guidelines when moving from classic to native histograms.

AI-native Development

AI Governance