Prometheus

Coder exposes many metrics which can be consumed by a Prometheus server, and give insight into the current state of a live Coder deployment.

If you don't have a Prometheus server installed, you can follow the Prometheus Getting started guide.

Enable Prometheus metrics

Coder server exports metrics via the HTTP endpoint, which can be enabled using either the environment variable CODER_PROMETHEUS_ENABLE or the flag --prometheus-enable.

The Prometheus endpoint address is http://localhost:2112/ by default. You can use either the environment variable CODER_PROMETHEUS_ADDRESS or the flag --prometheus-address <network-interface>:<port> to select a different listen address.

If coder server --prometheus-enable is started locally, you can preview the metrics endpoint in your browser or with curl:

$ curl http://localhost:2112/ # HELP coderd_api_active_users_duration_hour The number of users that have been active within the last hour. # TYPE coderd_api_active_users_duration_hour gauge coderd_api_active_users_duration_hour 0 ...

Kubernetes deployment

The Prometheus endpoint can be enabled in the Helm chart's values.yml by setting CODER_PROMETHEUS_ENABLE=true. Once enabled, the environment variable CODER_PROMETHEUS_ADDRESS will be set by default to 0.0.0.0:2112. A Service Endpoint will not be exposed; if you need to expose the Prometheus port on a Service, (for example, to use a ServiceMonitor), create a separate headless service instead.

apiVersion: v1 kind: Service metadata: name: coder-prom namespace: coder spec: clusterIP: None ports: - name: prom-http port: 2112 protocol: TCP targetPort: 2112 selector: app.kubernetes.io/instance: coder app.kubernetes.io/name: coder type: ClusterIP

Prometheus configuration

To allow Prometheus to scrape the Coder metrics, you will need to create a scrape_config in your prometheus.yml file, or in the Prometheus Helm chart values. The following is an example scrape_config.

scrape_configs: - job_name: "coder" scheme: "http" static_configs: # replace with the the IP address of the Coder pod or server - targets: ["<ip>:2112"] labels: apps: "coder"

To use the Kubernetes Prometheus operator to scrape metrics, you will need to create a ServiceMonitor in your Coder deployment namespace. The following is an example ServiceMonitor.

apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: coder-service-monitor namespace: coder spec: endpoints: - port: prom-http interval: 10s scrapeTimeout: 10s namespaceSelector: matchNames: - coder selector: matchLabels: app.kubernetes.io/name: coder

Available metrics

You must first enable coderd_agentstats_* with the flag --prometheus-collect-agent-stats, or the environment variable CODER_PROMETHEUS_COLLECT_AGENT_STATS before they can be retrieved from the deployment. They will always be available from the agent.

NameTypeDescriptionLabels
agent_scripts_executed_totalcounterTotal number of scripts executed by the Coder agent. Includes cron scheduled scripts.agent_name success template_name username workspace_name
coder_aibridged_circuit_breaker_rejects_totalcounterTotal number of requests rejected due to open circuit breaker.endpoint model provider
coder_aibridged_circuit_breaker_stategaugeCurrent state of the circuit breaker (0=closed, 0.5=half-open, 1=open).endpoint model provider
coder_aibridged_circuit_breaker_trips_totalcounterTotal number of times the circuit breaker transitioned to open state.endpoint model provider
coder_aibridged_injected_tool_invocations_totalcounterThe number of times an injected MCP tool was invoked by aibridge.model name provider server
coder_aibridged_interceptions_duration_secondshistogramThe total duration of intercepted requests, in seconds. The majority of this time will be the upstream processing of the request. aibridge has no control over upstream processing time, so it's just an illustrative metric.model provider
coder_aibridged_interceptions_inflightgaugeThe number of intercepted requests which are being processed.model provider route
coder_aibridged_interceptions_totalcounterThe count of intercepted requests.initiator_id method model provider route status
coder_aibridged_non_injected_tool_selections_totalcounterThe number of times an AI model selected a tool to be invoked by the client.model name provider
coder_aibridged_passthrough_totalcounterThe count of requests which were not intercepted but passed through to the upstream.method provider route
coder_aibridged_prompts_totalcounterThe number of prompts issued by users (initiators).initiator_id model provider
coder_aibridged_tokens_totalcounterThe number of tokens used by intercepted requests.initiator_id model provider type
coder_aibridgeproxyd_connect_sessions_totalcounterTotal number of CONNECT sessions established.type
coder_aibridgeproxyd_inflight_mitm_requestsgaugeNumber of MITM requests currently being processed.provider
coder_aibridgeproxyd_mitm_requests_totalcounterTotal number of MITM requests handled by the proxy.provider
coder_aibridgeproxyd_mitm_responses_totalcounterTotal number of MITM responses by HTTP status code class.code provider
coder_pubsub_connectedgaugeWhether we are connected (1) or not connected (0) to postgres
coder_pubsub_current_eventsgaugeThe current number of pubsub event channels listened for
coder_pubsub_current_subscribersgaugeThe current number of active pubsub subscribers
coder_pubsub_disconnections_totalcounterTotal number of times we disconnected unexpectedly from postgres
coder_pubsub_latency_measure_errs_totalcounterThe number of pubsub latency measurement failures
coder_pubsub_latency_measures_totalcounterThe number of pubsub latency measurements
coder_pubsub_messages_totalcounterTotal number of messages received from postgressize
coder_pubsub_published_bytes_totalcounterTotal number of bytes successfully published across all publishes
coder_pubsub_publishes_totalcounterTotal number of calls to Publishsuccess
coder_pubsub_receive_latency_secondsgaugeThe time taken to receive a message from a pubsub event channel
coder_pubsub_received_bytes_totalcounterTotal number of bytes received across all messages
coder_pubsub_send_latency_secondsgaugeThe time taken to send a message into a pubsub event channel
coder_pubsub_subscribes_totalcounterTotal number of calls to Subscribe/SubscribeWithErrsuccess
coder_servertailnet_connections_totalcounterTotal number of TCP connections made to workspace agents.network
coder_servertailnet_open_connectionsgaugeTotal number of TCP connections currently open to workspace agents.network
coderd_agentapi_metadata_batch_sizehistogramTotal number of metadata entries in each batch, updated before flushes.
coderd_agentapi_metadata_batch_utilizationhistogramNumber of metadata keys per agent in each batch, updated before flushes.
coderd_agentapi_metadata_batches_totalcounterTotal number of metadata batches flushed.reason
coderd_agentapi_metadata_dropped_keys_totalcounterTotal number of metadata keys dropped due to capacity limits.
coderd_agentapi_metadata_flush_duration_secondshistogramTime taken to flush metadata batch to database and pubsub.reason
coderd_agentapi_metadata_flushed_totalcounterTotal number of unique metadatas flushed.
coderd_agentapi_metadata_publish_errors_totalcounterTotal number of metadata batch pubsub publish calls that have resulted in an error.
coderd_agents_appsgaugeAgent applications with statuses.agent_name app_name health username workspace_name
coderd_agents_connection_latencies_secondsgaugeAgent connection latencies in seconds.agent_name derp_region preferred username workspace_name
coderd_agents_connectionsgaugeAgent connections with statuses.agent_name lifecycle_state status tailnet_node username workspace_name
coderd_agents_upgaugeThe number of active agents per workspace.template_name template_version username workspace_name
coderd_agentstats_connection_countgaugeThe number of established connections by agentagent_name username workspace_name
coderd_agentstats_connection_median_latency_secondsgaugeThe median agent connection latencyagent_name username workspace_name
coderd_agentstats_currently_reachable_peersgaugeThe number of peers (e.g. clients) that are currently reachable over the encrypted network.agent_name connection_type template_name username workspace_name
coderd_agentstats_rx_bytesgaugeAgent Rx bytesagent_name username workspace_name
coderd_agentstats_session_count_jetbrainsgaugeThe number of session established by JetBrainsagent_name username workspace_name
coderd_agentstats_session_count_reconnecting_ptygaugeThe number of session established by reconnecting PTYagent_name username workspace_name
coderd_agentstats_session_count_sshgaugeThe number of session established by SSHagent_name username workspace_name
coderd_agentstats_session_count_vscodegaugeThe number of session established by VSCodeagent_name username workspace_name
coderd_agentstats_startup_script_secondsgaugeThe number of seconds the startup script took to execute.agent_name success template_name username workspace_name
coderd_agentstats_tx_bytesgaugeAgent Tx bytesagent_name username workspace_name
coderd_api_active_users_duration_hourgaugeThe number of users that have been active within the last hour.
coderd_api_concurrent_requestsgaugeThe number of concurrent API requests.method path
coderd_api_concurrent_websocketsgaugeThe total number of concurrent API websockets.path
coderd_api_request_latencies_secondshistogramLatency distribution of requests in seconds.method path
coderd_api_requests_processed_totalcounterThe total number of processed API requestscode method path
coderd_api_total_user_countgaugeThe total number of registered users, partitioned by status.status
coderd_api_websocket_durations_secondshistogramWebsocket duration distribution of requests in seconds.path
coderd_api_workspace_latest_buildgaugeThe current number of workspace builds by status for all non-deleted workspaces.status
coderd_authz_authorize_duration_secondshistogramDuration of the 'Authorize' call in seconds. Only counts calls that succeed.allowed
coderd_authz_prepare_authorize_duration_secondshistogramDuration of the 'PrepareAuthorize' call in seconds.
coderd_db_query_counts_totalcounterTotal number of queries labelled by HTTP route, method, and query name.method query route
coderd_db_query_latencies_secondshistogramLatency distribution of queries in seconds.query
coderd_db_tx_duration_secondshistogramDuration of transactions in seconds.success tx_id
coderd_db_tx_executions_countcounterTotal count of transactions executed. 'retries' is expected to be 0 for a successful transaction.retries success tx_id
coderd_dbpurge_iteration_duration_secondshistogramDuration of each dbpurge iteration in seconds.success
coderd_dbpurge_records_purged_totalcounterTotal number of records purged by type.record_type
coderd_experimentsgaugeIndicates whether each experiment is enabled (1) or not (0)experiment
coderd_insights_applications_usage_secondsgaugeThe application usage per template.application_name slug template_name
coderd_insights_parametersgaugeThe parameter usage per template.parameter_name parameter_type parameter_value template_name
coderd_insights_templates_active_usersgaugeThe number of active users of the template.template_name
coderd_license_active_usersgaugeThe number of active users.
coderd_license_errorsgaugeThe number of active license errors.
coderd_license_limit_usersgaugeThe user seats limit based on the active Coder license.
coderd_license_user_limit_enabledgaugeReturns 1 if the current license enforces the user limit.
coderd_license_warningsgaugeThe number of active license warnings.
coderd_lifecycle_autobuild_execution_duration_secondshistogramDuration of each autobuild execution.
coderd_notifications_dispatcher_send_secondshistogramThe time taken to dispatch notifications.method
coderd_notifications_inflight_dispatchesgaugeThe number of dispatch attempts which are currently in progress.method notification_template_id
coderd_notifications_pending_updatesgaugeThe number of dispatch attempt results waiting to be flushed to the store.
coderd_notifications_queued_secondshistogramThe time elapsed between a notification being enqueued in the store and retrieved for dispatching (measures the latency of the notifications system). This should generally be within CODER_NOTIFICATIONS_FETCH_INTERVAL seconds; higher values for a sustained period indicates delayed processing and CODER_NOTIFICATIONS_LEASE_COUNT can be increased to accommodate this.method
coderd_notifications_retry_countcounterThe count of notification dispatch retry attempts.method notification_template_id
coderd_notifications_synced_updates_totalcounterThe number of dispatch attempt results flushed to the store.
coderd_oauth2_external_requests_rate_limitgaugeThe total number of allowed requests per interval.name resource
coderd_oauth2_external_requests_rate_limit_next_reset_unixgaugeUnix timestamp for when the next interval startsname resource
coderd_oauth2_external_requests_rate_limit_remaininggaugeThe remaining number of allowed requests in this interval.name resource
coderd_oauth2_external_requests_rate_limit_reset_in_secondsgaugeSeconds until the next intervalname resource
coderd_oauth2_external_requests_rate_limit_usedgaugeThe number of requests made in this interval.name resource
coderd_oauth2_external_requests_totalcounterThe total number of api calls made to external oauth2 providers. 'status_code' will be 0 if the request failed with no response.name source status_code
coderd_open_file_refs_currentgaugeThe count of file references currently open in the file cache. Multiple references can be held for the same file.
coderd_open_file_refs_totalcounterThe total number of file references ever opened in the file cache. The 'hit' label indicates if the file was loaded from the cache.hit
coderd_open_files_currentgaugeThe count of unique files currently open in the file cache.
coderd_open_files_size_bytes_currentgaugeThe current amount of memory of all files currently open in the file cache.
coderd_open_files_size_bytes_totalcounterThe total amount of memory ever opened in the file cache. This number never decrements.
coderd_open_files_totalcounterThe total count of unique files ever opened in the file cache.
coderd_prebuilds_reconciliation_duration_secondshistogramDuration of each prebuilds reconciliation cycle.
coderd_prebuilt_workspace_claim_duration_secondshistogramTime to claim a prebuilt workspace by organization, template, and preset.organization_name preset_name template_name
coderd_prebuilt_workspaces_claimed_totalcounterTotal number of prebuilt workspaces which were claimed by users. Claiming refers to creating a workspace with a preset selected for which eligible prebuilt workspaces are available and one is reassigned to a user.organization_name preset_name template_name
coderd_prebuilt_workspaces_created_totalcounterTotal number of prebuilt workspaces that have been created to meet the desired instance count of each template preset.organization_name preset_name template_name
coderd_prebuilt_workspaces_desiredgaugeTarget number of prebuilt workspaces that should be available for each template preset.organization_name preset_name template_name
coderd_prebuilt_workspaces_eligiblegaugeCurrent number of prebuilt workspaces that are eligible to be claimed by users. These are workspaces that have completed their build process with their agent reporting 'ready' status.organization_name preset_name template_name
coderd_prebuilt_workspaces_failed_totalcounterTotal number of prebuilt workspaces that failed to build.organization_name preset_name template_name
coderd_prebuilt_workspaces_metrics_last_updatedgaugeThe unix timestamp when the metrics related to prebuilt workspaces were last updated; these metrics are cached.
coderd_prebuilt_workspaces_preset_hard_limitedgaugeIndicates whether a given preset has reached the hard failure limit (1 = hard-limited). Metric is omitted otherwise.organization_name preset_name template_name
coderd_prebuilt_workspaces_reconciliation_pausedgaugeIndicates whether prebuilds reconciliation is currently paused (1 = paused, 0 = not paused).
coderd_prebuilt_workspaces_resource_replacements_totalcounterTotal number of prebuilt workspaces whose resource(s) got replaced upon being claimed. In Terraform, drift on immutable attributes results in resource replacement. This represents a worst-case scenario for prebuilt workspaces because the pre-provisioned resource would have been recreated when claiming, thus obviating the point of pre-provisioning. See https://coder.com/docs/admin/templates/extending-templates/prebuilt-workspaces#preventing-resource-replacementorganization_name preset_name template_name
coderd_prebuilt_workspaces_runninggaugeCurrent number of prebuilt workspaces that are in a running state. These workspaces have started successfully but may not yet be claimable by users (see coderd_prebuilt_workspaces_eligible).organization_name preset_name template_name
coderd_prometheusmetrics_agents_execution_secondshistogramHistogram for duration of agents metrics collection in seconds.
coderd_prometheusmetrics_agentstats_execution_secondshistogramHistogram for duration of agent stats metrics collection in seconds.
coderd_prometheusmetrics_metrics_aggregator_execution_cleanup_secondshistogramHistogram for duration of metrics aggregator cleanup in seconds.
coderd_prometheusmetrics_metrics_aggregator_execution_update_secondshistogramHistogram for duration of metrics aggregator update in seconds.
coderd_prometheusmetrics_metrics_aggregator_store_sizegaugeThe number of metrics stored in the aggregator
coderd_provisioner_job_queue_wait_secondshistogramTime from job creation to acquisition by a provisioner daemon.build_reason job_type provisioner_type transition
coderd_provisionerd_job_timings_secondshistogramThe provisioner job time duration in seconds.provisioner status
coderd_provisionerd_jobs_currentgaugeThe number of currently running provisioner jobs.provisioner
coderd_provisionerd_num_daemonsgaugeThe number of provisioner daemons.
coderd_provisionerd_workspace_build_timings_secondshistogramThe time taken for a workspace to build.status template_name template_version workspace_transition
coderd_proxyhealth_health_check_duration_secondshistogramHistogram for duration of proxy health collection in seconds.
coderd_proxyhealth_health_check_resultsgaugeThis endpoint returns a number to indicate the health status. -3 (unknown), -2 (Unreachable), -1 (Unhealthy), 0 (Unregistered), 1 (Healthy)proxy_id
coderd_template_workspace_build_duration_secondshistogramDuration from workspace build creation to agent ready, by template.is_prebuild organization_name status template_name transition
coderd_workspace_builds_enqueued_totalcounterTotal number of workspace build enqueue attempts.build_reason provisioner_type status transition
coderd_workspace_builds_totalcounterThe number of workspaces started, updated, or deleted.status template_name template_version workspace_name workspace_owner workspace_transition
coderd_workspace_creation_duration_secondshistogramTime to create a workspace by organization, template, preset, and type (regular or prebuild).organization_name preset_name template_name type
coderd_workspace_creation_totalcounterTotal regular (non-prebuilt) workspace creations by organization, template, and preset.organization_name preset_name template_name
coderd_workspace_latest_build_statusgaugeThe current workspace statuses by template, transition, and owner for all non-deleted workspaces.status template_name template_version workspace_owner workspace_transition
go_gc_duration_secondssummaryA summary of the pause duration of garbage collection cycles.
go_goroutinesgaugeNumber of goroutines that currently exist.
go_infogaugeInformation about the Go environment.version
go_memstats_alloc_bytesgaugeNumber of bytes allocated and still in use.
go_memstats_alloc_bytes_totalcounterTotal number of bytes allocated, even if freed.
go_memstats_buck_hash_sys_bytesgaugeNumber of bytes used by the profiling bucket hash table.
go_memstats_frees_totalcounterTotal number of frees.
go_memstats_gc_sys_bytesgaugeNumber of bytes used for garbage collection system metadata.
go_memstats_heap_alloc_bytesgaugeNumber of heap bytes allocated and still in use.
go_memstats_heap_idle_bytesgaugeNumber of heap bytes waiting to be used.
go_memstats_heap_inuse_bytesgaugeNumber of heap bytes that are in use.
go_memstats_heap_objectsgaugeNumber of allocated objects.
go_memstats_heap_released_bytesgaugeNumber of heap bytes released to OS.
go_memstats_heap_sys_bytesgaugeNumber of heap bytes obtained from system.
go_memstats_last_gc_time_secondsgaugeNumber of seconds since 1970 of last garbage collection.
go_memstats_lookups_totalcounterTotal number of pointer lookups.
go_memstats_mallocs_totalcounterTotal number of mallocs.
go_memstats_mcache_inuse_bytesgaugeNumber of bytes in use by mcache structures.
go_memstats_mcache_sys_bytesgaugeNumber of bytes used for mcache structures obtained from system.
go_memstats_mspan_inuse_bytesgaugeNumber of bytes in use by mspan structures.
go_memstats_mspan_sys_bytesgaugeNumber of bytes used for mspan structures obtained from system.
go_memstats_next_gc_bytesgaugeNumber of heap bytes when next garbage collection will take place.
go_memstats_other_sys_bytesgaugeNumber of bytes used for other system allocations.
go_memstats_stack_inuse_bytesgaugeNumber of bytes in use by the stack allocator.
go_memstats_stack_sys_bytesgaugeNumber of bytes obtained from system for stack allocator.
go_memstats_sys_bytesgaugeNumber of bytes obtained from system.
go_threadsgaugeNumber of OS threads created.
process_cpu_seconds_totalcounterTotal user and system CPU time spent in seconds.
process_max_fdsgaugeMaximum number of open file descriptors.
process_open_fdsgaugeNumber of open file descriptors.
process_resident_memory_bytesgaugeResident memory size in bytes.
process_start_time_secondsgaugeStart time of the process since unix epoch in seconds.
process_virtual_memory_bytesgaugeVirtual memory size in bytes.
process_virtual_memory_max_bytesgaugeMaximum amount of virtual memory available in bytes.
promhttp_metric_handler_requests_in_flightgaugeCurrent number of scrapes being served.
promhttp_metric_handler_requests_totalcounterTotal number of scrapes by HTTP status code.code

Note on Prometheus native histogram support

The following metrics support native histograms:

  • coderd_workspace_creation_duration_seconds
  • coderd_prebuilt_workspace_claim_duration_seconds
  • coderd_template_coderd_template_workspace_build_duration_seconds

Native histograms are an experimental Prometheus feature that removes the need to predefine bucket boundaries and allows higher-resolution buckets that adapt to deployment characteristics. Whether a metric is exposed as classic or native depends entirely on the Prometheus server configuration (see Prometheus docs for details):

  • If native histograms are enabled, Prometheus ingests the high-resolution histogram.
  • If not, it falls back to the predefined buckets.

⚠️ Important: classic and native histograms cannot be aggregated together. If Prometheus is switched from classic to native at a certain point in time, dashboards may need to account for that transition. For this reason, it’s recommended to follow Prometheus’ migration guidelines when moving from classic to native histograms.