<listitem>
<para>
Enables timing of database I/O calls. This parameter is off by
- default, because it will repeatedly query the operating system for
+ default, as it will repeatedly query the operating system for
the current time, which may cause significant overhead on some
platforms. You can use the <xref linkend="pgtesttiming"/> tool to
measure the overhead of timing on your system.
</listitem>
</varlistentry>
+ <varlistentry id="guc-track-wal-io-timing" xreflabel="track_wal_io_timing">
+ <term><varname>track_wal_io_timing</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>track_wal_io_timing</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables timing of WAL I/O calls. This parameter is off by default,
+ as it will repeatedly query the operating system for the current time,
+ which may cause significant overhead on some platforms.
+ You can use the <application>pg_test_timing</application> tool to
+ measure the overhead of timing on your system.
+ I/O timing information is
+ displayed in <link linkend="monitoring-pg-stat-wal-view">
+ <structname>pg_stat_wal</structname></link>. Only superusers can
+ change this setting.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-track-functions" xreflabel="track_functions">
<term><varname>track_functions</varname> (<type>enum</type>)
<indexterm>
of block read and write times.
</para>
+ <para>
+ The parameter <xref linkend="guc-track-wal-io-timing"/> enables monitoring
+ of WAL write times.
+ </para>
+
<para>
Normally these parameters are set in <filename>postgresql.conf</filename> so
that they apply to all server processes, but it is possible to turn
</para></entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>wal_write</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of times WAL buffers were written out to disk via
+ <function>XLogWrite</function> request.
+ See <xref linkend="wal-configuration"/> for more information about
+ the internal WAL function <function>XLogWrite</function>.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>wal_sync</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of times WAL files were synced to disk via
+ <function>issue_xlog_fsync</function> request
+ (if <xref linkend="guc-fsync"/> is <literal>on</literal> and
+ <xref linkend="guc-wal-sync-method"/> is either
+ <literal>fdatasync</literal>, <literal>fsync</literal> or
+ <literal>fsync_writethrough</literal>, otherwise zero).
+ See <xref linkend="wal-configuration"/> for more information about
+ the internal WAL function <function>issue_xlog_fsync</function>.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>wal_write_time</structfield> <type>double precision</type>
+ </para>
+ <para>
+ Total amount of time spent writing WAL buffers to disk via
+ <function>XLogWrite</function> request, in milliseconds
+ (if <xref linkend="guc-track-wal-io-timing"/> is enabled,
+ otherwise zero). This includes the sync time when
+ <varname>wal_sync_method</varname> is either
+ <literal>open_datasync</literal> or <literal>open_sync</literal>.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>wal_sync_time</structfield> <type>double precision</type>
+ </para>
+ <para>
+ Total amount of time spent syncing WAL files to disk via
+ <function>issue_xlog_fsync</function> request, in milliseconds
+ (if <varname>track_wal_io_timing</varname> is enabled,
+ <varname>fsync</varname> is <literal>on</literal>, and
+ <varname>wal_sync_method</varname> is either
+ <literal>fdatasync</literal>, <literal>fsync</literal> or
+ <literal>fsync_writethrough</literal>, otherwise zero).
+ </para></entry>
+ </row>
+
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>stats_reset</structfield> <type>timestamp with time zone</type>
<acronym>WAL</acronym> call being logged to the server log. This
option might be replaced by a more general mechanism in the future.
</para>
+
+ <para>
+ There are two internal functions to write WAL data to disk:
+ <function>XLogWrite</function> and <function>issue_xlog_fsync</function>.
+ When <xref linkend="guc-track-wal-io-timing"/> is enabled, the total
+ amounts of time <function>XLogWrite</function> writes and
+ <function>issue_xlog_fsync</function> syncs WAL data to disk are counted as
+ <literal>wal_write_time</literal> and <literal>wal_sync_time</literal> in
+ <xref linkend="pg-stat-wal-view"/>, respectively.
+ <function>XLogWrite</function> is normally called by
+ <function>XLogInsertRecord</function> (when there is no space for the new
+ record in WAL buffers), <function>XLogFlush</function> and the WAL writer,
+ to write WAL buffers to disk and call <function>issue_xlog_fsync</function>.
+ <function>issue_xlog_fsync</function> is normally called by
+ <function>XLogWrite</function> to sync WAL files to disk.
+ If <varname>wal_sync_method</varname> is either
+ <literal>open_datasync</literal> or <literal>open_sync</literal>,
+ a write operation in <function>XLogWrite</function> guarantees to sync written
+ WAL data to disk and <function>issue_xlog_fsync</function> does nothing.
+ If <varname>wal_sync_method</varname> is either <literal>fdatasync</literal>,
+ <literal>fsync</literal>, or <literal>fsync_writethrough</literal>,
+ the write operation moves WAL buffers to kernel cache and
+ <function>issue_xlog_fsync</function> syncs them to disk. Regardless
+ of the setting of <varname>track_wal_io_timing</varname>, the numbers
+ of times <function>XLogWrite</function> writes and
+ <function>issue_xlog_fsync</function> syncs WAL data to disk are also
+ counted as <literal>wal_write</literal> and <literal>wal_sync</literal>
+ in <structname>pg_stat_wal</structname>, respectively.
+ </para>
</sect1>
<sect1 id="wal-internals">
int CommitSiblings = 5; /* # concurrent xacts needed to sleep */
int wal_retrieve_retry_interval = 5000;
int max_slot_wal_keep_size_mb = -1;
+bool track_wal_io_timing = false;
#ifdef WAL_DEBUG
bool XLOG_DEBUG = false;
Size nbytes;
Size nleft;
int written;
+ instr_time start;
/* OK to write the page(s) */
from = XLogCtl->pages + startidx * (Size) XLOG_BLCKSZ;
do
{
errno = 0;
+
+ /* Measure I/O timing to write WAL data */
+ if (track_wal_io_timing)
+ INSTR_TIME_SET_CURRENT(start);
+
pgstat_report_wait_start(WAIT_EVENT_WAL_WRITE);
written = pg_pwrite(openLogFile, from, nleft, startoffset);
pgstat_report_wait_end();
+
+ /*
+ * Increment the I/O timing and the number of times WAL data
+ * were written out to disk.
+ */
+ if (track_wal_io_timing)
+ {
+ instr_time duration;
+
+ INSTR_TIME_SET_CURRENT(duration);
+ INSTR_TIME_SUBTRACT(duration, start);
+ WalStats.m_wal_write_time += INSTR_TIME_GET_MICROSEC(duration);
+ }
+
+ WalStats.m_wal_write++;
+
if (written <= 0)
{
char xlogfname[MAXFNAMELEN];
issue_xlog_fsync(int fd, XLogSegNo segno)
{
char *msg = NULL;
+ instr_time start;
+
+ /*
+ * Quick exit if fsync is disabled or write() has already synced the WAL
+ * file.
+ */
+ if (!enableFsync ||
+ sync_method == SYNC_METHOD_OPEN ||
+ sync_method == SYNC_METHOD_OPEN_DSYNC)
+ return;
+
+ /* Measure I/O timing to sync the WAL file */
+ if (track_wal_io_timing)
+ INSTR_TIME_SET_CURRENT(start);
pgstat_report_wait_start(WAIT_EVENT_WAL_SYNC);
switch (sync_method)
#endif
case SYNC_METHOD_OPEN:
case SYNC_METHOD_OPEN_DSYNC:
- /* write synced it already */
+ /* not reachable */
+ Assert(false);
break;
default:
elog(PANIC, "unrecognized wal_sync_method: %d", sync_method);
}
pgstat_report_wait_end();
+
+ /*
+ * Increment the I/O timing and the number of times WAL files were synced.
+ */
+ if (track_wal_io_timing)
+ {
+ instr_time duration;
+
+ INSTR_TIME_SET_CURRENT(duration);
+ INSTR_TIME_SUBTRACT(duration, start);
+ WalStats.m_wal_sync_time += INSTR_TIME_GET_MICROSEC(duration);
+ }
+
+ WalStats.m_wal_sync++;
}
/*
w.wal_fpi,
w.wal_bytes,
w.wal_buffers_full,
+ w.wal_write,
+ w.wal_sync,
+ w.wal_write_time,
+ w.wal_sync_time,
w.stats_reset
FROM pg_stat_get_wal() w;
pgstat_send_bgwriter();
/* Send WAL statistics to the stats collector. */
- pgstat_send_wal();
+ pgstat_report_wal();
/*
* If any checkpoint flags have been set, redo the loop to handle the
/*
* WAL usage counters saved from pgWALUsage at the previous call to
- * pgstat_send_wal(). This is used to calculate how much WAL usage
- * happens between pgstat_send_wal() calls, by substracting
+ * pgstat_report_wal(). This is used to calculate how much WAL usage
+ * happens between pgstat_report_wal() calls, by substracting
* the previous counters from the current ones.
*/
static WalUsage prevWalUsage;
pgstat_send_funcstats();
/* Send WAL statistics */
- pgstat_send_wal();
+ pgstat_report_wal();
/* Finally send SLRU statistics */
pgstat_send_slru();
}
/*
- * Initialize prevWalUsage with pgWalUsage so that pgstat_send_wal() can
+ * Initialize prevWalUsage with pgWalUsage so that pgstat_report_wal() can
* calculate how much pgWalUsage counters are increased by substracting
* prevWalUsage from pgWalUsage.
*/
}
/* ----------
- * pgstat_send_wal() -
+ * pgstat_report_wal() -
+ *
+ * Calculate how much WAL usage counters are increased and send
+ * WAL statistics to the collector.
*
- * Send WAL statistics to the collector
+ * Must be called by processes that generate WAL.
* ----------
*/
void
-pgstat_send_wal(void)
+pgstat_report_wal(void)
{
- /* We assume this initializes to zeroes */
- static const PgStat_MsgWal all_zeroes;
-
WalUsage walusage;
/*
WalStats.m_wal_fpi = walusage.wal_fpi;
WalStats.m_wal_bytes = walusage.wal_bytes;
+ /*
+ * Send WAL stats message to the collector.
+ */
+ if (!pgstat_send_wal(true))
+ return;
+
+ /*
+ * Save the current counters for the subsequent calculation of WAL usage.
+ */
+ prevWalUsage = pgWalUsage;
+}
+
+/* ----------
+ * pgstat_send_wal() -
+ *
+ * Send WAL statistics to the collector.
+ *
+ * If 'force' is not set, WAL stats message is only sent if enough time has
+ * passed since last one was sent to reach PGSTAT_STAT_INTERVAL.
+ *
+ * Return true if the message is sent, and false otherwise.
+ * ----------
+ */
+bool
+pgstat_send_wal(bool force)
+{
+ /* We assume this initializes to zeroes */
+ static const PgStat_MsgWal all_zeroes;
+ static TimestampTz sendTime = 0;
+
/*
* This function can be called even if nothing at all has happened. In
* this case, avoid sending a completely empty message to the stats
* collector.
*/
if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0)
- return;
+ return false;
+
+ if (!force)
+ {
+ TimestampTz now = GetCurrentTimestamp();
+
+ /*
+ * Don't send a message unless it's been at least PGSTAT_STAT_INTERVAL
+ * msec since we last sent one.
+ */
+ if (!TimestampDifferenceExceeds(sendTime, now, PGSTAT_STAT_INTERVAL))
+ return false;
+ sendTime = now;
+ }
/*
* Prepare and send the message
pgstat_setheader(&WalStats.m_hdr, PGSTAT_MTYPE_WAL);
pgstat_send(&WalStats, sizeof(WalStats));
- /*
- * Save the current counters for the subsequent calculation of WAL usage.
- */
- prevWalUsage = pgWalUsage;
-
/*
* Clear out the statistics buffer, so it can be re-used.
*/
MemSet(&WalStats, 0, sizeof(WalStats));
+
+ return true;
}
/* ----------
walStats.wal_fpi += msg->m_wal_fpi;
walStats.wal_bytes += msg->m_wal_bytes;
walStats.wal_buffers_full += msg->m_wal_buffers_full;
+ walStats.wal_write += msg->m_wal_write;
+ walStats.wal_sync += msg->m_wal_sync;
+ walStats.wal_write_time += msg->m_wal_write_time;
+ walStats.wal_sync_time += msg->m_wal_sync_time;
}
/* ----------
else if (left_till_hibernate > 0)
left_till_hibernate--;
+ /* Send WAL statistics to the stats collector */
+ pgstat_send_wal(false);
+
/*
* Sleep until we are signaled or WalWriterDelay has elapsed. If we
* haven't done anything useful for quite some time, lengthen the
Datum
pg_stat_get_wal(PG_FUNCTION_ARGS)
{
-#define PG_STAT_GET_WAL_COLS 5
+#define PG_STAT_GET_WAL_COLS 9
TupleDesc tupdesc;
Datum values[PG_STAT_GET_WAL_COLS];
bool nulls[PG_STAT_GET_WAL_COLS];
NUMERICOID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 4, "wal_buffers_full",
INT8OID, -1, 0);
- TupleDescInitEntry(tupdesc, (AttrNumber) 5, "stats_reset",
+ TupleDescInitEntry(tupdesc, (AttrNumber) 5, "wal_write",
+ INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 6, "wal_sync",
+ INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 7, "wal_write_time",
+ FLOAT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 8, "wal_sync_time",
+ FLOAT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, (AttrNumber) 9, "stats_reset",
TIMESTAMPTZOID, -1, 0);
BlessTupleDesc(tupdesc);
Int32GetDatum(-1));
values[3] = Int64GetDatum(wal_stats->wal_buffers_full);
- values[4] = TimestampTzGetDatum(wal_stats->stat_reset_timestamp);
+ values[4] = Int64GetDatum(wal_stats->wal_write);
+ values[5] = Int64GetDatum(wal_stats->wal_sync);
+
+ /* Convert counters from microsec to millisec for display */
+ values[6] = Float8GetDatum(((double) wal_stats->wal_write_time) / 1000.0);
+ values[7] = Float8GetDatum(((double) wal_stats->wal_sync_time) / 1000.0);
+
+ values[8] = TimestampTzGetDatum(wal_stats->stat_reset_timestamp);
/* Returns the record as Datum */
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
false,
NULL, NULL, NULL
},
+ {
+ {"track_wal_io_timing", PGC_SUSET, STATS_COLLECTOR,
+ gettext_noop("Collects timing statistics for WAL I/O activity."),
+ NULL
+ },
+ &track_wal_io_timing,
+ false,
+ NULL, NULL, NULL
+ },
{
{"update_process_title", PGC_SUSET, PROCESS_TITLE,
#track_activities = on
#track_counts = on
#track_io_timing = off
+#track_wal_io_timing = off
#track_functions = none # none, pl, all
#track_activity_query_size = 1024 # (change requires restart)
#stats_temp_directory = 'pg_stat_tmp'
extern char *PrimaryConnInfo;
extern char *PrimarySlotName;
extern bool wal_receiver_create_temp_slot;
+extern bool track_wal_io_timing;
/* indirectly set via GUC system */
extern TransactionId recoveryTargetXid;
*/
/* yyyymmddN */
-#define CATALOG_VERSION_NO 202103092
+#define CATALOG_VERSION_NO 202103093
#endif
{ oid => '1136', descr => 'statistics: information about WAL activity',
proname => 'pg_stat_get_wal', proisstrict => 'f', provolatile => 's',
proparallel => 'r', prorettype => 'record', proargtypes => '',
- proallargtypes => '{int8,int8,numeric,int8,timestamptz}',
- proargmodes => '{o,o,o,o,o}',
- proargnames => '{wal_records,wal_fpi,wal_bytes,wal_buffers_full,stats_reset}',
+ proallargtypes => '{int8,int8,numeric,int8,int8,int8,float8,float8,timestamptz}',
+ proargmodes => '{o,o,o,o,o,o,o,o,o}',
+ proargnames => '{wal_records,wal_fpi,wal_bytes,wal_buffers_full,wal_write,wal_sync,wal_write_time,wal_sync_time,stats_reset}',
prosrc => 'pg_stat_get_wal' },
{ oid => '2306', descr => 'statistics: information about SLRU caches',
PgStat_Counter m_wal_fpi;
uint64 m_wal_bytes;
PgStat_Counter m_wal_buffers_full;
+ PgStat_Counter m_wal_write;
+ PgStat_Counter m_wal_sync;
+ PgStat_Counter m_wal_write_time; /* time spent writing wal records in
+ * microseconds */
+ PgStat_Counter m_wal_sync_time; /* time spent syncing wal records in
+ * microseconds */
} PgStat_MsgWal;
/* ----------
* ------------------------------------------------------------
*/
-#define PGSTAT_FILE_FORMAT_ID 0x01A5BCA0
+#define PGSTAT_FILE_FORMAT_ID 0x01A5BCA1
/* ----------
* PgStat_StatDBEntry The collector's data per database
PgStat_Counter wal_fpi;
uint64 wal_bytes;
PgStat_Counter wal_buffers_full;
+ PgStat_Counter wal_write;
+ PgStat_Counter wal_sync;
+ PgStat_Counter wal_write_time;
+ PgStat_Counter wal_sync_time;
TimestampTz stat_reset_timestamp;
} PgStat_WalStats;
extern void pgstat_send_archiver(const char *xlog, bool failed);
extern void pgstat_send_bgwriter(void);
-extern void pgstat_send_wal(void);
+extern void pgstat_report_wal(void);
+extern bool pgstat_send_wal(bool force);
/* ----------
* Support functions for the SQL-callable functions to
w.wal_fpi,
w.wal_bytes,
w.wal_buffers_full,
+ w.wal_write,
+ w.wal_sync,
+ w.wal_write_time,
+ w.wal_sync_time,
w.stats_reset
- FROM pg_stat_get_wal() w(wal_records, wal_fpi, wal_bytes, wal_buffers_full, stats_reset);
+ FROM pg_stat_get_wal() w(wal_records, wal_fpi, wal_bytes, wal_buffers_full, wal_write, wal_sync, wal_write_time, wal_sync_time, stats_reset);
pg_stat_wal_receiver| SELECT s.pid,
s.status,
s.receive_start_lsn,