From 93ee38eade1b2b4964354b95b01b09e17d6f098d Mon Sep 17 00:00:00 2001
From: Peter Geoghegan
Date: Sat, 29 Feb 2020 12:10:17 -0800
Subject: Teach pageinspect about nbtree deduplication.
Add a new bt_metap() column to display the metapage's allequalimage
field. Also add three new columns to contrib/pageinspect's
bt_page_items() function:
* Add a boolean column ("dead") that displays the LP_DEAD bit value for
each non-pivot tuple.
* Add a TID column ("htid") that displays a single heap TID value for
each tuple. This is the TID that is returned by BTreeTupleGetHeapTID(),
so comparable values are shown for pivot tuples, plain non-pivot tuples,
and posting list tuples.
* Add a TID array column ("tids") that displays TIDs from each tuple's
posting list, if any. This works just like the "tids" column from
pageinspect's gin_leafpage_items() function.
No version bump for the pageinspect extension, since there hasn't been a
stable Postgres release since the last version bump (the last bump was
part of commit 58b4cb30).
Author: Peter Geoghegan
Discussion: https://postgr.es/m/CAH2-WzmSMmU2eNvY9+a4MNP+z02h6sa-uxZvN3un6jY02ZVBSw@mail.gmail.com
---
doc/src/sgml/pageinspect.sgml | 120 ++++++++++++++++++++++++++++++------------
1 file changed, 85 insertions(+), 35 deletions(-)
(limited to 'doc/src')
diff --git a/doc/src/sgml/pageinspect.sgml b/doc/src/sgml/pageinspect.sgml
index 7e2e1487d79..fedc94e8b22 100644
--- a/doc/src/sgml/pageinspect.sgml
+++ b/doc/src/sgml/pageinspect.sgml
@@ -300,13 +300,14 @@ test=# SELECT t_ctid, raw_flags, combined_flags
test=# SELECT * FROM bt_metap('pg_cast_oid_index');
-[ RECORD 1 ]-----------+-------
magic | 340322
-version | 3
+version | 4
root | 1
level | 0
fastroot | 1
fastlevel | 0
oldest_xact | 582
last_cleanup_num_tuples | 1000
+allequalimage | f
@@ -329,11 +330,11 @@ test=# SELECT * FROM bt_page_stats('pg_cast_oid_index', 1);
-[ RECORD 1 ]-+-----
blkno | 1
type | l
-live_items | 256
+live_items | 224
dead_items | 0
-avg_item_size | 12
+avg_item_size | 16
page_size | 8192
-free_size | 4056
+free_size | 3668
btpo_prev | 0
btpo_next | 0
btpo | 0
@@ -356,33 +357,75 @@ btpo_flags | 3
bt_page_items returns detailed information about
all of the items on a B-tree index page. For example:
-test=# SELECT * FROM bt_page_items('pg_cast_oid_index', 1);
- itemoffset | ctid | itemlen | nulls | vars | data
-------------+---------+---------+-------+------+-------------
- 1 | (0,1) | 12 | f | f | 23 27 00 00
- 2 | (0,2) | 12 | f | f | 24 27 00 00
- 3 | (0,3) | 12 | f | f | 25 27 00 00
- 4 | (0,4) | 12 | f | f | 26 27 00 00
- 5 | (0,5) | 12 | f | f | 27 27 00 00
- 6 | (0,6) | 12 | f | f | 28 27 00 00
- 7 | (0,7) | 12 | f | f | 29 27 00 00
- 8 | (0,8) | 12 | f | f | 2a 27 00 00
+ test=# SELECT itemoffset, ctid, itemlen, nulls, vars, data, dead, htid, tids[0:2] AS some_tids
+ FROM bt_page_items(get_raw_page('tenk2_hundred', 5));
+ itemoffset | ctid | itemlen | nulls | vars | data | dead | htid | some_tids
+------------+-----------+---------+-------+------+-------------------------+------+--------+---------------------
+ 1 | (16,1) | 16 | f | f | 30 00 00 00 00 00 00 00 | | |
+ 2 | (16,8292) | 616 | f | f | 24 00 00 00 00 00 00 00 | f | (1,6) | {"(1,6)","(10,22)"}
+ 3 | (16,8292) | 616 | f | f | 25 00 00 00 00 00 00 00 | f | (1,18) | {"(1,18)","(4,22)"}
+ 4 | (16,8292) | 616 | f | f | 26 00 00 00 00 00 00 00 | f | (4,18) | {"(4,18)","(6,17)"}
+ 5 | (16,8292) | 616 | f | f | 27 00 00 00 00 00 00 00 | f | (1,2) | {"(1,2)","(1,19)"}
+ 6 | (16,8292) | 616 | f | f | 28 00 00 00 00 00 00 00 | f | (2,24) | {"(2,24)","(4,11)"}
+ 7 | (16,8292) | 616 | f | f | 29 00 00 00 00 00 00 00 | f | (2,17) | {"(2,17)","(11,2)"}
+ 8 | (16,8292) | 616 | f | f | 2a 00 00 00 00 00 00 00 | f | (0,25) | {"(0,25)","(3,20)"}
+ 9 | (16,8292) | 616 | f | f | 2b 00 00 00 00 00 00 00 | f | (0,10) | {"(0,10)","(0,14)"}
+ 10 | (16,8292) | 616 | f | f | 2c 00 00 00 00 00 00 00 | f | (1,3) | {"(1,3)","(3,9)"}
+ 11 | (16,8292) | 616 | f | f | 2d 00 00 00 00 00 00 00 | f | (6,28) | {"(6,28)","(11,1)"}
+ 12 | (16,8292) | 616 | f | f | 2e 00 00 00 00 00 00 00 | f | (0,27) | {"(0,27)","(1,13)"}
+ 13 | (16,8292) | 616 | f | f | 2f 00 00 00 00 00 00 00 | f | (4,17) | {"(4,17)","(4,21)"}
+(13 rows)
- In a B-tree leaf page, ctid points to a heap tuple.
- In an internal page, the block number part of ctid
- points to another page in the index itself, while the offset part
- (the second number) is ignored and is usually 1.
+ This is a B-tree leaf page. All tuples that point to the table
+ happen to be posting list tuples (all of which store a total of
+ 100 6 byte TIDs). There is also a high key
tuple
+ at itemoffset number 1.
+ ctid is used to store encoded
+ information about each tuple in this example, though leaf page
+ tuples often store a heap TID directly in the
+ ctid field instead.
+ tids is the list of TIDs stored as a
+ posting list.
+
+
+ In an internal page (not shown), the block number part of
+ ctid is a downlink
,
+ which is a block number of another page in the index itself.
+ The offset part (the second number) of
+ ctid stores encoded information about
+ the tuple, such as the number of columns present (suffix
+ truncation may have removed unneeded suffix columns). Truncated
+ columns are treated as having the value minus
+ infinity
.
+
+
+ htid shows a heap TID for the tuple,
+ regardless of the underlying tuple representation. This value
+ may match ctid, or may be decoded
+ from the alternative representations used by posting list tuples
+ and tuples from internal pages. Tuples in internal pages
+ usually have the implementation level heap TID column truncated
+ away, which is represented as a NULL
+ htid value.
Note that the first item on any non-rightmost page (any page with
a non-zero value in the btpo_next field) is the
page's high key
, meaning its data
serves as an upper bound on all items appearing on the page, while
- its ctid field is meaningless. Also, on non-leaf
- pages, the first real data item (the first item that is not a high
- key) is a minus infinity
item, with no actual value
- in its data field. Such an item does have a valid
- downlink in its ctid field, however.
+ its ctid field does not point to
+ another block. Also, on internal pages, the first real data
+ item (the first item that is not a high key) reliably has every
+ column truncated away, leaving no actual value in its
+ data field. Such an item does have a
+ valid downlink in its ctid field,
+ however.
+
+
+ For more details about the structure of B-tree indexes, see
+ . For more details about
+ deduplication and posting lists, see .
@@ -402,17 +445,24 @@ test=# SELECT * FROM bt_page_items('pg_cast_oid_index', 1);
with get_raw_page should be passed as argument. So
the last example could also be rewritten like this:
-test=# SELECT * FROM bt_page_items(get_raw_page('pg_cast_oid_index', 1));
- itemoffset | ctid | itemlen | nulls | vars | data
-------------+---------+---------+-------+------+-------------
- 1 | (0,1) | 12 | f | f | 23 27 00 00
- 2 | (0,2) | 12 | f | f | 24 27 00 00
- 3 | (0,3) | 12 | f | f | 25 27 00 00
- 4 | (0,4) | 12 | f | f | 26 27 00 00
- 5 | (0,5) | 12 | f | f | 27 27 00 00
- 6 | (0,6) | 12 | f | f | 28 27 00 00
- 7 | (0,7) | 12 | f | f | 29 27 00 00
- 8 | (0,8) | 12 | f | f | 2a 27 00 00
+ test=# SELECT itemoffset, ctid, itemlen, nulls, vars, data, dead, htid, tids[0:2] AS some_tids
+ FROM bt_page_items(get_raw_page('tenk2_hundred', 5));
+ itemoffset | ctid | itemlen | nulls | vars | data | dead | htid | some_tids
+------------+-----------+---------+-------+------+-------------------------+------+--------+---------------------
+ 1 | (16,1) | 16 | f | f | 30 00 00 00 00 00 00 00 | | |
+ 2 | (16,8292) | 616 | f | f | 24 00 00 00 00 00 00 00 | f | (1,6) | {"(1,6)","(10,22)"}
+ 3 | (16,8292) | 616 | f | f | 25 00 00 00 00 00 00 00 | f | (1,18) | {"(1,18)","(4,22)"}
+ 4 | (16,8292) | 616 | f | f | 26 00 00 00 00 00 00 00 | f | (4,18) | {"(4,18)","(6,17)"}
+ 5 | (16,8292) | 616 | f | f | 27 00 00 00 00 00 00 00 | f | (1,2) | {"(1,2)","(1,19)"}
+ 6 | (16,8292) | 616 | f | f | 28 00 00 00 00 00 00 00 | f | (2,24) | {"(2,24)","(4,11)"}
+ 7 | (16,8292) | 616 | f | f | 29 00 00 00 00 00 00 00 | f | (2,17) | {"(2,17)","(11,2)"}
+ 8 | (16,8292) | 616 | f | f | 2a 00 00 00 00 00 00 00 | f | (0,25) | {"(0,25)","(3,20)"}
+ 9 | (16,8292) | 616 | f | f | 2b 00 00 00 00 00 00 00 | f | (0,10) | {"(0,10)","(0,14)"}
+ 10 | (16,8292) | 616 | f | f | 2c 00 00 00 00 00 00 00 | f | (1,3) | {"(1,3)","(3,9)"}
+ 11 | (16,8292) | 616 | f | f | 2d 00 00 00 00 00 00 00 | f | (6,28) | {"(6,28)","(11,1)"}
+ 12 | (16,8292) | 616 | f | f | 2e 00 00 00 00 00 00 00 | f | (0,27) | {"(0,27)","(1,13)"}
+ 13 | (16,8292) | 616 | f | f | 2f 00 00 00 00 00 00 00 | f | (4,17) | {"(4,17)","(4,21)"}
+(13 rows)
All the other details are the same as explained in the previous item.
--
cgit v1.2.3