| Lists: | pgsql-bugs |
|---|
| From: | skoposov(at)cmu(dot)edu |
|---|---|
| To: | pgsql-bugs(at)postgresql(dot)org |
| Subject: | BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit overflow |
| Date: | 2017-06-29 16:16:37 |
| Message-ID: | 20170629161637.1478.93109@wrigleys.postgresql.org |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-bugs |
The following bug has been logged on the website:
Bug reference: 14722
Logged by: Sergey Koposov
Email address: skoposov(at)cmu(dot)edu
PostgreSQL version: 9.5.7
Operating system: Debian 7.11, x86_64
Description:
Hi,
I have a very large table (40e9 records) that I'm trying to create the index
on and I am getting a segmentation fault that could be traced as far as I
understand to a 32 bit int overflow in tuplesort_heap_siftup
Here are the commands leading to the crash:
wsdb=# set maintenance_work_mem to '70GB';
SET
wsdb=# create index on cgonzal.vvv_single_ks_sorted (q3c_ang2ipix(ra,dec));
----
Importantly the table has already been sorted by q3c_ang2ipix(ra,dec) !
--
Here is the table info:
wsdb=# explain select * from cgonzal.vvv_single_ks_sorted;
QUERY PLAN
---------------------------------------------------------------------------------------
Seq Scan on vvv_single_ks_sorted (cost=0.00..968967342.13 rows=43362626913
width=72)
(1 row)
wsdb=# \d cgonzal.vvv_single_ks_sorted
Table "cgonzal.vvv_single_ks_sorted"
Column | Type | Modifiers
---------+------------------+-----------
objid | bigint |
ra | double precision |
dec | double precision |
x | real |
y | real |
chip | integer |
mag | real |
e_mag | real |
class | integer |
frameid | bigint |
zp | double precision |
obj_id | bigint |
--------
Here is the gdb full stacktrace:
(gdb) bt full
#0 0x0000000000914cf8 in tuplesort_heap_siftup (state=0x23503f8,
checkIndex=1 '\001') at tuplesort.c:3014
j = -1879048193
memtuples = 0x7fb283aa1048
tuple = 0x7fba03aa0fd0
i = 1207959551
n = 1342177275
#1 0x000000000091430a in dumptuples (state=0x23503f8, alltuples=0 '\000')
at tuplesort.c:2648
__func__ = "dumptuples"
#2 0x00000000009120a3 in puttuple_common (state=0x23503f8,
tuple=0x7ffe420fefc0) at tuplesort.c:1468
__func__ = "puttuple_common"
#3 0x0000000000911d85 in tuplesort_putindextuplevalues (state=0x23503f8,
rel=0x7fd040f3b8e0, self=0x234ba34, values=0x7ffe420ff360,
isnull=0x7ffe420ff340 "") at tuplesort.c:1321
oldcontext = 0x23340b8
stup = {tuple = 0x7fbf040f6ae8, datum1 = 4710889527840951089,
isnull1 = 0 '\000', tupindex = 0}
original = 4710889527840951089
tuple = 0x7fbf040f6ae8
#4 0x00000000004d26dd in _bt_spool (btspool=0x234cba0, self=0x234ba34,
values=0x7ffe420ff360, isnull=0x7ffe420ff340 "") at nbtsort.c:192
No locals.
#5 0x00000000004cba67 in btbuildCallback (index=0x7fd040f3b8e0,
htup=0x234ba30, values=0x7ffe420ff360, isnull=0x7ffe420ff340 "",
tupleIsAlive=1 '\001', state=0x7ffe420ff550) at nbtree.c:179
buildstate = 0x7ffe420ff550
#6 0x0000000000525d8e in IndexBuildHeapRangeScan
(heapRelation=0x7fd040f32f78, indexRelation=0x7fd040f3b8e0,
indexInfo=0x2348308,
allow_sync=1 '\001', anyvisible=0 '\000', start_blockno=0,
numblocks=4294967295, callback=0x4cba0a <btbuildCallback>,
callback_state=0x7ffe420ff550) at index.c:2591
tupleIsAlive = 1 '\001'
is_system_catalog = 0 '\000'
checking_uniqueness = 0 '\000'
scan = 0x234b9e8
heapTuple = 0x234ba30
values = {4710889527840951089, 9472000, 36863416, 1089733344,
140730006762416, 9195433, 140730006762448, 140532419658520, 140730006762528,
140532419658464, 140730006762448, 9261444, 1976, 140532419658520,
4999282, 128, 36962306, 17179869199, 140730006762544, 9473335, 37029384,
37020152, 18288211008, 9498080, 37029368, 37020152,
140730006762592, 9478487, 140730006762624, 37029384, 64, 37020152}
isnull =
"\000\314\366(at)\320\177\000\000'Z\216\000\000\000\000\000\030\003\364(at)\320\177\000\000\310A3\002\000\000\000"
reltuples = 1342177279
predicate = 0x0
slot = 0x2348e08
estate = 0x2358448
---Type <return> to continue, or q <return> to quit---
econtext = 0x2358558
snapshot = 0xd366e0
OldestXmin = 1148880660
root_blkno = 16570089
root_offsets = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49,
50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68,
69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 0 <repeats 210
times>}
__func__ = "IndexBuildHeapRangeScan"
#7 0x0000000000525556 in IndexBuildHeapScan (heapRelation=0x7fd040f32f78,
indexRelation=0x7fd040f3b8e0, indexInfo=0x2348308, allow_sync=1 '\001',
callback=0x4cba0a <btbuildCallback>, callback_state=0x7ffe420ff550) at
index.c:2162
No locals.
#8 0x00000000004cb979 in btbuild (fcinfo=0x7ffe420ff5d0) at nbtree.c:121
heap = 0x7fd040f32f78
index = 0x7fd040f3b8e0
indexInfo = 0x2348308
result = 0x234be28
reltuples = 6.9529861680561111e-310
buildstate = {isUnique = 0 '\000', haveDead = 0 '\000', heapRel =
0x7fd040f32f78, spool = 0x234cba0, spool2 = 0x0, indtuples = 1342177278}
__func__ = "btbuild"
#9 0x00000000008e8a13 in OidFunctionCall3Coll (functionId=338, collation=0,
arg1=140532419604344, arg2=140532419639520, arg3=36995848) at fmgr.c:1649
flinfo = {fn_addr = 0x4cb854 <btbuild>, fn_oid = 338, fn_nargs = 3,
fn_strict = 1 '\001', fn_retset = 0 '\000', fn_stats = 2 '\002',
fn_extra = 0x0, fn_mcxt = 0x23340b8, fn_expr = 0x0}
fcinfo = {flinfo = 0x7ffe420ff980, context = 0x0, resultinfo = 0x0,
fncollation = 0, isnull = 0 '\000', nargs = 3, arg = {140532419604344,
140532419639520, 36995848, 140532419656080, 68756505104, 128,
13, 17179869199, 140730006763184, 9472170, 128, 36023424, 140730006763152,
17189342519, 140532419627520, 36913336, 36023424, 37017440,
37017424, 36913336, 140730006763200, 512, 1108342496, 25769803839,
140730006763248, 9473335, 140532419653944, 36023424,
26878146304, 6912158, 140532419653928, 36023424, 140730006763296, 9478487,
140730006763296, 140532419653944, 0, 36023424, 140730006763328,
9230214, 672953898141726960, 140532419653944, 140730006763792, 9231509,
10999411261461, 140532419639520, 70458938492543, 156684292, 0,
18446744069414584320, 65536, 0, 140532419654472, 140532419654744,
672953910093598724, 16405, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
140532419656112, 140532419656056, 140532419657096, 37019880, 37027816,
37028344,
37028368, 37028392, 37028416, 37028584, 0, 0, 0, 0, 0, 0,
37028560, 0, 0, 0, 0, 36491160, 0, 0, 0, 13854912, 140532419640320,
8626848200,
36633224, 8589934592, 140730006763808, 6798261,
672953909936914436, 13854912},
argnull =
"\000\000\000\000\000\000\000\000\240h\323\000\000\000\000\000`\371\017B\376\177\000\000\032\274g\000\000\000\000\000\240h\323\000\
000\000\000\000\023\000\000\000\016\000\000\000\300\371\017B\376\177\000\000\aۍ",
'\000' <repeats 13 times>, "hD\224\000\000\000\000\000p<\224\000\000\
000\000\000\331\a\000\000\016\000\000\000\260\371\017B"}
result = 42949672962
__func__ = "OidFunctionCall3Coll"
#10 0x00000000005252a3 in index_build (heapRelation=0x7fd040f32f78,
indexRelation=0x7fd040f3b8e0, indexInfo=0x2348308, isprimary=0 '\000',
---Type <return> to continue, or q <return> to quit---
isreindex=0 '\000') at index.c:2025
procedure = 338
stats = 0x234cfec
save_userid = 10
save_sec_context = 0
save_nestlevel = 2
__func__ = "index_build"
#11 0x0000000000523f98 in index_create (heapRelation=0x7fd040f32f78,
indexRelationName=0x234b8e8 "vvv_single_ks_sorted_q3c_ang2ipix_idx",
indexRelationId=156684292, relFileNode=0, indexInfo=0x2348308,
indexColNames=0x234b638, accessMethodObjectId=403, tableSpaceId=0,
collationObjectId=0x234bdf8, classObjectId=0x234be10,
coloptions=0x234be28, reloptions=0, isprimary=0 '\000', isconstraint=0
'\000',
deferrable=0 '\000', initdeferred=0 '\000', allow_system_table_mods=0
'\000', skip_build=0 '\000', concurrent=0 '\000', is_internal=0 '\000',
if_not_exists=0 '\000') at index.c:1100
heapRelationId = 156673270
pg_class = 0x7fd040f81208
indexRelation = 0x7fd040f3b8e0
indexTupDesc = 0x23486c8
shared_relation = 0 '\000'
mapped_relation = 0 '\000'
is_exclusion = 0 '\000'
[120/270]
namespaceId = 16842
i = 1
relpersistence = 112 'p'
__func__ = "index_create"
#12 0x00000000005e9d27 in DefineIndex (relationId=156673270, stmt=0x23485f8,
indexRelationId=0, is_alter_table=0 '\000', check_rights=1 '\001',
skip_build=0 '\000', quiet=0 '\000') at indexcmds.c:607
indexRelationName = 0x234b8e8
"vvv_single_ks_sorted_q3c_ang2ipix_idx"
accessMethodName = 0x2348930 "btree"
typeObjectId = 0x234b780
collationObjectId = 0x234bdf8
classObjectId = 0x234be10
accessMethodId = 403
namespaceId = 16842
tablespaceId = 0
indexColNames = 0x234b638
rel = 0x7fd040f32f78
indexRelation = 0x23340b8
tuple = 0x7fd040f39b30
---Type <return> to continue, or q <return> to quit---
accessMethodForm = 0x7fd040f39ba8
amcanorder = 1 '\001'
amoptions = 2785
reloptions = 0
coloptions = 0x234be28
indexInfo = 0x2348308
numberOfAttributes = 1
limitXmin = 0
old_snapshots = 0x7fd040f32f78
address = {classId = 36997560, objectId = 0, objectSubId =
36995848}
n_old_snapshots = 0
heaprelid = {relId = 1108343952, dbId = 32766}
heaplocktag = {locktag_field1 = 4657712, locktag_field2 = 0,
locktag_field3 = 1108347536, locktag_field4 = 32766, locktag_type = 0
'\000',
locktag_lockmethodid = 0 '\000'}
lockmode = 5
snapshot = 0x2348308
i = 0
__func__ = "DefineIndex"
#13 0x00000000007ab5ec in ProcessUtilitySlow (parsetree=0x230c138,
queryString=0x230b268 "create index on cgonzal.vvv_single_ks_sorted
(q3c_ang2ipix(ra,dec));", context=PROCESS_UTILITY_TOPLEVEL, params=0x0,
dest=0x230c4d8, completionTag=0x7ffe42100420 "") at utility.c:1259
stmt = 0x23485f8
relid = 156673270
lockmode = 5
save_exception_stack = 0x7ffe421002e0
save_context_stack = 0x0
local_sigjmp_buf = {{__jmpbuf = {0, 8080871256505359237, 4657712,
140730006768272, 0, 0, 8080871325866564485, -8081285932728411259},
__mask_was_saved = 0, __saved_mask = {__val = {64, 36632424,
140730006765464, 140730006765472, 13829056, 8192, 36973152, 4657712, 5,
140730006765328, 9476353, 64, 0, 36973248, 13829056,
64}}}}
isTopLevel = 1 '\001'
isCompleteQuery = 1 '\001'
needCleanup = 0 '\000'
commandCollected = 0 '\000'
address = {classId = 0, objectId = 0, objectSubId = 13829056}
secondaryObject = {classId = 0, objectId = 0, objectSubId = 0}
__func__ = "ProcessUtilitySlow"
#14 0x00000000007aaa16 in standard_ProcessUtility (parsetree=0x230c138,
---Type <return> to continue, or q <return> to quit---
queryString=0x230b268 "create index on cgonzal.vvv_single_ks_sorted
(q3c_ang2ipix(ra,dec));", context=PROCESS_UTILITY_TOPLEVEL, params=0x0,
dest=0x230c4d8, completionTag=0x7ffe42100420 "") at utility.c:892
isTopLevel = 1 '\001'
__func__ = "standard_ProcessUtility"
#15 0x00000000007a9beb in ProcessUtility (parsetree=0x230c138,
queryString=0x230b268 "create index on cgonzal.vvv_single_ks_sorted
(q3c_ang2ipix(ra,dec));", context=PROCESS_UTILITY_TOPLEVEL, params=0x0,
dest=0x230c4d8, completionTag=0x7ffe42100420 "") at utility.c:334
No locals.
#16 0x00000000007a8e07 in PortalRunUtility (portal=0x2278798,
utilityStmt=0x230c138, isTopLevel=1 '\001', dest=0x230c4d8,
completionTag=0x7ffe42100420 "") at pquery.c:1183
active_snapshot_set = 1 '\001'
__func__ = "PortalRunUtility"
#17 0x00000000007a8fae in PortalRunMulti (portal=0x2278798, isTopLevel=1
'\001', dest=0x230c4d8, altdest=0x230c4d8,
completionTag=0x7ffe4210042[50/270]
at pquery.c:1314
stmt = 0x230c138
active_snapshot_set = 0 '\000'
stmtlist_item = 0x230c488
#18 0x00000000007a85c2 in PortalRun (portal=0x2278798,
count=9223372036854775807, isTopLevel=1 '\001', dest=0x230c4d8,
altdest=0x230c4d8,
completionTag=0x7ffe42100420 "") at pquery.c:812
save_exception_stack = 0x7ffe42100560
save_context_stack = 0x0
local_sigjmp_buf = {{__jmpbuf = {0, 8080871256352267141, 4657712,
140730006768272, 0, 0, 8080871256442444677, -8081285932000961659},
__mask_was_saved = 0, __saved_mask = {__val = {3432, 9356099,
36745776, 13, 0, 140730006766512, 9477730, 36624768, 88, 0, 36750640, 88,
9359107, 36750552, 36750640, 0}}}}
result = 0 '\000'
nprocessed = 32766
saveTopTransactionResourceOwner = 0x22ef878
saveTopTransactionContext = 0x22ef768
saveActivePortal = 0x0
saveResourceOwner = 0x22ef878
savePortalContext = 0x0
saveMemoryContext = 0x22ef768
__func__ = "PortalRun"
#19 0x00000000007a2ac3 in exec_simple_query (query_string=0x230b268 "create
index on cgonzal.vvv_single_ks_sorted (q3c_ang2ipix(ra,dec));")
[29/270]
at postgres.c:1104
parsetree = 0x230c138
portal = 0x2278798
---Type <return> to continue, or q <return> to quit---
snapshot_set = 0 '\000'
commandTag = 0xa4fc46 "CREATE INDEX"
completionTag =
"\000\004\020B\376\177\000\000\243b\217\000\000\000\000\000p\004\020B\376\177\000\000\000
\000\000D\000\000\000p\004\020B\376\1
77\000\000\252i\217\000\000\000\000\000\002\000\000\000\002\000\000\000J\000\000\000\000\000\000"
querytree_list = 0x230c458
plantree_list = 0x230c4a8
receiver = 0x230c4d8
format = 0
dest = DestRemote
oldcontext = 0x22ef768
parsetree_list = 0x230c1e8
parsetree_item = 0x230c1c8
save_log_statement_stats = 0 '\000'
was_logged = 0 '\000'
isTopLevel = 1 '\001'
msec_str =
"\260\004\020B\376\177\000\000\177:\217\000\000\000\000\000\006\000\000\000D\000\000\000h\262\060\002\000\000\000"
__func__ = "exec_simple_query"
#20 0x00000000007a69b2 in PostgresMain (argc=1, argv=0x225a220,
dbname=0x225a0d8 "wsdb", username=0x225a0b8 "postgres") at postgres.c:4051
query_string = 0x230b268 "create index on
cgonzal.vvv_single_ks_sorted (q3c_ang2ipix(ra,dec));"
firstchar = 81
input_message = {data = 0x230b268 "create index on
cgonzal.vvv_single_ks_sorted (q3c_ang2ipix(ra,dec));", len = 69, maxlen =
1024,
cursor = 69}
local_sigjmp_buf = {{__jmpbuf = {0, 8080871256293546885, 4657712,
140730006768272, 0, 0, 8080871256322907013, -8081285930984760443},
__mask_was_saved = 1, __saved_mask = {__val = {0, 36017624, 0,
0, 0, 0, 1024, 0, 30064771199, 140730006767152, 9473335, 36150008,
36017624, 30064771088, 36150008, 36149992}}}}
send_ready_for_query = 0 '\000'
__func__ = "PostgresMain"
#21 0x0000000000732973 in BackendRun (port=0x22a3050) at postmaster.c:4255
av = 0x225a220
maxac = 2
ac = 1
secs = 552065929
usecs = 554900
i = 1
__func__ = "BackendRun"
#22 0x0000000000732106 in BackendStartup (port=0x22a3050) at
postmaster.c:3929
bn = 0x22a3230
---Type <return> to continue, or q <return> to quit---
pid = 0
__func__ = "BackendStartup"
#23 0x000000000072ea84 in ServerLoop () at postmaster.c:1699
port = 0x22a3050
i = 4
rmask = {fds_bits = {128, 0 <repeats 15 times>}}
selres = 1
now = 1498750719
readmask = {fds_bits = {248, 0 <repeats 15 times>}}
nSockets = 8
last_lockfile_recheck_time = 1498750679
last_touch_time = 1498750679
__func__ = "ServerLoop"
#24 0x000000000072e100 in PostmasterMain (argc=3, argv=0x2259310) at
postmaster.c:1307
opt = -1
status = 0
userDoption = 0x227ad40 "/mnt/bigdata/pgdata9.5"
listen_addr_saved = 1 '\001'
i = 64
output_config_variable = 0x0
__func__ = "PostmasterMain"
#25 0x000000000068ecda in main (argc=3, argv=0x2259310) at main.c:228
do_check_root = 1 '\001'
----
From a quick look of the code it looks to me that the reason for the bug is
the 32 bit int overflow in the j=2*i+1 calculation inside the
tuplesort_heap_siftup leading to negative values of j.
Regards,
Sergey Koposov
| From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
|---|---|
| To: | skoposov(at)cmu(dot)edu |
| Cc: | PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org> |
| Subject: | Re: BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit overflow |
| Date: | 2017-06-29 16:34:40 |
| Message-ID: | CAH2-Wzmdygq8tSN+qKtG+QFeRSzSLj2byT_oNqVOjzJN4+1H6Q@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-bugs |
On Thu, Jun 29, 2017 at 9:16 AM, <skoposov(at)cmu(dot)edu> wrote:
> I have a very large table (40e9 records) that I'm trying to create the index
> on and I am getting a segmentation fault that could be traced as far as I
> understand to a 32 bit int overflow in tuplesort_heap_siftup
Can your print the Tuplesortstate (the variable "state") within GDB,
and post it here?
--
Peter Geoghegan
| From: | Sergey Koposov <skoposov(at)cmu(dot)edu> |
|---|---|
| To: | "pg(at)bowt(dot)ie" <pg(at)bowt(dot)ie> |
| Cc: | "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org> |
| Subject: | Re: BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit overflow |
| Date: | 2017-06-29 16:50:31 |
| Message-ID: | 1498755031.21587.73.camel@cmu.edu |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-bugs |
On Thu, 2017-06-29 at 09:34 -0700, Peter Geoghegan wrote:
> On Thu, Jun 29, 2017 at 9:16 AM, <skoposov(at)cmu(dot)edu> wrote:
> > I have a very large table (40e9 records) that I'm trying to create the index
> > on and I am getting a segmentation fault that could be traced as far as I
> > understand to a 32 bit int overflow in tuplesort_heap_siftup
>
> Can your print the Tuplesortstate (the variable "state") within GDB,
> and post it here?
Here it is (it is a different run, as I closed the previous session,
but the bug is 100% reproduceable).
Program received signal SIGSEGV, Segmentation fault.
0x0000000000914cf8 in tuplesort_heap_siftup (state=0x234ffe8,
checkIndex=1 '\001') at tuplesort.c:3014
3014 HEAPCOMPARE(&memtuples[j], &memtuples[j
+ 1]) > 0)
(gdb) print (state)
$1 = (Tuplesortstate *) 0x234ffe8
(gdb) print (*state)
$2 = {status = TSS_BUILDRUNS, nKeys = 1, randomAccess = 0 '\000',
bounded = 0 '\000', boundUsed = 0 '\000', bound = 0, availMem =
-6442450776,
allowedMem = 75161927680, maxTapes = 262144, tapeRange = 262143,
sortcontext = 0x234e5c8, tapeset = 0x7fbf032a3048,
comparetup = 0x918d8a <comparetup_index_btree>, copytup = 0x919bbc
<copytup_index>, writetup = 0x91a243 <writetup_index>,
readtup = 0x91a31f <readtup_index>, memtuples = 0x7fb283aa1048,
memtupcount = 1342177275, memtupsize = 1342177279, growmemtuples = 0
'\000',
currentRun = 0, mergeactive = 0x7fc80aab9048 "", mergenext =
0x7fc80a23e048, mergelast = 0x7fc809ebc048, mergeavailslots =
0x7fc809dbb048,
mergeavailmem = 0x7fc8096ba048, mergefreelist = 0, mergefirstfree = 0,
Level = 1, destTape = 0, tp_fib = 0x7fc8095b9048, tp_runs =
0x7fc8094b8048,
tp_dummy = 0x7fc8093b7048, tp_tapenum = 0x7fc8081b7048, activeTapes =
0, result_tape = -1, current = 0, eof_reached = 0 '\000', markpos_block
= 0,
markpos_offset = 0, markpos_eof = 0 '\000', tupDesc = 0x0, sortKeys =
0x2350288, onlyKey = 0x0, abbrevNext = 10, indexInfo = 0x0, estate =
0x0,
heapRel = 0x7fd040f32f78, indexRel = 0x7fd040f3b8e0, enforceUnique = 0
'\000', hash_mask = 0, datumType = 0, datumTypeLen = 0,
datumTypeByVal = 0 '\000', ru_start = {tv = {tv_sec = 0, tv_usec = 0},
ru = {ru_utime = {tv_sec = 0, tv_usec = 0}, ru_stime = {tv_sec = 0,
tv_usec = 0}, ru_maxrss = 0, ru_ixrss = 0, ru_idrss = 0,
ru_isrss = 0, ru_minflt = 0, ru_majflt = 0, ru_nswap = 0, ru_inblock =
0,
ru_oublock = 0, ru_msgsnd = 0, ru_msgrcv = 0, ru_nsignals = 0,
ru_nvcsw = 0, ru_nivcsw = 0}}}
Regards,
Sergey
| From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
|---|---|
| To: | skoposov(at)cmu(dot)edu |
| Cc: | PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org> |
| Subject: | Re: BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit overflow |
| Date: | 2017-06-29 17:00:41 |
| Message-ID: | CAH2-Wz=wjX9ADNR6V6eoTqC0J0FZo+VxmU2PPryMJKjZW=8wxg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-bugs |
On Thu, Jun 29, 2017 at 9:16 AM, <skoposov(at)cmu(dot)edu> wrote:
> From a quick look of the code it looks to me that the reason for the bug is
> the 32 bit int overflow in the j=2*i+1 calculation inside the
> tuplesort_heap_siftup leading to negative values of j.
It seems likely that the explanation is as simple as that. This
happens during run generation with replacement selection. All versions
are affected, but version 9.6+ is dramatically less likely to be
affected, because replacement selection was all but killed in Postgres
9.6.
This is an oversight in commit 263865a. The fix is to use a variable
that won't overflow in tuplesort_heap_siftup() -- this is probably a
one-liner, because when the variable overflows today, the correct
behavior would be for control to break out of the loop that declares
the overflowing variable "j", and, I don't see any similar problem in
other heap maintenance routines. It's a very isolated problem.
I could write a patch.
--
Peter Geoghegan
| From: | Sergey Koposov <skoposov(at)cmu(dot)edu> |
|---|---|
| To: | "pg(at)bowt(dot)ie" <pg(at)bowt(dot)ie> |
| Cc: | "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org> |
| Subject: | Re: BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit overflow |
| Date: | 2017-06-29 17:50:36 |
| Message-ID: | 1498758636.21587.82.camel@cmu.edu |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-bugs |
On Thu, 2017-06-29 at 10:00 -0700, Peter Geoghegan wrote:
> On Thu, Jun 29, 2017 at 9:16 AM, <skoposov(at)cmu(dot)edu> wrote:
> > From a quick look of the code it looks to me that the reason for the bug is
> > the 32 bit int overflow in the j=2*i+1 calculation inside the
> > tuplesort_heap_siftup leading to negative values of j.
>
> It seems likely that the explanation is as simple as that. This
> happens during run generation with replacement selection. All versions
> are affected, but version 9.6+ is dramatically less likely to be
> affected, because replacement selection was all but killed in Postgres
> 9.6.
>
> This is an oversight in commit 263865a. The fix is to use a variable
> that won't overflow in tuplesort_heap_siftup() -- this is probably a
> one-liner, because when the variable overflows today, the correct
> behavior would be for control to break out of the loop that declares
> the overflowing variable "j", and, I don't see any similar problem in
> other heap maintenance routines. It's a very isolated problem.
>
> I could write a patch.
For the time being I've just changed the type of i,j from int to long
(or int64) and I am running the index creation now. I let you submit a
patch -- thanks in advance.
I also noticed that the availMem variable was negative in the printout
of the TupleSortState.
availMem =-6442450776,
I don't know whether that's an issue on its own or was caused by the
(i,j) overflow. (availMem seems to be int64 variable though).
Sergey
| From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
|---|---|
| To: | Sergey Koposov <skoposov(at)cmu(dot)edu> |
| Cc: | "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org> |
| Subject: | Re: BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit overflow |
| Date: | 2017-06-29 18:10:13 |
| Message-ID: | CAH2-WznbKHkY-sEFHjjdtLzF_tVeaXVn9nvSRKZDrJUnBLUFPg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-bugs |
On Thu, Jun 29, 2017 at 10:50 AM, Sergey Koposov <skoposov(at)cmu(dot)edu> wrote:
> For the time being I've just changed the type of i,j from int to long
> (or int64) and I am running the index creation now. I let you submit a
> patch -- thanks in advance.
That's more or less what I had in mind.
> I also noticed that the availMem variable was negative in the printout
> of the TupleSortState.
> availMem =-6442450776,
> I don't know whether that's an issue on its own or was caused by the
> (i,j) overflow. (availMem seems to be int64 variable though).
That's definitely allowed to go negative, which is why it's int64.
That's about 6GB of memory, though, which seems unusually large.
--
Peter Geoghegan
| From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
|---|---|
| To: | Sergey Koposov <skoposov(at)cmu(dot)edu> |
| Cc: | PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org> |
| Subject: | Re: BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit overflow |
| Date: | 2017-06-29 18:59:12 |
| Message-ID: | CAH2-Wzneyg=+zVr9TJSg3sYm=XHu43Kn1zExA5kge57mNDk7MA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-bugs |
On Thu, Jun 29, 2017 at 9:16 AM, <skoposov(at)cmu(dot)edu> wrote:
> Here are the commands leading to the crash:
>
> wsdb=# set maintenance_work_mem to '70GB';
>
> SET
> wsdb=# create index on cgonzal.vvv_single_ks_sorted (q3c_ang2ipix(ra,dec));
>
>
> ----
>
> Importantly the table has already been sorted by q3c_ang2ipix(ra,dec) !
BTW, given the exact details of what you're doing here, using only 1MB
of maintenance_work_mem will probably make the operation complete
sooner.
--
Peter Geoghegan
| From: | Sergey Koposov <skoposov(at)cmu(dot)edu> |
|---|---|
| To: | "pg(at)bowt(dot)ie" <pg(at)bowt(dot)ie> |
| Cc: | "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org> |
| Subject: | Re: BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit overflow |
| Date: | 2017-06-29 19:56:02 |
| Message-ID: | 1498766162.21587.98.camel@cmu.edu |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-bugs |
On Thu, 2017-06-29 at 11:59 -0700, Peter Geoghegan wrote:
> On Thu, Jun 29, 2017 at 9:16 AM, <skoposov(at)cmu(dot)edu> wrote:
> > Here are the commands leading to the crash:
> >
> > wsdb=# set maintenance_work_mem to '70GB';
> >
> > SET
> > wsdb=# create index on cgonzal.vvv_single_ks_sorted (q3c_ang2ipix(ra,dec));
> >
> >
> > ----
> >
> > Importantly the table has already been sorted by q3c_ang2ipix(ra,dec) !
>
> BTW, given the exact details of what you're doing here, using only 1MB
> of maintenance_work_mem will probably make the operation complete
> sooner.
>
>
Thank you. It does seem to be ~ 50% faster.
It is somewhat unfortunate that maintenance_work_mem is
counterproductive in this scenario, given that indexing an ordered table
is a natural step in clustering a table...
Sergey
| From: | Sergey Koposov <skoposov(at)cmu(dot)edu> |
|---|---|
| To: | "pg(at)bowt(dot)ie" <pg(at)bowt(dot)ie> |
| Cc: | "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org> |
| Subject: | Re: BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit overflow |
| Date: | 2017-07-05 19:06:19 |
| Message-ID: | 1499281579.8258.32.camel@cmu.edu |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-bugs |
On Thu, 2017-06-29 at 10:00 -0700, Peter Geoghegan wrote:
> On Thu, Jun 29, 2017 at 9:16 AM, <skoposov(at)cmu(dot)edu> wrote:
> > From a quick look of the code it looks to me that the reason for the bug is
> > the 32 bit int overflow in the j=2*i+1 calculation inside the
> > tuplesort_heap_siftup leading to negative values of j.
>
> It seems likely that the explanation is as simple as that. This
> happens during run generation with replacement selection. All versions
> are affected, but version 9.6+ is dramatically less likely to be
> affected, because replacement selection was all but killed in Postgres
> 9.6.
>
> This is an oversight in commit 263865a. The fix is to use a variable
> that won't overflow in tuplesort_heap_siftup() -- this is probably a
> one-liner, because when the variable overflows today, the correct
> behavior would be for control to break out of the loop that declares
> the overflowing variable "j", and, I don't see any similar problem in
> other heap maintenance routines. It's a very isolated problem.
>
> I could write a patch.
Just to avoid being forgotten, I attach a trivial patch against 9.5
branch as well as have created a commitfest submission
https://commitfest.postgresql.org/14/1189/
The script below allows to reproduce the bug (segfault) and test that
the patch fixes it: (>~70 GB of RAM are needed and 100+GB of disk
space)
---
create table xx3 as
select generate_series as a
from generate_series(0,(1.5*((1::bigint)<<31))::bigint);
set maintenance_work_mem to '70GB';
create index on xx3(a);
----
Hopefully somebody can take care of patching other PG branches.
Sergey
| Attachment | Content-Type | Size |
|---|---|---|
| x.patch | text/x-patch | 901 bytes |
| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
|---|---|
| To: | Sergey Koposov <skoposov(at)cmu(dot)edu> |
| Cc: | "pg(at)bowt(dot)ie" <pg(at)bowt(dot)ie>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org> |
| Subject: | Re: BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit overflow |
| Date: | 2017-07-05 22:03:56 |
| Message-ID: | 28906.1499292236@sss.pgh.pa.us |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-bugs |
Sergey Koposov <skoposov(at)cmu(dot)edu> writes:
> On Thu, 2017-06-29 at 10:00 -0700, Peter Geoghegan wrote:
>> This is an oversight in commit 263865a. The fix is to use a variable
>> that won't overflow in tuplesort_heap_siftup() -- this is probably a
>> one-liner, because when the variable overflows today, the correct
>> behavior would be for control to break out of the loop that declares
>> the overflowing variable "j", and, I don't see any similar problem in
>> other heap maintenance routines. It's a very isolated problem.
>>
>> I could write a patch.
> Just to avoid being forgotten, I attach a trivial patch against 9.5
> branch as well as have created a commitfest submission
> https://commitfest.postgresql.org/14/1189/
I don't like s/int/int64/g as a fix for this. That loop is probably
a hot spot, and this fix is going to be expensive on any machine where
int64 isn't the native word width. How about something like this instead:
- int j = 2 * i + 1;
+ int j;
+ if (unlikely(i > INT_MAX / 2))
+ break; /* if j would overflow, we're done */
+ j = 2 * i + 1;
if (j >= n)
break;
regards, tom lane
| From: | Andres Freund <andres(at)anarazel(dot)de> |
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
| Cc: | Sergey Koposov <skoposov(at)cmu(dot)edu>, "pg(at)bowt(dot)ie" <pg(at)bowt(dot)ie>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org> |
| Subject: | Re: BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit overflow |
| Date: | 2017-07-05 22:14:14 |
| Message-ID: | 20170705221414.bgnw73z2mwkjwor7@alap3.anarazel.de |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-bugs |
On 2017-07-05 18:03:56 -0400, Tom Lane wrote:
> Sergey Koposov <skoposov(at)cmu(dot)edu> writes:
> > On Thu, 2017-06-29 at 10:00 -0700, Peter Geoghegan wrote:
> >> This is an oversight in commit 263865a. The fix is to use a variable
> >> that won't overflow in tuplesort_heap_siftup() -- this is probably a
> >> one-liner, because when the variable overflows today, the correct
> >> behavior would be for control to break out of the loop that declares
> >> the overflowing variable "j", and, I don't see any similar problem in
> >> other heap maintenance routines. It's a very isolated problem.
> >>
> >> I could write a patch.
>
> > Just to avoid being forgotten, I attach a trivial patch against 9.5
> > branch as well as have created a commitfest submission
> > https://commitfest.postgresql.org/14/1189/
>
> I don't like s/int/int64/g as a fix for this. That loop is probably
> a hot spot, and this fix is going to be expensive on any machine where
> int64 isn't the native word width. How about something like this instead:
>
> - int j = 2 * i + 1;
> + int j;
>
> + if (unlikely(i > INT_MAX / 2))
> + break; /* if j would overflow, we're done */
> + j = 2 * i + 1;
> if (j >= n)
> break;
Isn't an added conditional likely going to be more costly than the
s/32/64/ bit calculations on the majority of machines pg runs on? I'm
quite doubtful that it's worth catering for the few cases where that's
really slow.
- Andres
| From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
|---|---|
| To: | Andres Freund <andres(at)anarazel(dot)de> |
| Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Sergey Koposov <skoposov(at)cmu(dot)edu>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org> |
| Subject: | Re: BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit overflow |
| Date: | 2017-07-05 22:19:57 |
| Message-ID: | CAH2-Wzn19GKyEyNr3dXeazbBSstEEtaBO3MCQWwFqZs_bb52Zw@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-bugs |
On Wed, Jul 5, 2017 at 3:14 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> Isn't an added conditional likely going to be more costly than the
> s/32/64/ bit calculations on the majority of machines pg runs on? I'm
> quite doubtful that it's worth catering for the few cases where that's
> really slow.
I doubt that myself. Especially prior to Postgres 10, where merging
will have tuplesort_heap_insert() as the bottleneck.
In Postgres 10, tuplesort external sort run merging became much faster
following commit 24598337c8d. It might be noticeable if such a machine
were using Postgres 10 already, and really had something to lose, but
that seems very unlikely.
--
Peter Geoghegan
| From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
|---|---|
| To: | Andres Freund <andres(at)anarazel(dot)de> |
| Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Sergey Koposov <skoposov(at)cmu(dot)edu>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org> |
| Subject: | Re: BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit overflow |
| Date: | 2017-07-05 22:35:30 |
| Message-ID: | CAH2-Wz=ad_ZLvzjJ0sXUS0cBgAe7xWsMh-CpDzL9t7gq1gQ=-A@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-bugs |
On Wed, Jul 5, 2017 at 3:19 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> I doubt that myself. Especially prior to Postgres 10, where merging
> will have tuplesort_heap_insert() as the bottleneck.
BTW, I do of course understand that the reporter isn't talking about
merging, but rather is talking about run generation using replacement
selection. Replacement selection is more or less obsolete, and was
more or less removed in Postgres 9.6. I don't think that its needs
should be given much weight here; this is mostly about merging.
I wanted to make merging use its own dedicated function within the
patch that became 24598337c8d, and leave tuplesort_heap_insert() and
tuplesort_heap_siftup() as things used only by replacement selection +
Top-N heap sort, but Heikki overruled me on that. He might have been
right about that. I'm not sure, and never took the time to follow up
with it.
tuplesort_heap_replace_top(), which is where this logic lives now that
tuplesort_heap_siftup() was broken up, certainly is a red-hot code
path where individual instructions could matter. I've looked at the
disassembly of the function in the past, which is rare for me.
--
Peter Geoghegan
| From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
|---|---|
| To: | Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
| Cc: | Sergey Koposov <skoposov(at)cmu(dot)edu>, "pg(at)bowt(dot)ie" <pg(at)bowt(dot)ie>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org> |
| Subject: | Re: BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit overflow |
| Date: | 2017-07-12 13:15:10 |
| Message-ID: | 3de75af4-a49f-36d1-347f-f170693ce6f5@iki.fi |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-bugs |
On 07/06/2017 01:14 AM, Andres Freund wrote:
> On 2017-07-05 18:03:56 -0400, Tom Lane wrote:
>> I don't like s/int/int64/g as a fix for this. That loop is probably
>> a hot spot, and this fix is going to be expensive on any machine where
>> int64 isn't the native word width. How about something like this instead:
>>
>> - int j = 2 * i + 1;
>> + int j;
>>
>> + if (unlikely(i > INT_MAX / 2))
>> + break; /* if j would overflow, we're done */
>> + j = 2 * i + 1;
>> if (j >= n)
>> break;
>
> Isn't an added conditional likely going to be more costly than the
> s/32/64/ bit calculations on the majority of machines pg runs on? I'm
> quite doubtful that it's worth catering for the few cases where that's
> really slow.
Another option to use "unsigned int", on the assumption that UINT_MAX >=
INT_MAX * 2 + 1. And to eliminate that assumption, we can use (UINT_MAX
- 1) / 2 as the maximum size of the memtuples array, rather than INT_MAX.
- Heikki
| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
|---|---|
| To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
| Cc: | Andres Freund <andres(at)anarazel(dot)de>, Sergey Koposov <skoposov(at)cmu(dot)edu>, "pg(at)bowt(dot)ie" <pg(at)bowt(dot)ie>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org> |
| Subject: | Re: BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit overflow |
| Date: | 2017-07-12 15:42:04 |
| Message-ID: | 4817.1499874124@sss.pgh.pa.us |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-bugs |
Heikki Linnakangas <hlinnaka(at)iki(dot)fi> writes:
> On 07/06/2017 01:14 AM, Andres Freund wrote:
>> On 2017-07-05 18:03:56 -0400, Tom Lane wrote:
>>> I don't like s/int/int64/g as a fix for this. That loop is probably
>>> a hot spot, and this fix is going to be expensive on any machine where
>>> int64 isn't the native word width. How about something like this instead:
> Another option to use "unsigned int", on the assumption that UINT_MAX >=
> INT_MAX * 2 + 1.
Ah, that seems like a fine idea.
> And to eliminate that assumption, we can use (UINT_MAX
> - 1) / 2 as the maximum size of the memtuples array, rather than INT_MAX.
Uh ... what assumption? That's certainly true on any twos-complement
machine. Besides, if you're worried about hypothetical portability
issues, I'm not sure it's any better to assume that (UINT_MAX - 1) / 2
fits in a signed int.
regards, tom lane
| From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
| Cc: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Andres Freund <andres(at)anarazel(dot)de>, Sergey Koposov <skoposov(at)cmu(dot)edu>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org> |
| Subject: | Re: BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit overflow |
| Date: | 2017-07-12 15:47:57 |
| Message-ID: | CAH2-WzkNs03GLMnv6pQZY8f2vKgjhsjmKByfQX6E27HCpjhKow@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-bugs |
On Wed, Jul 12, 2017 at 8:42 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Another option to use "unsigned int", on the assumption that UINT_MAX >=
>> INT_MAX * 2 + 1.
>
> Ah, that seems like a fine idea.
Works for me.
--
Peter Geoghegan
| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
|---|---|
| To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
| Cc: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Andres Freund <andres(at)anarazel(dot)de>, Sergey Koposov <skoposov(at)cmu(dot)edu>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org> |
| Subject: | Re: BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit overflow |
| Date: | 2017-07-12 16:14:54 |
| Message-ID: | 8081.1499876094@sss.pgh.pa.us |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-bugs |
Peter Geoghegan <pg(at)bowt(dot)ie> writes:
> On Wed, Jul 12, 2017 at 8:42 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Another option to use "unsigned int", on the assumption that UINT_MAX >=
>>> INT_MAX * 2 + 1.
>> Ah, that seems like a fine idea.
> Works for me.
I'll go make it so, unless Heikki's already on it?
regards, tom lane
| From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Geoghegan <pg(at)bowt(dot)ie> |
| Cc: | Andres Freund <andres(at)anarazel(dot)de>, Sergey Koposov <skoposov(at)cmu(dot)edu>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org> |
| Subject: | Re: BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit overflow |
| Date: | 2017-07-12 16:20:53 |
| Message-ID: | 007e3020-3d27-8ac8-7269-9db7ebe9a8bf@iki.fi |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-bugs |
On 07/12/2017 07:14 PM, Tom Lane wrote:
> Peter Geoghegan <pg(at)bowt(dot)ie> writes:
>> On Wed, Jul 12, 2017 at 8:42 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>> Another option to use "unsigned int", on the assumption that UINT_MAX >=
>>>> INT_MAX * 2 + 1.
>
>>> Ah, that seems like a fine idea.
>
>> Works for me.
>
> I'll go make it so, unless Heikki's already on it?
I'm not. Thanks!
>> And to eliminate that assumption, we can use (UINT_MAX
>> - 1) / 2 as the maximum size of the memtuples array, rather than INT_MAX.
>
> Uh ... what assumption? That's certainly true on any twos-complement
> machine. Besides, if you're worried about hypothetical portability
> issues, ...
Right, it's a hypothetical portability issue. The assumption we're
making is that UINT_MAX >= INT_MAX * 2 + 1. I'm not aware of any system
where it's not true, but I don't know what the C standards say about that.
> ... I'm not sure it's any better to assume that (UINT_MAX - 1) / 2
> fits in a signed int.
Well, you could do Min(INT_MAX, (UINT_MAX - 1 / 2). Or just add a
StaticAssertion for it. Or just note in a comment that we're making that
assumption.
- Heikki
| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
|---|---|
| To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
| Cc: | Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>, Sergey Koposov <skoposov(at)cmu(dot)edu>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org> |
| Subject: | Re: BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit overflow |
| Date: | 2017-07-12 16:46:56 |
| Message-ID: | 15154.1499878016@sss.pgh.pa.us |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-bugs |
Heikki Linnakangas <hlinnaka(at)iki(dot)fi> writes:
> On 07/12/2017 07:14 PM, Tom Lane wrote:
>> Uh ... what assumption? That's certainly true on any twos-complement
>> machine. Besides, if you're worried about hypothetical portability
>> issues, ...
> Right, it's a hypothetical portability issue. The assumption we're
> making is that UINT_MAX >= INT_MAX * 2 + 1. I'm not aware of any system
> where it's not true, but I don't know what the C standards say about that.
Actually, at the top of the loop we have i < n <= INT_MAX, so the
assumption only needs to be UINT_MAX >= INT_MAX * 2 - 1.
My copy of the C99 draft says, among other things
6.2.5 Types
[#6] For each of the signed integer types, there is a
corresponding (but different) unsigned integer type
(designated with the keyword unsigned) that uses the same
amount of storage (including sign information) and has the
same alignment requirements.
[#9] The range of nonnegative values of a signed integer
type is a subrange of the corresponding unsigned integer
type, and the representation of the same value in each type
is the same.
6.2.6.2 Integer types
[#1] For unsigned integer types other than unsigned char,
the bits of the object representation shall be divided into
two groups: value bits and padding bits (there need not be
any of the latter). If there are N value bits, each bit
shall represent a different power of 2 between 1 and 2N-1,
so that objects of that type shall be capable of
representing values from 0 to 2N-1 using a pure binary
representation; this shall be known as the value
representation. The values of any padding bits are
unspecified.37)
[#2] For signed integer types, the bits of the object
representation shall be divided into three groups: value
bits, padding bits, and the sign bit. There need not be any
padding bits; there shall be exactly one sign bit. Each bit
that is a value bit shall have the same value as the same
bit in the object representation of the corresponding
unsigned type (if there are M value bits in the signed type
and N in the unsigned type, then M<=N). If the sign bit is
zero, it shall not affect the resulting value. If the sign
bit is one, then the value shall be modified in one of the
following ways:
-- the corresponding value with sign bit 0 is negated;
-- the sign bit has the value -2N;
-- the sign bit has the value 1-2N.
[#3] The values of any padding bits are unspecified.
Both 6.2.5.9 and 6.2.6.2.2 constrain INT_MAX to be <= UINT_MAX.
In principle, you could have a conforming implementation in which
they were the same, and the sign bit of a signed integer was treated
as a useless padding bit in an unsigned integer. I can't imagine
anyone actually building it that way though; what would be the point?
But as long as there aren't wasted bits in an unsigned int, these
rules require INT_MAX to be <= UINT_MAX/2, AFAICS.
regards, tom lane
| From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
|---|---|
| To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
| Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Sergey Koposov <skoposov(at)cmu(dot)edu>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org> |
| Subject: | Re: BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit overflow |
| Date: | 2017-07-12 16:48:29 |
| Message-ID: | CAH2-WzmxAi3R+JEbdMr17h=Ecf=aVywYi138Qnym=EyG2nwOpQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-bugs |
On Wed, Jul 12, 2017 at 9:20 AM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>> Uh ... what assumption? That's certainly true on any twos-complement
>> machine. Besides, if you're worried about hypothetical portability
>> issues, ...
>
>
> Right, it's a hypothetical portability issue. The assumption we're making is
> that UINT_MAX >= INT_MAX * 2 + 1. I'm not aware of any system where it's not
> true, but I don't know what the C standards say about that.
Intuitively, it seems very likely to be true, since two's complement
arithmetic is already assumed by Postgres, and addition and
multiplication work the same way at the instruction level for unsigned
and signed (two's complement) integers. In order for this to break,
int and unsigned int would have to have different widths.
> Well, you could do Min(INT_MAX, (UINT_MAX - 1 / 2). Or just add a
> StaticAssertion for it. Or just note in a comment that we're making that
> assumption.
I like the idea of a StaticAssertion().
--
Peter Geoghegan
| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
|---|---|
| To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
| Cc: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Andres Freund <andres(at)anarazel(dot)de>, Sergey Koposov <skoposov(at)cmu(dot)edu>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org> |
| Subject: | Re: BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit overflow |
| Date: | 2017-07-12 16:53:08 |
| Message-ID: | 15372.1499878388@sss.pgh.pa.us |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-bugs |
Peter Geoghegan <pg(at)bowt(dot)ie> writes:
> On Wed, Jul 12, 2017 at 9:20 AM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>> Right, it's a hypothetical portability issue. The assumption we're making is
>> that UINT_MAX >= INT_MAX * 2 + 1. I'm not aware of any system where it's not
>> true, but I don't know what the C standards say about that.
> Intuitively, it seems very likely to be true, since two's complement
> arithmetic is already assumed by Postgres,
... right. There haven't been any non-twos-complement machines in the
wild for probably 50 years, and even if there were, this would be *way*
down the list of problems you'd have to fix to get Postgres to run on
one of them. I think a comment is plenty sufficient.
regards, tom lane
| From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
|---|---|
| To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
| Cc: | Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Sergey Koposov <skoposov(at)cmu(dot)edu>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org> |
| Subject: | Re: BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit overflow |
| Date: | 2017-07-12 17:16:02 |
| Message-ID: | CAH2-WzkqvpkJ6LYHaaZVUS_MwXsHBTOOEHdBE_hwQxzUqvoGbQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-bugs |
On Wed, Jul 12, 2017 at 6:15 AM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> Another option to use "unsigned int", on the assumption that UINT_MAX >=
> INT_MAX * 2 + 1. And to eliminate that assumption, we can use (UINT_MAX - 1)
> / 2 as the maximum size of the memtuples array, rather than INT_MAX.
FWIW, memtupcount is allowed to go negative. It just won't in this
function, per the assertion. So the fix should be specifically scoped
to only the one function that is affected. That's a good idea anyway,
of course.
--
Peter Geoghegan
| From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
|---|---|
| To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
| Cc: | Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Sergey Koposov <skoposov(at)cmu(dot)edu>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org> |
| Subject: | Re: BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit overflow |
| Date: | 2017-07-12 17:42:55 |
| Message-ID: | 3ab2dc21-8a01-aa00-8967-5cbfe92ae82c@iki.fi |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-bugs |
On 07/12/2017 08:16 PM, Peter Geoghegan wrote:
> On Wed, Jul 12, 2017 at 6:15 AM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>> Another option to use "unsigned int", on the assumption that UINT_MAX >=
>> INT_MAX * 2 + 1. And to eliminate that assumption, we can use (UINT_MAX - 1)
>> / 2 as the maximum size of the memtuples array, rather than INT_MAX.
>
> FWIW, memtupcount is allowed to go negative.
Huh, say what?
- Heikki
| From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
|---|---|
| To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
| Cc: | Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Sergey Koposov <skoposov(at)cmu(dot)edu>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org> |
| Subject: | Re: BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit overflow |
| Date: | 2017-07-12 17:44:33 |
| Message-ID: | CAH2-WzmhsMPfONa2fuWppsPtDuNJMDw7yO6M79RJG1bxKx6MgQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-bugs |
On Wed, Jul 12, 2017 at 10:42 AM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>> FWIW, memtupcount is allowed to go negative.
>
>
> Huh, say what?
Well, there are defenses against that case, both in assertions and in
code like tuplesort_heap_delete_top(). In that sense, it's allowed to
go negative.
--
Peter Geoghegan
| From: | David Gould <daveg(at)sonic(dot)net> |
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
| Cc: | Peter Geoghegan <pg(at)bowt(dot)ie>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Andres Freund <andres(at)anarazel(dot)de>, Sergey Koposov <skoposov(at)cmu(dot)edu>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org> |
| Subject: | Re: BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit overflow |
| Date: | 2017-07-14 21:17:45 |
| Message-ID: | 20170714141745.13954776@engels |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-bugs |
On Wed, 12 Jul 2017 12:53:08 -0400
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> ... right. There haven't been any non-twos-complement machines in the
> wild for probably 50 years, and even if there were, this would be *way*
> down the list of problems you'd have to fix to get Postgres to run on
> one of them.
Not quite 50 years. In 1979 had the "pleasure" of working at Bechtel on a
Univac 1110. Univac 1100 seris are ones-complement (with both positive
and negative zero!) with 36 bit longs, 18 bit ints and depending on character
mode either 9 bit ASCII or 6 bit FIELDDATA chars.
Not even one year. UNISYS are still marketing this architecture as the UNISYS
ClearPath IX series, you can order one today.
Apparently it is true in computing that nothing ever dies.
These were actually fascinatingly weird machines, almost everything is
different from what we are used to:
https://en.wikipedia.org/wiki/UNIVAC_1100/2200_series
Still, I think it is safe to wait until someone actually pays for a
postgresql port before considering ones-complement issues.
-dg
--
David Gould daveg(at)sonic(dot)net
If simplicity worked, the world would be overrun with insects.
| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
|---|---|
| To: | David Gould <daveg(at)sonic(dot)net> |
| Cc: | Peter Geoghegan <pg(at)bowt(dot)ie>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Andres Freund <andres(at)anarazel(dot)de>, Sergey Koposov <skoposov(at)cmu(dot)edu>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org> |
| Subject: | Re: BUG #14722: Segfault in tuplesort_heap_siftup, 32 bit overflow |
| Date: | 2017-07-14 21:45:30 |
| Message-ID: | 5105.1500068730@sss.pgh.pa.us |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-bugs |
David Gould <daveg(at)sonic(dot)net> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> ... right. There haven't been any non-twos-complement machines in the
>> wild for probably 50 years, and even if there were, this would be *way*
>> down the list of problems you'd have to fix to get Postgres to run on
>> one of them.
> Not quite 50 years. In 1979 had the "pleasure" of working at Bechtel on a
> Univac 1110. Univac 1100 seris are ones-complement (with both positive
> and negative zero!) with 36 bit longs, 18 bit ints and depending on character
> mode either 9 bit ASCII or 6 bit FIELDDATA chars.
Fun stuff. Other than the ones-complement choice, this smells quite a bit
like the PDP-10 gear I used to use back when.
> Not even one year. UNISYS are still marketing this architecture as the UNISYS
> ClearPath IX series, you can order one today.
The recent models claim to be pure Intel though; if they're still
supporting the 1100 architecture, it must be through emulation.
> Still, I think it is safe to wait until someone actually pays for a
> postgresql port before considering ones-complement issues.
Yeah. I actually suspect that the weird word size would be a much bigger
headache for us than the ones-complement business. The other small
problem is that as far as I could find, they never went past a 24-bit
address space, which would make for at most 64MB worth of memory (with
9-bit "bytes"). In principle you could probably run modern Postgres
with so little RAM, but it wouldn't be of any real use.
regards, tom lane