Comparing changes

Presently, "pg_dump --format=custom" calls WriteToc() twice. The second call updates the data offset information, which allegedly makes parallel pg_restore significantly faster. However, if we're not dumping any data, there are no data offsets to update, so we can skip this step. Reviewed-by: Jeff Davis <pgsql@j-davis.com> Discussion: https://postgr.es/m/Z9c1rbzZegYQTOQE%40nathan

Right now, pg_dump stores all generated commands for statistics in memory. These commands can be quite large and therefore can significantly increase pg_dump's memory footprint. To fix, wait until we are about to write out the commands before generating them, and be sure to free the commands after writing. This is implemented via a new defnDumper callback that works much like the dataDumper one but is specifically designed for TOC entries. Custom dumps that include data might write the TOC twice (to update data offset information), which would ordinarily cause pg_dump to run the attribute statistics queries twice. However, as a hack, we save the length of the written-out entry in the first pass and skip over it in the second. While there is no known technical issue with executing the queries multiple times and rewriting the results, it's expensive and feels risky, so let's avoid it. As an exception, we _do_ execute the queries twice for the tar format. This format does a second pass through the TOC to generate the restore.sql file. pg_restore doesn't use this file, so even if the second round of queries returns different results than the first, it won't corrupt the output; the archive and restore.sql file will just have different content. Author: Corey Huinker <corey.huinker@gmail.com> Co-authored-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Jeff Davis <pgsql@j-davis.com> Discussion: https://postgr.es/m/CADkLM%3Dc%2Br05srPy9w%2B-%2BnbmLEo15dKXYQ03Q_xyK%2BriJerigLQ%40mail.gmail.com

Currently, pg_dump gathers attribute statistics with a query per relation, which can cause pg_dump to take significantly longer, especially when there are many tables. This commit address this by teaching pg_dump to gather attribute statistics for 64 relations at a time. Some simple tests showed this was the optimal batch size, but performance may vary depending on the workload. While this change increases pg_dump's memory usage a bit, it isn't expected to be too egregious and seems well worth the trade-off. Our lookahead code determines the next batch of relations by searching the TOC sequentially for relevant entries. This approach assumes that we will dump all such entries in TOC order, which unfortunately isn't true for dump formats that use RestoreArchive(). RestoreArchive() does multiple passes through the TOC and selectively dumps certain groups of entries each time. This is particularly problematic for index stats and a subset of matview stats; both are in SECTION_POST_DATA, but matview stats that depend on matview data are dumped in RESTORE_PASS_POST_ACL, while all other stats data entries are dumped in RESTORE_PASS_MAIN. To handle this, this commit moves all statistics data entries in SECTION_POST_DATA to RESTORE_PASS_POST_ACL, which ensures that we always dump them in TOC order. A convenient side effect of this change is that we can revert a decent chunk of commit a0a4601, but that is left for a follow-up commit. Author: Corey Huinker <corey.huinker@gmail.com> Co-authored-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Jeff Davis <pgsql@j-davis.com> Discussion: https://postgr.es/m/CADkLM%3Dc%2Br05srPy9w%2B-%2BnbmLEo15dKXYQ03Q_xyK%2BriJerigLQ%40mail.gmail.com

Thanks to commit XXXXXXXXXX, which simplified some code in _tocEntryRestorePass(), we can remove the now-unused ArchiveHandle parameter from _tocEntryRestorePass() and move_to_ready_heap(). Reviewed-by: Jeff Davis <pgsql@j-davis.com> Discussion: https://postgr.es/m/Z-3x2AnPCP331JA3%40nathan

This branch was automatically generated by a robot using patches from an email thread registered at: https://commitfest.postgresql.org/patch/5678 The branch will be overwritten each time a new patch version is posted to the thread, and also periodically to check for bitrot caused by changes on the master branch. Patch(es): https://www.postgresql.org/message-id/Z-9Bx3ml2i7OfHiN@nathan Author(s): Nathan Bossart

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comparing changes

Open a pull request

Commits on Apr 4, 2025

This comparison is taking too long to generate.

Uh oh!