-
Notifications
You must be signed in to change notification settings - Fork 2
Comparing changes
Open a pull request
base repository: postgresql-cfbot/postgresql
base: cf/5678~1
head repository: postgresql-cfbot/postgresql
compare: cf/5678
- 5 commits
- 5 files changed
- 3 contributors
Commits on Apr 4, 2025
-
Skip second WriteToc() call for custom-format dumps without data.
Presently, "pg_dump --format=custom" calls WriteToc() twice. The second call updates the data offset information, which allegedly makes parallel pg_restore significantly faster. However, if we're not dumping any data, there are no data offsets to update, so we can skip this step. Reviewed-by: Jeff Davis <pgsql@j-davis.com> Discussion: https://postgr.es/m/Z9c1rbzZegYQTOQE%40nathan
Configuration menu - View commit details
-
Copy full SHA for b4fd930 - Browse repository at this point
Copy the full SHA b4fd930View commit details -
pg_dump: Reduce memory usage of dumps with statistics.
Right now, pg_dump stores all generated commands for statistics in memory. These commands can be quite large and therefore can significantly increase pg_dump's memory footprint. To fix, wait until we are about to write out the commands before generating them, and be sure to free the commands after writing. This is implemented via a new defnDumper callback that works much like the dataDumper one but is specifically designed for TOC entries. Custom dumps that include data might write the TOC twice (to update data offset information), which would ordinarily cause pg_dump to run the attribute statistics queries twice. However, as a hack, we save the length of the written-out entry in the first pass and skip over it in the second. While there is no known technical issue with executing the queries multiple times and rewriting the results, it's expensive and feels risky, so let's avoid it. As an exception, we _do_ execute the queries twice for the tar format. This format does a second pass through the TOC to generate the restore.sql file. pg_restore doesn't use this file, so even if the second round of queries returns different results than the first, it won't corrupt the output; the archive and restore.sql file will just have different content. Author: Corey Huinker <corey.huinker@gmail.com> Co-authored-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Jeff Davis <pgsql@j-davis.com> Discussion: https://postgr.es/m/CADkLM%3Dc%2Br05srPy9w%2B-%2BnbmLEo15dKXYQ03Q_xyK%2BriJerigLQ%40mail.gmail.com
Configuration menu - View commit details
-
Copy full SHA for 606e543 - Browse repository at this point
Copy the full SHA 606e543View commit details -
pg_dump: Retrieve attribute statistics in batches.
Currently, pg_dump gathers attribute statistics with a query per relation, which can cause pg_dump to take significantly longer, especially when there are many tables. This commit address this by teaching pg_dump to gather attribute statistics for 64 relations at a time. Some simple tests showed this was the optimal batch size, but performance may vary depending on the workload. While this change increases pg_dump's memory usage a bit, it isn't expected to be too egregious and seems well worth the trade-off. Our lookahead code determines the next batch of relations by searching the TOC sequentially for relevant entries. This approach assumes that we will dump all such entries in TOC order, which unfortunately isn't true for dump formats that use RestoreArchive(). RestoreArchive() does multiple passes through the TOC and selectively dumps certain groups of entries each time. This is particularly problematic for index stats and a subset of matview stats; both are in SECTION_POST_DATA, but matview stats that depend on matview data are dumped in RESTORE_PASS_POST_ACL, while all other stats data entries are dumped in RESTORE_PASS_MAIN. To handle this, this commit moves all statistics data entries in SECTION_POST_DATA to RESTORE_PASS_POST_ACL, which ensures that we always dump them in TOC order. A convenient side effect of this change is that we can revert a decent chunk of commit a0a4601, but that is left for a follow-up commit. Author: Corey Huinker <corey.huinker@gmail.com> Co-authored-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Jeff Davis <pgsql@j-davis.com> Discussion: https://postgr.es/m/CADkLM%3Dc%2Br05srPy9w%2B-%2BnbmLEo15dKXYQ03Q_xyK%2BriJerigLQ%40mail.gmail.com
Configuration menu - View commit details
-
Copy full SHA for a1bd2a8 - Browse repository at this point
Copy the full SHA a1bd2a8View commit details -
Partially revert commit a0a4601.
Thanks to commit XXXXXXXXXX, which simplified some code in _tocEntryRestorePass(), we can remove the now-unused ArchiveHandle parameter from _tocEntryRestorePass() and move_to_ready_heap(). Reviewed-by: Jeff Davis <pgsql@j-davis.com> Discussion: https://postgr.es/m/Z-3x2AnPCP331JA3%40nathan
Configuration menu - View commit details
-
Copy full SHA for 183c288 - Browse repository at this point
Copy the full SHA 183c288View commit details -
[CF 5678] optimizations for dumping statistics
This branch was automatically generated by a robot using patches from an email thread registered at: https://commitfest.postgresql.org/patch/5678 The branch will be overwritten each time a new patch version is posted to the thread, and also periodically to check for bitrot caused by changes on the master branch. Patch(es): https://www.postgresql.org/message-id/Z-9Bx3ml2i7OfHiN@nathan Author(s): Nathan Bossart
Commitfest Bot committedApr 4, 2025 Configuration menu - View commit details
-
Copy full SHA for 66904c0 - Browse repository at this point
Copy the full SHA 66904c0View commit details
This comparison is taking too long to generate.
Unfortunately it looks like we can’t render this comparison for you right now. It might be too big, or there might be something weird with your repository.
You can try running this command locally to see the comparison on your machine:
git diff cf/5678~1...cf/5678