Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: postgresql-cfbot/postgresql
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: cf/5678~1
Choose a base ref
...
head repository: postgresql-cfbot/postgresql
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: cf/5678
Choose a head ref
  • 5 commits
  • 5 files changed
  • 3 contributors

Commits on Apr 4, 2025

  1. Skip second WriteToc() call for custom-format dumps without data.

    Presently, "pg_dump --format=custom" calls WriteToc() twice.  The
    second call updates the data offset information, which allegedly
    makes parallel pg_restore significantly faster.  However, if we're
    not dumping any data, there are no data offsets to update, so we
    can skip this step.
    
    Reviewed-by: Jeff Davis <pgsql@j-davis.com>
    Discussion: https://postgr.es/m/Z9c1rbzZegYQTOQE%40nathan
    nathan-bossart authored and Commitfest Bot committed Apr 4, 2025
    Configuration menu
    Copy the full SHA
    b4fd930 View commit details
    Browse the repository at this point in the history
  2. pg_dump: Reduce memory usage of dumps with statistics.

    Right now, pg_dump stores all generated commands for statistics in
    memory.  These commands can be quite large and therefore can
    significantly increase pg_dump's memory footprint.  To fix, wait
    until we are about to write out the commands before generating
    them, and be sure to free the commands after writing.  This is
    implemented via a new defnDumper callback that works much like the
    dataDumper one but is specifically designed for TOC entries.
    
    Custom dumps that include data might write the TOC twice (to update
    data offset information), which would ordinarily cause pg_dump to
    run the attribute statistics queries twice.  However, as a hack, we
    save the length of the written-out entry in the first pass and skip
    over it in the second.  While there is no known technical issue
    with executing the queries multiple times and rewriting the
    results, it's expensive and feels risky, so let's avoid it.
    
    As an exception, we _do_ execute the queries twice for the tar
    format.  This format does a second pass through the TOC to generate
    the restore.sql file.  pg_restore doesn't use this file, so even if
    the second round of queries returns different results than the
    first, it won't corrupt the output; the archive and restore.sql
    file will just have different content.
    
    Author: Corey Huinker <corey.huinker@gmail.com>
    Co-authored-by: Nathan Bossart <nathandbossart@gmail.com>
    Reviewed-by: Jeff Davis <pgsql@j-davis.com>
    Discussion: https://postgr.es/m/CADkLM%3Dc%2Br05srPy9w%2B-%2BnbmLEo15dKXYQ03Q_xyK%2BriJerigLQ%40mail.gmail.com
    2 people authored and Commitfest Bot committed Apr 4, 2025
    Configuration menu
    Copy the full SHA
    606e543 View commit details
    Browse the repository at this point in the history
  3. pg_dump: Retrieve attribute statistics in batches.

    Currently, pg_dump gathers attribute statistics with a query per
    relation, which can cause pg_dump to take significantly longer,
    especially when there are many tables.  This commit address this by
    teaching pg_dump to gather attribute statistics for 64 relations at
    a time.  Some simple tests showed this was the optimal batch size,
    but performance may vary depending on the workload.  While this
    change increases pg_dump's memory usage a bit, it isn't expected to
    be too egregious and seems well worth the trade-off.
    
    Our lookahead code determines the next batch of relations by
    searching the TOC sequentially for relevant entries.  This approach
    assumes that we will dump all such entries in TOC order, which
    unfortunately isn't true for dump formats that use
    RestoreArchive().  RestoreArchive() does multiple passes through
    the TOC and selectively dumps certain groups of entries each time.
    This is particularly problematic for index stats and a subset of
    matview stats; both are in SECTION_POST_DATA, but matview stats
    that depend on matview data are dumped in RESTORE_PASS_POST_ACL,
    while all other stats data entries are dumped in RESTORE_PASS_MAIN.
    To handle this, this commit moves all statistics data entries in
    SECTION_POST_DATA to RESTORE_PASS_POST_ACL, which ensures that we
    always dump them in TOC order.  A convenient side effect of this
    change is that we can revert a decent chunk of commit a0a4601,
    but that is left for a follow-up commit.
    
    Author: Corey Huinker <corey.huinker@gmail.com>
    Co-authored-by: Nathan Bossart <nathandbossart@gmail.com>
    Reviewed-by: Jeff Davis <pgsql@j-davis.com>
    Discussion: https://postgr.es/m/CADkLM%3Dc%2Br05srPy9w%2B-%2BnbmLEo15dKXYQ03Q_xyK%2BriJerigLQ%40mail.gmail.com
    2 people authored and Commitfest Bot committed Apr 4, 2025
    Configuration menu
    Copy the full SHA
    a1bd2a8 View commit details
    Browse the repository at this point in the history
  4. Partially revert commit a0a4601.

    Thanks to commit XXXXXXXXXX, which simplified some code in
    _tocEntryRestorePass(), we can remove the now-unused ArchiveHandle
    parameter from _tocEntryRestorePass() and move_to_ready_heap().
    
    Reviewed-by: Jeff Davis <pgsql@j-davis.com>
    Discussion: https://postgr.es/m/Z-3x2AnPCP331JA3%40nathan
    nathan-bossart authored and Commitfest Bot committed Apr 4, 2025
    Configuration menu
    Copy the full SHA
    183c288 View commit details
    Browse the repository at this point in the history
  5. [CF 5678] optimizations for dumping statistics

    This branch was automatically generated by a robot using patches from an
    email thread registered at:
    
    https://commitfest.postgresql.org/patch/5678
    
    The branch will be overwritten each time a new patch version is posted to
    the thread, and also periodically to check for bitrot caused by changes
    on the master branch.
    
    Patch(es): https://www.postgresql.org/message-id/Z-9Bx3ml2i7OfHiN@nathan
    Author(s): Nathan Bossart
    Commitfest Bot committed Apr 4, 2025
    Configuration menu
    Copy the full SHA
    66904c0 View commit details
    Browse the repository at this point in the history
Loading