Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: postgresql-cfbot/postgresql
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: cf/5929~1
Choose a base ref
...
head repository: postgresql-cfbot/postgresql
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: cf/5929
Choose a head ref
  • 6 commits
  • 3 files changed
  • 2 contributors

Commits on Nov 17, 2025

  1. Add some test scaffolding to join_selectivity().

    This not-meant-for-commit patch adds some instrumentation to
    plancat.c's join_selectivity() to log the result and runtime
    of a join selectivity function.  This is useful for manual
    testing of performance patches in eqjoinsel().
    
    To improve the accuracy of the runtime measurement, run the
    function 1000 times in each call.  The regression tests still
    take a reasonable amount of time with this overhead, although
    it's noticeably more than usual.
    tglsfdc authored and Commitfest Bot committed Nov 17, 2025
    Configuration menu
    Copy the full SHA
    767aaf8 View commit details
    Browse the repository at this point in the history
  2. Factor out duplicative code in eqjoinsel_inner/eqjoinsel_semi.

    These functions have essentially identical code for scanning the
    two MCV lists and identifying which entries have matches in the
    other list.  While it's not a huge amount of code, it's 50 or
    so lines, and will be more after an upcoming patch to use a hash
    table with many MCVs.  Let's reduce duplication by moving that
    code into a common subroutine.
    
    The one downside of doing this is that we must compute
    sum(sslot1->numbers[i] * sslot2->numbers[j]) even though
    eqjoinsel_semi won't need that.  But the cost of that appears
    negligible, so I didn't trouble to invent a way of avoiding it.
    tglsfdc authored and Commitfest Bot committed Nov 17, 2025
    Configuration menu
    Copy the full SHA
    e960d82 View commit details
    Browse the repository at this point in the history
  3. Rethink eqjoinsel's handling of reversed joins.

    Formerly, if we needed to deal with a "reversed" join where the
    outer-side variable is on the right hand of the given operator,
    we looked up the operator's commutator and applied that, so that
    eqjoinsel_semi could always treat "sslot1" as the outer-side
    variable of the semijoin.
    
    This isn't great, because we ended up punting to a poor estimate
    if no commutator is recorded.  It also doesn't play well with
    later changes in this patch series.  Instead, let's handle the
    case by swapping the left and right input values just before
    we call the comparison operator.  While this theoretically adds
    cycles to the inner comparison loop, with the coding proposed
    here I don't see any real timing difference.  (But I only tested
    it on x86_64.)
    tglsfdc authored and Commitfest Bot committed Nov 17, 2025
    Configuration menu
    Copy the full SHA
    ad627b4 View commit details
    Browse the repository at this point in the history
  4. Share more work between eqjoinsel_inner and eqjoinsel_semi.

    Originally, only one of eqjoinsel_inner and eqjoinsel_semi was
    invoked per eqjoinsel call, so the fact that they duplicated a
    good deal of work was irrelevant to performance.  But since commit
    a314c34, the semi/antijoin case calls both, and that is really
    expensive if there are a lot of MCVs to match.  Refactor so that
    we can re-use eqjoinsel_inner's matching results except in the
    (uncommon) case where eqjoinsel_semi clamps the RHS MCV list size
    because it's less than the expected number of rows to be fetched
    from the RHS rel.  This doesn't seem to create any performance
    penalty for non-semijoin cases.
    
    While at it, we can avoid doing fmgr_info twice too.
    I considered also avoiding duplicate InitFunctionCallInfoData
    calls, but desisted: that wouldn't save very much, and in my
    tests it looks like there may be some performance advantage
    if fcinfo is a local variable.
    tglsfdc authored and Commitfest Bot committed Nov 17, 2025
    Configuration menu
    Copy the full SHA
    564e521 View commit details
    Browse the repository at this point in the history
  5. Use hashing to avoid O(N^2) matching work in eqjoinsel.

    Use a simplehash hash table if there are enough MCVs and the
    join operator has associated hash functions.  The threshold
    for switching to hash mode perhaps could use more research.
    tglsfdc authored and Commitfest Bot committed Nov 17, 2025
    Configuration menu
    Copy the full SHA
    b5ff78c View commit details
    Browse the repository at this point in the history
  6. [CF 5929] v6 - Optimize join selectivity estimation for tables with l…

    …arge number of MCVs
    
    This branch was automatically generated by a robot using patches from an
    email thread registered at:
    
    https://commitfest.postgresql.org/patch/5929
    
    The branch will be overwritten each time a new patch version is posted to
    the thread, and also periodically to check for bitrot caused by changes
    on the master branch.
    
    Patch(es): https://www.postgresql.org/message-id/841773.1763404213@sss.pgh.pa.us
    Author(s): David Geier, Ilia Evdokimov
    Commitfest Bot committed Nov 17, 2025
    Configuration menu
    Copy the full SHA
    91e7f60 View commit details
    Browse the repository at this point in the history
Loading