Add support for doing late row locking in FDWs.

author Tom Lane <tgl@sss.pgh.pa.us>

Tue, 12 May 2015 18:10:10 +0000 (14:10 -0400)

committer Tom Lane <tgl@sss.pgh.pa.us>

Tue, 12 May 2015 18:10:17 +0000 (14:10 -0400)
author Tom Lane <tgl@sss.pgh.pa.us>
Tue, 12 May 2015 18:10:10 +0000 (14:10 -0400)
committer Tom Lane <tgl@sss.pgh.pa.us>
Tue, 12 May 2015 18:10:17 +0000 (14:10 -0400)
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml

index 33863f04f82899bf1f306e8fc7777527f23e479c..236157743a537c9b677c80e60bd1c499e6ead337 100644 (file)
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -665,6 +665,108 @@ IsForeignRelUpdatable (Relation rel);
  
     </sect2>
  
+   <sect2 id="fdw-callbacks-row-locking">
+    <title>FDW Routines For Row Locking</title>
+
+    <para>
+     If an FDW wishes to support <firstterm>late row locking</> (as described
+     in <xref linkend="fdw-row-locking">), it must provide the following
+     callback functions:
+    </para>
+
+    <para>
+<programlisting>
+RowMarkType
+GetForeignRowMarkType (RangeTblEntry *rte,
+                       LockClauseStrength strength);
+</programlisting>
+
+     Report which row-marking option to use for a foreign table.
+     <literal>rte</> is the <structname>RangeTblEntry</> node for the table
+     and <literal>strength</> describes the lock strength requested by the
+     relevant <literal>FOR UPDATE/SHARE</> clause, if any.  The result must be
+     a member of the <literal>RowMarkType</> enum type.
+    </para>
+
+    <para>
+     This function is called during query planning for each foreign table that
+     appears in an <command>UPDATE</>, <command>DELETE</>, or <command>SELECT
+     FOR UPDATE/SHARE</> query and is not the target of <command>UPDATE</>
+     or <command>DELETE</>.
+    </para>
+
+    <para>
+     If the <function>GetForeignRowMarkType</> pointer is set to
+     <literal>NULL</>, the <literal>ROW_MARK_COPY</> option is always used.
+     (This implies that <function>RefetchForeignRow</> will never be called,
+     so it need not be provided either.)
+    </para>
+
+    <para>
+     See <xref linkend="fdw-row-locking"> for more information.
+    </para>
+
+    <para>
+<programlisting>
+HeapTuple
+RefetchForeignRow (EState *estate,
+                   ExecRowMark *erm,
+                   Datum rowid,
+                   bool *updated);
+</programlisting>
+
+     Re-fetch one tuple from the foreign table, after locking it if required.
+     <literal>estate</> is global execution state for the query.
+     <literal>erm</> is the <structname>ExecRowMark</> struct describing
+     the target foreign table and the row lock type (if any) to acquire.
+     <literal>rowid</> identifies the tuple to be fetched.
+     <literal>updated</> is an output parameter.
+    </para>
+
+    <para>
+     This function should return a palloc'ed copy of the fetched tuple,
+     or <literal>NULL</> if the row lock couldn't be obtained.  The row lock
+     type to acquire is defined by <literal>erm-&gt;markType</>, which is the
+     value previously returned by <function>GetForeignRowMarkType</>.
+     (<literal>ROW_MARK_REFERENCE</> means to just re-fetch the tuple without
+     acquiring any lock, and <literal>ROW_MARK_COPY</> will never be seen by
+     this routine.)
+    </para>
+
+    <para>
+     In addition, <literal>*updated</> should be set to <literal>true</>
+     if what was fetched was an updated version of the tuple rather than
+     the same version previously obtained.  (If the FDW cannot be sure about
+     this, always returning <literal>true</> is recommended.)
+    </para>
+
+    <para>
+     Note that by default, failure to acquire a row lock should result in
+     raising an error; a <literal>NULL</> return is only appropriate if
+     the <literal>SKIP LOCKED</> option is specified
+     by <literal>erm-&gt;waitPolicy</>.
+    </para>
+
+    <para>
+     The <literal>rowid</> is the <structfield>ctid</> value previously read
+     for the row to be re-fetched.  Although the <literal>rowid</> value is
+     passed as a <type>Datum</>, it can currently only be a <type>tid</>.  The
+     function API is chosen in hopes that it may be possible to allow other
+     datatypes for row IDs in future.
+    </para>
+
+    <para>
+     If the <function>RefetchForeignRow</> pointer is set to
+     <literal>NULL</>, attempts to re-fetch rows will fail
+     with an error message.
+    </para>
+
+    <para>
+     See <xref linkend="fdw-row-locking"> for more information.
+    </para>
+
+   </sect2>
+
     <sect2 id="fdw-callbacks-explain">
      <title>FDW Routines for <command>EXPLAIN</></title>
  
@@ -1092,24 +1194,6 @@ GetForeignServerByName(const char *name, bool missing_ok);
       structures that <function>copyObject</> knows how to copy.
      </para>
  
-    <para>
-     For an <command>UPDATE</> or <command>DELETE</> against an external data
-     source that supports concurrent updates, it is recommended that the
-     <literal>ForeignScan</> operation lock the rows that it fetches, perhaps
-     via the equivalent of <command>SELECT FOR UPDATE</>.  The FDW may also
-     choose to lock rows at fetch time when the foreign table is referenced
-     in a <command>SELECT FOR UPDATE/SHARE</>; if it does not, the
-     <literal>FOR UPDATE</> or <literal>FOR SHARE</> option is essentially a
-     no-op so far as the foreign table is concerned.  This behavior may yield
-     semantics slightly different from operations on local tables, where row
-     locking is customarily delayed as long as possible: remote rows may get
-     locked even though they subsequently fail locally-applied restriction or
-     join conditions.  However, matching the local semantics exactly would
-     require an additional remote access for every row, and might be
-     impossible anyway depending on what locking semantics the external data
-     source provides.
-    </para>
-
      <para>
       <command>INSERT</> with an <literal>ON CONFLICT</> clause does not
       support specifying the conflict target, as remote constraints are not
@@ -1117,6 +1201,118 @@ GetForeignServerByName(const char *name, bool missing_ok);
       UPDATE</> is not supported, since the specification is mandatory there.
      </para>
  
+   </sect1>
+
+   <sect1 id="fdw-row-locking">
+    <title>Row Locking in Foreign Data Wrappers</title>
+
+    <para>
+     If an FDW's underlying storage mechanism has a concept of locking
+     individual rows to prevent concurrent updates of those rows, it is
+     usually worthwhile for the FDW to perform row-level locking with as
+     close an approximation as practical to the semantics used in
+     ordinary <productname>PostgreSQL</> tables.  There are multiple
+     considerations involved in this.
+    </para>
+
+    <para>
+     One key decision to be made is whether to perform <firstterm>early
+     locking</> or <firstterm>late locking</>.  In early locking, a row is
+     locked when it is first retrieved from the underlying store, while in
+     late locking, the row is locked only when it is known that it needs to
+     be locked.  (The difference arises because some rows may be discarded by
+     locally-checked restriction or join conditions.)  Early locking is much
+     simpler and avoids extra round trips to a remote store, but it can cause
+     locking of rows that need not have been locked, resulting in reduced
+     concurrency or even unexpected deadlocks.  Also, late locking is only
+     possible if the row to be locked can be uniquely re-identified later.
+     Preferably the row identifier should identify a specific version of the
+     row, as <productname>PostgreSQL</> TIDs do.
+    </para>
+
+    <para>
+     By default, <productname>PostgreSQL</> ignores locking considerations
+     when interfacing to FDWs, but an FDW can perform early locking without
+     any explicit support from the core code.  The API functions described
+     in <xref linkend="fdw-callbacks-row-locking">, which were added
+     in <productname>PostgreSQL</> 9.5, allow an FDW to use late locking if
+     it wishes.
+    </para>
+
+    <para>
+     An additional consideration is that in <literal>READ COMMITTED</>
+     isolation mode, <productname>PostgreSQL</> may need to re-check
+     restriction and join conditions against an updated version of some
+     target tuple.  Rechecking join conditions requires re-obtaining copies
+     of the non-target rows that were previously joined to the target tuple.
+     When working with standard <productname>PostgreSQL</> tables, this is
+     done by including the TIDs of the non-target tables in the column list
+     projected through the join, and then re-fetching non-target rows when
+     required.  This approach keeps the join data set compact, but it
+     requires inexpensive re-fetch capability, as well as a TID that can
+     uniquely identify the row version to be re-fetched.  By default,
+     therefore, the approach used with foreign tables is to include a copy of
+     the entire row fetched from a foreign table in the column list projected
+     through the join.  This puts no special demands on the FDW but can
+     result in reduced performance of merge and hash joins.  An FDW that is
+     capable of meeting the re-fetch requirements can choose to do it the
+     first way.
+    </para>
+
+    <para>
+     For an <command>UPDATE</> or <command>DELETE</> on a foreign table, it
+     is recommended that the <literal>ForeignScan</> operation on the target
+     table perform early locking on the rows that it fetches, perhaps via the
+     equivalent of <command>SELECT FOR UPDATE</>.  An FDW can detect whether
+     a table is an <command>UPDATE</>/<command>DELETE</> target at plan time
+     by comparing its relid to <literal>root-&gt;parse-&gt;resultRelation</>,
+     or at execution time by using <function>ExecRelationIsTargetRelation()</>.
+     An alternative possibility is to perform late locking within the
+     <function>ExecForeignUpdate</> or <function>ExecForeignDelete</>
+     callback, but no special support is provided for this.
+    </para>
+
+    <para>
+     For foreign tables that are specified to be locked by a <command>SELECT
+     FOR UPDATE/SHARE</> command, the <literal>ForeignScan</> operation can
+     again perform early locking by fetching tuples with the equivalent
+     of <command>SELECT FOR UPDATE/SHARE</>.  To perform late locking
+     instead, provide the callback functions defined
+     in <xref linkend="fdw-callbacks-row-locking">.
+     In <function>GetForeignRowMarkType</>, select rowmark option
+     <literal>ROW_MARK_EXCLUSIVE</>, <literal>ROW_MARK_NOKEYEXCLUSIVE</>,
+     <literal>ROW_MARK_SHARE</>, or <literal>ROW_MARK_KEYSHARE</> depending
+     on the requested lock strength.  (The core code will act the same
+     regardless of which of these four options you choose.)
+     Elsewhere, you can detect whether a foreign table was specified to be
+     locked by this type of command by using <function>get_plan_rowmark</> at
+     plan time, or <function>ExecFindRowMark</> at execution time; you must
+     check not only whether a non-null rowmark struct is returned, but that
+     its <structfield>strength</> field is not <literal>LCS_NONE</>.
+    </para>
+
+    <para>
+     Lastly, for foreign tables that are used in an <command>UPDATE</>,
+     <command>DELETE</> or <command>SELECT FOR UPDATE/SHARE</> command but
+     are not specified to be row-locked, you can override the default choice
+     to copy entire rows by having <function>GetForeignRowMarkType</> select
+     option <literal>ROW_MARK_REFERENCE</> when it sees lock strength
+     <literal>LCS_NONE</>.  This will cause <function>RefetchForeignRow</> to
+     be called with that value for <structfield>markType</>; it should then
+     re-fetch the row without acquiring any new lock.  (If you have
+     a <function>GetForeignRowMarkType</> function but don't wish to re-fetch
+     unlocked rows, select option <literal>ROW_MARK_COPY</>
+     for <literal>LCS_NONE</>.)
+    </para>
+
+    <para>
+     See <filename>src/include/nodes/lockoptions.h</>, the comments
+     for <type>RowMarkType</> and <type>PlanRowMark</>
+     in <filename>src/include/nodes/plannodes.h</>, and the comments for
+     <type>ExecRowMark</> in <filename>src/include/nodes/execnodes.h</> for
+     additional information.
+    </para>
+
    </sect1>
  
   </chapter>
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c

index 0dee9491788fa558dd20b0666374679f75f6a839..43d3c44c82795dd9f2fe257bf3080db415f4a71c 100644 (file)
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -898,8 +898,11 @@ InitPlan(QueryDesc *queryDesc, int eflags)
         erm->prti = rc->prti;
         erm->rowmarkId = rc->rowmarkId;
         erm->markType = rc->markType;
+       erm->strength = rc->strength;
         erm->waitPolicy = rc->waitPolicy;
+       erm->ermActive = false;
         ItemPointerSetInvalid(&(erm->curCtid));
+       erm->ermExtra = NULL;
         estate->es_rowMarks = lappend(estate->es_rowMarks, erm);
     }
  
@@ -1143,6 +1146,8 @@ CheckValidResultRel(Relation resultRel, CmdType operation)
  static void
  CheckValidRowMarkRel(Relation rel, RowMarkType markType)
  {
+   FdwRoutine *fdwroutine;
+
     switch (rel->rd_rel->relkind)
     {
         case RELKIND_RELATION:
@@ -1178,11 +1183,13 @@ CheckValidRowMarkRel(Relation rel, RowMarkType markType)
                               RelationGetRelationName(rel))));
             break;
         case RELKIND_FOREIGN_TABLE:
-           /* Should not get here; planner should have used ROW_MARK_COPY */
-           ereport(ERROR,
-                   (errcode(ERRCODE_WRONG_OBJECT_TYPE),
-                    errmsg("cannot lock rows in foreign table \"%s\"",
-                           RelationGetRelationName(rel))));
+           /* Okay only if the FDW supports it */
+           fdwroutine = GetFdwRoutineForRelation(rel, false);
+           if (fdwroutine->RefetchForeignRow == NULL)
+               ereport(ERROR,
+                       (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                        errmsg("cannot lock rows in foreign table \"%s\"",
+                               RelationGetRelationName(rel))));
             break;
         default:
             ereport(ERROR,
@@ -2005,9 +2012,11 @@ ExecUpdateLockMode(EState *estate, ResultRelInfo *relinfo)
  
  /*
   * ExecFindRowMark -- find the ExecRowMark struct for given rangetable index
+ *
+ * If no such struct, either return NULL or throw error depending on missing_ok
   */
  ExecRowMark *
-ExecFindRowMark(EState *estate, Index rti)
+ExecFindRowMark(EState *estate, Index rti, bool missing_ok)
  {
     ListCell   *lc;
  
@@ -2018,8 +2027,9 @@ ExecFindRowMark(EState *estate, Index rti)
         if (erm->rti == rti)
             return erm;
     }
-   elog(ERROR, "failed to find ExecRowMark for rangetable index %u", rti);
-   return NULL;                /* keep compiler quiet */
+   if (!missing_ok)
+       elog(ERROR, "failed to find ExecRowMark for rangetable index %u", rti);
+   return NULL;
  }
  
  /*
@@ -2530,7 +2540,7 @@ EvalPlanQualFetchRowMarks(EPQState *epqstate)
  
         if (erm->markType == ROW_MARK_REFERENCE)
         {
-           Buffer      buffer;
+           HeapTuple   copyTuple;
  
             Assert(erm->relation != NULL);
  
@@ -2541,17 +2551,50 @@ EvalPlanQualFetchRowMarks(EPQState *epqstate)
             /* non-locked rels could be on the inside of outer joins */
             if (isNull)
                 continue;
-           tuple.t_self = *((ItemPointer) DatumGetPointer(datum));
  
-           /* okay, fetch the tuple */
-           if (!heap_fetch(erm->relation, SnapshotAny, &tuple, &buffer,
-                           false, NULL))
-               elog(ERROR, "failed to fetch tuple for EvalPlanQual recheck");
+           /* fetch requests on foreign tables must be passed to their FDW */
+           if (erm->relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+           {
+               FdwRoutine *fdwroutine;
+               bool        updated = false;
  
-           /* successful, copy and store tuple */
-           EvalPlanQualSetTuple(epqstate, erm->rti,
-                                heap_copytuple(&tuple));
-           ReleaseBuffer(buffer);
+               fdwroutine = GetFdwRoutineForRelation(erm->relation, false);
+               /* this should have been checked already, but let's be safe */
+               if (fdwroutine->RefetchForeignRow == NULL)
+                   ereport(ERROR,
+                           (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                            errmsg("cannot lock rows in foreign table \"%s\"",
+                                   RelationGetRelationName(erm->relation))));
+               copyTuple = fdwroutine->RefetchForeignRow(epqstate->estate,
+                                                         erm,
+                                                         datum,
+                                                         &updated);
+               if (copyTuple == NULL)
+                   elog(ERROR, "failed to fetch tuple for EvalPlanQual recheck");
+
+               /*
+                * Ideally we'd insist on updated == false here, but that
+                * assumes that FDWs can track that exactly, which they might
+                * not be able to.  So just ignore the flag.
+                */
+           }
+           else
+           {
+               /* ordinary table, fetch the tuple */
+               Buffer      buffer;
+
+               tuple.t_self = *((ItemPointer) DatumGetPointer(datum));
+               if (!heap_fetch(erm->relation, SnapshotAny, &tuple, &buffer,
+                               false, NULL))
+                   elog(ERROR, "failed to fetch tuple for EvalPlanQual recheck");
+
+               /* successful, copy tuple */
+               copyTuple = heap_copytuple(&tuple);
+               ReleaseBuffer(buffer);
+           }
+
+           /* store tuple */
+           EvalPlanQualSetTuple(epqstate, erm->rti, copyTuple);
         }
         else
         {
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c

index 88ba16bc6dabbf5c2fb5877e74bf2e7b0f9c1338..0da8e53e816c68aed95e6f0ada981b6db581828e 100644 (file)
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -805,20 +805,11 @@ ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags)
         lockmode = NoLock;
     else
     {
-       ListCell   *l;
+       /* Keep this check in sync with InitPlan! */
+       ExecRowMark *erm = ExecFindRowMark(estate, scanrelid, true);
  
-       foreach(l, estate->es_rowMarks)
-       {
-           ExecRowMark *erm = lfirst(l);
-
-           /* Keep this check in sync with InitPlan! */
-           if (erm->rti == scanrelid &&
-               erm->relation != NULL)
-           {
-               lockmode = NoLock;
-               break;
-           }
-       }
+       if (erm != NULL && erm->relation != NULL)
+           lockmode = NoLock;
     }
  
     /* Open the relation and acquire lock as needed */
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c

index 5ae106c06ad0480859bf968e95925a428fbfb704..7bcf99f48890ea6507bf5d36101339dbe8d24e56 100644 (file)
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -25,6 +25,7 @@
  #include "access/xact.h"
  #include "executor/executor.h"
  #include "executor/nodeLockRows.h"
+#include "foreign/fdwapi.h"
  #include "storage/bufmgr.h"
  #include "utils/rel.h"
  #include "utils/tqual.h"
@@ -40,7 +41,7 @@ ExecLockRows(LockRowsState *node)
     TupleTableSlot *slot;
     EState     *estate;
     PlanState  *outerPlan;
-   bool        epq_started;
+   bool        epq_needed;
     ListCell   *lc;
  
     /*
@@ -58,15 +59,18 @@ lnext:
     if (TupIsNull(slot))
         return NULL;
  
+   /* We don't need EvalPlanQual unless we get updated tuple version(s) */
+   epq_needed = false;
+
     /*
      * Attempt to lock the source tuple(s).  (Note we only have locking
      * rowmarks in lr_arowMarks.)
      */
-   epq_started = false;
     foreach(lc, node->lr_arowMarks)
     {
         ExecAuxRowMark *aerm = (ExecAuxRowMark *) lfirst(lc);
         ExecRowMark *erm = aerm->rowmark;
+       HeapTuple  *testTuple;
         Datum       datum;
         bool        isNull;
         HeapTupleData tuple;
@@ -77,8 +81,10 @@ lnext:
         HeapTuple   copyTuple;
  
         /* clear any leftover test tuple for this rel */
-       if (node->lr_epqstate.estate != NULL)
-           EvalPlanQualSetTuple(&node->lr_epqstate, erm->rti, NULL);
+       testTuple = &(node->lr_curtuples[erm->rti - 1]);
+       if (*testTuple != NULL)
+           heap_freetuple(*testTuple);
+       *testTuple = NULL;
  
         /* if child rel, must check whether it produced this row */
         if (erm->rti != erm->prti)
@@ -97,10 +103,12 @@ lnext:
             if (tableoid != erm->relid)
             {
                 /* this child is inactive right now */
+               erm->ermActive = false;
                 ItemPointerSetInvalid(&(erm->curCtid));
                 continue;
             }
         }
+       erm->ermActive = true;
  
         /* fetch the tuple's ctid */
         datum = ExecGetJunkAttribute(slot,
@@ -109,9 +117,45 @@ lnext:
         /* shouldn't ever get a null result... */
         if (isNull)
             elog(ERROR, "ctid is NULL");
-       tuple.t_self = *((ItemPointer) DatumGetPointer(datum));
+
+       /* requests for foreign tables must be passed to their FDW */
+       if (erm->relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+       {
+           FdwRoutine *fdwroutine;
+           bool        updated = false;
+
+           fdwroutine = GetFdwRoutineForRelation(erm->relation, false);
+           /* this should have been checked already, but let's be safe */
+           if (fdwroutine->RefetchForeignRow == NULL)
+               ereport(ERROR,
+                       (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                        errmsg("cannot lock rows in foreign table \"%s\"",
+                               RelationGetRelationName(erm->relation))));
+           copyTuple = fdwroutine->RefetchForeignRow(estate,
+                                                     erm,
+                                                     datum,
+                                                     &updated);
+           if (copyTuple == NULL)
+           {
+               /* couldn't get the lock, so skip this row */
+               goto lnext;
+           }
+
+           /* save locked tuple for possible EvalPlanQual testing below */
+           *testTuple = copyTuple;
+
+           /*
+            * if FDW says tuple was updated before getting locked, we need to
+            * perform EPQ testing to see if quals are still satisfied
+            */
+           if (updated)
+               epq_needed = true;
+
+           continue;
+       }
  
         /* okay, try to lock the tuple */
+       tuple.t_self = *((ItemPointer) DatumGetPointer(datum));
         switch (erm->markType)
         {
             case ROW_MARK_EXCLUSIVE:
@@ -191,40 +235,11 @@ lnext:
                 /* remember the actually locked tuple's TID */
                 tuple.t_self = copyTuple->t_self;
  
-               /*
-                * Need to run a recheck subquery.  Initialize EPQ state if we
-                * didn't do so already.
-                */
-               if (!epq_started)
-               {
-                   ListCell   *lc2;
+               /* Save locked tuple for EvalPlanQual testing below */
+               *testTuple = copyTuple;
  
-                   EvalPlanQualBegin(&node->lr_epqstate, estate);
-
-                   /*
-                    * Ensure that rels with already-visited rowmarks are told
-                    * not to return tuples during the first EPQ test.  We can
-                    * exit this loop once it reaches the current rowmark;
-                    * rels appearing later in the list will be set up
-                    * correctly by the EvalPlanQualSetTuple call at the top
-                    * of the loop.
-                    */
-                   foreach(lc2, node->lr_arowMarks)
-                   {
-                       ExecAuxRowMark *aerm2 = (ExecAuxRowMark *) lfirst(lc2);
-
-                       if (lc2 == lc)
-                           break;
-                       EvalPlanQualSetTuple(&node->lr_epqstate,
-                                            aerm2->rowmark->rti,
-                                            NULL);
-                   }
-
-                   epq_started = true;
-               }
-
-               /* Store target tuple for relation's scan node */
-               EvalPlanQualSetTuple(&node->lr_epqstate, erm->rti, copyTuple);
+               /* Remember we need to do EPQ testing */
+               epq_needed = true;
  
                 /* Continue loop until we have all target tuples */
                 break;
@@ -237,17 +252,35 @@ lnext:
                      test);
         }
  
-       /* Remember locked tuple's TID for WHERE CURRENT OF */
+       /* Remember locked tuple's TID for EPQ testing and WHERE CURRENT OF */
         erm->curCtid = tuple.t_self;
     }
  
     /*
      * If we need to do EvalPlanQual testing, do so.
      */
-   if (epq_started)
+   if (epq_needed)
     {
+       int         i;
+
+       /* Initialize EPQ machinery */
+       EvalPlanQualBegin(&node->lr_epqstate, estate);
+
+       /*
+        * Transfer already-fetched tuples into the EPQ state, and make sure
+        * its test tuples for other tables are reset to NULL.
+        */
+       for (i = 0; i < node->lr_ntables; i++)
+       {
+           EvalPlanQualSetTuple(&node->lr_epqstate,
+                                i + 1,
+                                node->lr_curtuples[i]);
+           /* freeing this tuple is now the responsibility of EPQ */
+           node->lr_curtuples[i] = NULL;
+       }
+
         /*
-        * First, fetch a copy of any rows that were successfully locked
+        * Next, fetch a copy of any rows that were successfully locked
          * without any update having occurred.  (We do this in a separate pass
          * so as to avoid overhead in the common case where there are no
          * concurrent updates.)
@@ -260,7 +293,7 @@ lnext:
             Buffer      buffer;
  
             /* ignore non-active child tables */
-           if (!ItemPointerIsValid(&(erm->curCtid)))
+           if (!erm->ermActive)
             {
                 Assert(erm->rti != erm->prti);  /* check it's child table */
                 continue;
@@ -269,6 +302,10 @@ lnext:
             if (EvalPlanQualGetTuple(&node->lr_epqstate, erm->rti) != NULL)
                 continue;       /* it was updated and fetched above */
  
+           /* foreign tables should have been fetched above */
+           Assert(erm->relation->rd_rel->relkind != RELKIND_FOREIGN_TABLE);
+           Assert(ItemPointerIsValid(&(erm->curCtid)));
+
             /* okay, fetch the tuple */
             tuple.t_self = erm->curCtid;
             if (!heap_fetch(erm->relation, SnapshotAny, &tuple, &buffer,
@@ -351,6 +388,13 @@ ExecInitLockRows(LockRows *node, EState *estate, int eflags)
     ExecAssignResultTypeFromTL(&lrstate->ps);
     lrstate->ps.ps_ProjInfo = NULL;
  
+   /*
+    * Create workspace in which we can remember per-RTE locked tuples
+    */
+   lrstate->lr_ntables = list_length(estate->es_range_table);
+   lrstate->lr_curtuples = (HeapTuple *)
+       palloc0(lrstate->lr_ntables * sizeof(HeapTuple));
+
     /*
      * Locate the ExecRowMark(s) that this node is responsible for, and
      * construct ExecAuxRowMarks for them.  (InitPlan should already have
@@ -370,8 +414,11 @@ ExecInitLockRows(LockRows *node, EState *estate, int eflags)
         if (rc->isParent)
             continue;
  
+       /* safety check on size of lr_curtuples array */
+       Assert(rc->rti > 0 && rc->rti <= lrstate->lr_ntables);
+
         /* find ExecRowMark and build ExecAuxRowMark */
-       erm = ExecFindRowMark(estate, rc->rti);
+       erm = ExecFindRowMark(estate, rc->rti, false);
         aerm = ExecBuildAuxRowMark(erm, outerPlan->targetlist);
  
         /*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c

index 34435c7e50abac419accbf4a9d1d8fd154b1f1c2..aec415109468d5ff6eebffcf23fcf8875f07a47d 100644 (file)
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -1720,7 +1720,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
             continue;
  
         /* find ExecRowMark (same for all subplans) */
-       erm = ExecFindRowMark(estate, rc->rti);
+       erm = ExecFindRowMark(estate, rc->rti, false);
  
         /* build ExecAuxRowMark for each subplan */
         for (i = 0; i < nplans; i++)
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c

index c80d45acaa95d1f75f287c8e6b19d6a106517da3..8de57c8e6bb68db1c661a333b0739b04a83e1ca9 100644 (file)
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -20,6 +20,7 @@
  #include "access/htup_details.h"
  #include "executor/executor.h"
  #include "executor/nodeAgg.h"
+#include "foreign/fdwapi.h"
  #include "miscadmin.h"
  #include "nodes/makefuncs.h"
  #ifdef OPTIMIZER_DEBUG
@@ -2324,7 +2325,12 @@ select_rowmark_type(RangeTblEntry *rte, LockClauseStrength strength)
     }
     else if (rte->relkind == RELKIND_FOREIGN_TABLE)
     {
-       /* For now, we force all foreign tables to use ROW_MARK_COPY */
+       /* Let the FDW select the rowmark type, if it wants to */
+       FdwRoutine *fdwroutine = GetFdwRoutineByRelId(rte->relid);
+
+       if (fdwroutine->GetForeignRowMarkType != NULL)
+           return fdwroutine->GetForeignRowMarkType(rte, strength);
+       /* Otherwise, use ROW_MARK_COPY by default */
         return ROW_MARK_COPY;
     }
     else
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h

index 6c646091976ce84629fd478d67162b64c3ba8c2f..e60ab9fd963ba8a090eddb8ad2dea5cf492b9fa7 100644 (file)
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -196,7 +196,7 @@ extern void ExecConstraints(ResultRelInfo *resultRelInfo,
  extern void ExecWithCheckOptions(WCOKind kind, ResultRelInfo *resultRelInfo,
                      TupleTableSlot *slot, EState *estate);
  extern LockTupleMode ExecUpdateLockMode(EState *estate, ResultRelInfo *relinfo);
-extern ExecRowMark *ExecFindRowMark(EState *estate, Index rti);
+extern ExecRowMark *ExecFindRowMark(EState *estate, Index rti, bool missing_ok);
  extern ExecAuxRowMark *ExecBuildAuxRowMark(ExecRowMark *erm, List *targetlist);
  extern TupleTableSlot *EvalPlanQual(EState *estate, EPQState *epqstate,
              Relation relation, Index rti, int lockmode,
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h

index 511c96b093275a697c7055c53cfeae3069905c13..69b48b46778e0178b07c9c23ed7ddf1d7b24f687 100644 (file)
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -89,6 +89,14 @@ typedef void (*EndForeignModify_function) (EState *estate,
  
  typedef int (*IsForeignRelUpdatable_function) (Relation rel);
  
+typedef RowMarkType (*GetForeignRowMarkType_function) (RangeTblEntry *rte,
+                                               LockClauseStrength strength);
+
+typedef HeapTuple (*RefetchForeignRow_function) (EState *estate,
+                                                            ExecRowMark *erm,
+                                                            Datum rowid,
+                                                            bool *updated);
+
  typedef void (*ExplainForeignScan_function) (ForeignScanState *node,
                                                     struct ExplainState *es);
  
@@ -151,6 +159,10 @@ typedef struct FdwRoutine
     EndForeignModify_function EndForeignModify;
     IsForeignRelUpdatable_function IsForeignRelUpdatable;
  
+   /* Functions for SELECT FOR UPDATE/SHARE row locking */
+   GetForeignRowMarkType_function GetForeignRowMarkType;
+   RefetchForeignRow_function RefetchForeignRow;
+
     /* Support functions for EXPLAIN */
     ExplainForeignScan_function ExplainForeignScan;
     ExplainForeignModify_function ExplainForeignModify;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h

index 9de6d1484eb7a22e9ea0a3f36738df8b31a20d7a..5ad2cc235883b973fb8b232feb96af1910842e53 100644 (file)
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -429,8 +429,11 @@ typedef struct EState
   * parent RTEs, which can be ignored at runtime).  Virtual relations such as
   * subqueries-in-FROM will have an ExecRowMark with relation == NULL.  See
   * PlanRowMark for details about most of the fields.  In addition to fields
- * directly derived from PlanRowMark, we store curCtid, which is used by the
- * WHERE CURRENT OF code.
+ * directly derived from PlanRowMark, we store an activity flag (to denote
+ * inactive children of inheritance trees), curCtid, which is used by the
+ * WHERE CURRENT OF code, and ermExtra, which is available for use by the plan
+ * node that sources the relation (e.g., for a foreign table the FDW can use
+ * ermExtra to hold information).
   *
   * EState->es_rowMarks is a list of these structs.
   */
@@ -442,8 +445,11 @@ typedef struct ExecRowMark
     Index       prti;           /* parent range table index, if child */
     Index       rowmarkId;      /* unique identifier for resjunk columns */
     RowMarkType markType;       /* see enum in nodes/plannodes.h */
+   LockClauseStrength strength;    /* LockingClause's strength, or LCS_NONE */
     LockWaitPolicy waitPolicy;  /* NOWAIT and SKIP LOCKED */
+   bool        ermActive;      /* is this mark relevant for current tuple? */
     ItemPointerData curCtid;    /* ctid of currently locked tuple, if any */
+   void       *ermExtra;       /* available for use by relation source node */
  } ExecRowMark;
  
  /*
@@ -1921,6 +1927,8 @@ typedef struct LockRowsState
     PlanState   ps;             /* its first field is NodeTag */
     List       *lr_arowMarks;   /* List of ExecAuxRowMarks */
     EPQState    lr_epqstate;    /* for evaluating EvalPlanQual rechecks */
+   HeapTuple  *lr_curtuples;   /* locked tuples (one entry per RT entry) */
+   int         lr_ntables;     /* length of lr_curtuples[] array */
  } LockRowsState;
  
  /* ----------------
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h

index 9313292222afb97dd5e8b49f322073c5361d8476..1494b336c22b9c1bbaeb942936ee790e40f8ef6c 100644 (file)
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -822,16 +822,16 @@ typedef struct Limit
   *
   * The first four of these values represent different lock strengths that
   * we can take on tuples according to SELECT FOR [KEY] UPDATE/SHARE requests.
- * We only support these on regular tables.  For foreign tables, any locking
- * that might be done for these requests must happen during the initial row
- * fetch; there is no mechanism for going back to lock a row later (and thus
- * no need for EvalPlanQual machinery during updates of foreign tables).
+ * We support these on regular tables, as well as on foreign tables whose FDWs
+ * report support for late locking.  For other foreign tables, any locking
+ * that might be done for such requests must happen during the initial row
+ * fetch; their FDWs provide no mechanism for going back to lock a row later.
   * This means that the semantics will be a bit different than for a local
   * table; in particular we are likely to lock more rows than would be locked
   * locally, since remote rows will be locked even if they then fail
- * locally-checked restriction or join quals.  However, the alternative of
- * doing a separate remote query to lock each selected row is extremely
- * unappealing, so let's do it like this for now.
+ * locally-checked restriction or join quals.  However, the prospect of
+ * doing a separate remote query to lock each selected row is usually pretty
+ * unappealing, so early locking remains a credible design choice for FDWs.
   *
   * When doing UPDATE, DELETE, or SELECT FOR UPDATE/SHARE, we have to uniquely
   * identify all the source rows, not only those from the target relations, so
@@ -840,12 +840,11 @@ typedef struct Limit
   * represented by ROW_MARK_REFERENCE.  Otherwise (for example for VALUES or
   * FUNCTION scans) we have to copy the whole row value.  ROW_MARK_COPY is
   * pretty inefficient, since most of the time we'll never need the data; but
- * fortunately the case is not performance-critical in practice.  Note that
- * we use ROW_MARK_COPY for non-target foreign tables, even if the FDW has a
- * concept of rowid and so could theoretically support some form of
- * ROW_MARK_REFERENCE.  Although copying the whole row value is inefficient,
- * it's probably still faster than doing a second remote fetch, so it doesn't
- * seem worth the extra complexity to permit ROW_MARK_REFERENCE.
+ * fortunately the overhead is usually not performance-critical in practice.
+ * By default we use ROW_MARK_COPY for foreign tables, but if the FDW has
+ * a concept of rowid it can request to use ROW_MARK_REFERENCE instead.
+ * (Again, this probably doesn't make sense if a physical remote fetch is
+ * needed, but for FDWs that map to local storage it might be credible.)
   */
  typedef enum RowMarkType
  {
@@ -866,7 +865,7 @@ typedef enum RowMarkType
   * When doing UPDATE, DELETE, or SELECT FOR UPDATE/SHARE, we create a separate
   * PlanRowMark node for each non-target relation in the query.  Relations that
   * are not specified as FOR UPDATE/SHARE are marked ROW_MARK_REFERENCE (if
- * regular tables) or ROW_MARK_COPY (if not).
+ * regular tables or supported foreign tables) or ROW_MARK_COPY (if not).
   *
   * Initially all PlanRowMarks have rti == prti and isParent == false.
   * When the planner discovers that a relation is the root of an inheritance
@@ -879,8 +878,8 @@ typedef enum RowMarkType
   * to use different markTypes).
   *
   * The planner also adds resjunk output columns to the plan that carry
- * information sufficient to identify the locked or fetched rows.  For
- * regular tables (markType != ROW_MARK_COPY), these columns are named
+ * information sufficient to identify the locked or fetched rows.  When
+ * markType != ROW_MARK_COPY, these columns are named
   *     tableoid%u          OID of table
   *     ctid%u              TID of row
   * The tableoid column is only present for an inheritance hierarchy.
author	Tom Lane <tgl@sss.pgh.pa.us>
	Tue, 12 May 2015 18:10:10 +0000 (14:10 -0400)
committer	Tom Lane <tgl@sss.pgh.pa.us>
	Tue, 12 May 2015 18:10:17 +0000 (14:10 -0400)
doc/src/sgml/fdwhandler.sgml		patch \| blob \| blame \| history
src/backend/executor/execMain.c		patch \| blob \| blame \| history
src/backend/executor/execUtils.c		patch \| blob \| blame \| history
src/backend/executor/nodeLockRows.c		patch \| blob \| blame \| history
src/backend/executor/nodeModifyTable.c		patch \| blob \| blame \| history
src/backend/optimizer/plan/planner.c		patch \| blob \| blame \| history
src/include/executor/executor.h		patch \| blob \| blame \| history
src/include/foreign/fdwapi.h		patch \| blob \| blame \| history
src/include/nodes/execnodes.h		patch \| blob \| blame \| history
src/include/nodes/plannodes.h		patch \| blob \| blame \| history