From 21734d2fb896e0ecdddd3251caa72a3576e2d415 Mon Sep 17 00:00:00 2001 From: Tom Lane Date: Sun, 10 Mar 2013 14:14:53 -0400 Subject: Support writable foreign tables. This patch adds the core-system infrastructure needed to support updates on foreign tables, and extends contrib/postgres_fdw to allow updates against remote Postgres servers. There's still a great deal of room for improvement in optimization of remote updates, but at least there's basic functionality there now. KaiGai Kohei, reviewed by Alexander Korotkov and Laurenz Albe, and rather heavily revised by Tom Lane. --- doc/src/sgml/ddl.sgml | 31 +- doc/src/sgml/fdwhandler.sgml | 439 ++++++++++++++++++++-- doc/src/sgml/file-fdw.sgml | 7 +- doc/src/sgml/postgres-fdw.sgml | 5 +- doc/src/sgml/ref/create_foreign_data_wrapper.sgml | 10 +- 5 files changed, 438 insertions(+), 54 deletions(-) (limited to 'doc/src') diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml index 207de9b1259..e9135bffaa5 100644 --- a/doc/src/sgml/ddl.sgml +++ b/doc/src/sgml/ddl.sgml @@ -3040,36 +3040,41 @@ ANALYZE measurement; Foreign data is accessed with help from a foreign data wrapper. A foreign data wrapper is a library that can communicate with an external data source, hiding the - details of connecting to the data source and fetching data from it. There - is a foreign data wrapper available as a contrib module, - which can read plain data files residing on the server. Other kind of - foreign data wrappers might be found as third party products. If none of - the existing foreign data wrappers suit your needs, you can write your - own; see . + details of connecting to the data source and obtaining data from it. + There are some foreign data wrappers available as contrib + modules; see . Other kinds of foreign data + wrappers might be found as third party products. If none of the existing + foreign data wrappers suit your needs, you can write your own; see . To access foreign data, you need to create a foreign server - object, which defines how to connect to a particular external data source, - according to the set of options used by a particular foreign data + object, which defines how to connect to a particular external data source + according to the set of options used by its supporting foreign data wrapper. Then you need to create one or more foreign tables, which define the structure of the remote data. A foreign table can be used in queries just like a normal table, but a foreign table has no storage in the PostgreSQL server. Whenever it is used, PostgreSQL asks the foreign data wrapper - to fetch the data from the external source. + to fetch data from the external source, or transmit data to the external + source in the case of update commands. - Accessing remote data may require authentication at the external + Accessing remote data may require authenticating to the external data source. This information can be provided by a - user mapping, which can provide additional options based + user mapping, which can provide additional data + such as user names and passwords based on the current PostgreSQL role. - Currently, foreign tables are read-only. This limitation may be fixed - in a future release. + For additional information, see + , + , + , and + . diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml index 638b6ab9ce8..e6bce195e63 100644 --- a/doc/src/sgml/fdwhandler.sgml +++ b/doc/src/sgml/fdwhandler.sgml @@ -13,14 +13,15 @@ wrapper, which consists of a set of functions that the core server calls. The foreign data wrapper is responsible for fetching data from the remote data source and returning it to the - PostgreSQL executor. This chapter outlines how - to write a new foreign data wrapper. + PostgreSQL executor. If updating foreign + tables is to be supported, the wrapper must handle that, too. + This chapter outlines how to write a new foreign data wrapper. The foreign data wrappers included in the standard distribution are good references when trying to write your own. Look into the - contrib/file_fdw subdirectory of the source tree. + contrib subdirectory of the source tree. The reference page also has some useful details. @@ -84,9 +85,19 @@ The FDW handler function returns a palloc'd FdwRoutine - struct containing pointers to the following callback functions: + struct containing pointers to the callback functions described below. + The scan-related functions are required, the rest are optional. + + The FdwRoutine struct type is declared in + src/include/foreign/fdwapi.h, which see for additional + details. + + + + FDW Routines For Scanning Foreign Tables + void @@ -96,7 +107,7 @@ GetForeignRelSize (PlannerInfo *root, Obtain relation size estimates for a foreign table. This is called - at the beginning of planning for a query involving a foreign table. + at the beginning of planning for a query that scans a foreign table. root is the planner's global information about the query; baserel is the planner's information about this table; and foreigntableid is the pg_class OID of the @@ -181,23 +192,6 @@ GetForeignPlan (PlannerInfo *root, void -ExplainForeignScan (ForeignScanState *node, - ExplainState *es); - - - Print additional EXPLAIN output for a foreign table scan. - This can just return if there is no need to print anything. - Otherwise, it should call ExplainPropertyText and - related functions to add fields to the EXPLAIN output. - The flag fields in es can be used to determine what to - print, and the state of the ForeignScanState node - can be inspected to provide run-time statistics in the EXPLAIN - ANALYZE case. - - - - -void BeginForeignScan (ForeignScanState *node, int eflags); @@ -212,6 +206,8 @@ BeginForeignScan (ForeignScanState *node, ForeignScanState node (in particular, from the underlying ForeignScan plan node, which contains any FDW-private information provided by GetForeignPlan). + eflags contains flag bits describing the executor's + operating mode for this plan node. @@ -246,9 +242,9 @@ IterateForeignScan (ForeignScanState *node); Note that PostgreSQL's executor doesn't care - whether the rows returned violate the NOT NULL - constraints which were defined on the foreign table columns - but the - planner does care, and may optimize queries incorrectly if + whether the rows returned violate any NOT NULL + constraints that were defined on the foreign table columns — but + the planner does care, and may optimize queries incorrectly if NULL values are present in a column declared not to contain them. If a NULL value is encountered when the user has declared that none should be present, it may be appropriate to raise an @@ -277,6 +273,356 @@ EndForeignScan (ForeignScanState *node); to remote servers should be cleaned up. + + + + FDW Routines For Updating Foreign Tables + + + If an FDW supports writable foreign tables, it should provide + some or all of the following callback functions depending on + the needs and capabilities of the FDW: + + + + +void +AddForeignUpdateTargets (Query *parsetree, + RangeTblEntry *target_rte, + Relation target_relation); + + + UPDATE and DELETE operations are performed + against rows previously fetched by the table-scanning functions. The + FDW may need extra information, such as a row ID or the values of + primary-key columns, to ensure that it can identify the exact row to + update or delete. To support that, this function can add extra hidden, + or junk, target columns to the list of columns that are to be + retrieved from the foreign table during an UPDATE or + DELETE. + + + + To do that, add TargetEntry items to + parsetree->targetList, containing expressions for the + extra values to be fetched. Each such entry must be marked + resjunk = true, and must have a distinct + resname that will identify it at execution time. + Avoid using names matching ctidN or + wholerowN, as the core system can + generate junk columns of these names. + + + + This function is called in the rewriter, not the planner, so the + information available is a bit different from that available to the + planning routines. + parsetree is the parse tree for the UPDATE or + DELETE command, while target_rte and + target_relation describe the target foreign table. + + + + If the AddForeignUpdateTargets pointer is set to + NULL, no extra target expressions are added. + (This will make it impossible to implement DELETE + operations, though UPDATE may still be feasible if the FDW + relies on an unchanging primary key to identify rows.) + + + + +List * +PlanForeignModify (PlannerInfo *root, + ModifyTable *plan, + Index resultRelation, + int subplan_index); + + + Perform any additional planning actions needed for an insert, update, or + delete on a foreign table. This function generates the FDW-private + information that will be attached to the ModifyTable plan + node that performs the update action. This private information must + have the form of a List, and will be delivered to + BeginForeignModify during the execution stage. + + + + root is the planner's global information about the query. + plan is the ModifyTable plan node, which is + complete except for the fdwPrivLists field. + resultRelation identifies the target foreign table by its + rangetable index. subplan_index identifies which target of + the ModifyTable plan node this is, counting from zero; + use this if you want to index into node->plans or other + substructure of the plan node. + + + + See for additional information. + + + + If the PlanForeignModify pointer is set to + NULL, no additional plan-time actions are taken, and the + fdw_private list delivered to + BeginForeignModify will be NIL. + + + + +void +BeginForeignModify (ModifyTableState *mtstate, + ResultRelInfo *rinfo, + List *fdw_private, + int subplan_index, + int eflags); + + + Begin executing a foreign table modification operation. This routine is + called during executor startup. It should perform any initialization + needed prior to the actual table modifications. Subsequently, + ExecForeignInsert, ExecForeignUpdate or + ExecForeignDelete will be called for each tuple to be + inserted, updated, or deleted. + + + + mtstate is the overall state of the + ModifyTable plan node being executed; global data about + the plan and execution state is available via this structure. + rinfo is the ResultRelInfo struct describing + the target foreign table. (The ri_FdwState field of + ResultRelInfo is available for the FDW to store any + private state it needs for this operation.) + fdw_private contains the private data generated by + PlanForeignModify, if any. + subplan_index identifies which target of + the ModifyTable plan node this is. + eflags contains flag bits describing the executor's + operating mode for this plan node. + + + + Note that when (eflags & EXEC_FLAG_EXPLAIN_ONLY) is + true, this function should not perform any externally-visible actions; + it should only do the minimum required to make the node state valid + for ExplainForeignModify and EndForeignModify. + + + + If the BeginForeignModify pointer is set to + NULL, no action is taken during executor startup. + + + + +TupleTableSlot * +ExecForeignInsert (EState *estate, + ResultRelInfo *rinfo, + TupleTableSlot *slot, + TupleTableSlot *planSlot); + + + Insert one tuple into the foreign table. + estate is global execution state for the query. + rinfo is the ResultRelInfo struct describing + the target foreign table. + slot contains the tuple to be inserted; it will match the + rowtype definition of the foreign table. + planSlot contains the tuple that was generated by the + ModifyTable plan node's subplan; it differs from + slot in possibly containing additional junk + columns. (The planSlot is typically of little interest + for INSERT cases, but is provided for completeness.) + + + + The return value is either a slot containing the data that was actually + inserted (this might differ from the data supplied, for example as a + result of trigger actions), or NULL if no row was actually inserted + (again, typically as a result of triggers). The passed-in + slot can be re-used for this purpose. + + + + The data in the returned slot is used only if the INSERT + query has a RETURNING clause. Hence, the FDW could choose + to optimize away returning some or all columns depending on the contents + of the RETURNING clause. However, some slot must be + returned to indicate success, or the query's reported rowcount will be + wrong. + + + + If the ExecForeignInsert pointer is set to + NULL, attempts to insert into the foreign table will fail + with an error message. + + + + +TupleTableSlot * +ExecForeignUpdate (EState *estate, + ResultRelInfo *rinfo, + TupleTableSlot *slot, + TupleTableSlot *planSlot); + + + Update one tuple in the foreign table. + estate is global execution state for the query. + rinfo is the ResultRelInfo struct describing + the target foreign table. + slot contains the new data for the tuple; it will match the + rowtype definition of the foreign table. + planSlot contains the tuple that was generated by the + ModifyTable plan node's subplan; it differs from + slot in possibly containing additional junk + columns. In particular, any junk columns that were requested by + AddForeignUpdateTargets will be available from this slot. + + + + The return value is either a slot containing the row as it was actually + updated (this might differ from the data supplied, for example as a + result of trigger actions), or NULL if no row was actually updated + (again, typically as a result of triggers). The passed-in + slot can be re-used for this purpose. + + + + The data in the returned slot is used only if the UPDATE + query has a RETURNING clause. Hence, the FDW could choose + to optimize away returning some or all columns depending on the contents + of the RETURNING clause. However, some slot must be + returned to indicate success, or the query's reported rowcount will be + wrong. + + + + If the ExecForeignUpdate pointer is set to + NULL, attempts to update the foreign table will fail + with an error message. + + + + +TupleTableSlot * +ExecForeignDelete (EState *estate, + ResultRelInfo *rinfo, + TupleTableSlot *slot, + TupleTableSlot *planSlot); + + + Delete one tuple from the foreign table. + estate is global execution state for the query. + rinfo is the ResultRelInfo struct describing + the target foreign table. + slot contains nothing useful upon call, but can be used to + hold the returned tuple. + planSlot contains the tuple that was generated by the + ModifyTable plan node's subplan; in particular, it will + carry any junk columns that were requested by + AddForeignUpdateTargets. The junk column(s) must be used + to identify the tuple to be deleted. + + + + The return value is either a slot containing the row that was deleted, + or NULL if no row was deleted (typically as a result of triggers). The + passed-in slot can be used to hold the tuple to be returned. + + + + The data in the returned slot is used only if the DELETE + query has a RETURNING clause. Hence, the FDW could choose + to optimize away returning some or all columns depending on the contents + of the RETURNING clause. However, some slot must be + returned to indicate success, or the query's reported rowcount will be + wrong. + + + + If the ExecForeignDelete pointer is set to + NULL, attempts to delete from the foreign table will fail + with an error message. + + + + +void +EndForeignModify (EState *estate, + ResultRelInfo *rinfo); + + + End the table update and release resources. It is normally not important + to release palloc'd memory, but for example open files and connections + to remote servers should be cleaned up. + + + + If the EndForeignModify pointer is set to + NULL, no action is taken during executor shutdown. + + + + + + FDW Routines for <command>EXPLAIN</> + + + +void +ExplainForeignScan (ForeignScanState *node, + ExplainState *es); + + + Print additional EXPLAIN output for a foreign table scan. + This function can call ExplainPropertyText and + related functions to add fields to the EXPLAIN output. + The flag fields in es can be used to determine what to + print, and the state of the ForeignScanState node + can be inspected to provide run-time statistics in the EXPLAIN + ANALYZE case. + + + + If the ExplainForeignScan pointer is set to + NULL, no additional information is printed during + EXPLAIN. + + + + +void +ExplainForeignModify (ModifyTableState *mtstate, + ResultRelInfo *rinfo, + List *fdw_private, + int subplan_index, + struct ExplainState *es); + + + Print additional EXPLAIN output for a foreign table update. + This function can call ExplainPropertyText and + related functions to add fields to the EXPLAIN output. + The flag fields in es can be used to determine what to + print, and the state of the ModifyTableState node + can be inspected to provide run-time statistics in the EXPLAIN + ANALYZE case. The first four arguments are the same as for + BeginForeignModify. + + + + If the ExplainForeignModify pointer is set to + NULL, no additional information is printed during + EXPLAIN. + + + + + + FDW Routines for <command>ANALYZE</> + bool @@ -291,6 +637,9 @@ AnalyzeForeignTable (Relation relation, to a function that will collect sample rows from the table in func, plus the estimated size of the table in pages in totalpages. Otherwise, return false. + + + If the FDW does not support collecting statistics for any tables, the AnalyzeForeignTable pointer can be set to NULL. @@ -314,11 +663,7 @@ AcquireSampleRowsFunc (Relation relation, int elevel, if the FDW does not have any concept of dead rows.) - - The FdwRoutine struct type is declared in - src/include/foreign/fdwapi.h, which see for additional - details. - + @@ -432,9 +777,10 @@ GetForeignServerByName(const char *name, bool missing_ok); The FDW callback functions GetForeignRelSize, - GetForeignPaths, and GetForeignPlan must fit - into the workings of the PostgreSQL planner. Here are - some notes about what they must do. + GetForeignPaths, GetForeignPlan, and + PlanForeignModify must fit into the workings of the + PostgreSQL planner. Here are some notes about what + they must do. @@ -546,6 +892,33 @@ GetForeignServerByName(const char *name, bool missing_ok); same as for an ordinary restriction clause. + + When planning an UPDATE or DELETE, + PlanForeignModify can look up the RelOptInfo + struct for the foreign table and make use of the + baserel->fdw_private data previously created by the + scan-planning functions. However, in INSERT the target + table is not scanned so there is no RelOptInfo for it. + + + + For an UPDATE or DELETE against an external data + source that supports concurrent updates, it is recommended that the + ForeignScan operation lock the rows that it fetches, perhaps + via the equivalent of SELECT FOR UPDATE. The FDW may also + choose to lock rows at fetch time when the foreign table is referenced + in a SELECT FOR UPDATE/SHARE; if it does not, the + FOR UPDATE or FOR SHARE option is essentially a + no-op so far as the foreign table is concerned. This behavior may yield + semantics slightly different from operations on local tables, where row + locking is customarily delayed as long as possible: remote rows may get + locked even though they subsequently fail locally-applied restriction or + join conditions. However, matching the local semantics exactly would + require an additional remote access for every row, and might be + impossible anyway depending on what locking semantics the external data + source provides. + + diff --git a/doc/src/sgml/file-fdw.sgml b/doc/src/sgml/file-fdw.sgml index 4acb8264a45..8c527612aef 100644 --- a/doc/src/sgml/file-fdw.sgml +++ b/doc/src/sgml/file-fdw.sgml @@ -13,6 +13,7 @@ files in the server's file system. Data files must be in a format that can be read by COPY FROM; see for details. + Access to such data files is currently read-only. @@ -160,7 +161,7 @@ Create a Foreign Table for PostgreSQL CSV Logs - + One of the obvious uses for the file_fdw is to make the PostgreSQL activity log available as a table for querying. To @@ -217,8 +218,8 @@ OPTIONS ( filename '/home/josh/9.1/data/pg_log/pglog.csv', format 'csv' ); - That's it — now you can query your log directly. In production, of course, - you would need to define some way to adjust to log rotation. + That's it — now you can query your log directly. In production, of + course, you would need to define some way to deal with log rotation. diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml index 61b77774aee..61cc2aafc24 100644 --- a/doc/src/sgml/postgres-fdw.sgml +++ b/doc/src/sgml/postgres-fdw.sgml @@ -61,7 +61,10 @@ Now you need only SELECT from a foreign table to access - the data stored in its underlying remote table. + the data stored in its underlying remote table. You can also modify + the remote table using INSERT, UPDATE, or + DELETE. (Of course, the remote user you have specified + in your user mapping must have privileges to do these things.) diff --git a/doc/src/sgml/ref/create_foreign_data_wrapper.sgml b/doc/src/sgml/ref/create_foreign_data_wrapper.sgml index d9936e81659..e2d897fb214 100644 --- a/doc/src/sgml/ref/create_foreign_data_wrapper.sgml +++ b/doc/src/sgml/ref/create_foreign_data_wrapper.sgml @@ -117,9 +117,10 @@ CREATE FOREIGN DATA WRAPPER name Notes - At the moment, the foreign-data wrapper functionality is rudimentary. - There is no support for updating a foreign table, and optimization of - queries is primitive (and mostly left to the wrapper, too). + PostgreSQL's foreign-data functionality is still under + active development. Optimization of queries is primitive (and mostly left + to the wrapper, too). Thus, there is considerable room for future + performance improvements. @@ -158,7 +159,7 @@ CREATE FOREIGN DATA WRAPPER mywrapper 9075-9 (SQL/MED), with the exception that the HANDLER and VALIDATOR clauses are extensions and the standard clauses LIBRARY and LANGUAGE - are not implemented in PostgreSQL. + are not implemented in PostgreSQL. @@ -175,6 +176,7 @@ CREATE FOREIGN DATA WRAPPER mywrapper + -- cgit v1.2.3