summaryrefslogtreecommitdiff
path: root/src/include/miscadmin.h
diff options
context:
space:
mode:
authorMichael Paquier2012-07-24 07:13:55 +0000
committerMichael Paquier2012-07-24 07:35:37 +0000
commitd03ea805cef9375bee9b751e65d698c07c138bf5 (patch)
tree2e578af76c1ac515887ff0363f5224f57af64a92 /src/include/miscadmin.h
parentbaa8c4a51cdd7de321169f12ebfb47b02fed3afc (diff)
Support for online data redistribution with ALTER TABLE
Online data redistribution is the possibility for a user to change the distribution strategy of a table. There are no restrictions in the modifications possible, meaning that all types of tables with all possible node subsets can be completely changed in one command. The SQL command used for redistribution is an extension of ALTER TABLE with those clauses specific to XC and already available in CREATE TABLE: DISTRIBUTE BY { REPLICATION | ROUND ROBIN | { [HASH | MODULO ] ( column_name ) } } TO { GROUP groupname | NODE ( nodename [, ... ] ) } ADD NODE ( nodename [, ... ] ) DELETE NODE ( nodename [, ... ] ) Those commands can be combined together without limitations. Several redistribution scenarios are implemented depending on the old and new distribution type of the table: - Default scenario: 1) Fetch the data of the table with a COPY TO and store it inside a tuplestore 2) Perform a TRUNCATE on the Datanodes 3) Perform a COPY TO with tuples inside tuplestore 4) REINDEX table if necessary This default scenario could also be managed by an external tool, however all the following optimizations need a node-level control to perform with highest efficiency possible. The performance of this scenario is equivalent to running a COPY TO/COPY FROM sequence on a table, so here performance is not bounded by the redistribution mechanism itself but by the COPY protocol used for data exchanged in network. - Replicated to replicated: In case of nodes removed from the set of nodes, those nodes are simply truncated, so this is really quick even on large sets of data. For new nodes, data is fetched on Coordinator from one Datanode with COPY TO, data is stored in a tuplestore, and then COPY FROM is launched only on the new nodes. - Replicated to distributed: If new nodes are added, a fallback to default scenario is made. If nodes are removed, those nodes are truncated. Finally, on the remaining nodes a DELETE query removing only the necessary tuples is launched to each remote node. In this case there is no data exchanged between nodes so performance is maximized. In order to support all those scenarios, a couple of new internal mechanisms have been added to XC: materialization on Coordinator of tuple slots and possibility to reuse them for redistribution purposes, externalization of a portion of PostgreSQL COPY code used by redistribution, reuse and extension of Postgres-XC APIs for remote COPY management. The tuplestore used to store tuples if necessary can have its allowed cache controlled with work_mem. The only thing to take care of is that the tuplestore data needs to be stored on Coordinator once so some additional disk space might be necessary on this server to perform redistribution correctly. Documentation, as well as a new set of regression tests have been added. Regressions do checks on views, prepared statementsm, views, distribution types and subsets in a way completely transparent whatever the cluster configuration.
Diffstat (limited to 'src/include/miscadmin.h')
0 files changed, 0 insertions, 0 deletions