diff options
author | Michael Paquier | 2012-07-24 07:13:55 +0000 |
---|---|---|
committer | Michael Paquier | 2012-07-24 07:35:37 +0000 |
commit | d03ea805cef9375bee9b751e65d698c07c138bf5 (patch) | |
tree | 2e578af76c1ac515887ff0363f5224f57af64a92 /src/include/miscadmin.h | |
parent | baa8c4a51cdd7de321169f12ebfb47b02fed3afc (diff) |
Support for online data redistribution with ALTER TABLE
Online data redistribution is the possibility for a user to change the distribution
strategy of a table. There are no restrictions in the modifications possible, meaning
that all types of tables with all possible node subsets can be completely changed in
one command.
The SQL command used for redistribution is an extension of ALTER TABLE with those
clauses specific to XC and already available in CREATE TABLE:
DISTRIBUTE BY { REPLICATION | ROUND ROBIN | { [HASH | MODULO ] ( column_name ) } }
TO { GROUP groupname | NODE ( nodename [, ... ] ) }
ADD NODE ( nodename [, ... ] )
DELETE NODE ( nodename [, ... ] )
Those commands can be combined together without limitations.
Several redistribution scenarios are implemented depending on the old and new
distribution type of the table:
- Default scenario:
1) Fetch the data of the table with a COPY TO and store it inside a tuplestore
2) Perform a TRUNCATE on the Datanodes
3) Perform a COPY TO with tuples inside tuplestore
4) REINDEX table if necessary
This default scenario could also be managed by an external tool, however all
the following optimizations need a node-level control to perform with highest
efficiency possible. The performance of this scenario is equivalent to running
a COPY TO/COPY FROM sequence on a table, so here performance is not bounded by
the redistribution mechanism itself but by the COPY protocol used for data exchanged
in network.
- Replicated to replicated:
In case of nodes removed from the set of nodes, those nodes are simply truncated,
so this is really quick even on large sets of data.
For new nodes, data is fetched on Coordinator from one Datanode with COPY TO,
data is stored in a tuplestore, and then COPY FROM is launched only on the new
nodes.
- Replicated to distributed:
If new nodes are added, a fallback to default scenario is made.
If nodes are removed, those nodes are truncated.
Finally, on the remaining nodes a DELETE query removing only the necessary tuples
is launched to each remote node. In this case there is no data exchanged between
nodes so performance is maximized.
In order to support all those scenarios, a couple of new internal mechanisms have
been added to XC: materialization on Coordinator of tuple slots and possibility
to reuse them for redistribution purposes, externalization of a portion of
PostgreSQL COPY code used by redistribution, reuse and extension of Postgres-XC
APIs for remote COPY management.
The tuplestore used to store tuples if necessary can have its allowed cache
controlled with work_mem. The only thing to take care of is that the tuplestore
data needs to be stored on Coordinator once so some additional disk space might
be necessary on this server to perform redistribution correctly.
Documentation, as well as a new set of regression tests have been added.
Regressions do checks on views, prepared statementsm, views, distribution types
and subsets in a way completely transparent whatever the cluster configuration.
Diffstat (limited to 'src/include/miscadmin.h')
0 files changed, 0 insertions, 0 deletions