feat(columnar): implement dynamic chunk group allocations #8202

imranzaheer612 · 2025-09-17T15:24:01Z

Add support for dynamically allocating new chunk groups when the configurable size limit is reached. This prevents memory allocation failures and improves scalability for large columnar data sets.

Add new GUC parameter columnar.chunk_group_size_limit to control chunk group size threshold
Add regression tests covering chunk group expansion scenarios
Add chunk_group_size_limit column to columnar_internal.options updated in citus_columnar--12.2-1--13.2-1.sql

Fixes #6420, #7199

BEFORE:

postgres=# INSERT INTO test_oversized_row                                                                                                                                                                                                                                                 SELECT gs, repeat('Y', 2*1024*1024)  -- 2 MB text                                                                                                                                                                                                                                         FROM generate_series(1, 600) AS gs;
2025-09-17 20:18:23.143 PKT [82542] ERROR:  out of memory
2025-09-17 20:18:23.143 PKT [82542] DETAIL:  Cannot enlarge string buffer containing 1071646716 bytes by 2097156 more bytes.
2025-09-17 20:18:23.143 PKT [82542] STATEMENT:  INSERT INTO test_oversized_row
	SELECT gs, repeat('Y', 2*1024*1024)  -- 2 MB text
	FROM generate_series(1, 600) AS gs;
ERROR:  out of memory
DETAIL:  Cannot enlarge string buffer containing 1071646716 bytes by 2097156 more bytes.

AFTER

postgres=# CREATE TABLE test_oversized_row (id INTEGER,huge_text TEXT) 
USING columnar WITH 
(columnar.chunk_group_row_limit = 1000,columnar.stripe_row_limit = 5000, columanar.chunk_group_size_limit = 256);
CREATE TABLE
postgres=# INSERT INTO test_oversized_row
SELECT gs, repeat('Y', 2*1024*1024)  -- 2 MB text
FROM generate_series(1, 600) AS gs;
2025-09-17 17:32:03.004 PKT [34749] DEBUG:  Row size (2097160 bytes) exceeds chunk group size limit (268435456 bytes), storing in a separate chunk group
2025-09-17 17:32:04.822 PKT [34749] DEBUG:  Row size (2097160 bytes) exceeds chunk group size limit (268435456 bytes), storing in a separate chunk group
2025-09-17 17:32:06.592 PKT [34749] DEBUG:  Row size (2097160 bytes) exceeds chunk group size limit (268435456 bytes), storing in a separate chunk group
2025-09-17 17:32:08.419 PKT [34749] DEBUG:  Row size (2097160 bytes) exceeds chunk group size limit (268435456 bytes), storing in a separate chunk group
2025-09-17 17:32:10.238 PKT [34749] DEBUG:  Flushing Stripe of size 600
INSERT 0 600

imranzaheer612 · 2025-09-17T15:32:05Z

@microsoft-github-policy-service agree

Add support for dynamically allocating new chunk groups when the configurable size limit is reached. This prevents memory allocation failures and improves scalability for large columnar data sets. - Add new GUC parameter `columnar.chunk_group_size_limit` to control chunk group size threshold - Add regression tests covering chunk group expansion scenarios - Add `chunk_group_size_limit` column to columnar_internal.options updated in citus_columnar--13.2-1--14.0-1.sql Fixes citusdata#6420

- In citus_columnar--14.0-1--13.2-1 remove the new column that was introduced in 14.0.1

- When compression is enable, in case for a worst compression input_data_size < compressed_data_size. This will increease the data length and again will cause enlargeStringInfo() failures. - We should also account for this change before allocation/deciding a new chunk group. GetMaxCompressedLength() will help us calculating the expected worst compressed sizes before hand.

There were still some issues in adjujsting chunk index and chunk row index after every compute. This was causing failures for some edge cases i.e. reaching chunk row limit. A better way would be keep track of these indices intead of recomputing and adjujsting them on every row iteration.

imranzaheer612 · 2025-09-25T05:37:30Z

Looks like this is an other related issue: #7199

imranzaheer612 added 4 commits September 21, 2025 20:37

docs(columnar): New guc columnar.chunk_group_size_limit

07a043d

fix downgrade script for chunk_group_size_limit

b0dcc11

- In citus_columnar--14.0-1--13.2-1 remove the new column that was introduced in 14.0.1

imranzaheer612 force-pushed the chunk_group_size_limit branch from ba64c9f to 292247e Compare September 21, 2025 15:57

imranzaheer612 marked this pull request as draft September 24, 2025 14:32

imranzaheer612 force-pushed the chunk_group_size_limit branch from a1aa22b to 9f52f6a Compare September 25, 2025 05:33

imranzaheer612 marked this pull request as ready for review September 25, 2025 05:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(columnar): implement dynamic chunk group allocations #8202

feat(columnar): implement dynamic chunk group allocations #8202

Uh oh!

imranzaheer612 commented Sep 17, 2025 •

edited

Loading

Uh oh!

imranzaheer612 commented Sep 17, 2025

Uh oh!

imranzaheer612 commented Sep 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat(columnar): implement dynamic chunk group allocations #8202

Are you sure you want to change the base?

feat(columnar): implement dynamic chunk group allocations #8202

Uh oh!

Conversation

imranzaheer612 commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

imranzaheer612 commented Sep 17, 2025

Uh oh!

imranzaheer612 commented Sep 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

imranzaheer612 commented Sep 17, 2025 •

edited

Loading