Skip to content

Conversation

@imranzaheer612
Copy link
Contributor

@imranzaheer612 imranzaheer612 commented Sep 17, 2025

Add support for dynamically allocating new chunk groups when the configurable size limit is reached. This prevents memory allocation failures and improves scalability for large columnar data sets.

  • Add new GUC parameter columnar.chunk_group_size_limit to control chunk group size threshold
  • Add regression tests covering chunk group expansion scenarios
  • Add chunk_group_size_limit column to columnar_internal.options updated in citus_columnar--12.2-1--13.2-1.sql

Fixes #6420, #7199

BEFORE:

postgres=# INSERT INTO test_oversized_row                                                                                                                                                                                                                                                 SELECT gs, repeat('Y', 2*1024*1024)  -- 2 MB text                                                                                                                                                                                                                                         FROM generate_series(1, 600) AS gs;
2025-09-17 20:18:23.143 PKT [82542] ERROR:  out of memory
2025-09-17 20:18:23.143 PKT [82542] DETAIL:  Cannot enlarge string buffer containing 1071646716 bytes by 2097156 more bytes.
2025-09-17 20:18:23.143 PKT [82542] STATEMENT:  INSERT INTO test_oversized_row
	SELECT gs, repeat('Y', 2*1024*1024)  -- 2 MB text
	FROM generate_series(1, 600) AS gs;
ERROR:  out of memory
DETAIL:  Cannot enlarge string buffer containing 1071646716 bytes by 2097156 more bytes.

AFTER

postgres=# CREATE TABLE test_oversized_row (id INTEGER,huge_text TEXT) 
USING columnar WITH 
(columnar.chunk_group_row_limit = 1000,columnar.stripe_row_limit = 5000, columanar.chunk_group_size_limit = 256);
CREATE TABLE
postgres=# INSERT INTO test_oversized_row
SELECT gs, repeat('Y', 2*1024*1024)  -- 2 MB text
FROM generate_series(1, 600) AS gs;
2025-09-17 17:32:03.004 PKT [34749] DEBUG:  Row size (2097160 bytes) exceeds chunk group size limit (268435456 bytes), storing in a separate chunk group
2025-09-17 17:32:04.822 PKT [34749] DEBUG:  Row size (2097160 bytes) exceeds chunk group size limit (268435456 bytes), storing in a separate chunk group
2025-09-17 17:32:06.592 PKT [34749] DEBUG:  Row size (2097160 bytes) exceeds chunk group size limit (268435456 bytes), storing in a separate chunk group
2025-09-17 17:32:08.419 PKT [34749] DEBUG:  Row size (2097160 bytes) exceeds chunk group size limit (268435456 bytes), storing in a separate chunk group
2025-09-17 17:32:10.238 PKT [34749] DEBUG:  Flushing Stripe of size 600
INSERT 0 600

@imranzaheer612
Copy link
Contributor Author

@microsoft-github-policy-service agree

Add support for dynamically allocating new chunk groups when the
configurable size limit is reached. This prevents memory allocation
failures and improves scalability for large columnar data sets.

- Add new GUC parameter `columnar.chunk_group_size_limit` to control chunk group size threshold
- Add regression tests covering chunk group expansion scenarios
- Add `chunk_group_size_limit` column to columnar_internal.options updated in citus_columnar--13.2-1--14.0-1.sql

Fixes citusdata#6420
- In citus_columnar--14.0-1--13.2-1 remove the new column that was introduced in 14.0.1
- When compression is enable, in case for a worst compression input_data_size < compressed_data_size. This will increease the data length and again will cause enlargeStringInfo() failures.
- We should also account for this change before allocation/deciding a new chunk group. GetMaxCompressedLength() will help us
calculating the expected worst compressed sizes before hand.
@imranzaheer612 imranzaheer612 marked this pull request as draft September 24, 2025 14:32
There were still some issues in adjujsting chunk index and chunk row index
after every compute. This was causing failures for some edge cases i.e. reaching
chunk row limit.

A better way would be keep track of these indices intead of recomputing and adjujsting
them on every row iteration.
@imranzaheer612 imranzaheer612 marked this pull request as ready for review September 25, 2025 05:35
@imranzaheer612
Copy link
Contributor Author

Looks like this is an other related issue: #7199

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Automatically allocate a new chunk group instead of throwing error due to buffer size limits

1 participant