Skip to content

fix(table-editor): use CTE optimization for table-editor selection #35071

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 17, 2025

Conversation

avallete
Copy link
Member

@avallete avallete commented Apr 16, 2025

Investigating some long running queries (+60s timeout) over postgres-meta, I've noticed that some of them were due to queries crafted from the table-preview-editor within studio.

Investigating further, I've noticed that our current select queries in some case will have runtime increase depending of the number of rows within a designed table.

This is partially due to the fact that we perform conditional transformations over columns depending of the length of the columns to avoid overfetching too much data, but instead only show a small preview.

SELECT col1, col2, CASE WHEN octect_length(col3::test) > 10KB THEN truncated(col3) ELSE col3 FROM table

This is then limited by the preview editor limit and paginations and ordered with this final result:

SELECT col1, col2, CASE WHEN octect_length(col3::test) > 10KB THEN truncated(col3) ELSE col3
FROM table
ORDER BY id
LIMIT 100
OFFSET 0

The issue with this approach is that postgres will perform a full table scan, and apply the condition over each row. Hence, the more truncated rows the table has, the longer it take for the preview query to execute.

Instead, we use a CTE optimization reducing the number of rows to work with by applying filters, limit, offsets and order by before applying the columns selection / truncation logic. Turning the query into something like:

-- First we hint the query planner to apply rows selection and reduction
with _base_query as ( SELECT * FROM table ORDER BY id LIMIT 100 OFFSET 0 )
-- Only then, we apply the columns selection / truncation over the reduced set of rows
SELECT col1, col2, CASE WHEN octect_length(col3::test) > 10KB THEN truncated(col3) ELSE col3
FROM _base_query

This drop a query time from +30s into a <80ms.

To test this I've created a table within the stress-table-editor-project project on staging: https://studio-staging-git-fix-table-editor-fetch-long-tables-supabase.vercel.app/dashboard/project/otexzejpktdrckprodjw/editor/127355?filter=id%3Aeq%3A54371

And inserted 100k rows in it, over a table with a dozen of text fields with random strings. Using this SQL script:

CREATE TABLE large_table_100k_rows_26_columns (
    id SERIAL PRIMARY KEY,
    problem TEXT,
    solution TEXT,
    problem_image TEXT,
    solution_image TEXT,
    problem_text TEXT,
    solution_text TEXT,
    problem_text_masking TEXT,
    solution_text_masking TEXT,
    embedding_problem_text TEXT,
    embedding_solution_text TEXT,
    embedding_problem_text_masking TEXT,
    embedding_solution_text_masking TEXT,
    subject_area_id INTEGER,
    difficulty INTEGER,
    metadata TEXT,
    created_at TIMESTAMP DEFAULT now(),
    updated_at TIMESTAMP DEFAULT now(),
    answer_text TEXT,
    answer_image TEXT,
    source_id INTEGER,
    metadata_questionbank TEXT,
    metadata_ocr TEXT,
    answer_type TEXT,
    embedding_problem_text_masking_1024 TEXT,
    embedding_solution_text_masking_1024 TEXT,
    answer TEXT
);
INSERT INTO large_table_100k_rows_26_columns (
    problem, solution, problem_image, solution_image,
    problem_text, solution_text, problem_text_masking, solution_text_masking,
    embedding_problem_text, embedding_solution_text,
    embedding_problem_text_masking, embedding_solution_text_masking,
    subject_area_id, difficulty, metadata,
    answer_text, answer_image, source_id, metadata_questionbank, metadata_ocr,
    answer_type, embedding_problem_text_masking_1024, embedding_solution_text_masking_1024,
    answer
)
SELECT
    repeat('Problem text example. ', 1000), -- ~20kB
    repeat('Solution text example. ', 1000),
    repeat('Image data', 1000),
    repeat('Image data', 1000),
    repeat('Long text field here. ', 1000),
    repeat('Another long text field. ', 1000),
    repeat('Masking data A. ', 1000),
    repeat('Masking data B. ', 1000),
    repeat('Embedding A. ', 1000),
    repeat('Embedding B. ', 1000),
    repeat('Mask Embedding A. ', 1000),
    repeat('Mask Embedding B. ', 1000),
    (random() * 10)::int,
    (random() * 5)::int,
    repeat('Some JSON-like metadata', 500),
    repeat('Answer text here. ', 1000),
    repeat('Image binary data', 1000),
    (random() * 1000)::int,
    repeat('Metadata QB', 1000),
    repeat('Metadata OCR', 1000),
    'text',
    repeat('Emb. Mask 1024 A', 1000),
    repeat('Emb. Mask 1024 B', 1000),
    repeat('Final answer', 1000)
FROM generate_series(1, 100000); 

This table editor preview will fail to load for this project with timeout on supabase.staging but should work on this PR preview.

What I have tested:

  1. Paginate, change limit, change order by
  2. Add a filter over the "id" field

Copy link

vercel bot commented Apr 16, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
design-system ✅ Ready (Inspect) Visit Preview 💬 Add feedback Apr 17, 2025 3:17am
docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Apr 17, 2025 3:17am
studio-self-hosted ✅ Ready (Inspect) Visit Preview 💬 Add feedback Apr 17, 2025 3:17am
studio-staging ✅ Ready (Inspect) Visit Preview 💬 Add feedback Apr 17, 2025 3:17am
ui-library ✅ Ready (Inspect) Visit Preview 💬 Add feedback Apr 17, 2025 3:17am
zone-www-dot-com ✅ Ready (Inspect) Visit Preview 💬 Add feedback Apr 17, 2025 3:17am
1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
studio ⬜️ Ignored (Inspect) Visit Preview Apr 17, 2025 3:17am

Copy link

supabase bot commented Apr 16, 2025

This pull request has been ignored for the connected project xguihxuzqibwxjnimxev because there are no changes detected in supabase directory. You can change this behaviour in Project Integrations Settings ↗︎.


Preview Branches by Supabase.
Learn more about Supabase Branching ↗︎.

@saltcod
Copy link
Contributor

saltcod commented Apr 16, 2025

Tested on production — was able to reproduce the timeout with just 10k rows, like so:
FROM generate_series(1, 100000);

On preview branch, I'm able to load the table successfully.

Tested:

  • sorting
  • filtering
  • inserting rows
  • changing data
  • add / remove / rename columns
  • wasn't able to export table to csv (separate issue)
  • pagination works

Copy link
Member

@joshenlim joshenlim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did some basic smoke testing on the preview - it all looks great! 😄
Also checked opening tables outside of the public schema (both protected schemas (e.g auth) and custom schemas), no issues there too

given that unit tests + e2e tests + smoke tests are all passing
manually verified the updated SQL when retrieving the table rows too

reckon this should be good to go 🙏🙂

@joshenlim joshenlim merged commit c0bddbb into master Apr 17, 2025
18 of 20 checks passed
@joshenlim joshenlim deleted the fix/table-editor-fetch-long-tables branch April 17, 2025 04:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants