-
Notifications
You must be signed in to change notification settings - Fork 7
Add backfill conflicts documentation #98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Document how Geneva handles conflicts during backfill operations, including safe vs conflicting operations, automatic retry behavior, and recovery steps. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
docs/geneva/jobs/conflicts.mdx
Outdated
| 3. Run compaction/optimization | ||
| ``` | ||
|
|
||
| ### Use INSERT-Only Operations During Backfill |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ### Use INSERT-Only Operations During Backfill | |
| ### Use Insert-Only Operations During Backfill |
docs/geneva/jobs/conflicts.mdx
Outdated
|
|
||
| ### Use INSERT-Only Operations During Backfill | ||
|
|
||
| If you need to add data while a backfill is running, use INSERT-only operations: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| If you need to add data while a backfill is running, use INSERT-only operations: | |
| If you need to add data while a backfill is running, use insert-only operations: |
dantasse
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks! Putting my AI engineer hat on, the worries that come to mind first are:
- running backfills on col1 and col2 simultaneously, we'll get the full results for both columns, right? (This doc answers this well I think)
- if another job deletes row 10, while my backfill tries to write to row 10, row 10 will still be gone, right? (I think this answers this well too)
- if a compaction or deletion occurs mid backfill, we won't write bad data, right? (e.g. writing the result for row 5 to row 4 because row 1 was deleted) - I don't think you explicitly say this. Which is probably fine because this is a pretty niche case that would be pretty weird if it did happen. But eh, brave new world of data lakes, I've seen weirder
So I think all the info you need is here; I might just add a sentence at the top emphasizing "backfills can't result in data loss, or writing incorrect data".
Yes
If it is just a delete and not concurrent, we may calculate the value and write it but the delete marker should override it. So a little extra compute but logically correct.
if a compaction or delete happens mid backfill, we won't write bad data. We'll fail the fragment, partially complete the job, and require a subsequent backfill call will recalculate the values that were null. (assumed not completed, committed before the compaction). I think that's documented, will double check.
yeah, I'll add that. conflict with compactions could result in partial commits that should only do small amount of recalculation. (note: there are further optmizations we could do, but it gets quite complicated, so we punted). |

Summary
Closes GEN-290
Test plan
npx mintlify devand verify new page renders correctly🤖 Generated with Claude Code