Skip to content

Conversation

@jmhsieh
Copy link
Contributor

@jmhsieh jmhsieh commented Jan 28, 2026

Summary

  • Add new documentation page explaining how Geneva handles backfill conflicts
  • Cover safe vs conflicting operations during backfill
  • Document automatic retry behavior and recovery steps
  • Add best practices for avoiding conflicts

Closes GEN-290

Test plan

  • Run npx mintlify dev and verify new page renders correctly
  • Check navigation shows "Conflicts" under Geneva > Job execution
  • Verify links to related docs work

🤖 Generated with Claude Code

Document how Geneva handles conflicts during backfill operations,
including safe vs conflicting operations, automatic retry behavior,
and recovery steps.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
3. Run compaction/optimization
```

### Use INSERT-Only Operations During Backfill
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Use INSERT-Only Operations During Backfill
### Use Insert-Only Operations During Backfill


### Use INSERT-Only Operations During Backfill

If you need to add data while a backfill is running, use INSERT-only operations:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If you need to add data while a backfill is running, use INSERT-only operations:
If you need to add data while a backfill is running, use insert-only operations:

Copy link
Contributor

@dantasse dantasse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks! Putting my AI engineer hat on, the worries that come to mind first are:

  • running backfills on col1 and col2 simultaneously, we'll get the full results for both columns, right? (This doc answers this well I think)
  • if another job deletes row 10, while my backfill tries to write to row 10, row 10 will still be gone, right? (I think this answers this well too)
  • if a compaction or deletion occurs mid backfill, we won't write bad data, right? (e.g. writing the result for row 5 to row 4 because row 1 was deleted) - I don't think you explicitly say this. Which is probably fine because this is a pretty niche case that would be pretty weird if it did happen. But eh, brave new world of data lakes, I've seen weirder

So I think all the info you need is here; I might just add a sentence at the top emphasizing "backfills can't result in data loss, or writing incorrect data".

@jmhsieh
Copy link
Contributor Author

jmhsieh commented Jan 28, 2026

Looks good, thanks! Putting my AI engineer hat on, the worries that come to mind first are:

  • running backfills on col1 and col2 simultaneously, we'll get the full results for both columns, right? (This doc answers this well I think)

Yes

  • if another job deletes row 10, while my backfill tries to write to row 10, row 10 will still be gone, right? (I think this answers this well too)

If it is just a delete and not concurrent, we may calculate the value and write it but the delete marker should override it. So a little extra compute but logically correct.

  • if a compaction or deletion occurs mid backfill, we won't write bad data, right? (e.g. writing the result for row 5 to row 4 because row 1 was deleted) - I don't think you explicitly say this. Which is probably fine because this is a pretty niche case that would be pretty weird if it did happen. But eh, brave new world of data lakes, I've seen weirder

if a compaction or delete happens mid backfill, we won't write bad data. We'll fail the fragment, partially complete the job, and require a subsequent backfill call will recalculate the values that were null. (assumed not completed, committed before the compaction). I think that's documented, will double check.

So I think all the info you need is here; I might just add a sentence at the top emphasizing "backfills can't result in data loss, or writing incorrect data".

yeah, I'll add that. conflict with compactions could result in partial commits that should only do small amount of recalculation. (note: there are further optmizations we could do, but it gets quite complicated, so we punted).

@jmhsieh
Copy link
Contributor Author

jmhsieh commented Jan 28, 2026

I think that concern is covered with this caveat, going to commit
image

@jmhsieh jmhsieh merged commit 33832e8 into main Jan 28, 2026
2 checks passed
@jmhsieh jmhsieh deleted the jon/gen-290-docs-backfill-conflicts branch January 28, 2026 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants