-
Notifications
You must be signed in to change notification settings - Fork 728
feat: define repositories datasource [CM-884] #3758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
2 similar comments
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
|
|
||
| ENGINE ReplacingMergeTree | ||
| ENGINE_PARTITION_KEY toYear(createdAt) | ||
| ENGINE_SORTING_KEY url |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong sorting key causes incorrect deduplication behavior
High Severity
The ENGINE_SORTING_KEY is set to url instead of id, but the description states id is the primary key identifier. With ReplacingMergeTree, rows with the same sorting key are deduplicated, keeping the one with the highest updatedAt. Using url means repository renames (URL changes) will create duplicate records for the same repository that won't be deduplicated. Additionally, different repositories that somehow share a URL would be incorrectly merged. Other datasources in this codebase consistently use id as the sorting key.
This pull request adds a new data source definition for repositories in the
services/libs/tinybird/datasources/repositories.datasourcefile. The new data source captures repository metadata and associations, supporting analytics and integration use cases.Repository data source definition:
repositoriesdata source, including fields for repository URLs, integration references, project associations, archival and exclusion status, timestamps, and parent repository tracking.ReplacingMergeTreeengine with partitioning by creation year and sorting by URL, optimizing for efficient querying and deduplication.Note
Adds a new Tinybird
repositoriesdatasource for replicated repository metadata and associations.url, integration/project links, archival/exclusion flags,forkedFrom, and lifecycle timestamps (includingdeletedAtandlastArchivedCheckAt)ReplacingMergeTreewith partitioning bytoYear(createdAt), sorting byurl, and versioning viaupdatedAt; tagged as "Repositories"Written by Cursor Bugbot for commit e917ea9. This will update automatically on new commits. Configure here.