Skip to content

Conversation

@mbani01
Copy link
Contributor

@mbani01 mbani01 commented Jan 16, 2026

This pull request adds a new data source definition for repositories in the services/libs/tinybird/datasources/repositories.datasource file. The new data source captures repository metadata and associations, supporting analytics and integration use cases.

Repository data source definition:

  • Added a detailed description, schema, and configuration for the repositories data source, including fields for repository URLs, integration references, project associations, archival and exclusion status, timestamps, and parent repository tracking.
  • Configured the data source to use the ReplacingMergeTree engine with partitioning by creation year and sorting by URL, optimizing for efficient querying and deduplication.

Note

Adds a new Tinybird repositories datasource for replicated repository metadata and associations.

  • Defines schema fields for IDs, url, integration/project links, archival/exclusion flags, forkedFrom, and lifecycle timestamps (including deletedAt and lastArchivedCheckAt)
  • Configures ReplacingMergeTree with partitioning by toYear(createdAt), sorting by url, and versioning via updatedAt; tagged as "Repositories"

Written by Cursor Bugbot for commit e917ea9. This will update automatically on new commits. Configure here.

@mbani01 mbani01 requested a review from epipav January 16, 2026 15:41
@mbani01 mbani01 self-assigned this Jan 16, 2026
@github-actions
Copy link
Contributor

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

  • feat: add user authentication (CM-123)
  • feat: add user authentication (IN-123)

Projects:

  • CM: Community Data Platform
  • IN: Insights

Please add a Jira issue key to your PR title.

2 similar comments
@github-actions
Copy link
Contributor

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

  • feat: add user authentication (CM-123)
  • feat: add user authentication (IN-123)

Projects:

  • CM: Community Data Platform
  • IN: Insights

Please add a Jira issue key to your PR title.

@github-actions
Copy link
Contributor

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

  • feat: add user authentication (CM-123)
  • feat: add user authentication (IN-123)

Projects:

  • CM: Community Data Platform
  • IN: Insights

Please add a Jira issue key to your PR title.

@mbani01 mbani01 changed the title feat: define repositories datasource feat: define repositories datasource [CM-884] Jan 16, 2026
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.


ENGINE ReplacingMergeTree
ENGINE_PARTITION_KEY toYear(createdAt)
ENGINE_SORTING_KEY url
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong sorting key causes incorrect deduplication behavior

High Severity

The ENGINE_SORTING_KEY is set to url instead of id, but the description states id is the primary key identifier. With ReplacingMergeTree, rows with the same sorting key are deduplicated, keeping the one with the highest updatedAt. Using url means repository renames (URL changes) will create duplicate records for the same repository that won't be deduplicated. Additionally, different repositories that somehow share a URL would be incorrectly merged. Other datasources in this codebase consistently use id as the sorting key.

Fix in Cursor Fix in Web

@mbani01 mbani01 merged commit 7401cda into main Jan 16, 2026
20 checks passed
@mbani01 mbani01 deleted the feat/repositories_datasource branch January 16, 2026 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants