Skip to content

vdk-trino: collect lineage for select/insert and rename table only#756

Merged
philip-alexiev merged 9 commits intomainfrom
person/palexiev/trino_lineage
Mar 15, 2022
Merged

vdk-trino: collect lineage for select/insert and rename table only#756
philip-alexiev merged 9 commits intomainfrom
person/palexiev/trino_lineage

Conversation

@philip-alexiev
Copy link

Why:
To make lineage collecting more production ready,
some improvements are needed.

What:
In order to reduce the load on the query engine,
only plans for insert/select queries are calculated.
For rename table queries, the plan doesn't give information.
The query is parsed and table names extracted.
Counting the number of rows in the output table before and after
is removed to reduce the burden on the query engine.

How has this been tested:
Tweaked the test_vdk_trino_lineage.py test
to be more comprehensive and cover all scenarios.

What type of change are you making?
Bug fix (non-breaking change which fixes an issue)
or a cosmetic change/minor improvement

Signed-off-by: Philip Alexiev (palexiev@vmware.com)

Why:
To make lineage collecting more production ready,
some improvements are needed.

What:
In order to reduce the load on the query engine,
   only plans for insert/select queries are calculated.
For rename table queries, the plan doesn't give information.
   The query is parsed and table names extracted.
Counting the number of rows in the output table before and after
   is removed to reduce the burden on the query engine.

How has this been tested:
Tweaked the test_vdk_trino_lineage.py test
  to be more comprehensive and cover all scenarios.

What type of change are you making?
Bug fix (non-breaking change which fixes an issue)
  or a cosmetic change/minor improvement

Signed-off-by: Philip Alexiev (palexiev@vmware.com)
@antoniivanov
Copy link
Contributor

antoniivanov commented Mar 8, 2022

I am not sure if you noticed - the CI tests failed (ci/gitlab/gitlab.com -> Click Details) - https://gitlab.com/vmware-analytics/versatile-data-kit/-/jobs/2176218096

Why:
To make lineage collecting more production ready,
some improvements are needed.

What:
In order to reduce the load on the query engine,
   only plans for insert/select queries are calculated.
For rename table queries, the plan doesn't give information.
   The query is parsed and table names extracted.
Counting the number of rows in the output table before and after
   is removed to reduce the burden on the query engine.

How has this been tested:
Tweaked the test_vdk_trino_lineage.py test
  to be more comprehensive and cover all scenarios.

What type of change are you making?
Bug fix (non-breaking change which fixes an issue)
  or a cosmetic change/minor improvement

Signed-off-by: Philip Alexiev (palexiev@vmware.com)
@philip-alexiev philip-alexiev self-assigned this Mar 8, 2022
Why:
To make lineage collecting more production ready,
some improvements are needed.

What:
In order to reduce the load on the query engine,
   only plans for insert/select queries are calculated.
For rename table queries, the plan doesn't give information.
   The query is parsed and table names extracted.
Counting the number of rows in the output table before and after
   is removed to reduce the burden on the query engine.

How has this been tested:
Tweaked the test_vdk_trino_lineage.py test
  to be more comprehensive and cover all scenarios.

What type of change are you making?
Bug fix (non-breaking change which fixes an issue)
  or a cosmetic change/minor improvement

Signed-off-by: Philip Alexiev (palexiev@vmware.com)
Copy link
Contributor

@antoniivanov antoniivanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. You can add a few more tests about some corner cases.

@philip-alexiev
Copy link
Author

@tozka Thank you for the review and valuable comments.

Philip Alexiev added 4 commits March 11, 2022 17:03
Why:
To make lineage collecting more production ready,
some improvements are needed.

What:
In order to reduce the load on the query engine,
   only plans for insert/select queries are calculated.
For rename table queries, the plan doesn't give information.
   The query is parsed and table names extracted.
Counting the number of rows in the output table before and after
   is removed to reduce the burden on the query engine.

How has this been tested:
Tweaked the test_vdk_trino_lineage.py test
  to be more comprehensive and cover all scenarios.

What type of change are you making?
Bug fix (non-breaking change which fixes an issue)
  or a cosmetic change/minor improvement

Signed-off-by: Philip Alexiev (palexiev@vmware.com)
Why:
To make lineage collecting more production ready,
some improvements are needed.

What:
In order to reduce the load on the query engine,
   only plans for insert/select queries are calculated.
For rename table queries, the plan doesn't give information.
   The query is parsed and table names extracted.
Counting the number of rows in the output table before and after
   is removed to reduce the burden on the query engine.

How has this been tested:
Tweaked the test_vdk_trino_lineage.py test
  to be more comprehensive and cover all scenarios.

What type of change are you making?
Bug fix (non-breaking change which fixes an issue)
  or a cosmetic change/minor improvement

Signed-off-by: Philip Alexiev (palexiev@vmware.com)
Why:
To make lineage collecting more production ready,
some improvements are needed.

What:
In order to reduce the load on the query engine,
   only plans for insert/select queries are calculated.
For rename table queries, the plan doesn't give information.
   The query is parsed and table names extracted.
Counting the number of rows in the output table before and after
   is removed to reduce the burden on the query engine.

How has this been tested:
Tweaked the test_vdk_trino_lineage.py test
  to be more comprehensive and cover all scenarios.

What type of change are you making?
Bug fix (non-breaking change which fixes an issue)
  or a cosmetic change/minor improvement

Signed-off-by: Philip Alexiev (palexiev@vmware.com)
Philip Alexiev added 2 commits March 15, 2022 10:51
Why:
To make lineage collecting more production ready,
some improvements are needed.

What:
In order to reduce the load on the query engine,
   only plans for insert/select queries are calculated.
For rename table queries, the plan doesn't give information.
   The query is parsed and table names extracted.
Counting the number of rows in the output table before and after
   is removed to reduce the burden on the query engine.

How has this been tested:
Tweaked the test_vdk_trino_lineage.py test
  to be more comprehensive and cover all scenarios.

What type of change are you making?
Bug fix (non-breaking change which fixes an issue)
  or a cosmetic change/minor improvement

Signed-off-by: Philip Alexiev (palexiev@vmware.com)
Why:
To make lineage collecting more production ready,
some improvements are needed.

What:
In order to reduce the load on the query engine,
   only plans for insert/select queries are calculated.
For rename table queries, the plan doesn't give information.
   The query is parsed and table names extracted.
Counting the number of rows in the output table before and after
   is removed to reduce the burden on the query engine.

How has this been tested:
Tweaked the test_vdk_trino_lineage.py test
  to be more comprehensive and cover all scenarios.

What type of change are you making?
Bug fix (non-breaking change which fixes an issue)
  or a cosmetic change/minor improvement

Signed-off-by: Philip Alexiev (palexiev@vmware.com)
@philip-alexiev philip-alexiev merged commit 15a119c into main Mar 15, 2022
@philip-alexiev philip-alexiev deleted the person/palexiev/trino_lineage branch March 15, 2022 14:48
ivakoleva pushed a commit that referenced this pull request Mar 22, 2022
)

* vdk-trino: collect lineage for select/insert and rename table only

Why:
To make lineage collecting more production ready,
some improvements are needed.

What:
In order to reduce the load on the query engine,
   only plans for insert/select queries are calculated.
For rename table queries, the plan doesn't give information.
   The query is parsed and table names extracted.
Counting the number of rows in the output table before and after
   is removed to reduce the burden on the query engine.

How has this been tested:
Tweaked the test_vdk_trino_lineage.py test
  to be more comprehensive and cover all scenarios.

What type of change are you making?
Bug fix (non-breaking change which fixes an issue)
  or a cosmetic change/minor improvement

Signed-off-by: Philip Alexiev (palexiev@vmware.com)

* vdk-trino: collect lineage for select/insert and rename table only

Why:
To make lineage collecting more production ready,
some improvements are needed.

What:
In order to reduce the load on the query engine,
   only plans for insert/select queries are calculated.
For rename table queries, the plan doesn't give information.
   The query is parsed and table names extracted.
Counting the number of rows in the output table before and after
   is removed to reduce the burden on the query engine.

How has this been tested:
Tweaked the test_vdk_trino_lineage.py test
  to be more comprehensive and cover all scenarios.

What type of change are you making?
Bug fix (non-breaking change which fixes an issue)
  or a cosmetic change/minor improvement

Signed-off-by: Philip Alexiev (palexiev@vmware.com)

* vdk-trino: collect lineage for select/insert and rename table only

Why:
To make lineage collecting more production ready,
some improvements are needed.

What:
In order to reduce the load on the query engine,
   only plans for insert/select queries are calculated.
For rename table queries, the plan doesn't give information.
   The query is parsed and table names extracted.
Counting the number of rows in the output table before and after
   is removed to reduce the burden on the query engine.

How has this been tested:
Tweaked the test_vdk_trino_lineage.py test
  to be more comprehensive and cover all scenarios.

What type of change are you making?
Bug fix (non-breaking change which fixes an issue)
  or a cosmetic change/minor improvement

Signed-off-by: Philip Alexiev (palexiev@vmware.com)

* vdk-trino: collect lineage for select/insert and rename table only

Why:
To make lineage collecting more production ready,
some improvements are needed.

What:
In order to reduce the load on the query engine,
   only plans for insert/select queries are calculated.
For rename table queries, the plan doesn't give information.
   The query is parsed and table names extracted.
Counting the number of rows in the output table before and after
   is removed to reduce the burden on the query engine.

How has this been tested:
Tweaked the test_vdk_trino_lineage.py test
  to be more comprehensive and cover all scenarios.

What type of change are you making?
Bug fix (non-breaking change which fixes an issue)
  or a cosmetic change/minor improvement

Signed-off-by: Philip Alexiev (palexiev@vmware.com)

* vdk-trino: collect lineage for select/insert and rename table only

Why:
To make lineage collecting more production ready,
some improvements are needed.

What:
In order to reduce the load on the query engine,
   only plans for insert/select queries are calculated.
For rename table queries, the plan doesn't give information.
   The query is parsed and table names extracted.
Counting the number of rows in the output table before and after
   is removed to reduce the burden on the query engine.

How has this been tested:
Tweaked the test_vdk_trino_lineage.py test
  to be more comprehensive and cover all scenarios.

What type of change are you making?
Bug fix (non-breaking change which fixes an issue)
  or a cosmetic change/minor improvement

Signed-off-by: Philip Alexiev (palexiev@vmware.com)

* vdk-trino: collect lineage for select/insert and rename table only

Why:
To make lineage collecting more production ready,
some improvements are needed.

What:
In order to reduce the load on the query engine,
   only plans for insert/select queries are calculated.
For rename table queries, the plan doesn't give information.
   The query is parsed and table names extracted.
Counting the number of rows in the output table before and after
   is removed to reduce the burden on the query engine.

How has this been tested:
Tweaked the test_vdk_trino_lineage.py test
  to be more comprehensive and cover all scenarios.

What type of change are you making?
Bug fix (non-breaking change which fixes an issue)
  or a cosmetic change/minor improvement

Signed-off-by: Philip Alexiev (palexiev@vmware.com)

* vdk-trino: collect lineage for select/insert and rename table only

Why:
To make lineage collecting more production ready,
some improvements are needed.

What:
In order to reduce the load on the query engine,
   only plans for insert/select queries are calculated.
For rename table queries, the plan doesn't give information.
   The query is parsed and table names extracted.
Counting the number of rows in the output table before and after
   is removed to reduce the burden on the query engine.

How has this been tested:
Tweaked the test_vdk_trino_lineage.py test
  to be more comprehensive and cover all scenarios.

What type of change are you making?
Bug fix (non-breaking change which fixes an issue)
  or a cosmetic change/minor improvement

Signed-off-by: Philip Alexiev (palexiev@vmware.com)

* vdk-trino: collect lineage for select/insert and rename table only

Why:
To make lineage collecting more production ready,
some improvements are needed.

What:
In order to reduce the load on the query engine,
   only plans for insert/select queries are calculated.
For rename table queries, the plan doesn't give information.
   The query is parsed and table names extracted.
Counting the number of rows in the output table before and after
   is removed to reduce the burden on the query engine.

How has this been tested:
Tweaked the test_vdk_trino_lineage.py test
  to be more comprehensive and cover all scenarios.

What type of change are you making?
Bug fix (non-breaking change which fixes an issue)
  or a cosmetic change/minor improvement

Signed-off-by: Philip Alexiev (palexiev@vmware.com)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants