fix(spark): correct handling of ‘remap’ property#818
fix(spark): correct handling of ‘remap’ property#818andrew-coleman wants to merge 3 commits intosubstrait-io:mainfrom
Conversation
| } | ||
| if (createProject) { | ||
| Project(projectList, child) | ||
| val ps = output.map(_.toAttribute) ++ projectList |
There was a problem hiding this comment.
My imagination falls a bit short in coming up with a meaning for ps. Is there are a more expressive variable name we can use here?
| @@ -0,0 +1,123 @@ | |||
| /* | |||
| * Licensed to the Apache Software Foundation (ASF) under one or more | |||
There was a problem hiding this comment.
since this is a new file and Substrait not being part of the Apache Software Foundation do we want to include a license header like this one?
| SparkSession.builder().config("spark.master", "local").enableHiveSupport().getOrCreate() | ||
|
|
||
| test("Aggregate") { | ||
| val substraitPlanPath = Path.of("../src/test/resources/substrait_plan_with_aggregate_op.json") |
There was a problem hiding this comment.
I'm a bit undecided on the testing approach for this specific feature.
I feel like there would be a simpler testing approach where one could use SubstraitBuilder to construct Substrait plans containing remap and roundtripping them Substrait -> Spark -> Substrait without the need for going through JSON / add the test data as CSVs.
On the other hand @benbellick also added JSON plans as testcases to other modules in this repo and I have proposed adding actual execution of plans against Postgres in #700 also including test data.
Would be curious to know what others think.
When running the substrait to spark converter on plans that were generated by another system, it was found that the emit/outputMappings (remap) property was not being processed correctly. This commit fixes that. Extra tests were added, containing substrait plans in JSON proto format that were generated by a third party system. Signed-off-by: Andrew Coleman <andrew_coleman@uk.ibm.com>
Signed-off-by: Andrew Coleman <andrew_coleman@uk.ibm.com>
Assisted by AI Signed-off-by: Andrew Coleman <andrew_coleman@uk.ibm.com>
When running the substrait to spark converter on plans that were generated by another system, it was found that the emit/outputMappings (remap) property was not being processed correctly. This commit fixes that.
Extra tests were added, containing substrait plans in JSON proto format that were generated by a third party system.