Skip to content

Commit

Permalink
[SPARK-47319][SQL] Improve missingInput calculation
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?
This PR improves `QueryPlan.missingInput()` calculation.

### Why are the changes needed?
This seems to be the root cause of `DeduplicateRelations` slowness in some cases.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing UTs.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#45424 from peter-toth/fix-missinginput.

Authored-by: Peter Toth <peter.toth@gmail.com>
Signed-off-by: Kent Yao <yao@apache.org>
  • Loading branch information
peter-toth authored and yaooqinn committed Mar 8, 2024
1 parent 54f1572 commit f659f8d
Show file tree
Hide file tree
Showing 2 changed files with 20 additions and 8 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -104,13 +104,19 @@ class AttributeSet private (private val baseSet: mutable.LinkedHashSet[Attribute
* in `other`.
*/
def --(other: Iterable[NamedExpression]): AttributeSet = {
other match {
// SPARK-32755: `--` method behave differently under scala 2.12 and 2.13,
// use a Scala 2.12 based code to maintains the insertion order in Scala 2.13
case otherSet: AttributeSet =>
new AttributeSet(baseSet.clone() --= otherSet.baseSet)
case _ =>
new AttributeSet(baseSet.clone() --= other.map(a => new AttributeEquals(a.toAttribute)))
if (isEmpty) {
AttributeSet.empty
} else if (other.isEmpty) {
this
} else {
other match {
// SPARK-32755: `--` method behave differently under scala 2.12 and 2.13,
// use a Scala 2.12 based code to maintains the insertion order in Scala 2.13
case otherSet: AttributeSet =>
new AttributeSet(baseSet.clone() --= otherSet.baseSet)
case _ =>
new AttributeSet(baseSet.clone() --= other.map(a => new AttributeEquals(a.toAttribute)))
}
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,13 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]]
/**
* Attributes that are referenced by expressions but not provided by this node's children.
*/
final def missingInput: AttributeSet = references -- inputSet
final def missingInput: AttributeSet = {
if (references.isEmpty) {
AttributeSet.empty
} else {
references -- inputSet
}
}

/**
* Runs [[transformExpressionsDown]] with `rule` on all expressions present
Expand Down

0 comments on commit f659f8d

Please sign in to comment.