π₯
[Spark] SQL Hint λ³Έλ¬Έ
SQL Hint
μ°Έκ³ : https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-hints.html
λ¬Έλ²
/*+ hint [ , ... ] */
νν°μ λ ννΈ
νν°μ λ μ λ΅μ μ μν μ μλ€.
COALESCE
`coalesce` dataset APIμ λμΌνλ€. 맀κ°λ³μλ νν°μ κ°μμ΄κ³ , μ§μ λ νν°μ μλ‘ μ€μΌ μ μλ€.
SELECT /*+ COALESCE(3) */ * FROM t;
REPARTITION
`repartition` dataset API μ λμΌνλ€.νν°μ κ°μ, 컬λΌλͺ μ 맀κ°λ³μλ‘ μ¬μ©ν μ μκ³ μ§μ λ μμ νν°μ μΌλ‘ λ€μ λΆν νλλ°μ μ¬μ©λλ€.
SELECT /*+ REPARTITION(3) */ * FROM t;
SELECT /*+ REPARTITION(c) */ * FROM t;
SELECT /*+ REPARTITION(3, c) */ * FROM t;
REPARTITION_BY_RANGE
μ§μ λ μμ νν°μ μΌλ‘ λΆν νλλ°μ μ¬μ©λλ€. `repartitionByRange` apiμ λμΌνλ€.
SELECT /*+ REPARTITION_BY_RANGE(c) */ * FROM t;
SELECT /*+ REPARTITION_BY_RANGE(3, c) */ * FROM t;
REBALANCE
SQL Hintμμλ§ μ¬μ©ν μ μλ€. 쿼리 κ²°κ³Ό Output νν°μ μ κ· νμ μ¬μ‘°μ νλλ°μ μ¬μ©λλ€. λͺ¨λ νν°μ μ΄ μ μ ν ν¬κΈ°κ° λλλ‘ νλ€. 컬λΌλͺ μ 맀κ°λ³μλ‘ μ¬μ©νλ€λ©΄ μ€μ ν 컬λΌμ κΈ°μ€μΌλ‘ 쿼리 κ²°κ³Όλ₯Ό μ΅μ μ λ Έλ ₯μΌλ‘ λΆν νλ€. skewed dataλΌλ©΄ REBALANCEλ₯Ό μ¬μ©ν μ μλ€. AQEκ° νμ±νλμ§ μμλ€λ©΄ 무μλλ ννΈμ΄λ€.
SELECT /*+ REBALANCE */ * FROM t;
SELECT /*+ REBALANCE(3) */ * FROM t;
SELECT /*+ REBALANCE(c) */ * FROM t;
SELECT /*+ REBALANCE(3, c) */ * FROM t;
μ¬λ¬ ννΈλ₯Ό μ μνλ€λ©΄ κ°μ₯ μΌμͺ½μ μλ Hintλ₯Ό μ ννλ€. (μλμ κ²½μ° REPARTITION)
-- multiple partitioning hints
EXPLAIN EXTENDED SELECT /*+ REPARTITION(100), COALESCE(500), REPARTITION_BY_RANGE(3, c) */ * FROM t;
== Parsed Logical Plan ==
'UnresolvedHint REPARTITION, [100]
+- 'UnresolvedHint COALESCE, [500]
+- 'UnresolvedHint REPARTITION_BY_RANGE, [3, 'c]
+- 'Project [*]
+- 'UnresolvedRelation [t]
== Analyzed Logical Plan ==
name: string, c: int
Repartition 100, true
+- Repartition 500, false
+- RepartitionByExpression [c#30 ASC NULLS FIRST], 3
+- Project [name#29, c#30]
+- SubqueryAlias spark_catalog.default.t
+- Relation[name#29,c#30] parquet
== Optimized Logical Plan ==
Repartition 100, true
+- Relation[name#29,c#30] parquet
== Physical Plan ==
Exchange RoundRobinPartitioning(100), false, [id=#121]
+- *(1) ColumnarToRow
+- FileScan parquet default.t[name#29,c#30] Batched: true, DataFilters: [], Format: Parquet,
Location: CatalogFileIndex[file:/spark/spark-warehouse/t], PartitionFilters: [],
PushedFilters: [], ReadSchema: struct<name:string>
μ‘°μΈ ννΈ
μ‘°μΈ μ λ΅μ μ μν μ μλ€.
Spark3.0 μ΄μ μλ broadcast join hintλ§ μ§μλμμΌλ 3.0λΆν°λ `MERGE`, `SHUFFLE_HASH`, `SHUFFLE_REPLICATE_NL` μ΄ μΆκ°λμλ€.
νΉμ μ λ΅μ μ§μνμ§ μλ μ‘°μΈ νμ μ΄ μμ μ μκΈ° λλ¬Έμ ννΈλ₯Ό μ 곡νλ€κ³ 무쑰건 μ 곡λ μ λ΅μ μ¬μ©νλ€κ³ 보μ₯λ°μ μ μλ€.
BROADCAST
λΈλ‘λμΌμ€νΈ μ‘°μΈμ μ μνλ€. μ‘°μΈ μμΉ μ΄λμ μλ , ν¬κΈ°κ° λ μμ μͺ½μ΄ λΈλ‘λμΊμ€νΈ λλ€.
-- Join Hints for broadcast join
SELECT /*+ BROADCAST(t1) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;
SELECT /*+ BROADCASTJOIN (t1) */ * FROM t1 left JOIN t2 ON t1.key = t2.key;
SELECT /*+ MAPJOIN(t2) */ * FROM t1 right JOIN t2 ON t1.key = t2.key;
MERGE
shuffle sort merge joinμ μ μνλ€.
-- Join Hints for shuffle sort merge join
SELECT /*+ SHUFFLE_MERGE(t1) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;
SELECT /*+ MERGEJOIN(t2) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;
SELECT /*+ MERGE(t1) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;
SHUFFLE_HASH
shuffle hash joinμ μ¬μ©νλλ‘ μ μνλ€. λ§μ½ shuffle hash hintλ₯Ό μ 곡λ°λλ€λ©΄ λ μμ μͺ½μ build sideλ‘ μ§μ νλ€.
-- Join Hints for shuffle hash join
SELECT /*+ SHUFFLE_HASH(t1) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;
SHUFFLE_REPLICATE_NL
shuffle-and-replicate nested loop joinμ μ¬μ©νλλ‘ μ μνλ€.
-- Join Hints for shuffle-and-replicate nested loop join
SELECT /*+ SHUFFLE_REPLICATE_NL(t1) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;
μλμ κ°μ΄ μ‘°μΈμ μμͺ½μ μλ‘ λ€λ₯Έ ννΈκ° μ§μ λλ€λ©΄ μλμ μ°μ μμλ‘ ννΈλ₯Ό μ€μ νλ€.
`BROADCAST` > `MERGE` > `SHUFFLE_HASH` > SHUFFLE_REPLICATE_NL`
SELECT /*+ BROADCAST(t1), MERGE(t1, t2) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;
μμ μμμμλ BROADCAST μ‘°μΈμ μ¬μ©νκ³ μλμ κ°μ΄ μλ λ‘κ·Έλ₯Ό λ°μμν¨λ€.
org.apache.spark.sql.catalyst.analysis.HintErrorLogger: Hint (strategy=merge) is overridden by another hint and will not take effect.
'λ°μ΄ν° > Spark' μΉ΄ν κ³ λ¦¬μ λ€λ₯Έ κΈ
[Spark] Spark Structured Streaming κ°μ (0) | 2024.04.04 |
---|---|
[Spark] Accumulatorμ Broadcast (곡μ λ³μ) (0) | 2024.04.01 |
[Spark] μ€νν¬ μ€μΌμ₯΄λ§ (0) | 2024.04.01 |
[Spark] cache()μ persist() (0) | 2024.04.01 |
[Spark] Speculative Execution (0) | 2024.04.01 |