๐Ÿฅ

[Spark] ์ŠคํŒŒํฌ ์Šค์ผ€์ฅด๋ง ๋ณธ๋ฌธ

๋ฐ์ดํ„ฐ/Spark

[Spark] ์ŠคํŒŒํฌ ์Šค์ผ€์ฅด๋ง

•8• 2024. 4. 1. 15:22

Application ์Šค์ผ€์ฅด๋ง

๊ด€๋ จํ•˜์—ฌ https://community.cloudera.com/t5/Community-Articles/Dynamic-Allocation-in-Apache-Spark/ta-p/368095  ์ด์ชฝ์— ์ •๋ฆฌ๊ฐ€ ์ž˜ ๋˜์–ด ์žˆ๋‹ค.

SRA(Static Resource Allocation)

spark ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜์€ ์‹คํ–‰๋˜๊ธฐ ์ „ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜์—์„œ ์‚ฌ์šฉํ•  ๋ฆฌ์†Œ์Šค๋ฅผ ๋ฏธ๋ฆฌ ์˜ˆ์•ฝํ•ด๋‘”๋‹ค.

์ด ๋ฆฌ์†Œ์Šค์˜ ์–‘์€ ๊ณ ์ •๋˜์–ด์„œ ๋Ÿฐํƒ€์ž„์ค‘์—๋Š” ๋ณ€๊ฒฝ์ด ๋ถˆ๊ฐ€๋Šฅํ•˜๋‹ค. ๋งŒ์•ฝ ํ• ๋‹น๋œ ๊ฒƒ๋ณด๋‹ค ๋” ๋งŽ์€ ๋ฆฌ์†Œ์Šค๊ฐ€ ํ•„์š”ํ•˜๋‹ค๋ฉด ์‹คํ–‰์‹œ๊ฐ„์ด ๊ธธ์–ด์ง€๊ฑฐ๋‚˜ ์‹คํŒจํ•  ์ˆ˜ ์žˆ๋‹ค.

๋ฆฌ์†Œ์Šค๊ฐ€ ๊ณ ์ •๋˜์–ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ๋งŒ์•ฝ ํ• ๋‹น๋œ ๋ฆฌ์†Œ์Šค๊ฐ€ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜์—์„œ ์‚ฌ์šฉ๋˜์ง€ ์•Š๋Š”๋‹ค๋ฉด ๋น„ํšจ์œจ์ ์ธ ๋ฆฌ์†Œ์Šค ํ™œ์šฉ์œผ๋กœ ์ด์–ด์ง€๊ฒŒ ๋œ๋‹ค.

๋˜ํ•œ ๋Ÿฐํƒ€์ž„์ค‘์— ๋ฆฌ์†Œ์Šค๋ฅผ ์กฐ์ •ํ•˜๋Š”๊ฒƒ์ด ๋ถˆ๊ฐ€๋Šฅํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ค‘๊ฐ„์— ๋ฆฌ์†Œ์Šค ํˆฌ์ž…์ด ๋” ํ•„์š”ํ•˜๋‹ค๋ฉด out of memory ์—๋Ÿฌ๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋‹ค.

http://www.riveriq.com/blogs/2018/08/dynamic-allocation-in-spark

 

DRA(Dynamic Resource Allocation)

dynamic resource allocation์„ ํ™œ์„ฑํ™”์‹œํ‚ค๋ฉด ๋Ÿฐํƒ€์ž„์ค‘์— ํ•„์š”์— ๋”ฐ๋ผ ๋ฆฌ์†Œ์Šค๊ฐ€ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜์— ํ• ๋‹น๋œ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋ฆฌ์†Œ์Šค ํ™œ์šฉ๋„๊ฐ€ ํ–ฅ์ƒ๋˜๊ณ  ๋ฆฌ์†Œ์Šค ํ™œ์šฉ๋„๊ฐ€ ๋‚ฎ๊ฑฐ๋‚˜ ๊ณผ๋„ํ•˜๊ฒŒ ํ™œ์šฉํ•˜๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•  ์ˆ˜ ์žˆ๋‹ค. dynamic resouce allocation์—์„œ์˜ ๋ฆฌ์†Œ์Šค ๊ณต์œ  ๋‹จ์œ„๋Š” Executor์ด๋ฉฐ, ์•„๋ž˜์— ๋‚˜์˜ค๋Š” ๋ชจ๋“  "๋ฆฌ์†Œ์Šค"๋‹จ์–ด์˜ ๋‹จ์œ„๋Š” Executor์ด๋‹ค.

http://www.riveriq.com/blogs/2018/08/dynamic-allocation-in-spark

๋™์ ํ• ๋‹น์„ ํ•จ์œผ๋กœ์จ ์ƒ๊ธฐ๋Š” ์žฅ์ ์€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

  1. ๋ฆฌ์†Œ์Šค๋ฅผ ํšจ์œจ์ ์œผ๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๊ณ  ํด๋Ÿฌ์Šคํ„ฐ์˜ ์ „๋ฐ˜์ ์ธ ํšจ์œจ์„ฑ์„ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค.
  2. ์›Œํฌ๋กœ๋“œ์— ๋”ฐ๋ผ ํ• ๋‹น๋œ ๋ฆฌ์†Œ์Šค๋ฅผ ํ™•์žฅํ•˜๊ฑฐ๋‚˜ ์ถ•์†Œํ•  ์ˆ˜ ์žˆ๋‹ค.
  3. ๋ฆฌ์†Œ์Šค๋ฅผ ํšจ์œจ์ ์œผ๋กœ ํ• ๋‹นํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜ ์ „์ฒด ๋น„์šฉ์„ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค.
  4. ํ•œ ํด๋Ÿฌ์Šคํ„ฐ์—์„œ ์—ฌ๋Ÿฌ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜์ด ์‹คํ–‰๋  ๋•Œ ๊ณต์ •ํ•˜๊ฒŒ ๋ฆฌ์†Œ์Šค๋ฅผ ํ• ๋‹น๋ฐ›์„ ์ˆ˜ ์žˆ๋‹ค.

๋ฐ˜๋Œ€๋กœ ๋‹จ์ ์€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

  1. ์ŠคํŒŒํฌ์—์„œ ์›Œํฌ๋กœ๋“œ๋ฅผ ์ง€์†์ ์œผ๋กœ ๋ชจ๋‹ˆํ„ฐ๋งํ•˜๊ณ  ๋ฆฌ์†Œ์Šค ํ• ๋‹น์„ ์กฐ์ •ํ•ด์•ผํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ถ”๊ฐ€์ ์ธ ์˜ค๋ฒ„ํ—ค๋“œ๊ฐ€ ๋ฐœ์ƒํ•˜๊ณ , ์ด๋ถ€๋ถ„์€ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜ ์„ฑ๋Šฅ์— ์˜ํ–ฅ์„ ๋ฏธ์น  ์ˆ˜ ์žˆ๋‹ค.
  2. ๋ฆฌ์†Œ์Šค ์กฐ์ •์— ๋”ฐ๋ฅธ ๋ ˆ์ดํ„ด์‹œ๊ฐ€ ์žˆ์„ ์ˆ˜ ์žˆ๋‹ค.
  3. ๊ด€๋ จ ์†์„ฑ๊ฐ’ ๊ตฌ์„ฑ์ด ์ถ”๊ฐ€๋กœ ํ•„์š”ํ•ด ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜ ๊ด€๋ฆฌ ๋ฐ ๋ฐฐํฌ๊ฐ€ ๋” ๋ณต์žกํ•ด์งˆ ์ˆ˜ ์žˆ๋‹ค.
  4. ํ• ๋‹น๋œ ๋ฆฌ์†Œ์Šค๊ฐ€ ์ž์ฃผ ๋ณ€๊ฒฝ๋˜๋ฉด ์‚ฌ์šฉํ•  executor์— ๋Œ€ํ•ด์„œ ์˜ˆ์ธก๋ถˆ๊ฐ€๋Šฅํ•ด์ง„๋‹ค.
  5. ๋งค ๋ฆฌ์†Œ์Šค ์š”์ฒญ ๋ฐ ํ•ด์ œ ์‹œ๋งˆ๋‹ค node manager์™€ ํ†ต์‹ ํ•ด์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋„คํŠธ์›Œํฌ ๋น„์šฉ์„ ์ฆ๊ฐ€์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค.
  6. ํ• ๋‹น๋œ ๋ฆฌ์†Œ์Šค๊ฐ€ ๋ถ€์กฑํ•˜๋‹ค๋ฉด ์…”ํ”Œ ํ”„๋กœ์„ธ์Šค๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” yarn shuffle service๊ฐ€ ์˜ค๋ฒ„๋กœ๋“œ ๋  ์ˆ˜ ์žˆ๊ณ , ์ด๋Š” ์ „์ฒด ํด๋Ÿฌ์Šคํ„ฐ ์„ฑ๋Šฅ ์ €ํ•˜๋ฅผ ์•ผ๊ธฐํ•  ์ˆ˜ ์žˆ๋‹ค. ๋™์ ์œผ๋กœ ํ• ๋‹น๋˜๋Š” ๋ฆฌ์†Œ์Šค ๋ชจ๋‹ˆํ„ฐ๋ง์„ ์ž˜ ํ•ด์•ผํ•œ๋‹ค.

๊ด€๋ จ ๊ตฌ์„ฑ

Property Name Default Value Description
spark.shuffle.service.enabled false Enables the external shuffle service.
spark.dynamicAllocation.enabled false Set this to true to enable dynamic allocation.
spark.dynamicAllocation.minExecutors 0 ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜์— ํ• ๋‹นํ•ด์•ผ ํ•˜๋Š” ์ตœ์†Œ executor ์ˆ˜
spark.dynamicAllocation.initialExecutors spark.dynamicAllocation.minExecutors ๋™์  ํ• ๋‹น์ด ํ™œ์„ฑํ™”๋œ ๊ฒฝ์šฐ ์‹คํ–‰ํ• 
initial executor ์ˆ˜

`--num-executors` (or `spark.executor.instances`) ๊ฐ€ ์„ค์ •๋˜์–ด ์žˆ๊ณ , ์ด ๊ฐ’๋ณด๋‹ค ํฐ ๊ฒฝ์šฐ์— ์‚ฌ์šฉ๋จ
spark.dynamicAllocation.maxExecutors infinity ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜์— ํ• ๋‹นํ•ด์•ผ ํ•˜๋Š” ์ตœ๋Œ€ executor ์ˆ˜

* ์ฐธ๊ณ : yarn shuffle service

https://mallikarjuna_g.gitbooks.io/spark/content/yarn/spark-yarn-YarnShuffleService.html

dynamic resource allocation์„ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด ํ™œ์„ฑํ™”ํ•ด์•ผํ•˜๋Š” external shuffle service. ์ŠคํŒŒํฌ executor๊ฐ€ ์…”ํ”Œ ํŒŒ์ผ์„ ๊ฐ€์ ธ์˜ค๋Š”๋ฐ ์‚ฌ์šฉ๋˜๋ฉฐ, executor ํ•˜๋‚˜๊ฐ€ fail๋˜๋”๋ผ๋„ shuffled file์€ ์†์‹ค๋˜์ง€ ์•Š๋Š”๋‹ค.

https://medium.com/@rachit1arora/apache-spark-shuffle-service-there-are-more-than-one-options-c1a8e098230e

 

Job ์Šค์ผ€์ฅด๋ง

FIFO ์Šค์ผ€์ฅด๋ง

ํ•œ job์˜ ๋ชจ๋“  task๊ฐ€ ์ข…๋ฃŒ๋˜๋ฉด ๋‹ค์Œ job์„ ์‹คํ–‰ํ•˜๋Š” First in first out ์Šค์ผ€์ฅด๋ง์ด๋‹ค.

๋””ํดํŠธ๋กœ FIFO ๋ฐฉ์‹์œผ๋กœ ๋ฆฌ์†Œ์Šค๋ฅผ ํ• ๋‹นํ•˜๋ฉฐ, ์ฒซ ๋ฒˆ์งธ ์ •์˜๋œ ์ž‘์—…์ด ์‚ฌ์šฉ๊ฐ€๋Šฅํ•œ ๋ชจ๋“  ๋ฆฌ์†Œ์Šค์— ๋Œ€ํ•ด ์šฐ์„ ์ˆœ์œ„๋ฅผ ๊ฐ–๊ฒŒ ๋œ๋‹ค.

์ฒซ ๋ฒˆ์งธ job์—์„œ ๋ชจ๋“  ๋ฆฌ์†Œ์Šค๊ฐ€ ํ•„์š”ํ•˜์ง€ ์•Š๋‹ค๋ฉด ๋‹ค์Œ ์ž‘์—…์—์„œ๋„ ํ•ด๋‹น ๋ฆฌ์†Œ์Šค๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

https://www.waitingforcode.com/apache-spark/fair-jobs-scheduling-apache-spark/read#FAIR_scheduling_Apache_Spark

๊ทธ๋Ÿฌ๋‚˜ ๋ฌธ์ œ๋  ์ˆ˜ ์žˆ๋Š” ์ƒํ™ฉ์€ ์ฒซ ๋ฒˆ์จฐ Job์—์„œ ์‹คํ–‰ํ•ด์•ผ ํ•˜๋Š” task ๊ฐ€ ๋งŽ์„ ๋•Œ ๋‹ค์Œ Job์˜ task ์ˆ˜๊ฐ€ ์•„๋ฌด๋ฆฌ ์ž‘์•„๋„ ๋‹ค์Œ job์€ ์ฒซ ๋ฒˆ์จฐ job์ด ์™„๋ฃŒ๋˜๊ธฐ๋ฅผ ๊ธฐ๋‹ค๋ ค์•ผ ํ•œ๋‹ค๋Š” ์ง€์ ์ด๋‹ค.

์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด FAIR ์Šค์ผ€์ฅด๋ง์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

 

FAIR ์Šค์ผ€์ฅด๋ง

Fair ์Šค์ผ€์ฅด๋ง์€ ๋ผ์šด๋“œ ๋กœ๋นˆ ๋ฐฉ์‹์œผ๋กœ ์ž‘๋™ํ•œ๋‹ค.

์•„๋ž˜ ์ด๋ฏธ์ง€์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋‹ค์‹œํ”ผ job2๋Š” job1์ด ์ข…๋ฃŒ๋  ๋•Œ๊นŒ์ง€ ๊ธฐ๋‹ค๋ฆด ํ•„์š” ์—†์ด ๊ฐ€๋Šฅํ•œ ํ•œ ๋นจ๋ฆฌ ์‹œ์ž‘๋  ์ˆ˜ ์žˆ๋‹ค.

ํ•˜๋‚˜์˜ application ๋‚ด์—์„œ ์—ฌ๋Ÿฌ ์ž‘์—…์˜ ์‹คํ–‰์‹œ๊ฐ„์„ ์ตœ์ ํ™”ํ•  ์ˆ˜ ์žˆ๋Š” ๋ชจ๋“œ ์ค‘ ํ•˜๋‚˜์ด๋‹ค. FIFO ์™€ ๋‹ฌ๋ฆฌ ์ž‘์—… ๊ฐ„ ๋ฆฌ์†Œ์Šค๋ฅผ ๊ณต์œ ํ•˜๊ธฐ ๋–„๋ฌธ์— ๊ธด ์‹œ๊ฐ„ ์‹คํ–‰๋˜๋Š” ํ•˜๋‚˜์˜ job๋•Œ๋ฌธ์— ๋ฆฌ์†Œ์Šค๊ฐ€ lock๋˜๋Š” ์ƒํ™ฉ์ผ ํ”ผํ•  ์ˆ˜ ์žˆ๋‹ค.

 

์‚ฌ์šฉ๋ฐฉ๋ฒ•

`spark.scheduler.mode`๋ฅผ FAIR๋กœ ์„ค์ •ํ•ด์ฃผ๋ฉด ๋œ๋‹ค.

from pyspark.sql SparkSession
from pyspark import SparkConf

conf = SparkConf()
conf.set("spark.scheduler.mode", "FAIR")
spark = SparkSession.builder.appName(...)\
		...
        .config(conf=conf)\
        ...

FAIR ๋ชจ๋“œ ์‹คํ–‰ ๋ชจ์Šต

scheduler pool

https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application

Job์„ pool๋กœ ์ผ์ข…์˜ ๊ทธ๋ฃนํ™”๋ฅผ ํ•˜์—ฌ ๊ด€๋ฆฌํ•˜๊ณ  ๊ฐ pool์— ๊ฐ€์ค‘์น˜ ๋“ฑ ์„œ๋กœ ๋‹ค๋ฅธ ์˜ต์…˜ ์„ค์ •์„ ์ง€์›ํ•œ๋‹ค.

์Šค์ผ€์ฅด๋ง ๋ชจ๋“œ๋ฅผ FAIR๋กœ ์„ค์ • ์‹œ ๊ธฐ๋ณธ์ ์œผ๋กœ ๊ฐ pool์€ ํด๋Ÿฌ์Šคํ„ฐ์—์„œ ๋™์ผํ•œ FAIR ๋ชจ๋“œ๋กœ ์‹คํ–‰๋˜์ง€๋งŒ, ๊ฐ pool ๋‚ด์—์„œ์˜ ์ž‘์—…์€ FIFO๋กœ ์ˆ˜ํ–‰๋œ๋‹ค.

 

 

Pool ๊ด€๋ จ ์†์„ฑ

`schedulingMode`

pool ๋Œ€๊ธฐ์—ด ๋‚ด์˜ ์ž‘์—…์˜ ์Šค์ผ€์ฅด๋ง ๋ชจ๋“œ์ด๋‹ค. FIFO or FAIR

`weight`

๋‹ค๋ฅธ pool์„ ๊ธฐ์ค€์œผ๋กœ ํด๋Ÿฌ์Šคํ„ฐ์˜ pool ๊ณต์œ ๋ฅผ ์ œ์–ดํ•œ๋‹ค. ๋””ํดํŠธ ๊ฐ’์€ 1๋กœ ๋ชจ๋“  pool์˜ ๊ฐ€์ค‘์น˜๋Š” 1์ด๋‹ค.

์˜ˆ๋ฅผ ๋“ค์–ด ํŠน์ • pool์— ๊ฐ€์ค‘์น˜๋ฅผ 2๋กœ ์„ค์ •ํ•˜๋ฉด ๋‹ค๋ฅธ pool๋“ค๋ณด๋‹ค 2๋ฐฐ ๋” ๋งŽ์€ ๋ฆฌ์†Œ์Šค๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.

1000 ๊ณผ ๊ฐ™์€ ์•„์ฃผ ํฐ ์ˆ˜๋กœ ์„ค์ •ํ•ด pool๊ฐ„์˜ ์šฐ์„ ์ˆœ์œ„๋ฅผ ๊ตฌํ˜„ํ•  ์ˆ˜๋„ ์žˆ๋‹ค. ( ๊ฐ€์ค‘์น˜ 1000 ๋กœ ์„ค์ •ํ•˜๋ฉด ์ฆ‰ ํ•ด๋‹น pool์— ์žˆ๋Š” ์ž‘์—…์ด ํ™œ์„ฑํ™”๋  ๋•Œ๋งˆ๋‹ค ํ•ญ์ƒ ๋จผ์ € ์ž‘์—…์„ ์‹œ์ž‘ํ•˜๊ฒŒ ๋œ๋‹ค๋Š” ์˜๋ฏธ์ด๋‹ค.)

`minShare`

์ „์ฒด ๊ฐ€์ค‘์น˜์™€ ๋ณ„๋„๋กœ ๊ฐ pool์˜ ์ตœ์†Œ ๋ฆฌ์†Œ์Šค(CPU ์ฝ”์–ด) ๋ฅผ ์„ค์ •ํ•œ๋‹ค. ์Šค์ผ€์ฅด๋Ÿฌ๋Š” ๊ฐ€์ค‘์น˜์— ๋”ฐ๋ผ ๋ฆฌ์†Œ์Šค๋ฅผ ์žฌ๋ถ„๋ฐฐํ•˜๊ธฐ ์ „์— ํ•ญ์ƒ ๋ชจ๋“  ํ™œ์„ฑ pool์˜ minShare๋ฅผ ์ถฉ์กฑํ•ด์•ผ ํ•œ๋‹ค.

๋”ฐ๋ผ์„œ minShare ๊ฐ’์€ pool ์ด ํ•ญ์ƒ ํŠน์ • ์ˆ˜์˜ ๋ฆฌ์†Œ์Šค(CPU ์ฝ”์–ด)๋ฅผ ๋น ๋ฅด๊ฒŒ ํ™•๋ณดํ•  ์ˆ˜ ์žˆ๋„๋ก ๋ณด์žฅํ•  ์ˆ˜ ์žˆ๋Š” ๊ฐ’์ด์–ด์•ผ ํ•œ๋‹ค. ๋””ํดํŠธ ๊ฐ’์€ 0์ด๋‹ค.

 

Pool ์‚ฌ์šฉ ๋ฐฉ๋ฒ•

๋จผ์ € XML ํŒŒ์ผ์„ ์ž‘์„ฑํ•ด ๊ฐ Pool ์„ ์ •์˜ํ•ด์ค€๋‹ค.

<?xml version="1.0"?>
<allocations>
  <pool name="production">
    <schedulingMode>FAIR</schedulingMode>
    <weight>1</weight>
    <minShare>2</minShare>
  </pool>
  <pool name="test">
    <schedulingMode>FIFO</schedulingMode>
    <weight>2</weight>
    <minShare>3</minShare>
  </pool>
</allocations>

๊ทธ๋ฆฌ๊ณ  spark config ๋‚ด `spark.scheduler.allocation.file`์— ํ•ด๋‹น ํŒŒ์ผ ์œ„์น˜๋ฅผ ์ž‘์„ฑํ•ด์ค€๋‹ค. ์•„๋‹ˆ๋ฉด conf ๋””๋ ‰ํ† ๋ฆฌ์˜ fairscheduler.xml ํŒŒ์ผ์— ๊ธฐ์žฌํ•ด์ฃผ์–ด๋„ ๋œ๋‹ค.

# scheduler file at local
conf.set("spark.scheduler.allocation.file", "file:///path/to/file")
# scheduler file at hdfs
conf.set("spark.scheduler.allocation.file", "hdfs:///path/to/file")

pool์„ ์ •์˜ํ•  ๋•Œ์—๋Š” ์•„๋ž˜์™€ ๊ฐ™์ด ์–ด๋–ค pool์ธ์ง€ ์„ค์ •ํ•ด์ฃผ๋ฉด ๋œ๋‹ค. ์ฐธ๊ณ ๋กœ ์•„๋ž˜์™€ ๊ฐ™์ด ์„ค์ •ํ•˜๋ฉด "test"๋ผ๋Š” ์ด๋ฆ„์˜ pool ์ด ์žˆ๋Š”์ง€ ๋จผ์ € spark.scheduler.allocation.file์„ ํ™•์ธํ•˜๊ณ , fairscheduler.xml ํŒŒ์ผ์„ ํ™•์ธํ•˜๊ณ , ์ผ์น˜ํ•˜๋Š” ํ’€์ด ์—†๋‹ค๋ฉด default pool๋กœ ์ด๋™ํ•œ๋‹ค.

# Assuming sc is your SparkContext variable
sc.setLocalProperty("spark.scheduler.pool", "test")

์—ฐ๊ฒฐ๋œ pool์„ ์ง€์šฐ๊ธฐ ์œ„ํ•ด์„œ๋Š” ์•„๋ž˜์™€ ๊ฐ™์ด null๋กœ ์„ค์ •ํ•ด์ฃผ๋ฉด ๋œ๋‹ค. ๊ทธ๋Ÿผ ํ•ด๋‹น ์“ฐ๋ ˆ๋“œ ๋‚ด์—์„œ ์ œ์ถœ๋œ ์ž‘์—…๋“ค์€ ๋””ํดํŠธ pool๋กœ ๋“ค์–ด๊ฐ„๋‹ค.

# Assuming sc is your SparkContext variable
sc.setLocalProperty("spark.scheduler.pool", None)

 

 

์ฐธ๊ณ 

https://youtu.be/BLT6eHcT-e8?si=dgDttOPjW7I40mH-

https://www.waitingforcode.com/apache-spark/fair-jobs-scheduling-apache-spark/read#FAIR_scheduling_Apache_Spark

https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application

 

'๋ฐ์ดํ„ฐ > Spark' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[Spark] Accumulator์™€ Broadcast (๊ณต์œ ๋ณ€์ˆ˜)  (0) 2024.04.01
[Spark] SQL Hint  (0) 2024.04.01
[Spark] cache()์™€ persist()  (0) 2024.04.01
[Spark] Speculative Execution  (0) 2024.04.01
[Spark] Logical Plan ๊ณผ Physical Plan  (0) 2024.03.25