๐ฅ
[Spark] TaskMemoryManager: Failed to allocate a page (2097152 bytes), try again. ์ค๋ฅ ๋ฐ์ ์ ํด๊ฒฐ ๋ฐฉ๋ฒ ๋ณธ๋ฌธ
[Spark] TaskMemoryManager: Failed to allocate a page (2097152 bytes), try again. ์ค๋ฅ ๋ฐ์ ์ ํด๊ฒฐ ๋ฐฉ๋ฒ
•8• 2023. 5. 22. 22:13๋ฎจ์ ์ํฉ
dataframe ๋ ๊ฐ๋ฅผ ์กฐ์ธํ๋ ค๋๋ฐ ์๋์ ๊ฐ์ ์๋์ด ์ฃผ๋ฅด๋ฅต ๋ฐ์ํ๋๋ ์ค๋ฅ๋ฅผ ์ถ๋ ฅํ๊ณ ์ดํ๋ฆฌ์ผ์ด์ ์ด ์ข ๋ฃ๋๋ค.
...
23/05/22 04:39:35 WARN TaskMemoryManager: Failed to allocate a page (2097152 bytes), try again.
23/05/22 04:39:35 WARN TaskMemoryManager: Failed to allocate a page (2097152 bytes), try again.
23/05/22 04:39:35 WARN TaskMemoryManager: Failed to allocate a page (2097152 bytes), try again.
23/05/22 04:39:36 WARN TaskMemoryManager: Failed to allocate a page (2097152 bytes), try again.
23/05/22 04:39:36 WARN TaskMemoryManager: Failed to allocate a page (2097152 bytes), try again.
23/05/22 04:39:36 WARN TaskMemoryManager: Failed to allocate a page (2097152 bytes), try again.
23/05/22 04:39:36 WARN TaskMemoryManager: Failed to allocate a page (2097152 bytes), try again.
23/05/22 04:39:37 WARN TaskMemoryManager: Failed to allocate a page (2097152 bytes), try again.
23/05/22 04:39:37 WARN TaskMemoryManager: Failed to allocate a page (2097152 bytes), try again.
23/05/22 04:39:38 WARN TaskMemoryManager: Failed to allocate a page (2097152 bytes), try again.
23/05/22 04:39:38 WARN TaskMemoryManager: Failed to allocate a page (2097152 bytes), try again.
23/05/22 04:39:38 WARN TaskMemoryManager: Failed to allocate a page (2097152 bytes), try again.
23/05/22 04:39:39 WARN TaskMemoryManager: Failed to allocate a page (2097152 bytes), try again.
23/05/22 04:39:39 WARN TaskMemoryManager: Failed to allocate a page (2097152 bytes), try again.
23/05/22 04:39:39 WARN TaskMemoryManager: Failed to allocate a page (2097152 bytes), try again.
23/05/22 04:39:39 WARN TaskMemoryManager: Failed to allocate a page (2097152 bytes), try again.
23/05/22 04:39:40 WARN TaskMemoryManager: Failed to allocate a page (2097152 bytes), try again.
23/05/22 04:39:40 WARN TaskMemoryManager: Failed to allocate a page (2097152 bytes), try again.
23/05/22 04:39:40 WARN TaskMemoryManager: Failed to allocate a page (2097152 bytes), try again.
23/05/22 04:39:41 WARN TaskMemoryManager: Failed to allocate a page (2097152 bytes), try again.
23/05/22 04:39:41 WARN TaskMemoryManager: Failed to allocate a page (2097152 bytes), try again.
23/05/22 04:39:41 WARN TaskMemoryManager: Failed to allocate a page (2097152 bytes), try again.
23/05/22 04:39:41 WARN TaskMemoryManager: Failed to allocate a page (2097152 bytes), try again.
...
์ด ์๋ฌ๋ TaskMemoryManager๊ฐ ํ์ด์ง๋ฅผ ํ ๋นํ ๋ ๋ฐ์ํ๋ ๊ฒ ๊ฐ๋ค.
(์ฐธ๊ณ :
resourceManager์๊ฒ ์ด์ ์ ๋ฉ๋ชจ๋ฆฌ ํ ๋น์ ํ๋ฝ๋ฐ์ ํ์ `tungstenMemoryAllocator`๋ฅผ ์ด์ฉํด ๋ฉ๋ชจ๋ฆฌ๋ฅผ ํ ๋น๋ฐ๊ณ `pageNumber`๋ฅผ ์ ๋ฐ์ดํธ ํ๋ค. ์ด๋ OutOfMemory๊ฐ ๋ฐ์ํ๋ฉด ํด๋น ์๋ฌ๋ฅผ ๋ฐ์์ํจ๋ค.
๋ฌธ์ ์ ์ํฉ์์๋ ํ ์ชฝ์ ๋ฐ์ดํฐํ๋ ์ ์ฌ์ด์ฆ๊ฐ ์์๋ค. (skewed data)
์ค์ ๋ก dag๋ฅผ ์ดํด๋ณด๋ broadcastHashJoin์ ์ฌ์ฉํ๊ณ ์์๋ค.
๋ฐ์ดํฐ์ฌ์ด์ฆ๊ฐ ์ถฉ๋ถํ ์์ ๊ฒ ๊ฐ์๋ฐ ์ ์ฌ๊ธฐ์ ์ค๋ฅ๊ฐ ๋๋์ง๋ ๋ชจ๋ฅด๊ฒ ๋ค ใ
ํด๊ฒฐ๋ฐฉ๋ฒ1. ์์ ์ชฝ์ ๋ฐ์ดํฐํ๋ ์ ํฌ๊ธฐ๋ฅผ ์์๋ก ํค์ฐ๊ธฐ
small_data = small_data.union(small_data).union(small_data).union(small_data).union(small_data).union(small_data).union(small_data).union(small_data)
big_data.join(small_data.distinct(), [conditions], how='left')
์ผ๋งค๊ธด ํ๋ฐ ์์ ๋ฐ์ดํฐ๋ฅผ union์ ํตํด ํฌ๊ธฐ๋ฅผ ํค์์ค ๋ค, join ์ distinct()๋ฅผ ํตํด ์ค๋ณต์ ๊ฑฐํ๊ณ ์ฌ์ฉํ๋ฉด ์ค๋ฅ ์์ด ์ ์์ ์ผ๋ก ์คํ๋๋ค.
์ด ๊ฒฝ์ฐ์๋ broadcast hash join์ ์ฌ์ฉํ์ง ์์๋ค.
ํด๊ฒฐ๋ฐฉ๋ฒ2. executor ๋ฉ๋ชจ๋ฆฌ ํฌ๊ธฐ ๋๋ฆฌ๊ธฐ
spark.executor.memory
ํด๊ฒฐ๋ฐฉ๋ฒ3. broadcastjoin ๋นํ์ฑํ
์๋์ config ๊ฐ์ -1๋ก ์ค์ ํ๋ฉด ๋นํ์ฑํ๋์ด ๋ค๋ฅธ ์กฐ์ธ ๋ฐฉ๋ฒ์ ์ฌ์ฉํ๋ค.
spark.sql.autoBroadcastJoinThreshold
๊ทธ๋ฐ๋ฐ ์ด์ํ๊ฑด ์ด๊ฒ ๋ ๋๋ ์๊ณ ์๋ ๋๋ ์์ด์ ์กฐ๊ฑด์ ํ์ธํด๋ด์ผํ ๊ฒ ๊ฐ๋ค.
์ถ๊ฐ: Project Tungsten
TaskMemoryManager.java์์ ๋ณผ ์ ์๋ค์ํผ ๋ฉ๋ชจ๋ฆฌ ๊ด๋ จ ํจ์๋ ๋ณ์๋ช ์ `tungsten`์ด๋ผ๋ ๋จ์ด๊ฐ ๋ง์ด ๋ฑ์ฅํด์ ๊ถ๊ธํด์ ์ฐพ์๋ดค๋ค.
Tungsten์ด๋ผ๋ ํ๋ก์ ํธ๊ฐ ์๋๋ฐ Spark 2๋ถํฐ ๋์ ๋ ์์ง์ด๋ค.
๋์คํฌI/O๋ ๋คํธ์ํฌ ์ชฝ์ ๊ฐ์ ํ๋ฉด์ ์ฑ๋ฅ ํฅ์์ ๋๋ชจํ๋ ๋ค๋ฅธ ํ๋ก์ ํธ์๋ ๋ฌ๋ฆฌ ์คํํฌ์ CPU์ ๋ฉ๋ชจ๋ฆฌ ๊ด๋ฆฌ์ ๊ธฐ์ฌํ๊ณ ์ด ๋ถ๋ถ์์ ๋ง์ ์ฑ๋ฅ ํฅ์์ ์ฑ ์์ง๊ณ ์๋ค๊ณ ํ๋ค.
(sun.misc.Unsafe ์ฌ์ฉ) Tungsten์ JVM์ heap memory๋ฅผ ์ฌ์ฉํ์ง ์๊ณ native ์์ญ ๋ฉ๋ชจ๋ฆฌ๋ฅผ ์ฌ์ฉํ์ฌ ์ด๋ฌํ ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๋ค๊ณ ํ๋ค.
์ฐธ๊ณ ๋ก sparkSQL ์ฌ์ฉ ์ ์ต์ ํ์๋ ์ฌ์ฉํ๋ค: Catalyst optimizer๋ก ๋ ผ๋ฆฌ์ ์ฟผ๋ฆฌ ๊ณํ์์ ๋ฌผ๋ฆฌ์ ์ฟผ๋ฆฌ ๊ณํ์ ์์ฑํ ํ์ Tungsten์ Codegen ๊ธฐ๋ฅ์ ์ฌ์ฉํ์ฌ ์ต์ ํ๋ ์ฝ๋๋ฅผ ์์ฑํ๋ค๊ณ ํ๋ค.
Tungsten์ ์ญํ ์ ํฌ๊ฒ ์๋์ ๊ฐ๋ค.
1. Off-Heap Memory Management: ๋ฉ๋ชจ๋ฆฌ๋ฅผ ๊ด๋ฆฌ
2. Cache Locality: ์บ์๋ฅผ ์ ๋ฐฐ์น..(for high cache hit rate)
3. CodeGen: Whole-Stage Code Generation
๊ด๋ จ ์์ฑ์ผ๋ก๋ `spark.sql.tungsten.enabled` ๊ฐ ์๋ค.
(์ฐธ๊ณ : https://www.linkedin.com/pulse/catalyst-tungsten-apache-sparks-speeding-engine-deepak-rajak)