[Airflow] Nifi Trigger Operator ๋งŒ๋“ค๊ธฐ

[Airflow] Nifi Trigger Operator ๋งŒ๋“ค๊ธฐ

airflow์—์„œ nifi processor๋กœ trigger๋ฅผ ๋ณด๋‚ด์•ผ ํ–ˆ๋‹ค. ์‚ฌ์‹ค nifi์—์„œ restAPI๋ฅผ ์ง€์›ํ•ด์ฃผ๊ธฐ ๋•Œ๋ฌธ์— NIFI ETL ์ž์ฒด๋ฅผ ์‹œ์ž‘ํ•˜๋Š” ๊ฒƒ์€ ํฐ ๋ฌธ์ œ๊ฐ€ ์•ˆ๋์ง€๋งŒ nifi ์ž‘์—… ์ดํ›„์— airflow์—์„œ ํ›„์† task๊ฐ€ ์žˆ๋‹ค๋ฉด ETL์ด ์–ด๋А ์‹œ์ ์— ๋๋‚˜๋Š” ์ง€๋ฅผ ํ™•์ธํ•˜๋Š” ๊ฒŒ ์–ด๋ ค์› ๋‹ค. NIFI๋Š” processor ๋‹จ์œ„๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋ฉฐ, flowfile์ด processor์— ์ธํ’‹๋˜๋ฉด ๋ฐ”๋กœ ์ฒ˜๋ฆฌ ํ›„ output stream์œผ๋กœ ๋‚ด๋ณด๋‚ด๊ธฐ ๋•Œ๋ฌธ์— ์ข…๋ฃŒ ์‹œ์ ์„ ๋ช…ํ™•ํžˆ ์•Œ ์ˆ˜๊ฐ€ ์—†์—ˆ๋‹ค. ์•„๋ž˜ ๋ธ”๋กœ๊ทธ์—์„œ airflow์—์„œ nifi ๋ฅผ ํŠธ๋ฆฌ๊ฑฐํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์„ ์†Œ๊ฐœํ–ˆ์ง€๋งŒ ์‹œ์ž‘ ๋…ธ๋“œ๋ฅผ `GenerateFlowFile`๋กœ ์„ค์ •ํ•ด์•ผ ํ•˜๋Š” ๋“ฑ ์ œ์•ฝ์กฐ๊ฑด์ด ์žˆ์—ˆ๋‹ค. https://towardsdatasci..

[Spark] Spark Structured Streaming - Fault Tolerance

[Spark] Spark Structured Streaming - Fault Tolerance

Background ์‹ค์‹œ๊ฐ„ stream ์ฒ˜๋ฆฌ๋Š” ์ง€์†์ ์ธ input stream์˜ ํŠน์„ฑ ์ƒ ์ค‘๋‹จ๋˜์ง€ ์•Š๊ณ  24์‹œ๊ฐ„ ์‹คํ–‰๋˜๋ฏ€๋กœ ์—ฌ๋Ÿฌ ์˜ค๋ฅ˜ ์›์ธ์œผ๋กœ failure์ด ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋‹ค. input stream์ด ์ž‘์„ฑ๋œ ์ฝ”๋“œ๋กœ ์ฒ˜๋ฆฌ๋  ์ˆ˜ ์—†๋Š” ๊ฒฝ์šฐ (์ž˜๋ชป๋œ ํ˜•์‹์˜ ๋ฐ์ดํ„ฐ) ์‹œ์Šคํ…œ/cluster ์˜ค๋ฅ˜์˜ ๊ฒฝ์šฐ Spark Streaming์—์„œ์˜ Fault Tolerance ์ •์˜ ์ŠคํŒŒํฌ์˜ ๋ชฉํ‘œ๋Š” end-to-end Exactly Once ๋ณด์žฅ์ด๋‹ค. (์ฐธ๊ณ ) At Most Once: ๊ฐ ๋ ˆ์ฝ”๋“œ๋Š” ํ•œ ๋ฒˆ๋งŒ ์ฒ˜๋ฆฌ๋˜๊ฑฐ๋‚˜ ์•„์˜ˆ ์ฒ˜๋ฆฌ๋˜์ง€ ์•Š๋Š”๋‹ค. At Least Once: ๊ฐ ๋ ˆ์ฝ”๋“œ๊ฐ€ ํ•œ ๋ฒˆ ์ด์ƒ ์ฒ˜๋ฆฌ๋œ๋‹ค. ๋ฐ์ดํ„ฐ๊ฐ€ ์†์‹ค๋˜์ง€ ์•Š๋„๋ก ๋ณด์žฅํ•˜๋ฏ€๋กœ At Most Once๋ณด๋‹ค ๊ฐ•๋ ฅํ•˜์ง€๋งŒ ์ค‘๋ณต์ด ์žˆ์„ ์ˆ˜ ์žˆ๋‹ค. Exactly Once: ๊ฐ ..

[Spark] Accumulator์™€ Broadcast (๊ณต์œ ๋ณ€์ˆ˜)

[Spark] Accumulator์™€ Broadcast (๊ณต์œ ๋ณ€์ˆ˜)

์ฐธ๊ณ : https://spark.apache.org/docs/latest/rdd-programming-guide.html#shared-variables ๊ณต์œ ๋ณ€์ˆ˜ ์ŠคํŒŒํฌ์—์„œ๋Š” ๋ถ„์‚ฐํ•˜์—ฌ ๋ณ‘๋ ฌ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„ํ• ํ•ด์„œ(ํŒŒํ‹ฐ์…”๋‹) ์—ฌ๋Ÿฌ ๋จธ์‹ ์—์„œ ๋™์‹œ์— ์ฒ˜๋ฆฌํ•œ๋‹ค. ์ด๋•Œ ์‚ฌ์šฉ๋œ ๋ชจ๋“  ๋ณ€์ˆ˜๋Š” ๋ฐฐํฌ๋  ๋•Œ ๋ณต์‚ฌ๋œ ๋ณต์‚ฌ๋ณธ์œผ๋กœ ์ž‘๋™ํ•˜๋Š”๋ฐ, ์ด๋Ÿฌํ•œ ๋ณ€์ˆ˜์˜ ์—…๋ฐ์ดํŠธ๋Š” driver์— ๋‹ค์‹œ ์ „๋‹ฌ๋˜์ง€ ์•Š๋Š”๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์–ด๋–ค ์กฐ๊ฑด(?) ์„ ๊ฐ€์ง€๊ณ  ์žˆ์–ด์•ผ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋ฐœ์ƒํ•˜๋Š”๋ฐ, ์ด๋Ÿด๋•Œ ์ œํ•œ๋œ ์œ ํ˜•์˜ ๊ณต์œ  ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ ํ›„ ์›ํ•˜๋Š” ๊ฐ’์„ ๋ฐ›์„ ์ˆ˜ ์žˆ๋‹ค. Broadcast Variable ๋ธŒ๋กœ๋“œ์บ์ŠคํŠธ ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋ณ€์ˆ˜์˜ ๋ณต์‚ฌ๋ณธ์„ ์ „๋‹ฌํ•˜๋Š”๊ฒŒ ์•„๋‹Œ, Read Only ๋ณ€์ˆ˜๋ฅผ ๊ฐ ์›Œ์ปค ๋…ธ๋“œ์— ์บ์‹œ๋œ ์ƒํƒœ๋กœ..