๋ชฉ๋ก์ „์ฒด ๊ธ€ (59)

๐Ÿฅ

[Spark] Structured Streaming - stateful transformation๊ณผ Window operation

streaming ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•˜๋ฉด์„œ filter, count ๋“ฑ ์—ฌ๋Ÿฌ๊ฐ€์ง€ ์—ฐ์‚ฐ์ž๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.๊ณผ๊ฑฐ ๋ฐ์ดํ„ฐ์˜ ์ •๋ณด๊ฐ€ ํ•„์š”์—†๋Š” stateless ์—ฐ์‚ฐ์ž์™€, ๊ณผ๊ฑฐ์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ํ•„์š”ํ•ด ๋ฐ์ดํ„ฐ ์œ ์ง€๊ฐ€ ํ•„์š”ํ•œ statefull ์—ฐ์‚ฐ์ž๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ๋‹ค.Stateless Trasformation๊ณผ๊ฑฐ์˜ ๋ฐ์ดํ„ฐ์™€ ์ƒ๊ด€ ์—†์ด ํ˜„์žฌ์˜ ๋ฐ์ดํ„ฐ๋งŒ ์‚ฌ์šฉํ•˜๋Š” ์—ฐ์‚ฐ์žStateful Transfmraion์ด์ „ ๋ฐฐ์น˜์˜ ๋ฐ์ดํ„ฐ์˜ ์œ ์ง€๊ฐ€ ํ•„์š”ํ•œ ์—ฐ์‚ฐ์ž์˜ˆ: aggreagion, counter, join, group, windowing ๋“ฑStateless Transformation๊ฐ ๋ฐฐ์น˜๋ฅผ ์ด์ „ ๋ฐฐ์น˜๋ฅผ ์ฐธ์กฐํ•˜์ง€ ์•Š๊ณ  ๋…๋ฆฝ์ ์œผ๋กœ ์ฒ˜๋ฆฌํ•œ๋‹ค. -> ๋ฐฐ์น˜์˜ ์ถœ๋ ฅ์ด ํ•ด๋‹น ๋ฐฐ์น˜์˜ ๋ฐ์ดํ„ฐ๋งŒ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•จ์„ ์˜๋ฏธselect, filter, map, fla..

๋ฐ์ดํ„ฐ/Spark 2024. 4. 16. 22:24
[Airflow] Nifi Trigger Operator ๋งŒ๋“ค๊ธฐ

airflow์—์„œ nifi processor๋กœ trigger๋ฅผ ๋ณด๋‚ด์•ผ ํ–ˆ๋‹ค. ์‚ฌ์‹ค nifi์—์„œ restAPI๋ฅผ ์ง€์›ํ•ด์ฃผ๊ธฐ ๋•Œ๋ฌธ์— NIFI ETL ์ž์ฒด๋ฅผ ์‹œ์ž‘ํ•˜๋Š” ๊ฒƒ์€ ํฐ ๋ฌธ์ œ๊ฐ€ ์•ˆ๋์ง€๋งŒ nifi ์ž‘์—… ์ดํ›„์— airflow์—์„œ ํ›„์† task๊ฐ€ ์žˆ๋‹ค๋ฉด ETL์ด ์–ด๋Š ์‹œ์ ์— ๋๋‚˜๋Š” ์ง€๋ฅผ ํ™•์ธํ•˜๋Š” ๊ฒŒ ์–ด๋ ค์› ๋‹ค. NIFI๋Š” processor ๋‹จ์œ„๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋ฉฐ, flowfile์ด processor์— ์ธํ’‹๋˜๋ฉด ๋ฐ”๋กœ ์ฒ˜๋ฆฌ ํ›„ output stream์œผ๋กœ ๋‚ด๋ณด๋‚ด๊ธฐ ๋•Œ๋ฌธ์— ์ข…๋ฃŒ ์‹œ์ ์„ ๋ช…ํ™•ํžˆ ์•Œ ์ˆ˜๊ฐ€ ์—†์—ˆ๋‹ค. ์•„๋ž˜ ๋ธ”๋กœ๊ทธ์—์„œ airflow์—์„œ nifi ๋ฅผ ํŠธ๋ฆฌ๊ฑฐํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์„ ์†Œ๊ฐœํ–ˆ์ง€๋งŒ ์‹œ์ž‘ ๋…ธ๋“œ๋ฅผ `GenerateFlowFile`๋กœ ์„ค์ •ํ•ด์•ผ ํ•˜๋Š” ๋“ฑ ์ œ์•ฝ์กฐ๊ฑด์ด ์žˆ์—ˆ๋‹ค. https://towardsdatasci..

๋ฐ์ดํ„ฐ 2024. 4. 4. 00:26
[Spark] Spark Structured Streaming - Fault Tolerance

Background ์‹ค์‹œ๊ฐ„ stream ์ฒ˜๋ฆฌ๋Š” ์ง€์†์ ์ธ input stream์˜ ํŠน์„ฑ ์ƒ ์ค‘๋‹จ๋˜์ง€ ์•Š๊ณ  24์‹œ๊ฐ„ ์‹คํ–‰๋˜๋ฏ€๋กœ ์—ฌ๋Ÿฌ ์˜ค๋ฅ˜ ์›์ธ์œผ๋กœ failure์ด ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋‹ค. input stream์ด ์ž‘์„ฑ๋œ ์ฝ”๋“œ๋กœ ์ฒ˜๋ฆฌ๋  ์ˆ˜ ์—†๋Š” ๊ฒฝ์šฐ (์ž˜๋ชป๋œ ํ˜•์‹์˜ ๋ฐ์ดํ„ฐ) ์‹œ์Šคํ…œ/cluster ์˜ค๋ฅ˜์˜ ๊ฒฝ์šฐ Spark Streaming์—์„œ์˜ Fault Tolerance ์ •์˜ ์ŠคํŒŒํฌ์˜ ๋ชฉํ‘œ๋Š” end-to-end Exactly Once ๋ณด์žฅ์ด๋‹ค. (์ฐธ๊ณ ) At Most Once: ๊ฐ ๋ ˆ์ฝ”๋“œ๋Š” ํ•œ ๋ฒˆ๋งŒ ์ฒ˜๋ฆฌ๋˜๊ฑฐ๋‚˜ ์•„์˜ˆ ์ฒ˜๋ฆฌ๋˜์ง€ ์•Š๋Š”๋‹ค. At Least Once: ๊ฐ ๋ ˆ์ฝ”๋“œ๊ฐ€ ํ•œ ๋ฒˆ ์ด์ƒ ์ฒ˜๋ฆฌ๋œ๋‹ค. ๋ฐ์ดํ„ฐ๊ฐ€ ์†์‹ค๋˜์ง€ ์•Š๋„๋ก ๋ณด์žฅํ•˜๋ฏ€๋กœ At Most Once๋ณด๋‹ค ๊ฐ•๋ ฅํ•˜์ง€๋งŒ ์ค‘๋ณต์ด ์žˆ์„ ์ˆ˜ ์žˆ๋‹ค. Exactly Once: ๊ฐ ..

๋ฐ์ดํ„ฐ/Spark 2024. 4. 4. 00:25