๐Ÿฅ

[Spark] GraphX ๋ณธ๋ฌธ

๋ฐ์ดํ„ฐ/Spark

[Spark] GraphX

•8• 2024. 3. 18. 23:37

graphX๋Š” ์ŠคํŒŒํฌ์—์„œ graph computation์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•œ Spark API์ด๋‹ค.

 

์ŠคํŒŒํฌ ๋ฌธ์„œ์— ์„ค๋ช…๋˜์–ด ์žˆ๋Š” GraphX์˜ ์žฅ์ ์€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

 

Flexibility

๋‹จ์ผ ์‹œ์Šคํ…œ ๋‚ด์—์„œ ETA, ๋ถ„์„, ๊ทธ๋ž˜ํ”„ ๊ณ„์‚ฐ์„ ํ†ตํ•ฉํ•  ์ˆ˜ ์žˆ๋‹ค. RDD/Dataframe์—์„œ ๊ทธ๋ž˜ํ”„๋กœ ํšจ์œจ์ ์ธ ๋ณ€ํ™˜์ด ๊ฐ€๋Šฅํ•˜๋ฉฐ, Pregel API๋กœ ์‚ฌ์šฉ์ž ์ •์˜ ๋ฐ˜๋ณต ๊ทธ๋ž˜ํ”„ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ž‘์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค.

Speed

์‹ค์ œ ๊ทธ๋ž˜ํ”„ ์ปดํ“จํŒ… ๋ชฉ์ ์œผ๋กœ ๋งŒ๋“ค์–ด์ง„ ์‹œ์Šคํ…œ (specialized graph processing system)๊ณผ ๋น„๊ตํ•˜์—ฌ ๋น„์Šทํ•œ ์„ฑ๋Šฅ์„ ๋ณด์ธ๋‹ค.

Algorithm

๋‹ค์–‘ํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ์žˆ๋‹ค.

  • PageRank
  • Connected components
  • Label propagation
  • SVD++
  • Strongly connected components
  • Triangle Count

 

GraphX๋Š” ์•„์‰ฝ๊ฒŒ๋„ Scala์—์„œ๋งŒ ์ง€์›ํ•˜๊ณ  ์žˆ๋‹ค. 

๋‹ค๋ฅธ ์–ธ์–ด๋กœ graphX๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์‹ถ๋‹ค๋ฉด GraphFrames ํŒจํ‚ค์ง€๋ฅผ ์ฃผ์ž…ํ•ด์„œ ์‚ฌ์šฉํ•˜๋ฉด ๋œ๋‹ค.