๐Ÿฅ

parquet-cli๋ฅผ ํ†ตํ•ด parquet ํŒŒ์ผ์˜ ์—ฌ๋Ÿฌ๊ฐ€์ง€ ์ •๋ณด ํ™•์ธ (metadata, schema ๋“ฑ) ๋ณธ๋ฌธ

linux

parquet-cli๋ฅผ ํ†ตํ•ด parquet ํŒŒ์ผ์˜ ์—ฌ๋Ÿฌ๊ฐ€์ง€ ์ •๋ณด ํ™•์ธ (metadata, schema ๋“ฑ)

•8• 2023. 5. 16. 20:48

parquet-tools ๋ฅผ ์จ๋„ ๋˜๋Š”๋ฐ parquet-cli๊ฐ€ ์ข€ ๋” ๊ฐ€๋ฒผ์›Œ์„œ parquet-cli๋ฅผ ์„ค์น˜ํ–ˆ๋‹ค. (์Šคํ‚ค๋งˆ๋งŒ ํ™•์ธํ•˜๋ฉด ๋˜๋Š” ์‚ฌ๋žŒ...)

(env) [testuser@test-server-1 ~]$ pip install parquet-cli
Collecting parquet-cli
  Using cached parquet_cli-1.3-py2.py3-none-any.whl (3.6 kB)
Collecting pyarrow>=0.9.0.post1
  Using cached pyarrow-6.0.1-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (25.6 MB)
Collecting pandas>=0.22.0
  Using cached pandas-1.1.5-cp36-cp36m-manylinux1_x86_64.whl (9.5 MB)
Collecting pytz>=2017.2
  Using cached pytz-2023.3-py2.py3-none-any.whl (502 kB)
Collecting python-dateutil>=2.7.3
  Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
Collecting numpy>=1.15.4
  Using cached numpy-1.19.5-cp36-cp36m-manylinux2010_x86_64.whl (14.8 MB)
Collecting six>=1.5
  Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Installing collected packages: six, pytz, python-dateutil, numpy, pyarrow, pandas, parquet-cli
Successfully installed numpy-1.19.5 pandas-1.1.5 parquet-cli-1.3 pyarrow-6.0.1 python-dateutil-2.8.2 pytz-2023.3 six-1.16.0

 

๋ช…๋ น์–ด

parq filename.parquet            # view meta data
parq filename.parquet --schema   # view the schema
parq filename.parquet --head n   # view top n rows

 

์ฐธ์กฐ: https://stackoverflow.com/a/66524989/8578220