๋ชฉ๋ก๋ฐ์ดํฐ (40)
๐ฅ
![](http://i1.daumcdn.net/thumb/C150x150.fwebp.q85/?fname=https://blog.kakaocdn.net/dn/5hHIU/btrNNzPggqc/raCuZQYfvdmGU7AMIOeX2K/img.png)
CREATE EXTERNAL TABLE some_table ( some columns... ) STORED AS SOME_FILE_FORMAT LOCATION 's3a://s3_bucket/my_s3_path/' ์์ ๊ฐ์ด s3 ๊ฒฝ๋ก๋ฅผ ์ฐธ์กฐํ๋ ํ ์ด๋ธ ์์ฑ ํ ์กฐํ ์ ์๋์ ์๋ฌ ๋ฐ์ํ๋ค. Disk I/O error on "my_server_info" Failed to open HDFS file s3a:"my_s3_file" Error(255): Unknown error 255 Root cause: ConnectionPoolTimeoutException: Timeout waiting for connection from pool https://docs.cloudera.com/documentation/en..
[parquet-tools] parquet-tools schema myfile.parquet --> ํ์ผ์ดํ์ผ ์คํค๋ง ์ถ๋ ฅ parquet-tools meta myfile.parquet --> ๋ฉํ๋ฐ์ดํฐ ์ถ๋ ฅ parquet-tools cat myfile.parquet --> ํ์ผ ๋ด์ฉ ์ถ๋ ฅ
yarn ์ ๋ค์ด๊ฐ์๋ ์ดํ๋ฆฌ์ผ์ด์ ์ด ํ๋๋ ์๋๊ณ ๋๊ธฐ์ค์ด์ด์ Resorce Manager ์ญํ ๋ก๊ทธ๋ฅผ ๋ดค๋๋ ์๋์ ๊ฐ์ ์๋ฌ๋ฉ์์ง ํ์ธ.. Error trying to assign container token and NM token to an updated container CONTAINER_NAME java.lang.IllegalArgumentException: java.net.UnknownHostException: HOST_NAME at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:445) at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerT..
sparkConf = SpartConf().setAppName("test") sc = SparkContext.getOrCreate(conf=spartConf) hc = HiveContext(sc) df = hc.read.option("basePath", '/Path-to-data/')\ .parquet('/Path-to-data/') /Path-to-data/partition1=x/partition2=y ๋๋ ํ ๋ฆฌ๊ฐ ์ด๋ฐ ๊ตฌ์กฐ๋ก ๋์ด์์ ๋ ์์ ๊ฐ์ด ๋ฐ์ดํฐ ๋ก๋ ์ basePath ์ต์ ์ ์ถ๊ฐํ๋ฉด ํํฐ์ ์ ๋ณด(์์ ์ฝ๋์์๋ partition1, partition2) ๊ฐ dataframe์ ์ปฌ๋ผ์ผ๋ก ๋ก๋๋๋ค.
![](http://i1.daumcdn.net/thumb/C150x150.fwebp.q85/?fname=https://blog.kakaocdn.net/dn/dYTEDl/btruTirw33G/TYbVmkjALykeUgnyvdP9R0/img.png)
pip ์ค์น: https://quackstudy.tistory.com/13?category=801005 1. impyla ๋ผ์ด๋ธ๋ฌ๋ฆฌ ์ฌ์ฉ #pip install impyla from impala.dbapi import connect HOST = "host_ip" PORT = 21050 #default conn = connect(host=HOST, port=PORT) cursor = conn.cursor() query = "select * from default.table1 where some condition" cursor.execute(query) conn.close() 2. pyodbc ๋ผ์ด๋ธ๋ฌ๋ฆฌ ์ฌ์ฉ 1) cloudera odbc driver for impala ์ค์น https://docs.info..