🐥

[Spark] TaskMemoryManager: Failed to allocate a page (2097152 bytes), try again. 오류 발생 시 해결 방법

뮨제상황 dataframe 두 개를 조인하려는데 아래와 같은 워닝이 주르륵 발생하더니 오류를 출력하고 어플리케이션이 종료됐다. ... 23/05/22 04:39:35 WARN TaskMemoryManager: Failed to allocate a page (2097152 bytes), try again. 23/05/22 04:39:35 WARN TaskMemoryManager: Failed to allocate a page (2097152 bytes), try again. 23/05/22 04:39:35 WARN TaskMemoryManager: Failed to allocate a page (2097152 bytes), try again. 23/05/22 04:39:36 WARN TaskMemoryMa..

format_list_bulleted 데이터/Spark
· 2023. 5. 22.

parquet-cli를 통해 parquet 파일의 여러가지 정보 확인 (metadata, schema 등)

parquet-tools 를 써도 되는데 parquet-cli가 좀 더 가벼워서 parquet-cli를 설치했다. (스키마만 확인하면 되는 사람...) (env) [testuser@test-server-1 ~]$ pip install parquet-cli Collecting parquet-cli Using cached parquet_cli-1.3-py2.py3-none-any.whl (3.6 kB) Collecting pyarrow>=0.9.0.post1 Using cached pyarrow-6.0.1-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (25.6 MB) Collecting pandas>=0.22.0 Using cached pandas-..

format_list_bulleted linux
· 2023. 5. 16.

[Python] 가상환경 만들기 (virtualenv 설치, 생성, 활성화, 제거)

virtualenv vs venv venv는 표준 라이브러리라 설치 불필요하지만 virtualenv는 외부 라이브러리로 별도 설치 필요 속도: virtualenv가 더 빠름 확장성: virtualenv가 더 좋음 virtualenv는 python 버전이 다른 가상환경을 만들 수 있지만 venv는 불가능 virtualenv는 pip 업그레이드 가능하나 venv는 불가능 → virtualenv가 더 좋아보여서 virtualenv를 사용하기로 했다. 1. virtualenv 설치 [testuser@test-server-1 ~]$ python3 -m pip install --user -U virtualenv Collecting virtualenv Downloading https://files.pythonhost..

format_list_bulleted linux
· 2023. 5. 16.

[Spark] CSV 파일 로드하기

data_3 = hc.read.\ csv('/my/path/partition={PARTITION}/*'.format(PARTITION=my_partition), header=False, schema=customSchema) # header가 있을 경우 schema는 따로 없어도 됨 (header=True)

format_list_bulleted 데이터/Spark
· 2023. 4. 25.

[Hive] set hive.msck.repair.batch.size 명령어 사용 불가 조치

set hive.msck.repair.batch.size=1; set hive.msck.path.validation=ignore; Hive 에서 위 명령어 사용 시에 아래와 같은 오류가 발생했다. Error: Error while processing statement: Cannot modify hive.msck.path.validation at runtime. It is not in list of params that are allowed to be modified at runtime (state=42000,code=1) hive-site에 아래 설정값 추가 후 정상 동작 확인함 key: hive.security.authorization.sqlstd.confwhitelist.append value: h..

format_list_bulleted 데이터/하둡
· 2023. 4. 21.

TRINO -> Hive metastore 사용 시 HIVE_METASTORE_ERROR 오류 조치

io.trino.spi.TrinoException: testserver-1:9083: java.net.SocketTimeoutException: Read timed out at io.trino.plugin.hive.metastore.thrift.ThriftHiveMetastore.getPartitionNamesByFilter(ThriftHiveMetastore.java:1080) at io.trino.plugin.hive.metastore.thrift.BridgingHiveMetastore.getPartitionNamesByFilter(BridgingHiveMetastore.java:335) at io.trino.plugin.hive.metastore.ForwardingHiveMetastore.getPa..

format_list_bulleted 데이터
· 2023. 4. 21.