[Spark] 데이터 로드 시 partition 경로를 column으로 가져오기

Notice

Recent Posts

« 2024/12 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Tags more

Archives

관리 메뉴

🐥

[Spark] 데이터 로드 시 partition 경로를 column으로 가져오기 본문

데이터/Spark

[Spark] 데이터 로드 시 partition 경로를 column으로 가져오기

•8• 2022. 4. 27. 17:19

sparkConf = SpartConf().setAppName("test")
sc = SparkContext.getOrCreate(conf=spartConf)
hc = HiveContext(sc)

df = hc.read.option("basePath", '/Path-to-data/')\
	.parquet('/Path-to-data/')

/Path-to-data/partition1=x/partition2=y

디렉토리가 이런 구조로 되어있을 때 위와 같이 데이터 로드 시 basePath 옵션을 추가하면 파티션 정보(위의 코드에서는 partition1, partition2) 가 dataframe의 컬럼으로 로드된다.

'데이터 > Spark' 카테고리의 다른 글

[Spark] spark에서 s3 접근하기 (ACCESS_KEY, SECRET_KEY) (0)	2023.12.19
[Spark] s3 데이터 dataframe으로 로드하기 (0)	2023.06.01
[Spark] TaskMemoryManager: Failed to allocate a page (2097152 bytes), try again. 오류 발생 시 해결 방법 (0)	2023.05.22
[Spark] CSV 파일 로드하기 (0)	2023.04.25
[Spark]Parquet type not supported인 parquet file을 읽는 방법 - StructType을 사용해서 Custom Schema로 로드) (0)	2020.10.21

'데이터/Spark' Related Articles

🐥

[Spark] 데이터 로드 시 partition 경로를 column으로 가져오기 본문

[Spark] 데이터 로드 시 partition 경로를 column으로 가져오기

'데이터 > Spark' 카테고리의 다른 글

티스토리툴바