Data Engineering (3) 썸네일형 리스트형 [multiprocessing] What is Parallel Processing? 1. What is Parallel Processing In order to process a large dataset in a situation where available memory was limited, data had to be processed by splitting it into chunks. Parallel processing increases the speed of work by simultaneously processing chunks. A central processing unit(CPU) is hardware that processes comuputer operations. The old CPU was able to perform only one task in a single cor.. [Q&A] ELT vs ETL Question : ELT vs ETL, Terminologly wise, one does the load before the transformation and one does it after.. But this doesn't make much sense to me and why it's so important? Answer : It's important for cost and future proofing reasons. The reasons we had ETL before was because storage particulary cloud storage was expensive so we had to limit the data we were writing in our warehouse to keep c.. [pandas] Processing Dataframes in Chunks 1. What is Chunks? Even after optimizing the data type of the data frame and selecting the appropriate column, the size of the data set may not be suitable for memory. At this time, it is more efficients to process the entire data frame in Chunk units than to load it into memory. Only a portion of the entire row should be used in memory for a given time. In other words, we need to process tasks .. 이전 1 다음