Reading and writing large datasets.

Having too much data is problem every data practitioner wishes of having but it’s still a problem nonetheless.

A single solution might not work to all requirements and so different solutions can work in different scenarios. This notebook aims to describe and introduce some of these techniques. We are going to use The Riiid! Answer Correctness Prediction dataset in our experiments on since it has over 100 million rows and 10 columns and should result in an out-of-memory error using the basic pd.read_csv .

--

--

Jaafar Benabderrazak (Human/Not A Robot)

Committed lifelong learner. I am passionate about machine learning, MLOps and currently working as a Machine Learning Engineer.