Introduction to Apache Spark RDDs using Python

Jaafar Benabderrazak (Human/Not A Robot)

6 min readMar 19, 2020

Apache Spark is a must for Big data’s lovers. In a few words, Spark is a fast and powerful framework that provides an API to perform massive distributed processing over resilient sets of data.

Prerequisites :

Before we begin, please set up the Python and Apache Spark environment on your machine. Head over to this blog here to install if you have…

Introduction to Apache Spark RDDs using Python

Prerequisites :

Written by Jaafar Benabderrazak (Human/Not A Robot)