Introduction to Apache Spark RDDs using Python
6 min readMar 19, 2020
Apache Spark is a must for Big data’s lovers. In a few words, Spark is a fast and powerful framework that provides an API to perform massive distributed processing over resilient sets of data.
Prerequisites :
Before we begin, please set up the Python and Apache Spark environment on your machine. Head over to this blog here to install if you have…