Recently i moved in a new apartment and i had no decoration, and i didn’t want to just use random posters from Amazon, i wanted to create something myself. The only issue is, i am not the most artistic person out there .

So this got me thinking on how to use my knowledge on machine learning to decorate my apartment hence Neural Style Transfer and as a datascientist and and general art lover this is like the best mix of both worlds.

Principle :

Deep Learning Neural Style Transfer consist on capturing the content of one image and combine it with…


Having too much data is problem every data practitioner wishes of having but it’s still a problem nonetheless.

A single solution might not work to all requirements and so different solutions can work in different scenarios. This notebook aims to describe and introduce some of these techniques. We are going to use The Riiid! Answer Correctness Prediction dataset in our experiments on since it has over 100 million rows and 10 columns and should result in an out-of-memory error using the basic pd.read_csv .

Different packages have their own way of reading data. …


Source

Google has digitized books ang Google earth is using NLP to identify addresses, but how does it work exactly?

Deep learning approaches like neural networks can be used to combine the tasks of localizing text (Text detection) in an image along with understanding what the text is (Text recognition).

Unstructured Text

Text at random places in a natural scene. Sparse text, no proper row structure, complex background , at random place in the image and no standard font.


https://www.dezyre.com/article/spark-mllib-for-scalable-machine-learning-with-spark/339

MLlib is Spark’s machine learning (ML) library. Its goal is to make practical machine learning scalable and easy. At a high level, it provides tools such as:

  • ML Algorithms: common learning algorithms such as classification, regression, clustering, and collaborative filtering
  • Featurization: feature extraction, transformation, dimensionality reduction, and selection
  • Pipelines: tools for constructing, evaluating, and tuning ML Pipelines
  • Persistence: saving and load algorithms, models, and Pipelines
  • Utilities: linear algebra, statistics, data handling, etc.

Prerequisites :

Before we begin, please set up the Python and Apache Spark environment on your machine. …


Apache Spark is a must for Big data’s lovers. In a few words, Spark is a fast and powerful framework that provides an API to perform massive distributed processing over resilient sets of data.

Prerequisites :

Before we begin, please set up the Python and Apache Spark environment on your machine. Head over to this blog here to install if you have not done so.

For our dataset, we will use the KDD Cup 1999 competition dataset is described in detail here. The results of this competition can be found here.

Resilient Distributed Datasets (RDD) :


Optical Character Recognition (OCR) is the conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a photo from a scene (billboards in a landscape photo) or from a text superimposed on an image (subtitles on a television broadcast).

OCR consists generally of sub-processes to perform as accurately as possible.


Sentiment Analysis is a common NLP task that Data Scientists need to perform. This is a straightforward guide to learn the basics of NLP and to create a basic movie review classifier in Python.

Prerequisites :

Before we begin, please set up the Python environment on your machine. Head over to their official page here to install if you have not done so.

You have to install packages like NLTK and Scikit-learn . You can install both of them using pip installer if it is not installed in your Python installation.

NLP Pyramid :

To get a large overview of text classification and NLP in…


Whether you are working on data science, machine learning projects, you are probably going to need to extract data from the web in your line of work. So how do we do to actually pull stuff out from the web?

In this article, we’re going to see the basics that you’re going to need to access and to extract automatically data from the web using Python.

Prerequisites:

Before we begin, please set up the Python environment on your machine. Head over to their official page here to install if you have not done so.

We will also be installing Beautiful Soup

Jaafar Benabderrazak

Committed lifelong learner. I am passionate about machine learning, data engineering and currently working as a datascientist.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store