Photo by Karolien Brughmans on Unsplash

I wrote an article a while ago about econometrics (Econometrics 101 for Data Scientists). The article resonated well with readers, but that was a kind of introductory article for data science people who might not be otherwise familiar with the domain.

Inspired by the response to that article, today I’m…

Photo by Carolina Sánchez on Unsplash

You are not alone if you had a hard time understanding what exactly Regularization is and how it works. Regularization can be a very confusing term and I’m attempting to clear up some of that in this article.

In this article I’ll do three things: (a) define the problem that…

Photo by Emily Morter on Unsplash

As a data scientist if you are asked to find the average income of customers, how’d you do that? Having ALL customer data is of course “good to have”, but in reality, it never exists nor feasible to collect.

Instead, you get a small sample, take measurements on it and…

Photo by Lex Aliviado on Unsplash

Logistic regression is amongst the most popular algorithms used to solve classification problems in machine learning. It uses a logistic function to model how different input variables impact the probability of binary outcomes. The technique is quite convoluted as described in the available literature. …

Photo by Bernard Hermant on Unsplash

After writing a few pieces on topics like econometrics, logistic regression and regularization — I’m back to the basics!

Many sophisticated data science algorithms are built with simple building blocks. How quickly you will level up your skills largely depends on how strong is your foundation. In the next few…

Photo by Kelly Sikkema on Unsplash

Data scientists and analysts spend a significant amount of their time in data cleaning or pre-processing. People working with unstructured data know exactly what messy data looks like. This type of data has one or more of the following: missing entries, incorrect data, wrong data types, extreme values, unexpected symbols…

Photo by David Clode on Unsplash

Some people say feature selection and engineering is the most important part of data science projects. In many cases it’s not sophisticated algorithms, rather it’s feature selection that makes all the difference in model performance.

Too few features can under-fit a model. For example, if you want to predict house…

Photo by Håkon Grimstad on Unsplash

Feature engineering is the process of transforming data to extract valuable information. In fact, if appropriately transformed, feature engineering can play even a bigger role in model performance than hyperparameter tuning.

Despite its huge role, feature engineering is often not well understood, and sometimes misunderstood, by beginner and experienced data…

Photo by Jeremy Allouche on Unsplash

What is the one machine learning algorithm — if you ask — that consistently gives superior performance in regression and classification?

XGBoost it is. It is arguably the most powerful algorithm and is increasingly being used in all industries and in all problem domains —from customer analytics and sales prediction…

Photo by Haley Powers on Unsplash

A typical data science project starts with data wrangling. It is the process of cleaning messy data and transforming them into appropriate formats for further analysis and modeling.

The next step in the process is exploratory data analysis or EDA. This is where you spot hidden issues and anomalies in…

Mahbubul Alam

Data scientist, economist. Twitter @DataEnthus / www.linkedin.com/in/mab-alam/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store