Photo by Emily Morter on Unsplash

As a data scientist if you are asked to find the average income of customers, how’d you do that? Having ALL customer data is of course “good to have”, but in reality, it never exists nor feasible to collect.

Instead, you get a small sample, take measurements on it and…

Photo by Lex Aliviado on Unsplash

Logistic regression is amongst the most popular algorithms used to solve classification problems in machine learning. It uses a logistic function to model how different input variables impact the probability of binary outcomes. The technique is quite convoluted as described in the available literature. …

Photo by Bernard Hermant on Unsplash

After writing a few pieces on topics like econometrics, logistic regression and regularization — I’m back to the basics!

Many sophisticated data science algorithms are built with simple building blocks. How quickly you will level up your skills largely depends on how strong is your foundation. In the next few…

Photo by Museums Victoria on Unsplash

The first industrial revolution — powered by steam engines — led to the transition into new manufacturing processes that changed the world. The second industrial revolution saw accelerated production in iron, steel, chemicals and communication networks. The third revolution automated the manufacturing process with advanced tools and technologies.

The fourth…

Photo by Kelly Sikkema on Unsplash

Data scientists and analysts spend a significant amount of their time in data cleaning or pre-processing. People working with unstructured data know exactly what messy data looks like. This type of data has one or more of the following: missing entries, incorrect data, wrong data types, extreme values, unexpected symbols…

Photo by David Clode on Unsplash

Some people say feature selection and engineering is the most important part of data science projects. In many cases it’s not sophisticated algorithms, rather it’s feature selection that makes all the difference in model performance.

Too few features can under-fit a model. For example, if you want to predict house…

Photo by Håkon Grimstad on Unsplash

Feature engineering is the process of transforming data to extract valuable information. In fact, if appropriately transformed, feature engineering can play even a bigger role in model performance than hyperparameter tuning.

Despite its huge role, feature engineering is often not well understood, and sometimes misunderstood, by beginner and experienced data…

Photo by Jeremy Allouche on Unsplash

What is the one machine learning algorithm — if you ask — that consistently gives superior performance in regression and classification?

XGBoost it is. It is arguably the most powerful algorithm and is increasingly being used in all industries and in all problem domains —from customer analytics and sales prediction…

Mahbubul Alam

Data scientist, economist. Twitter @DataEnthus /

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store