Photo by Karolien Brughmans on Unsplash

Methods, models, tools and business solutions

I wrote an article a while ago about econometrics (Econometrics 101 for Data Scientists). The article resonated well with readers, but that was a kind of introductory article for data science people who might not be otherwise familiar with the domain.

What is econometrics

Econometrics is a sub-domain of economics that applies mathematical and statistical models with economic theories to understand, explain…

Photo by Carolina Sánchez on Unsplash

L1 and L2 regularization in LASSO, Ridge & ElasticNet regression

You are not alone if you had a hard time understanding what exactly Regularization is and how it works. Regularization can be a very confusing term and I’m attempting to clear up some of that in this article.

What is the problem?

Data scientists take great care during the modeling process to make sure their models work well and they are neither under- nor overfit.

Photo by Emily Morter on Unsplash

Intuition and use cases of Gaussian, Binomial and Poisson distribution

As a data scientist if you are asked to find the average income of customers, how’d you do that? Having ALL customer data is of course “good to have”, but in reality, it never exists nor feasible to collect.

Photo by Lex Aliviado on Unsplash

From simple intuition to complex model building process

Logistic regression is amongst the most popular algorithms used to solve classification problems in machine learning. It uses a logistic function to model how different input variables impact the probability of binary outcomes. The technique is quite convoluted as described in the available literature. The purpose of writing this article is to describe the model in simple terms, primarily focusing on building an intuition by avoiding complex mathematical formulation as much as possible.

Photo by Bernard Hermant on Unsplash

Methods, functions and use cases of Python lists

After writing a few pieces on topics like econometrics, logistic regression and regularization — I’m back to the basics!

Photo by Anna Kolosyuk on Unsplash

Should you bypass Matplotlib?

I had this thought a while back — is learning Matplotlib essential for beginners? Or can they get away with Seaborn? That thought came back recently while mentoring a cohort of data science students.

If you want to get into the field of data science and machine learning here’s your pathway:

Step 1

Learn Data Science 101. Focus on what business problems they solve, what are different sub-disciplines etc. Follow people in data science and see what they say. Maybe listen to some data science podcasts.

Step 2

Pick a language you are comfortable with. I recommend Python, for various reasons I won’t go into today.

Step 3

Get a superficial understanding of common data science and machine learning algorithms and how they work, what business problems they solve:

  • Descriptive statistics
  • Linear regression
  • Logistic regression
  • Nearest Neighbors
  • Decision-trees
  • Clustering
  • Time…

Photo by Gustavo Torres on Unsplash

Choosing between a bad and a highly tuned model

General intuition

If we had more money in our pockets, we tend to spend more — that’s almost a fact that everyone knows. But what’s often not known is the exact relationship between income and expenditure, i.e. how much people would spend on a known income.

Looping through lists, tuples, dictionaries and strings

Photo by Bonneval Sebastien on Unsplash

In programming, loop is a logical structure that repeats a sequence of instructions until certain conditions are met. Looping allows for repeating the same set of tasks on every item in an iterable object, until all items are exhausted or a looping condition is reached.

Mahbubul Alam

Data scientist, economist. Twitter @DataEnthus /

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store