Image for post
Image for post
Photo by Karolien Brughmans on Unsplash

Methods, models, tools and business solutions

I wrote an article a while ago about econometrics (Econometrics 101 for Data Scientists). The article resonated well with readers, but that was a kind of introductory article for data science people who might not be otherwise familiar with the domain.

Inspired by the response to that article, today I’m attempting to take it to the next level by making it a bit comprehensive. I’ll mostly focus on the methods, tools, and techniques used in econometrics that data scientists will benefit from.

What is econometrics

Econometrics is a sub-domain of economics that applies mathematical and statistical models with economic theories to understand, explain…

Image for post
Image for post
Photo by Carolina Sánchez on Unsplash

L1 and L2 regularization in LASSO, Ridge & ElasticNet regression

You are not alone if you had a hard time understanding what exactly Regularization is and how it works. Regularization can be a very confusing term and I’m attempting to clear up some of that in this article.

In this article I’ll do three things: (a) define the problem that we want to tackle with regularization; then (b) examine how exactly regularization helps; and finally (c) explain how regularization works in action.

What is the problem?

Data scientists take great care during the modeling process to make sure their models work well and they are neither under- nor overfit.

Let’s say you want to…

Image for post
Image for post
Photo by Emily Morter on Unsplash

Intuition and use cases of Gaussian, Binomial and Poisson distribution

As a data scientist if you are asked to find the average income of customers, how’d you do that? Having ALL customer data is of course “good to have”, but in reality, it never exists nor feasible to collect.

Instead, you get a small sample, take measurements on it and make predictions about the whole population. But how confident are you that your sample statistics represent the population?

Statistical distribution plays an important role in measuring such uncertainties and giving you that confidence. Simply speaking, probability distribution is a function that describes the likelihood of a specific outcome (value) of…

Image for post
Image for post
Photo by Lex Aliviado on Unsplash

From simple intuition to complex model building process

Logistic regression is amongst the most popular algorithms used to solve classification problems in machine learning. It uses a logistic function to model how different input variables impact the probability of binary outcomes. The technique is quite convoluted as described in the available literature. The purpose of writing this article is to describe the model in simple terms, primarily focusing on building an intuition by avoiding complex mathematical formulation as much as possible.

I will start with a linear regression problem — which is relatively easy to understand — and build on that to get to logistic regression. Towards the…

Image for post
Image for post
Photo by Bernard Hermant on Unsplash

Methods, functions and use cases of Python lists

After writing a few pieces on topics like econometrics, logistic regression and regularization — I’m back to the basics!

Many sophisticated data science algorithms are built with simple building blocks. How quickly you will level up your skills largely depends on how strong is your foundation. In the next few articles, I’ll touch upon a few such foundational topics. Hopefully, learning those topics will make your journey a pleasant and fun experience.

Today’s topic is Python lists.

Most people would learn tools first and then practice them with a few examples. I take the opposite route — focus on problems…

There are 3 so-called loop control keywords: break, continue and pass.

If a break statement is present in the loop, it terminates the loop when a condition is satisfied.

string = 'hello, there'for i in string:
if i == ',':

In the snippet above, we ask the program to exist as soon as it finds a comma in a string and executes the next statement (which is to print i).

Instead of breaking out of the loop, continue statement simply skips an iteration and continues to the next.

Let’s execute the…

Image for post
Image for post
Photo by Parker Gibbs on Unsplash

A short story

The story goes like this.

You are a data scientist and your wife is an entrepreneur. One beautiful morning she declared she’s going to start a new business — a used car business. You cheered on her idea and assured that you’ll provide initial analytics support until the business grows and she’s able to hire a full-time analyst.

A few days later, she goes to an auction to buy some used cars. And back at home, you’ve set up an inventory database on the computers. You wrote the line of code to create an empty dictionary to store car information…

Now that you’ve got a Python dictionary (you created it or got it from somewhere), how do you access its contents?

Let’s say we have a dictionary of fruit prices, where:

  • keys -> fruit names
  • values -> fruit prices
fruit_prices = {"apple": 2.50, "orange": 4.99, "banana": 0.59}

From this dictionary, you can access all the keys altogether:

fruit_prices.keys()>> dict_keys(['apple', 'orange', 'banana'])

You can also cast the keys into a list:

list(fruit_prices.keys())>> ['apple', 'orange', 'banana']

Similar to keys, you can access all the values altogether

fruit_prices.values()>> dict_values([2.5, 4.99, 0.59])

You can also cast the values into a list…

A convenient shorthand in place of for loop

List comprehension is a convenient shorthand syntax that returns a list based on an existing list. It does what a ‘for loop’ does on a list.

Let’s use ‘for loop’…

A typical machine learning workflow can be very exhaustive but before you even get closer to it there are a few fundamental prep works needed:

1 ) Data history: First thing even before importing data into the prototyping environment is to understand where the data is coming from. If there is any metadata associated with it. If this data was used by someone else and for what purpose. Having those info provides a useful context needed when you start to explore data.

2) Domain information: You might be an expert in the domain. Or maybe you are not. In either…

Data scientist, economist. Twitter @DataEnthus /

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store