I wrote an article a while ago about econometrics (Econometrics 101 for Data Scientists). The article resonated well with readers, but that was a kind of introductory article for data science people who might not be otherwise familiar with the domain.
Inspired by the response to that article, today I’m attempting to take it to the next level by making it a bit comprehensive. I’ll mostly focus on the methods, tools, and techniques used in econometrics that data scientists will benefit from.
Econometrics is a sub-domain of economics that applies mathematical and statistical models with economic theories to understand, explain…
You are not alone if you had a hard time understanding what exactly Regularization is and how it works. Regularization can be a very confusing term and I’m attempting to clear up some of that in this article.
In this article I’ll do three things: (a) define the problem that we want to tackle with regularization; then (b) examine how exactly regularization helps; and finally (c) explain how regularization works in action.
Data scientists take great care during the modeling process to make sure their models work well and they are neither under- nor overfit.
Let’s say you want to…
As a data scientist if you are asked to find the average income of customers, how’d you do that? Having ALL customer data is of course “good to have”, but in reality, it never exists nor feasible to collect.
Instead, you get a small sample, take measurements on it and make predictions about the whole population. But how confident are you that your sample statistics represent the population?
Statistical distribution plays an important role in measuring such uncertainties and giving you that confidence. Simply speaking, probability distribution is a function that describes the likelihood of a specific outcome (value) of…
Logistic regression is amongst the most popular algorithms used to solve classification problems in machine learning. It uses a logistic function to model how different input variables impact the probability of binary outcomes. The technique is quite convoluted as described in the available literature. The purpose of writing this article is to describe the model in simple terms, primarily focusing on building an intuition by avoiding complex mathematical formulation as much as possible.
I will start with a linear regression problem — which is relatively easy to understand — and build on that to get to logistic regression. Towards the…
Many sophisticated data science algorithms are built with simple building blocks. How quickly you will level up your skills largely depends on how strong is your foundation. In the next few articles, I’ll touch upon a few such foundational topics. Hopefully, learning those topics will make your journey a pleasant and fun experience.
Today’s topic is Python lists.
Most people would learn tools first and then practice them with a few examples. I take the opposite route — focus on problems…
If we had more money in our pockets, we tend to spend more — that’s almost a fact that everyone knows. But what’s often not known is the exact relationship between income and expenditure, i.e. how much people would spend on a known income.
An approximate solution is to build a statistical model by observing people’s income and expenditure. The more data there is, the better the model. We can then take this model and apply it to an unknown place or population with reasonable confidence.
But the model wouldn’t be able to make a 100% accurate prediction, because people’s…
In programming, loop is a logical structure that repeats a sequence of instructions until certain conditions are met. Looping allows for repeating the same set of tasks on every item in an iterable object, until all items are exhausted or a looping condition is reached.
Looping is applied to iterables —objects that store a sequence of values in specific data formats such as dictionaries. The beauty of loops is that you write the program once and use it on as many elements as needed.
The purpose of this article is to implement some intermediate looping challenges applied to four Python…
In programming, looping means repeating the same set of computations in the same sequence for a number of times.
Think about a real-life situation. You are a field biologist who’s taking measurements of trees in the forest. You pick a tree, measure its diameter and height, write them down in your notebook and make an estimate of its total volume.
Next, you pick another tree, measure its diameter and height, write them down in your notebook and make an estimate of its total volume.
Then, you pick yet another tree, measure its diameter and height, write them down in your…
Dictionaries in Python are a collection of key-value pairs — meaning every item in the dictionary has a key and an associated value.
If we want to write down prices of some items in a grocery store, normally we will note them on a piece of paper like this:
eggs - 4.99
banana - 1.49
eggplant - 2.5
bread - 3.99
In Python dictionary lingo, the name of each item is “key” and the associated price is “value” and they appear in pairs. We can represent the same in a Python dictionary data structure as follows: