Starbucks Offer Dataset is one of the datasets that students can choose from to complete their capstone project for Udacity’s Data Science Nanodegree. The dataset contains simulated data that mimics customers' behavior after they received Starbucks offers. The data is collected via Starbucks rewards mobile apps and the offers were sent out once every few days to the users of the mobile app.
The data file contains 3 different JSON files.
*File descriptions provided by Udacity*portfolio.json — containing offer ids and meta data about each offer (duration, type, etc.)profile.json — demographic data for each customertranscript.json — records…
— The Fictional Conversation Starts Here —
“Can we see things?”
“Yes, we see things with our eyes.”
“Do you think computers can see things as we do?”
“Well… with image recognition, they can recognize things, but I don’t know if you can say that they see as we humans do.”
“What makes the difference?”
“In computer vision, what computers actually see are the combination of numbers from 0 to 1, but for us, we see the actual objects.”
“But our eyes don’t see depth, our retinal only reflects a two-dimensional image. …
As a newbie, API has always been mystical to me. I have read multiple definitions and watched multiple Youtube videos, but it was still abstract.
The common explanation is always something like: “a software intermediary that allows two applications to talk to each other.” or “we can send a request to it to receive the information that…” and etc…
but still, is it a physical thing? a package that I can import and use? or something similar to a programming language? I know, it allows applications to talk to each other, but, WHAT IS THAT THING?
If you have the…
Here are the commonly used queries which you would probably need every time you have a dataset, but they are nitty-gritty enough that you still need to google them to make sure of the syntax…So I summarized them here as a cheat sheet for myself.
>> data.rename(columns={'x':'new_x'}, inplace=True)If you only have a few columns, you can also: >> last_rating.columns = ['column1/2', 'column2/2']
2. Remove “[]” “()” and etc…
>> print (df)
value
0 [1]
1 [2]
2 [3]>> df['value'] = df['value'].str.strip('[]').astype(int)
3. Insert a column
>> df = pd.DataFrame({ …
I am sure some of you have seen this image below. You probably have also heard headlines like “Tech companies stoped selling Facial Recognition because the models are inherently biased”. “Police offices stop using Predictive Policing because the model is inherently biased” and etc…
There are lots of articles out there teaching you how to write queries. Knowing how to write a single query is not really the hard part. The hard part is that there are always multiple options of approaches and all of them require a combination of queries. To put questions into categories, it helps us to identify patterns and build better instincts on what kinds of queries we can use.
There are basically 3 types of SQL questions. The three types of questions are very simple in their original form. However, they can be leveled up by mix and match…
I used an open dataset from Kaggle to explore how doing a churn analysis using machine learning models can not only save real money for the company but can also provide insights for strategic decision making. Here are the three business questions I would like to talk about:
Churn is hard to predict for two reasons. One is because only a minority of customers will churn. For example, in this dataset, only 14.5%…
If your data-science online course syllabus looks like this, then you will find this blog helpful.
SQL, Python → Stats, Math → Machine Learning
My first data science course syllabus also looked just like the one above. After extensive job searches and talking to many people, I realized those skills are not enough for me to actually start working as a data scientist. They lack some connections to reality.
So I started Udacity’s Data Science Nanodegree. It is a lot more comprehensive than an online course. …
No, I am not saying that my childhood was like
a princess living in a palace.
Prince and princess are fairytales even for children.
I am talking about a time when I was taught,
if you were courageous, honest, and hardworking,
you could achieve anything in your life, like everybody else.
I am talking about a time when
bad behaviors always get punished
so that they can improve;
good behaviors always get rewarded
so that they can be set as examples.
I am talking about a time when
death, death was always due to natural causes or by some bad…
In my previous post (see below), I used an analogy of people voting to show the difference between a weighted Random Forest and boosting algorithms. To recap, the rule of a weighted Random Forest is, everyone who voted result A will be counted more. The rule of boosting algorithms is, whoever that’s more qualified can cast more votes. And if result A won more votes then A it is.
This piece will introduce how each boosting algorithm decides the amounts of votes each tree has. By understanding how the algorithms work, we will understand the difference in performance shown in…
Share what I learned, and learn from what I shared. All about machines, humans, and the links between them. Take everything with a grain of salt.