Hands-on Tutorials

An investigation of ‘wasted offers’

Photo by Karl Fredrickson on Unsplash

Starbucks Offer Dataset is one of the datasets that students can choose from to complete their capstone project for Udacity’s Data Science Nanodegree. The dataset contains simulated data that mimics customers' behavior after they received Starbucks offers. The data is collected via Starbucks rewards mobile apps and the offers were sent out once every few days to the users of the mobile app.

The data file contains 3 different JSON files.

*File descriptions provided by Udacity*portfolio.json — containing offer ids and meta data about each offer (duration, type, etc.)profile.json — demographic data for each customertranscript.json — records…

12-am conversations

A brief discussion on subjective experiences

Photo by Adam Jaime on Unsplash

— The Fictional Conversation Starts Here —

“Can we see things?”

“Yes, we see things with our eyes.”

“Do you think computers can see things as we do?”

“Well… with image recognition, they can recognize things, but I don’t know if you can say that they see as we humans do.”

“What makes the difference?”

“In computer vision, what computers actually see are the combination of numbers from 0 to 1, but for us, we see the actual objects.”

“But our eyes don’t see depth, our retinal only reflects a two-dimensional image. …

Hitchhiker’s Guide To The Digital World

Reflection on my first API project

Photo by Michael Browning on Unsplash

As a newbie, API has always been mystical to me. I have read multiple definitions and watched multiple Youtube videos, but it was still abstract.

The common explanation is always something like: “a software intermediary that allows two applications to talk to each other.” or “we can send a request to it to receive the information that…” and etc…

but still, is it a physical thing? a package that I can import and use? or something similar to a programming language? I know, it allows applications to talk to each other, but, WHAT IS THAT THING?

If you have the…

Hard Lessons, Real Tips

One-stop cheatsheet, for myself…

Photo by Angèle Kamp on Unsplash

Here are the commonly used queries which you would probably need every time you have a dataset, but they are nitty-gritty enough that you still need to google them to make sure of the syntax…So I summarized them here as a cheat sheet for myself.

  1. Rename column/columns
>> data.rename(columns={'x':'new_x'}, inplace=True)If you only have a few columns, you can also: >> last_rating.columns = ['column1/2', 'column2/2']

2. Remove “[]” “()” and etc…

>> print (df)
0 [1]
1 [2]
2 [3]
>> df['value'] = df['value'].str.strip('[]').astype(int)

3. Insert a column

>> df = pd.DataFrame({ …

12-am conversations

Beyond blaming the data…

A piece of sour news for the tech community don’t you think? (Photo by Ralph Mayhew on Unsplash)

I am sure some of you have seen this image below. You probably have also heard headlines like “Tech companies stoped selling Facial Recognition because the models are inherently biased”. “Police offices stop using Predictive Policing because the model is inherently biased” and etc…

Hard Lessons, Real Tips

There’re only 3 types of questions. They are…

Photo by Sebastian Coman Photography on Unsplash

There are lots of articles out there teaching you how to write queries. Knowing how to write a single query is not really the hard part. The hard part is that there are always multiple options of approaches and all of them require a combination of queries. To put questions into categories, it helps us to identify patterns and build better instincts on what kinds of queries we can use.

There are basically 3 types of SQL questions. The three types of questions are very simple in their original form. However, they can be leveled up by mix and match…

I can not only predict who is leaving but also…

Photo by Debby Hudson on Unsplash

I used an open dataset from Kaggle to explore how doing a churn analysis using machine learning models can not only save real money for the company but can also provide insights for strategic decision making. Here are the three business questions I would like to talk about:

  1. What is the value of the model to a business?
  2. What types of customers are most likely to churn and why?
  3. What can we do about it?

Churn is hard to predict for two reasons. One is because only a minority of customers will churn. For example, in this dataset, only 14.5%…

Hard Lessons, Real Tips

If your data-science online course syllabus looks like…

Photo by Tiard Schulz on Unsplash

If your data-science online course syllabus looks like this, then you will find this blog helpful.

SQL, Python → Stats, Math → Machine Learning

My first data science course syllabus also looked just like the one above. After extensive job searches and talking to many people, I realized those skills are not enough for me to actually start working as a data scientist. They lack some connections to reality.

So I started Udacity’s Data Science Nanodegree. It is a lot more comprehensive than an online course. …

12-am conversations

Something not about data science. For George Floyd.

Photo by Jakub Sofranko on Unsplash

No, I am not saying that my childhood was like

a princess living in a palace.

Prince and princess are fairytales even for children.

I am talking about a time when I was taught,

if you were courageous, honest, and hardworking,

you could achieve anything in your life, like everybody else.

I am talking about a time when

bad behaviors always get punished

so that they can improve;

good behaviors always get rewarded

so that they can be set as examples.

I am talking about a time when

death, death was always due to natural causes or by some bad…

Machine Learning With No Jargon

Mission impossible? Mission accomplished…

Image by Wasserstrom via Google Image

In my previous post (see below), I used an analogy of people voting to show the difference between a weighted Random Forest and boosting algorithms. To recap, the rule of a weighted Random Forest is, everyone who voted result A will be counted more. The rule of boosting algorithms is, whoever that’s more qualified can cast more votes. And if result A won more votes then A it is.

This piece will introduce how each boosting algorithm decides the amounts of votes each tree has. By understanding how the algorithms work, we will understand the difference in performance shown in…

Linda Chen

Share what I learned, and learn from what I shared. All about machines, humans, and the links between them. Take everything with a grain of salt.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store