Getting started with a Data Challenge: Thinking like an Analyst
Problem Scenario: Excited about a data analytics challenge, you signed up and ready to put your analytics skills into good use. But halfway through the process, you blank out. You have no idea how to proceed with the challenge and how to position your analysis geared towards the challenge. For most data enthusiasts, this poses a major concern.
Here i outline a basic method i do follow when faced with a data challenge.
NB: I will be using the Supermarket sales dataset found on kaggle for this example.
1) Data collection: Every data analytics project begins with data. The first step at this stage is to get data. Most dataset are shared in either excel or csv format, but in some cases also you would be asked to generate your own data which can be done through web scraping
2) Data Exploration: Exploring and understanding the dataset given to us. At this point we start asking our biz questions. I believe this is where most of us are finding it difficult What metrics should I report that will be relevant.There are different ways to think about this
There isnt a laid down procedure.
So I will use the procedure that I usually use for my own data analysis projects. In this supermarket scenario, as the data analyst, what kind of questions would I find valuable from this dataset
The total number of transactions from January - March
The proportion of male customers to female customers
The different retail outlets of the supermarket. In this case, 3.
products sold in the shop
what are the different channels of making payments
These are some of the things that are of interest to you, the data analyst
However, it is from these questions that you can start thinking of what the manager might be interested in.
Think of this as an FBI investigator
You are an investigator actually, but this time with data😉
You are trying to understand scenarios around a crime scene (the dataset)
How many customers do we have
How many transactions were done during this time period
What were the different channels of making payments
Which products do we have in stock in the supermarket during this time period
These are what will form the basis of your analysis, and these are what will automatically build into answers which will be presented to management.
Ask the right data questions in order to extract valuable insights.
3) Data Cleaning: Checking for inconsistent or inaccurate data
- Null values
- Missing dates
- Duplicate values
- Improper capitalization
- Unnecessary columns
Remember:
Our aim here is to make the dataset relevant to answer the questions we have already asked in step 2.
4) Visualization: These are what I will be showing on the dashboard to be created using Tableau, Power BI or any other tool of your choice. The metrics identified in stage 2, will now be transformed into charts and graphs.
5) Present Findings: After going through the above steps,its now time to present your findings to management or stakeholders. These are people who are interested in the outcome of your analysis and perhaps will be using your insights to make data driven decisions for the supermarket outlets.
In summary, these are the basic steps that should be followed when performing an analysis with data.
So the next time you are faced with a challenge, follow these simple steps, and in no time you would be generating insights from data which would be used to make better decisions.
Happy Learning