Data Analytics Basics (introduction) by Tomi Mester

Source - https://data36.com/data-analytics-basics-intro/?lipi=urn%3Ali%3Apage%3Ad_flagship3_feed%3BvyB5pIvjRTannwq1l93Wew%3D%3D

You might have heard, that Data Scientist ranked as the best job of 2017 in the USA (based on Glassdoor’s research). Recently many IT professionals started considering to move their career path towards Data Science or Data Analytics. University students are looking for data related internships – even if their major is not related to it. And even project/product managers want to learn the data analytics basics to bring better data-informed decisions. Are you interested to learn more about the basics of data analytics too? Then this article is for you! I’ll just summarize here the most fundamental topics for first timers.
Note: If you have questions/suggestions to expand this article, feel free to ask in the comment section below, and I’ll answer!

Why is Data Science/Analytics important?

data analytics basics - you know nothing
We know nothing either. Without data at least.
Have you ever had this experience: sitting on a meeting, arguing about an important decision, but each and every argument is based only on personal opinions and gut feeling? And if you asked “why”, the only answers you’d get:
  • “because we have done this at my previous company”
  • “because our competitor is doing this”
  • “because this is the best practice in our industry”
You could answer:
  • “Your previous company had a different customer base and solved a different problem. Why would we use the same strategy here?”
  • “If we don’t know, why we are doing things, why would we suppose, that our competitor does?”
  • “Our world changes faster than ever. There are no industry standards anymore – just trends, and if you are the one who can react the best and the fastest to these trends, then – and only then – you’ll win.”
Afterall, the only real answer for the “Why do you think this is the best strategy?” question is, that: “Because this is what the data suggests.” And an important way to learn what the data exactly suggests is to do data analyses.

What is Data Science?

Data Science is the combination of these three skills:
  1. Statistics / Mathematics skills
  2. Coding skills
  3. Domain Knowledge / Business Knowledge
data analytics basics - statitics coding business
Data Analytics Basics: Statstics + Coding + Business Thinking
To be a fully featured data professional, you have to be good at all three!
I think I don’t have to explain why Statistics is important. Data is about numbers – and when you are working with numbers, you have to be confident with statistical and mathematical concepts.
Coding skills are required, because the data you will work with is often hard-to-access, broken, messy, has missing values and so on. Fix these things in an Excel spreadsheet… well, not so easy. Coding will give you full flexibility, thus it’s a must-have skill if you are seriously thinking about getting familiar with the data analytics basics.
Domain knowledge and business thinking is a soft factor, but just as essential as statistics and coding. If you don’t have the business sense, you won’t be able to evaluate, whether your data project makes a difference or not!

Data Coding – What languages to learn?

Note: it’s possible that as a data analyst you are not coding at all, but using smart tools like Google Analytics, Heatmapping tools, A/B testing tools, etc. instead. Still, I strongly recommend to learn coding. In this article I’ve summarized the pros and cons:Data Coding vs. Smart Tools.
If you start to learn coding and the data analytics basics, I recommend to begin with any of these four languages:
  1. SQL
  2. Python
  3. Bash
  4. R
In fact this is the particular order, that I personally would advise everyone, who’s new to this field. Why? Let’s take a look at the languages one by one first:

SQL for data analytics basics

SQL is a super-simple query language. Well structured and easy-to-interpret. I think, that learning the basics of SQL for Data Analysis could happen in net ~15-20 hours (that contains a fair amount of practicing too). If you are interested, here’s a free 6-article tutorial series: SQL for Data Analysis ep#1.
Syntax example for SQL:
SELECT * FROM my_datatable WHERE something = 'my_value';

Python for data science

Python is easy-to-interpret and easy-to-learn as well, but much more complex, than SQL. Of course, that’s not the only difference between the two languages. I won’t go in details here, but let’s just say, that Python is better for certain data tasks, and SQL is better for others. When it comes to Python, it’s really good with scientific things, like predictive analytics and machine learning. Not an accident, that it’s one of the most widely used languages by data scientist. Learning the basics of Python can take a bit more time (~100 hours for reaching a solid, but not yet advanced level.) If you want to get started, here’s my Python for Data Science series: Python for Data Science ep#1.
Syntax example for Python:
new_variable = my_table[my_table.something == 'my_value']

Bash for data server operations

To be honest, if you build up a solid SQL and Python knowledge, that will be already good enough to kick off your data career. But if you are really into this, I can recommend to learn bash, because that will be the language that you will use to move data files, give user permissions, automate scripts, and other cool things – on your data server. Here’s my Data Analytics in Bash article series: Learn Bash for Data Analysis ep #1
Syntax example for Bash:
cat file.csv |grep 'my_value' > new_file.csv

R for data science

R is really similar to Python – just a little bit more challenging to learn. It’s originally developed by Statisticians to Statisticians, and as a consequence you can feel the twisted (but pragmatic) logic. R’s learning curve is steeper, but once you have learned it, you will see that it’s the most advanced language to do complex statistical tasks.
Note: to be honest I barely use R, because Python serves all of my needs in my data projects.
Syntax example for R:
new_variable = subset(my_table, something == 'my_value')

Data Analytics Basics Video Course

Maybe you have more questions about the details… Like:
  • Why do we have more than one data coding language?
  • What’s the difference between R, SQL and Python?
  • What’s the difference between a Data Analyst and a Data Scientist?
  • What do they do on their day to day job?
  • How can you learn about Business thinking?
data analytics basics - oreilly courseIf you want to get answers to all these questions (and more), check my short (but sweet), 1-hour O’Reilly video course: Data Science Fundamentals for Marketing and Business Professionals.

You May Also Like

0 comments