So You Want to Data Science?
Posted on October 29, 2015 by polong
Cognitive Class is a community initiative with a goal of democratizing access to knowledge and practical skills in Data Engineering and Big Data, Data Science, Artificial Intelligence, analytics, Cloud Native, DevOps, Blockchain and more. In this article Polong Lin, data scientist at IBM, recommends a progression of Cognitive Class courses you can take to learn Data Science.
Being a data scientist is like being a kitchen chef. Your manager gives you a load of raw ingredients – the data – and somehow you’re supposed to turn that into a beautiful, scrumptious dish – a statistical model or some app. How do you turn raw, messy data into a delicious, treasure trove of insight? Where do you begin?
Foundations of Data Science
First, what exactly does it mean to turn data into insight? With Introduction to Data Science, you learn what analyzing data is all about and why it’s one of the fastest growing industries today. Companies and researchers are absolutely hungry for what data scientists can cook up.
Now, as a data scientist, before you even set foot into your kitchen, you need to make sure you are properly prepared. You need to make sure you have sufficient skills and knowledge, the proper tools and equipment.
In data science, to understand the overarching strategy that guides a data scientist from data to a statistical model, you need to learn Data Science Methodology.
What does that mean? Well, like following a recipe, a kitchen chef would start with a particular goal in mind, collect the ingredients, clean and prepare the ingredients, cook the ingredients, perform a taste-test, and finally produce your result with a nice visual presentation — much like a data scientist would with data. Data Science Methodology is the course you take to jump into the mind of a data chef.
Now that you know how data dishes are prepared, you need to learn the basic tools of the data science kitchen. Although there are many brands of tools, we recommend either Python or R to learn data science.
Does it matter whether you choose Python or R? Not really, as long you have a programming language under your kitchen belt, and you get really good at slicing and dicing with one of them.
Importantly, these are the tools that will help you learn how to retrieve your data, clean your data, cut and slice your data frames, help you build your models, and test to make sure they taste, I mean, work properly. Eventually you will be able to implement machine learning algorithms in either Python or R, so don’t fret too much about the language now.
Now that you’ve decided on your tool of choice — your chef’s knife — you need to learn the proper techniques and the theory behind the techniques, and there are many courses here on CognitiveClass.ai to help you learn.
Data cleaning and exploration
Like raw ingredients, data is often dirty and needs to be processed. In data science, you need to first learn how to clean and explore your data:
Exploring CO2 Emissions Data using Pandas data frames in Python
Exploring and Visualizing the Number of Out-of-School Children Around the World using Python
Introduction to OpenRefine
Machine Learning and Data Analysis
Now comes the actual analysis of data. To be a world-class data chef, you don’t want to miss out on all the different statistical techniques of analyzing your data, including machine learning – one of the most important skills in data science.
Data Mining Algorithms
Text mining in action: Analyzing Twitter data for Democratic General Elections
Machine Learning – Cluster Analysis
(Machine Learning Classification, Principal Component Analysis, and Text Mining and NLP courses will be created.)
Ok, you’ve established yourself as an excellent data chef. But now you have to tackle the most important data problem of the 21st century: you’ve got a billion pies to bake, you have one oven and it can only fit six pies.
How do you even…?
Time to learn a new skill — Apache Spark! Using Python, R or Scala, you can analyze big data with Spark and get your dishes out of the kitchen on time.
Creating a Data Science App
What’s the use if you can’t show off your hard work? You might not want your customers to see the entire process behind the scenes in the kitchen, but you want give them a clean, final product. With R Shiny, you can create interactive apps that clients can run in the browser, with R running your analyses in the background.
A ‘Creating a Data Science app using Shiny in R’ course will be created shortly.
Where to go next?
Data Scientists, like chefs, don’t ever stop learning and practicing. These courses will give you an enormous jump start to your data science career. So what are you doing still reading this? Get back to the data kitchen and keep learning!