Data Scientists, Stand out by Sharing Your Notebooks
Posted on December 3, 2018 by Antonio Cangiano
Data Science has been dubbed as the sexiest profession of the 21st century. Demand is at an all-time high. Salaries are — particularly when Machine and Deep Learning are thrown into the mix — well into the six figures.
That’s great, but there is one small problem. Competition for available positions is fierce. Especially if you are starting out and don’t have years of experience to sell your case to prospective employers. So how do you stand out and get your foot into this industry?
As someone who has recruited, interviewed, and hired countless developers and data scientists over the years, I can tell you that it doesn’t actually take that much to stand out. So in this article, I’m going to focus on a simple strategy that you can implement right away to improve your odds of getting hired and grow your presence in this industry.
DO EMPLOYERS CARE ABOUT GITHUB?
Part of the reason why I’ve been quite successful in hiring so many developers and data scientists for my team (over the past decade), is that I take a holistic approach. I don’t focus on GPA, a particular school, years of experience, or specific keywords on a resume. Instead, I consider the person as a whole, trying to detect and evaluate as many signals as possible. Signals issued by their resume, interview, and online presence.
I’m not alone in this. Prospective employers don’t just judge junior applicants on the basis of their school credentials and interviewing skills. More often than not, they look for distinguishing elements that make a candidate stand out.
Having a blog where you share your knowledge online (even if you are just learning and not yet a working professional) will make a huge difference. I’m an advocate of this strategy and have even written a whole book about it (I’m currently writing the second edition, by the way).
But there is something even simpler that you can do, that requires limited commitment on your part. A low hanging fruit for your data science career, if you will. Share the data science notebooks you create online. The easiest way to do that is GitHub where other data scientists and prospective employers can find them.
The subject is a bit controversial, but make no mistake, employers very much look at your GitHub profile to investigate the work you have shared as well as your degree of activity on the site (the green dotted contribution wall shown in the image below). It shouldn’t be a requirement to have a GitHub account to be hired (for many reasons), but if a link is provided on your resume, it can certainly be an asset that helps you stand out.
Not to mention that I routinely get job offers in the chatbot space, specifically from people who found out about me via related content I published in one of my repositories. (Offers I decline because I work for an awesome team, but I digress.)
SHARING YOUR JUPYTER NOTEBOOKS ON GITHUB
You’re certainly welcome to have a public GitHub repository where you keep all your Jupyter notebooks. It’s a good idea, in fact. However, in this article, I’m going to show you an even easier way to share your notebooks with the world and hosting them on GitHub: through their Gist service.
Gist is a service where people can quickly share notes, code, and snippets. And it turns out, entire data science notebooks! Let’s see how.
Here at Cognitive Class, we believe in skill building far more than just providing education. We all have the experience of watching hours of video, retaining very little a week later. The reason for this is that we really learn by doing (we are apes with an oversized brain, after all). So you’ll find that most courses on our site require you to complete labs.
We offer various options, but our tool of choice is JupyterLab hosted in the Cloud in our Labs environment, provided for free to our learners. To see it in action, enroll in our, say, Python for Data Science course and in the first module, you’ll find an exercise that invites you to start the lab environment and write your first Python code. (Alternatively, you can access our lab environment directly and create your own notebook.
Once you have the notebook loaded, you’ll want to right click on the file name and then select Share on GitHub as shown in the picture below.
If you don’t see the file in the sidebar as per the image above, make sure you click on the folder icon on the very left to reveal the notebooks within your lab environment.
You’ll be asked to authenticate into your GitHub account. Click on the Sign In button to do so. If nothing happens, chances are your browser has blocked the pop-up, in which case you should allow the pop-up from our domain in your browser (as shown in the image below for Google Chrome) and repeat the process of sharing the notebook.
Once you are successfully authenticated, you will receive a pop-up message with a link to your newly created Gist as shown below. Click on the Copy icon to place the link in your clipboard and then click the Dismiss button.
Next, you’ll want to paste that link into your browser address bar in a new tab to see the Gist that was created within your account. You’ll notice that the notebook is now visible on Gist, and can be shared with other people by simply giving them this link.
It will also be added to your collection of Gists, so you’ll be able to add
https://gist.github.com/<username> to your resume and online profiles to showcase the type of data science and learning you’ve have been doing recently.
IMPORTING YOUR GITHUB GIST IN WATSON STUDIO
You can now easily share your notebooks with others. You can even leverage it for the sake of employment. That’s awesome. But what happens when a fellow data scientists, or prospective employer, wants to play with your notebooks by importing one into their own environment? That, it turns out, it’s quite easy as well. Let’s see how to accomplish this.
To test out importing the notebook from Gist, we are going to create a free account with Watson Studio. This will give us another advanced and powerful data science environment from which we’ll test out importing notebooks.
If you already have an account with IBM Cloud, simply visit that link and log in. This will add Watson Studio to your account. If not, provide your email to create an IBM Cloud account and activate Watson Studio.
Once you are inside Watson Studio, you’ll want to click on Create a project. You can pick any project type you like among the pre-made templates for data science, visual recognition, deep learning, etc. If you don’t know which one to pick, hover above Standard and click Create a project.
You’ll be asked to provide a name, feel free to call it Testing Notebook Imports. Next, click the Create button at the very bottom right of the page.
With the project created, you’ll want to click on the Add to project button and then select Notebook as shown in the image below.
Here you can create a blank notebook, import it from a file on your computer, or import it from a URL. Select the From URL option.
It might be tempting to just paste the Gist URL as-is, but don’t do that. We need to actually obtain the source code for the Notebook (which is really a JSON file). So, back in our Gist notebook, click on Raw as shown in the image below.
The row file will open in a new tab of your browser. Copy that long URL string (it ends with .ipynb) and paste it in the Notebook URL field back inside of Watson Studio as shown in the image below.
Provide also a name for the notebook (e.g., Python 101). You’ll be offered some options for the runtime executing your code; leave it to default. Finally, click on the Create Notebook button in the bottom right of the page to import the notebook.
You should now see your notebook running inside of Watson Studio. Congratulations, if you followed along every step, you’ll have exported a JupyterLab notebook to GitHub, and then imported in within Watson Studio!
A prospective employer will be able to do the same with your notebooks, whether using Watson Studio or not.
My parting words of advice to break into this industry is to never stop learning and sharing what you learn. You can’t achieve success if you do amazing work and nobody knows about it.
Put yourself out there, sharing with the world what you know. It’s the right thing to do, as it benefits both yourself and others. It’s uncomfortable, I know, especially if you are just starting out and don’t feel like an expert. Being uncomfortable can pay massive dividends as it’s where growth happens. I wish you great success in your data science career.
Antonio Cangiano is a Software Developer and AI Advocate at IBM and the author of Technical Blogging (2nd Edition): Amplify your influence. He is passionate about the craft of programming, AI, online marketing, and entrepreneurship. Keep in touch by reading his programming blog and by following him on Twitter.