Data science on Google Colab

Willem de Beijer and Daan Kolkman

This tutorial will take you through the steps of using Google Colab for data science. It is part of our Cloud Computing for Data Science series.

1. About Google Colab

Google Colaboraty is a service that allows you to run Jupyter Notebooks in the cloud for free. While it is more limited than a virtual machine, it’s much easier to set up and get going. Aditionally, you can use your existing Google account to login to the service. A good introduction to Colab can be found on https://colab.research.google.com/notebooks/welcome.ipynb#

2. Getting started

To get started, go to “File” in the top menu and choose either “New Python 3 notebook” or “Upload notebook…” to start with one of your existing notebooks. 

Getting data in Colab can be a bit of a hassle sometimes. Colab can be synchronized with Google Drive, but the connection is not always seamless. The easiest way to upload a dataset is to run the following in a notebook cell:

from google.colab import files
uploaded = files.upload()

This will prompt you to select and upload a file.

For other methods on how to upload data to Google Colab I would recommend the following blogpost: https://towardsdatascience.com/3-ways-to-load-csv-files-into-colab-7c14fcbdcb92

3. What you get

Packages

Most packages you will need for data science are pre-installed on Google Colab. This is especially true for Google-made packages such as TensorFlow. Recently, Google has introduced Swift for TensorFlow which allows you to use the Swift programming language with TensorFlow directly in a Colab notebook. As of writing the project is still in beta version, but it might be interesting to note for those who are interested.

Computing resources

Just like with Kaggle, Google Colab will provide you with free computing resources. Colab also offers TPU support, which is like a GPU but faster for deep learning. Keep in mind though that while TensorFlow does support TPU usage, PyTorch does not.

4. When to use

Collaboration

Google Colab can be especially useful to use for group projects since Colab notebooks can be easily shared on Google Drive. 

Personal

Just like with Kaggle, Google Colab can also be used to extend on the computing resources of your own device. Whether you want to use Google Colab or Kaggle ultimately comes down to personal preference, but for me Colab felt a bit like a pain and therefore I prefer Kaggle in this case.

For a good comparison between Google Colab and Kaggle I would suggest:
https://towardsdatascience.com/kaggle-vs-colab-faceoff-which-free-gpu-provider-is-tops-d4f0cd625029

Scroll naar boven
Scroll naar top

Bedankt voor je aanmelding!

Super leuk dat je bij ons lustrum aanwezig bent!
De inloop is vanaf 15:00 en om 15:30 zullen we beginnen met het programma.

Wanneer?

9 juni 15:30

Wij hebben er zin in en kijken er naar uit jullie weer te zien!