Willem de Beijer and Daan Kolkman
What is a virtual machine (VM) and why use it?
For day to day data science projects, you will often require no more than your own laptop. As your career progresses, however, is inevitable that you will find yourself in need of more processing power. You could be working with a dataset that is simply too large to fit in RAM or find that the model you are developing is taking several days to train. In these cases, you might want to work with a virtual machine that has considerably more resources and can run while you are doing other things. In this tutorial series, we will demonstrate how you can launch a virtual machine, install the necessary data science libraries, and transfer data to and from the virtual machine. We will offer a step by step guide on how to accomplish this on the major cloud platforms: Microsoft Azure, Google Cloud, and Amazons Web Services.
What service should you use?
Ultimately, the major cloud platforms offer very similar functionality so choosing between them is fairly arbitrary. There are slight differences in pricing (as of writing Google Cloud is somewhat cheaper), but much depends on your exact requirements. Moreover, there are a couple easier to use alternatives to launching your own VM which might work for your use-case. If ease of use is what you are looking for, Kaggle Kernel or Google Collaboratory are good picks. For more configuration options and flexibility in terms of data storage and security, use AWS/Google Cloud/Azure. Select a tutorial below and spin up your first virtual machine in a matter of minutes:
Amazon Web Services
Google Colab
Google Cloud
Kaggle Kernel
Microsoft Azure