Using AWS EC2 computer instances for Deep Neural Network Machine Learning

You can also use this guide for any Linux Ubuntu machine.

Motivation

In this post I document the most common practices I have used in order to remember myself and be able to share this knowledge with others.

This post is written as an actual (executable) Linux Bash shell script in Jupyter Notebook to make executing some of the commands easier. This assumes I run this notebook on the actual AWS instance.

AWS EC2 Instance Types

t2.micro

for free

Most of the time I run t2.micro, it is very underpowered and slow (think Raspberry Pi), but it is OK for the following tasks:

  • to run Python Jupyter Notebook, execute Python on CPU
  • to synch with GitHub
  • to transfer large DATA samples before I chenge the type to more poweful instance

When training machine learning deep neural networks, I stop t2.micro and change the type to the following GPU instance:

p2.xlarge

for 0.16 to 0.67 dollars/hr

This EC2 instance is exivalent to very powerul scientific work station.

  • 1 NVIDIA K80 GPU with 2,496 parallel CUDA cores, 12 GiB memory
  • 4CPU High Frequency Intel Xeon E5-2686v4 (Broadwell) Processors
  • 61 GiB RAM

If I feel frisky (or reckless with money) I choose even more powerful instance:

g2.8xlarge

for 0.65 to 1.43 dollars/hr

It is a true monster machine.

  • 4 NVidia GPU, each 1,536 CUDA cores (6,144 total), each 4GiB (16 GiB total)
  • 32 CPU Intel Xeon E5-2670 (Sandy Bridge)
  • 60 GiB Memory
  • 2 x 120 GB

Verify current prices

Verify prices for you region (I host in Oregon) as they change hourly depending on demand:

https://aws.amazon.com/ec2/spot/pricing/

Check your AWS EC2 BILLING page religiously

https://console.aws.amazon.com/billing/home#/

Connecting to AWS instance

I have an static (AWS elastic IP) which never changes which is assigned to my primary instance.

I have a domain name address that forwards to that IP so I do not have to remember the numbers.

I also created a script on my laptop in the ~/.bash_profile to ssh connect with this simple command.

aws_connect

Storage

Originally, I created "Amazon Elastic Block Store" (a.k.a. an SSD-drive) to be 8 GiB, which run out immediately.

I had to change it to 30 GiB which should be in the free tier level. The 30 gigs does not sound like a lot, but for running machine learning models it should be enough.

To check the disk usage I run the following, currently I am at about 50%:

In [ ]:
df -k

Python

For all my current work I should have Python 3.5.x (no more no less).

In [ ]:
python --version

kill running processes

In [ ]:
ps aux | grep python

# EXAMPLE
# ubuntu    1690  0.0  0.1 378096 68036 pts/0    Sl   16:00   0:12 /home/ubuntu/anaconda3/bin/python /home/ubuntu/anaconda3/bin/jupyte
# ubuntu    2522  0.0  0.0 556184 44604 ?        Ssl  20:43   0:00 /home/ubuntu/anaconda3/bin/python -m bash_kernel -f /run/user/1000/
# ubuntu    2531  0.0  0.0  21284  4996 pts/1    Ss+  20:43   0:00 /bin/bash --rcfile /home/ubuntu/anaconda3/lib/python3.5/site-packag
# ubuntu    3123  0.0  0.0  12944  1084 pts/3    S+   22:50   0:00 grep --color=auto python

# kill 1690

Python conda Environment

I try to install most of my Python packages with in Anaconda (conda), if I cannot find one I use pip (Python package index, Pip Installs Packages, go figure).

Sometimes I do experiment with different enviroments when there is a cataclysmic change e.g. Python 2.7 -> 3.5 -> 3.6 that breaks everything.

I am making sure I am in the conda environment I want to use:

In [ ]:
conda info --envs
In [ ]:
# print out installed packages and their version numbers
conda list  | grep -E "keras|teano|tensorflow"

# expected tensorflow-gpu 1.0.1 
# Example of OLD: tensorflow 0.10.0rc0 np111py35_0 (see uninstall Tensorflow below)

Jupyter Notebook

I am writing 95% of my code in Python, Bash, Markdown or HTML in Jupyter Notebook. For last 15 years I used Java, but I program Java on my local laptop in Android Studio / IntelliJ IDEA.

Starting Jupyter Notebook (disconnected from terminal)

cd dev/ nohup jupyter notebook &

Keras

  • keras-2.0.1
  • theano-0.8.2
In [ ]:
pip install keras

Tensor Flow

Install TensorFlow GPU for Python 3.5

  • My version: tensorflow_gpu-1.0.1-cp35-cp35m-manylinux1_x86_64.whl (94.8MB)

Please note that TensorFlow changes rapidly, so when you started in ancient times (say 6 months ago), you probably have some old garbage versions installed that do not work.

In [ ]:
pip show tensorflow                                                                                  

# EAMPLE OUTPUT with OLD VERSION:
# Name: tensorflow                                                                                                                    
# Version: 0.10.0rc0                                                                                                                  
# Summary: TensorFlow helps the tensors flow

Uninstall Tensorflow

Execute from actual Terminal, not from here (the notebook).

pip uninstall protobuf pip uninstall tensorflow conda uninstall tensorflow
In [ ]:
pip install --upgrade --ignore-installed tensorflow-gpu

Do NOT use conda to install TensorFlow (it is old and broken)

$ conda install tensorflow

  • The following NEW packages will be INSTALLED:

  • tensorflow: 0.10.0rc0-np111py35_0

In [ ]:
# once again make sure you have everything you want
conda list  | grep -E "keras|teano|tensorflow"

# expected tensorflow-gpu 1.0.1 or newer

Install CUDA

Installation instruction for CUDA on AWS Ubuntu

In [ ]:
uname -m && cat /etc/*release

NVidia CUDA support

In [ ]:
lspci | grep -i nvidia
# expected Tesla K80 on p2.xlarge
In [ ]:
gcc --version
# expected 5.4 on Ubuntu
In [ ]:
# Version of your kernel
# expected 4.4.x
 uname -r
In [ ]:
# Install Linux headers
sudo apt-get install linux-headers-$(uname -r)

wget CUDA

visit:

https://developer.nvidia.com/cuda-downloads

Select Platform: Linux > x86_64 > Ubuntu > 16.04 > deb(local)

RIGHT-CLICK on Donload(1.9 GB) and COPY the URL of the file

Note that there is a mistake in the instructions:

  • ..8.0.61-1_amd64.deb`
  • but download is:
  • ..8.0.61-1_amd64-deb
In [ ]:
cd ~/Downloads/
pwd
In [ ]:
# do not copy the URL below, find the newest
wget https://developer.nvidia.com/compute/cuda/8.0/Prod2/local_installers/cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64-deb
In [ ]:
# do not copy, match your download
sudo dpkg -i cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64-deb
In [ ]:
# don't be shy to run this every day
sudo apt-get update

Run the following in Terminal

It requires you to type "Y(es)"

In [ ]:
sudo apt-get install cuda
In [ ]:
# verify version
cat /proc/driver/nvidia/version
# expected: NVIDIA UNIX x86_64 Kernel Module  375.26 or newer

Verify your NVidia CUDA works

Create a python script (notebook) and run the following:

#### Show available CPU and GPU(s) from tensorflow.python.client import device_lib def get_available_CPU_GPU(): devices = device_lib.list_local_devices() #return [x.name for x in devices if x.device_type == 'CPU'] return [x.name for x in devices ] print(get_available_CPU_GPU()) # expected ['/cpu:0', '/gpu:0']

Install additional Python packages

You can have a yaml script to create a whole enviroment, but it works well only in brand new setup.

In [ ]:
conda install opencv
In [ ]:
pip install eventlet
In [ ]:
pip install python-socketio

Update all Anaconda packages

In [ ]:
conda update --all

Commit your notes

Since this page is created with Bash Jupyter Notebook, when I change it I have to commit them.

I decided to add this because these "little" things are usually omited elsewhere, but they are essential in my workflow.

Remember to CLEAR OUTPUTS before commiting!

In [ ]:
cd ~/dev/UkiDLucas.github.io
In [ ]:
git status
In [ ]:
git add .
In [ ]:
git commit -m "changes to AWS post"
In [ ]:
git push

Thank you for reading, if you find this valuable, please share it

and mark me at Twitter @UkiDLucas

https://twitter.com/ukidlucas