Use PyTorch to Easily Access Your GPU
Let’s say you are lucky enough to have access to a system with an Nvidia Graphical Processing Unit. Did you know there is an absurdly easy method to use your GPU’s capabilities using a Python library intended and predominantly used for machine learningapplications?
Don’t worry if you’re not up to speed on the ins and outs of ML, since we won’t be using it in this article. Instead, I’ll show you how to use the PyTorch library to access and use the capabilities of your GPU. We’ll compare the run times of Python programs using the popular numerical library NumPy, running on the CPU, with equivalent code using PyTorch on the GPU.
Before continuing, let’s quickly recap what a GPU and Pytorch are.
What is a GPU?
A GPU is a specialised electronic chip initially designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. Its utility as a rapid image manipulation device was based on its ability to perform many calculations simultaneously, and it’s still used for that purpose.
However, GPUs have recently become invaluable in machine learning, large language model training and development. Their inherent ability to perform highly parallelizable computations makes them ideal workhorses in these fields, as they employ complex mathematical models and simulations.
What is PyTorch?
PyTorch is an open-source machine learning library developed by Facebook’s AI Research Lab. It’s widely used for natural language processing and computer vision applications. Two of the main reasons that Pytorch can be used for GPU operations are,
One of PyTorch’s core data structures is the Tensor. Tensors are similar to arrays and matrices in other programming languages, but are optimised for running on a GPU.
Pytorch has CUDA support. PyTorch seamlessly integrates with CUDA, a parallel computing platform and programming model developed by NVIDIA for general computing on its GPUS. This allows PyTorch to access the GPU hardware directly, accelerating numerical computations. CUDA will enable developers to use PyTorch to write software that fully utilises GPU acceleration.
In summary, PyTorch’s support for GPU operations through CUDA and its efficient tensor manipulation capabilities make it an excellent tool for developing GPU-accelerated Python functions with high computational demands.
As we’ll show later on, you don’t have to use PyTorch to develop machine learning models or train large language models.
In the rest of this article, we’ll set up our development environment, install PyTorch and run through a few examples where we’ll compare some computationally heavy PyTorch implementations with the equivalent numpy implementation and see what, if any, performance differences we find.
Pre-requisites
An Nvidia GPU
You need an Nvidia GPU on your system. To check your GPU, issue the following command at your system prompt. I’m using the Windows Subsystem for Linux.
$ nvidia-smi
>>PS C:\Users\thoma> nvidia-smi
Fri Mar 22 11:41:34 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 551.61 Driver Version: 551.61 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4070 Ti WDDM | 00000000:01:00.0 On | N/A |
| 32% 24C P8 9W / 285W | 843MiB / 12282MiB | 1% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 1268 C+G ...tility\HPSystemEventUtilityHost.exe N/A |
| 0 N/A N/A 2204 C+G ...ekyb3d8bbwe\PhoneExperienceHost.exe N/A |
| 0 N/A N/A 3904 C+G ...cal\Microsoft\OneDrive\OneDrive.exe N/A |
| 0 N/A N/A 7068 C+G ...CBS_cw5n
etc ..
If that command isn’t recognised and you’re sure you have a GPU, it probably means you’re missing an NVIDIA driver. Just follow the rest of the instructions in this article, and it should be installed as part of that process.
Nvidia GPU drivers
While PyTorch installation packages can include CUDA libraries, your system must still install the appropriate NVIDIA GPU drivers. These drivers are necessary for your operating system to communicate with the graphics processing unithardware. The CUDA toolkit includes drivers, but if you’re using PyTorch’s bundled CUDA, you only need to ensure that your GPU drivers are current.
Click this link to go to the NVIDIA website and install the latest drivers compatible with your system and GPU specifications.
Setting up our development environment
As a best practice, we should set up a separate development environment for each project. I use conda, but use whatever method suits you.
If you want to go down the conda route and don’t already have it, you must install Minicondaor Anaconda first.
Please note that, at the time of writing, PyTorch currently only officially supports Python versions 3.8 to 3.11.
#create our test environment$ conda create -n pytorch_test python=3.11 -y
Now activate your new environment.$ conda activate pytorch_test
We now need to get the appropriate conda install command for PyTorch. This will depend on your operating system, chosen programming language, preferred package manager, and CUDA version.
Luckily, Pytorch provides a useful web interface that makes this easy to set up. So, to get started, head over to the Pytorch website at…
Click on the Get Started link near the top of the screen. From there, scroll down a little until you see this,
Image from Pytorch website
Click on each box in the appropriate position for your system and specs. As you do, you’ll see that the command in the Run this Command output field changes dynamically. When you’re done making your choices, copy the final command text shown and type it into your command window prompt.
For me, this was:-$ conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia -y
We’ll install Jupyter, Pandas, and Matplotlib to enable us to run our Python code in a notebook with our example code.$ conda install pandas matplotlib jupyter -y
Now type in jupyter notebook into your command prompt. You should see a jupyter notebook open in your browser. If that doesn’t happen automatically, you’ll likely see a screenful of information after the jupyter notebook command.
Near the bottom, there will be a URL that you should copy and paste into your browser to initiate the Jupyter Notebook.
Your URL will be different to mine, but it should look something like this:-
Testing our setup
The first thing we’ll do is test our setup. Please enter the following into a Jupyter cell and run it.
import torch
x = torch.randprintYou should see a similar output to the following.
tensorAdditionally, to check if your GPU driver and CUDA are enabled and accessible by PyTorch, run the following commands:
import torch
torch.cuda.is_availableThis should output True if all is OK.
If everything is okay, we can proceed to our examples. If not, go back and check your installation processes.
NB In the timings below, I ran each of the Numpy and PyTorch processes several times in succession and took the best time for each. This does favour the PyTorch runs somewhat as there is a small overhead on the very first invocation of each PyTorch run but, overall, I think it’s a fairer comparison.
Example 1 — A simple array math operation.
In this example, we set up two large, identical one-dimensional arrays and perform a simple addition to each array element.
import numpy as np
import torch as pt
from timeit import default_timer as timer
#func1 will run on the CPU
def func1:
a+= 1
#func2 will run on the GPU
def func2:
a+= 2
if __name__=="__main__":
n1 = 300000000
a1 = np.ones# had to make this array much smaller than
# the others due to slow loop processing on the GPU
n2 = 300000000
a2 = pt.onesstart = timerfunc1print-start)
start = timerfunc2#wait for all calcs on the GPU to complete
pt.cuda.synchronizeprint-start)
printprintprintTiming with CPU:numpy 0.1334826999955112
Timing with GPU:pytorch 0.10177790001034737
a1 =a2 = tensorWe see a slight improvement when using PyTorch over Numpy, but we missed one crucial point. We haven’t used the GPU because our PyTorch tensor data is still in CPU memory.
To move the data to the GPU memory, we need to add the device='cuda' directive when creating the tensor. Let’s do that and see if it makes a difference.
# Same code as above except
# to get the array data onto the GPU memory
# we changed
a2 = pt.ones# to
a2 = pt.onesAfter re-running with the changes we get,
Timing with CPU:numpy 0.12852740001108032
Timing with GPU:pytorch 0.011292399998637848
a1 =a2 = tensorThat’s more like it, a greater than 10x speed up.
Example 2—A slightly more complex array operation.
For this example, we’ll multiply multi-dimensional matrices using the built-in matmul operations available in the PyTorch and Numpy libraries. Each array will be 10000 x 10000 and contain random floating-point numbers between 1 and 100.
# NUMPY first
import numpy as np
from timeit import default_timer as timer
# Set the seed for reproducibility
np.random.seed# Generate two 10000x10000 arrays of random floating point numbers between 1 and 100
A = np.random.uniform).astypeB = np.random.uniform).astype# Perform matrix multiplication
start = timerC = np.matmul# Due to the large size of the matrices, it's not practical to print them entirely.
# Instead, we print a small portion to verify.
printprint-start)
A small portion of the result matrix:]
Without GPU: 1.4450852000009036
Now for the PyTorch version.
import torch
from timeit import default_timer as timer
# Set the seed for reproducibility
torch.manual_seed# Use the GPU
device = 'cuda'
# Generate two 10000x10000 tensors of random floating point
# numbers between 1 and 100 and move them to the GPU
#
A = torch.FloatTensor.uniform_.toB = torch.FloatTensor.uniform_.to# Perform matrix multiplication
start = timerC = torch.matmul# Wait for all current GPU operations to completetorch.cuda.synchronize# Due to the large size of the matrices, it's not practical to print them entirely.
# Instead, we print a small portion to verify.
printprint- start)
A small portion of the result matrix:]
With GPU: 0.07081239999388345
The PyTorch run was 20 times better this time than the NumPy run. Great stuff.
Example 3 — Combining CPU and GPU code.
Sometimes, not all of your processing can be done on a GPU. An everyday use case for this is graphing data. Sure, you can manipulate your data using the GPU, but often the next step is to see what your final dataset looks like using a plot.
You can’t plot data if it resides in the GPU memory, so you must move it back to CPU memory before calling your plotting functions. Is it worth the overhead of moving large chunks of data from the GPU to the CPU? Let’s find out.
In this example, we will solve this polar equation for values of θ between 0 and 2π incoordinate terms and then plot out the resulting graph.
Don’t get too hung up on the math. It’s just an equation that, when converted to use the x, y coordinate system and solved, looks nice when plotted.
For even a few million values of x and y, Numpy can solve this in milliseconds, so to make it a bit more interesting, we’ll use 100 millioncoordinates.
Here is the numpy code first.
%%time
import numpy as np
import matplotlib.pyplot as plt
from time import time as timer
start = timer# create an array of 100M thetas between 0 and 2pi
theta = np.linspace# our original polar formula
r = 1 + 3/4 * np.sin# calculate the equivalent x and y's coordinates
# for each theta
x = r * np.cosy = r * np.sin# see how long the calc part took
print-start)
# Now plot out the data
start = timerplt.plot# see how long the plotting part took
print-start)
Here is the output. Would you have guessed beforehand that it would look like this? I sure wouldn’t have!
Now, let’s see what the equivalent PyTorch implementation looks like and how much of a speed-up we get.
%%time
import torch as pt
import matplotlib.pyplot as plt
from time import time as timer
# Make sure PyTorch is using the GPU
device = 'cuda'
# Start the timer
start = timer# Creating the theta tensor on the GPU
theta = pt.linspace# Calculating r, x, and y using PyTorch operations on the GPU
r = 1 + 3/4 * pt.sinx = r * pt.cosy = r * pt.sin# Moving the result back to CPU for plotting
x_cpu = x.cpu.numpyy_cpu = y.cpu.numpypt.cuda.synchronizeprint- start)
# Plotting
start = timerplt.plotplt.showprint- start)
And our output again.
The calculation part was about 10 times more than the numpy calculation. The data plotting took around the same time using both the PyTorch and NumPy versions, which was expected since the data was still in CPU memory then, and the GPU played no further part in the processing.
But, overall, we shaved about 40% off the total run-time, which is excellent.
Summary
This article has demonstrated how to leverage an NVIDIA GPU using PyTorch—a machine learning library typically used for AI applications—to accelerate non-ML numerical Python code. It compares standard NumPyimplementations with GPU-accelerated PyTorch equivalents to show the performance benefits of running tensor-based operations on a GPU.
You don’t need to be doing machine learning to benefit from PyTorch. If you can access an NVIDIA GPU, PyTorch provides a simple and effective way to significantly speed up computationally intensive numerical operations—even in general-purpose Python code.
The post Use PyTorch to Easily Access Your GPU appeared first on Towards Data Science.
#use #pytorch #easily #access #your
Use PyTorch to Easily Access Your GPU
Let’s say you are lucky enough to have access to a system with an Nvidia Graphical Processing Unit. Did you know there is an absurdly easy method to use your GPU’s capabilities using a Python library intended and predominantly used for machine learningapplications?
Don’t worry if you’re not up to speed on the ins and outs of ML, since we won’t be using it in this article. Instead, I’ll show you how to use the PyTorch library to access and use the capabilities of your GPU. We’ll compare the run times of Python programs using the popular numerical library NumPy, running on the CPU, with equivalent code using PyTorch on the GPU.
Before continuing, let’s quickly recap what a GPU and Pytorch are.
What is a GPU?
A GPU is a specialised electronic chip initially designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. Its utility as a rapid image manipulation device was based on its ability to perform many calculations simultaneously, and it’s still used for that purpose.
However, GPUs have recently become invaluable in machine learning, large language model training and development. Their inherent ability to perform highly parallelizable computations makes them ideal workhorses in these fields, as they employ complex mathematical models and simulations.
What is PyTorch?
PyTorch is an open-source machine learning library developed by Facebook’s AI Research Lab. It’s widely used for natural language processing and computer vision applications. Two of the main reasons that Pytorch can be used for GPU operations are,
One of PyTorch’s core data structures is the Tensor. Tensors are similar to arrays and matrices in other programming languages, but are optimised for running on a GPU.
Pytorch has CUDA support. PyTorch seamlessly integrates with CUDA, a parallel computing platform and programming model developed by NVIDIA for general computing on its GPUS. This allows PyTorch to access the GPU hardware directly, accelerating numerical computations. CUDA will enable developers to use PyTorch to write software that fully utilises GPU acceleration.
In summary, PyTorch’s support for GPU operations through CUDA and its efficient tensor manipulation capabilities make it an excellent tool for developing GPU-accelerated Python functions with high computational demands.
As we’ll show later on, you don’t have to use PyTorch to develop machine learning models or train large language models.
In the rest of this article, we’ll set up our development environment, install PyTorch and run through a few examples where we’ll compare some computationally heavy PyTorch implementations with the equivalent numpy implementation and see what, if any, performance differences we find.
Pre-requisites
An Nvidia GPU
You need an Nvidia GPU on your system. To check your GPU, issue the following command at your system prompt. I’m using the Windows Subsystem for Linux.
$ nvidia-smi
>>PS C:\Users\thoma> nvidia-smi
Fri Mar 22 11:41:34 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 551.61 Driver Version: 551.61 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4070 Ti WDDM | 00000000:01:00.0 On | N/A |
| 32% 24C P8 9W / 285W | 843MiB / 12282MiB | 1% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 1268 C+G ...tility\HPSystemEventUtilityHost.exe N/A |
| 0 N/A N/A 2204 C+G ...ekyb3d8bbwe\PhoneExperienceHost.exe N/A |
| 0 N/A N/A 3904 C+G ...cal\Microsoft\OneDrive\OneDrive.exe N/A |
| 0 N/A N/A 7068 C+G ...CBS_cw5n
etc ..
If that command isn’t recognised and you’re sure you have a GPU, it probably means you’re missing an NVIDIA driver. Just follow the rest of the instructions in this article, and it should be installed as part of that process.
Nvidia GPU drivers
While PyTorch installation packages can include CUDA libraries, your system must still install the appropriate NVIDIA GPU drivers. These drivers are necessary for your operating system to communicate with the graphics processing unithardware. The CUDA toolkit includes drivers, but if you’re using PyTorch’s bundled CUDA, you only need to ensure that your GPU drivers are current.
Click this link to go to the NVIDIA website and install the latest drivers compatible with your system and GPU specifications.
Setting up our development environment
As a best practice, we should set up a separate development environment for each project. I use conda, but use whatever method suits you.
If you want to go down the conda route and don’t already have it, you must install Minicondaor Anaconda first.
Please note that, at the time of writing, PyTorch currently only officially supports Python versions 3.8 to 3.11.
#create our test environment$ conda create -n pytorch_test python=3.11 -y
Now activate your new environment.$ conda activate pytorch_test
We now need to get the appropriate conda install command for PyTorch. This will depend on your operating system, chosen programming language, preferred package manager, and CUDA version.
Luckily, Pytorch provides a useful web interface that makes this easy to set up. So, to get started, head over to the Pytorch website at…
Click on the Get Started link near the top of the screen. From there, scroll down a little until you see this,
Image from Pytorch website
Click on each box in the appropriate position for your system and specs. As you do, you’ll see that the command in the Run this Command output field changes dynamically. When you’re done making your choices, copy the final command text shown and type it into your command window prompt.
For me, this was:-$ conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia -y
We’ll install Jupyter, Pandas, and Matplotlib to enable us to run our Python code in a notebook with our example code.$ conda install pandas matplotlib jupyter -y
Now type in jupyter notebook into your command prompt. You should see a jupyter notebook open in your browser. If that doesn’t happen automatically, you’ll likely see a screenful of information after the jupyter notebook command.
Near the bottom, there will be a URL that you should copy and paste into your browser to initiate the Jupyter Notebook.
Your URL will be different to mine, but it should look something like this:-
Testing our setup
The first thing we’ll do is test our setup. Please enter the following into a Jupyter cell and run it.
import torch
x = torch.randprintYou should see a similar output to the following.
tensorAdditionally, to check if your GPU driver and CUDA are enabled and accessible by PyTorch, run the following commands:
import torch
torch.cuda.is_availableThis should output True if all is OK.
If everything is okay, we can proceed to our examples. If not, go back and check your installation processes.
NB In the timings below, I ran each of the Numpy and PyTorch processes several times in succession and took the best time for each. This does favour the PyTorch runs somewhat as there is a small overhead on the very first invocation of each PyTorch run but, overall, I think it’s a fairer comparison.
Example 1 — A simple array math operation.
In this example, we set up two large, identical one-dimensional arrays and perform a simple addition to each array element.
import numpy as np
import torch as pt
from timeit import default_timer as timer
#func1 will run on the CPU
def func1:
a+= 1
#func2 will run on the GPU
def func2:
a+= 2
if __name__=="__main__":
n1 = 300000000
a1 = np.ones# had to make this array much smaller than
# the others due to slow loop processing on the GPU
n2 = 300000000
a2 = pt.onesstart = timerfunc1print-start)
start = timerfunc2#wait for all calcs on the GPU to complete
pt.cuda.synchronizeprint-start)
printprintprintTiming with CPU:numpy 0.1334826999955112
Timing with GPU:pytorch 0.10177790001034737
a1 =a2 = tensorWe see a slight improvement when using PyTorch over Numpy, but we missed one crucial point. We haven’t used the GPU because our PyTorch tensor data is still in CPU memory.
To move the data to the GPU memory, we need to add the device='cuda' directive when creating the tensor. Let’s do that and see if it makes a difference.
# Same code as above except
# to get the array data onto the GPU memory
# we changed
a2 = pt.ones# to
a2 = pt.onesAfter re-running with the changes we get,
Timing with CPU:numpy 0.12852740001108032
Timing with GPU:pytorch 0.011292399998637848
a1 =a2 = tensorThat’s more like it, a greater than 10x speed up.
Example 2—A slightly more complex array operation.
For this example, we’ll multiply multi-dimensional matrices using the built-in matmul operations available in the PyTorch and Numpy libraries. Each array will be 10000 x 10000 and contain random floating-point numbers between 1 and 100.
# NUMPY first
import numpy as np
from timeit import default_timer as timer
# Set the seed for reproducibility
np.random.seed# Generate two 10000x10000 arrays of random floating point numbers between 1 and 100
A = np.random.uniform).astypeB = np.random.uniform).astype# Perform matrix multiplication
start = timerC = np.matmul# Due to the large size of the matrices, it's not practical to print them entirely.
# Instead, we print a small portion to verify.
printprint-start)
A small portion of the result matrix:]
Without GPU: 1.4450852000009036
Now for the PyTorch version.
import torch
from timeit import default_timer as timer
# Set the seed for reproducibility
torch.manual_seed# Use the GPU
device = 'cuda'
# Generate two 10000x10000 tensors of random floating point
# numbers between 1 and 100 and move them to the GPU
#
A = torch.FloatTensor.uniform_.toB = torch.FloatTensor.uniform_.to# Perform matrix multiplication
start = timerC = torch.matmul# Wait for all current GPU operations to completetorch.cuda.synchronize# Due to the large size of the matrices, it's not practical to print them entirely.
# Instead, we print a small portion to verify.
printprint- start)
A small portion of the result matrix:]
With GPU: 0.07081239999388345
The PyTorch run was 20 times better this time than the NumPy run. Great stuff.
Example 3 — Combining CPU and GPU code.
Sometimes, not all of your processing can be done on a GPU. An everyday use case for this is graphing data. Sure, you can manipulate your data using the GPU, but often the next step is to see what your final dataset looks like using a plot.
You can’t plot data if it resides in the GPU memory, so you must move it back to CPU memory before calling your plotting functions. Is it worth the overhead of moving large chunks of data from the GPU to the CPU? Let’s find out.
In this example, we will solve this polar equation for values of θ between 0 and 2π incoordinate terms and then plot out the resulting graph.
Don’t get too hung up on the math. It’s just an equation that, when converted to use the x, y coordinate system and solved, looks nice when plotted.
For even a few million values of x and y, Numpy can solve this in milliseconds, so to make it a bit more interesting, we’ll use 100 millioncoordinates.
Here is the numpy code first.
%%time
import numpy as np
import matplotlib.pyplot as plt
from time import time as timer
start = timer# create an array of 100M thetas between 0 and 2pi
theta = np.linspace# our original polar formula
r = 1 + 3/4 * np.sin# calculate the equivalent x and y's coordinates
# for each theta
x = r * np.cosy = r * np.sin# see how long the calc part took
print-start)
# Now plot out the data
start = timerplt.plot# see how long the plotting part took
print-start)
Here is the output. Would you have guessed beforehand that it would look like this? I sure wouldn’t have!
Now, let’s see what the equivalent PyTorch implementation looks like and how much of a speed-up we get.
%%time
import torch as pt
import matplotlib.pyplot as plt
from time import time as timer
# Make sure PyTorch is using the GPU
device = 'cuda'
# Start the timer
start = timer# Creating the theta tensor on the GPU
theta = pt.linspace# Calculating r, x, and y using PyTorch operations on the GPU
r = 1 + 3/4 * pt.sinx = r * pt.cosy = r * pt.sin# Moving the result back to CPU for plotting
x_cpu = x.cpu.numpyy_cpu = y.cpu.numpypt.cuda.synchronizeprint- start)
# Plotting
start = timerplt.plotplt.showprint- start)
And our output again.
The calculation part was about 10 times more than the numpy calculation. The data plotting took around the same time using both the PyTorch and NumPy versions, which was expected since the data was still in CPU memory then, and the GPU played no further part in the processing.
But, overall, we shaved about 40% off the total run-time, which is excellent.
Summary
This article has demonstrated how to leverage an NVIDIA GPU using PyTorch—a machine learning library typically used for AI applications—to accelerate non-ML numerical Python code. It compares standard NumPyimplementations with GPU-accelerated PyTorch equivalents to show the performance benefits of running tensor-based operations on a GPU.
You don’t need to be doing machine learning to benefit from PyTorch. If you can access an NVIDIA GPU, PyTorch provides a simple and effective way to significantly speed up computationally intensive numerical operations—even in general-purpose Python code.
The post Use PyTorch to Easily Access Your GPU appeared first on Towards Data Science.
#use #pytorch #easily #access #your
·13 Views