Vectorization in Python

In deep learing we often deal with large data sets. It is important to run the code quickly because otherwise the code might take a long time to get the results. That is why perform vectorization has become a key skill.

For example in logistic regression we need to to compute w transpose x in a non-vectorized implementation we can use the following code:

\[ \begin{array}{l}{\quad z=\underbrace{\omega^{\top} x}_{}+b} \\ {\text { Non-vectorized }} \\ {z=0} \\ {\text { for } i \text { in } \operatorname{range}(n-x) \text { : }} \\ {z+=\omega Ti] * x \ \text { Ti } } \\ {z+=b}\end{array} \] In contrast, a vectorized implementation can me made using numpy in python. This implementation is much faster to compute.

import numpy as np
import time

# Vectorized implementation
a = np.random.rand(1000000) # million dimentional array
b = np.random.rand(1000000)

tic = time.time()
c = np.dot(a,b) # both a and b are arrays
toc = time.time()

print("Vectorized version:" + str(1000*(toc-tic)) +"ms")


# Non-vectorized implementation
## Vectorized version:15.591144561767578ms
c = 0
tic = time.time()
for i in range(1000000):
    c += a[i]*b[i]
toc = time.time()

print("For loop:" + str(1000*(toc-tic)) +"ms")
    
## For loop:422.9731559753418ms

As we can see from the code above, the Non-vectorized version (For loop) is much longer (462.32 ms) than the Vectorized version (8.13 ms). This means that when we implement a deep learning algorithm weget result back much faster if we vectorized our code. There is also another scalable implementation of deep learning using GPU a Graphics Processing Unit, and both CPU and GPU have parallelization instructions called SIMD that stands for Single Instruction Multiple Data. That means that if we use a built-in function such as the np.dot() the SIMD enables Python to take much better advantages of parallelism to do our computations much faster. This is true both on CPUs and GPUs.