When we implement **backpropagation** there is a test called **Gradient Checking** that helps to make sure that the implementation of backpropagation is correct.

Looking at the figure above we can get much better estimate of gradient if we use a larger approximation of the derivative using a double triangle. The hiight of the triagle in the figure can be seen as follow:

\[
\begin{array}{l}{\frac{f(\theta+\epsilon)-f(\theta-\epsilon)}{2 \varepsilon} \approx g(\theta)} \\ {\frac{(1.01)^{3}-(0.99)^{3}}{2(0.01)}=3.0001} \\ {g(\theta)=3 \theta^{2}=3}\end{array}
\]
From the calculation above, we havean approximation error of 0.0001 extremely colose to 3.

This method is called **Two Side Approxicamtion** of epsilon.

It is important to remeber that:

1 - this method is not used during the training of the neural network but only to debug

2 - when we have the theta function very far from the real function, remember to use regularization

3 - it doesnâ€™t work with dropout, because in this case the cost function J is very difficult to compute

4 - run at random initialization: it is not impossible that the implementation of gradient descent is correct when weights are close to zero.

Reference: **coursera deep neural network course**