Autograd

Automatic differentiation, also referred to as automatic gradient computation or autograd, is at the heart of PyGrad’s design. PyGrad computes gradient values by building a computational graph, following a define-by-run paradigm that maximizes ease of usability.

Variable Class

PyGrad adds a layer of abstraction on top of NumPy’s ndarray class. For the most part, Variable acts much like a ndarray object.

>>> from pygrad import Variable
>>> a = Variable(5)
>>> b = Variable(3)
>>> a + b
Variable(8)

PyGrad can, of course, also deal with arrays and tensors.

>>> m1 = Variable([[1, 2], [3, 4]])
>>> m2 = Variable([[1, 1], [1, 1]])
>>> m1 + m2
Variable([[2 3]
          [4 5]])

Since PyGrad uses NumPy as its backend, it supports broadcasting as well as other matrix operations including transpose, reshape, and matrix multiplication.

>>> m3 = a + m1
>>> m3
Variable([[4 5]
          [6 7]])
>>> m3.reshape(1, 4)
Variable([[4 5 6 7]])

Function Class

PyGrad constructs computation graphs by creating a reference between variables and functions. Specifically, each PyGrad Function object has pointers to both its input and output Variable instances. Variable instances, in turn, contain reference to its creator, which is a Function object. Below is a simple example that demonstrates this relationship.

In this example, we demonstrate this relationship via the Square class.

>>> from pygrad import functions as F
>>> x = Variable(2)
>>> square = F.Square()
>>> y = square(x)

Given this setup, we can now access the function that created y, which, as expected, is a Square instance.

>>> y.creator
<pygrad.functions.Square object at 0x7fa63e090fd0>

The square function also has pointers to both its input and output Variables.

>>> square.inputs
[Variable(2)]
>>> square.outputs
[<weakref at 0x7fa63ea26650; to 'Variable' at 0x7fa63ab18350>]

In this case, the square function only has one input and output; for other non-unary functions, Function.inputs will return a list with two or more Variables.

Backpropagation

Every Variable instance has data and grad attributes. data stores the value of the variable itself as a ndarray, whereas grad stores the values of its gradients. When first initialized, all Variable instances have a grad value of None.

>>> x = Variable(1)
>>> x.data
array(1)
>>> print(x.grad)
None

However, when PyGrad functions are applied on the Variable instance, it now belongs to a portion of a newly constructed computation graph. This means that PyGrad is ready to perform backpropagation, thus making grad take an actual value. To obtain gradients, simply call the backward() method on a leaf variable.

>>> x = Variable(3)
>>> y = x * x
>>> y.backward()
>>> x.grad
Variable(6)

When a backward() method is called on a Variable instance, PyGrad traverses the computation graph to backpropagate throughout the entire chain. In doing so, it calls on the backward() method of each Function callable that are also part of the graph as creators of some Variable objects.

Memory Optimization

PyGrad implements a few optimizations to make computation more efficient.

Weak References

For memory efficiency and garbage collection purposes, PyGrad stores weakref objects instead of the reference themselves. This eliminates circular references, thus allowing Python to garbage collect more efficiently.

Non-retained Gradients

By default, PyGrad will erase the gradient values of any intermediate Variable object in the middle of the computation graph.

>>> x = Variable(3)
>>> y = x * x
>>> z = y + y
>>> z.backward()
>>> print(y.grad)
None

This behavior is desirable since most computations only require the gradient of some parameter of interest. Removing the gradients of intermediary variables can save memory. However, it is possible to suppress this behavior by explicitly setting retain_grad=True in the backward() call.

>>> z.backward(retain_grad=True)
>>> z.grad
Variable(1)
>>> y.grad
Variable(2)
>>> x.grad
Variable(12)