Autograd¶
Automatic differentiation, also referred to as automatic gradient computation or autograd, is at the heart of PyGrad’s design. PyGrad computes gradient values by building a computational graph, following a define-by-run paradigm that maximizes ease of usability.
Variable Class¶
PyGrad adds a layer of abstraction on top of NumPy’s ndarray
class. For the most part, Variable
acts much like a ndarray
object.
>>> from pygrad import Variable
>>> a = Variable(5)
>>> b = Variable(3)
>>> a + b
Variable(8)
PyGrad can, of course, also deal with arrays and tensors.
>>> m1 = Variable([[1, 2], [3, 4]])
>>> m2 = Variable([[1, 1], [1, 1]])
>>> m1 + m2
Variable([[2 3]
[4 5]])
Since PyGrad uses NumPy as its backend, it supports broadcasting as well as other matrix operations including transpose, reshape, and matrix multiplication.
>>> m3 = a + m1
>>> m3
Variable([[4 5]
[6 7]])
>>> m3.reshape(1, 4)
Variable([[4 5 6 7]])
Function Class¶
PyGrad constructs computation graphs by creating a reference between variables and functions. Specifically, each PyGrad Function
object has pointers to both its input and output Variable
instances. Variable
instances, in turn, contain reference to its creator
, which is a Function
object. Below is a simple example that demonstrates this relationship.
In this example, we demonstrate this relationship via the Square
class.
>>> from pygrad import functions as F
>>> x = Variable(2)
>>> square = F.Square()
>>> y = square(x)
Given this setup, we can now access the function that created y
, which, as expected, is a Square
instance.
>>> y.creator
<pygrad.functions.Square object at 0x7fa63e090fd0>
The square
function also has pointers to both its input and output Variables
.
>>> square.inputs
[Variable(2)]
>>> square.outputs
[<weakref at 0x7fa63ea26650; to 'Variable' at 0x7fa63ab18350>]
In this case, the square
function only has one input and output; for other non-unary functions, Function.inputs
will return a list with two or more Variable
s.
Backpropagation¶
Every Variable
instance has data
and grad
attributes. data
stores the value of the variable itself as a ndarray
, whereas grad
stores the values of its gradients. When first initialized, all Variable
instances have a grad
value of None
.
>>> x = Variable(1)
>>> x.data
array(1)
>>> print(x.grad)
None
However, when PyGrad functions are applied on the Variable
instance, it now belongs to a portion of a newly constructed computation graph. This means that PyGrad is ready to perform backpropagation, thus making grad
take an actual value. To obtain gradients, simply call the backward()
method on a leaf variable.
>>> x = Variable(3)
>>> y = x * x
>>> y.backward()
>>> x.grad
Variable(6)
When a backward()
method is called on a Variable
instance, PyGrad traverses the computation graph to backpropagate throughout the entire chain. In doing so, it calls on the backward()
method of each Function
callable that are also part of the graph as creator
s of some Variable
objects.
Memory Optimization¶
PyGrad implements a few optimizations to make computation more efficient.
Weak References¶
For memory efficiency and garbage collection purposes, PyGrad stores weakref
objects instead of the reference themselves. This eliminates circular references, thus allowing Python to garbage collect more efficiently.
Non-retained Gradients¶
By default, PyGrad will erase the gradient values of any intermediate Variable
object in the middle of the computation graph.
>>> x = Variable(3)
>>> y = x * x
>>> z = y + y
>>> z.backward()
>>> print(y.grad)
None
This behavior is desirable since most computations only require the gradient of some parameter of interest. Removing the gradients of intermediary variables can save memory. However, it is possible to suppress this behavior by explicitly setting retain_grad=True
in the backward()
call.
>>> z.backward(retain_grad=True)
>>> z.grad
Variable(1)
>>> y.grad
Variable(2)
>>> x.grad
Variable(12)