# Autograd Automatic differentiation, also referred to as automatic gradient computation or autograd, is at the heart of PyGrad's design. PyGrad computes gradient values by building a computational graph, following a define-by-run paradigm that maximizes ease of usability. ## Variable Class PyGrad adds a layer of abstraction on top of NumPy's `ndarray` class. For the most part, `Variable` acts much like a `ndarray` object. ```python >>> from pygrad import Variable >>> a = Variable(5) >>> b = Variable(3) >>> a + b Variable(8) ``` PyGrad can, of course, also deal with arrays and tensors. ```python >>> m1 = Variable([[1, 2], [3, 4]]) >>> m2 = Variable([[1, 1], [1, 1]]) >>> m1 + m2 Variable([[2 3] [4 5]]) ``` Since PyGrad uses NumPy as its backend, it supports broadcasting as well as other matrix operations including transpose, reshape, and matrix multiplication. ```python >>> m3 = a + m1 >>> m3 Variable([[4 5] [6 7]]) >>> m3.reshape(1, 4) Variable([[4 5 6 7]]) ``` ## Function Class PyGrad constructs computation graphs by creating a reference between variables and functions. Specifically, each PyGrad `Function` object has pointers to both its input and output `Variable` instances. `Variable` instances, in turn, contain reference to its `creator`, which is a `Function` object. Below is a simple example that demonstrates this relationship. In this example, we demonstrate this relationship via the `Square` class. ```python >>> from pygrad import functions as F >>> x = Variable(2) >>> square = F.Square() >>> y = square(x) ``` Given this setup, we can now access the function that created `y`, which, as expected, is a `Square` instance. ```python >>> y.creator ``` The `square` function also has pointers to both its input and output `Variables`. ```python >>> square.inputs [Variable(2)] >>> square.outputs [] ``` In this case, the `square` function only has one input and output; for other non-unary functions, `Function.inputs` will return a list with two or more `Variable`s. ## Backpropagation Every `Variable` instance has `data` and `grad` attributes. `data` stores the value of the variable itself as a `ndarray`, whereas `grad` stores the values of its gradients. When first initialized, all `Variable` instances have a `grad` value of `None`. ```python >>> x = Variable(1) >>> x.data array(1) >>> print(x.grad) None ``` However, when PyGrad functions are applied on the `Variable` instance, it now belongs to a portion of a newly constructed computation graph. This means that PyGrad is ready to perform backpropagation, thus making `grad` take an actual value. To obtain gradients, simply call the `backward()` method on a leaf variable. ```python >>> x = Variable(3) >>> y = x * x >>> y.backward() >>> x.grad Variable(6) ``` When a `backward()` method is called on a `Variable` instance, PyGrad traverses the computation graph to backpropagate throughout the entire chain. In doing so, it calls on the `backward()` method of each `Function` callable that are also part of the graph as `creator`s of some `Variable` objects. ## Memory Optimization PyGrad implements a few optimizations to make computation more efficient. ### Weak References For memory efficiency and garbage collection purposes, PyGrad stores `weakref` objects instead of the reference themselves. This eliminates circular references, thus allowing Python to garbage collect more efficiently. ### Non-retained Gradients By default, PyGrad will erase the gradient values of any intermediate `Variable` object in the middle of the computation graph. ```python >>> x = Variable(3) >>> y = x * x >>> z = y + y >>> z.backward() >>> print(y.grad) None ``` This behavior is desirable since most computations only require the gradient of some parameter of interest. Removing the gradients of intermediary variables can save memory. However, it is possible to suppress this behavior by explicitly setting `retain_grad=True` in the `backward()` call. ```python >>> z.backward(retain_grad=True) >>> z.grad Variable(1) >>> y.grad Variable(2) >>> x.grad Variable(12) ```