# Math20512

## How does the chain rule work in 2 variables?

### Single variable case:

Recall from A-level that if f(x) is a function of x and x is a function of t, then the chain rule says that $\frac{\mathrm{d} f}{\mathsf{d} t} = \frac{\mathsf{d} f}{\mathsf{d} x}\;\frac{\mathsf{d}x}{\mathsf{d}t},$ as I hope you remember.

### 2 variable case:

Let f(x, y) be a function of 2 variables, and suppose each of x, y depend on time t. Then $\frac{\mathrm{d}}{\mathsf{d}t}f(x,y) = \frac{\mathrm{d}}{\mathsf{d}t}f(x(t),y(t)) = \frac{\partial f}{\partial x}\;\frac{\mathsf{d}x}{\mathsf{d}t} + \frac{\partial f}{\partial y}\;\frac{\mathsf{d}y}{\mathsf{d}t}.$ NB It's important to distinguish between partial derivatives and ordinary derivatives. A partial $$\partial$$ should only be used if you are holding some of the variables constant. And you are not doing that when you ask for $$\frac{\mathrm{d}}{\mathsf{d}t}f(x,y)$$, but in $$\frac{\partial f}{\partial x}$$ you are holding $$y$$ constant. (If it's not clear, come and ask me about it - it's important!)

#### n variables:

Extending from 2 to n variables doesn't really change anything:
Let f(x1, . . . , xn) be a function of n variables, and suppose each xi depends on time t. Then $\frac{\mathsf{d}}{\mathsf{d}t}f(x_1,\dots,x_n) = \sum_{j=1}^n \frac{\partial f}{\partial x_j}\;\frac{\mathsf{d}x_j}{\mathsf{d}t}.$

Simple!

Then we can recognise this last expression as the row vector $$\mathrm{d}f$$ times the column vector $$\dot{\mathbf{x}}$$, so that $\frac{\mathsf{d}}{\mathsf{d}t}f(\mathbf{x}) = \mathsf{d}\!f\, \dot{\mathbf{x}}$ where $$\mathsf{d}f = (\partial f/\partial x_1,\dots, \partial f/\partial x_n)$$ (a row vector) and $$\dot{\mathbf{x}}=(\dot{x}_1, \dot{x}_2,\dots,\dot{x}_n)^T$$ (a column vector because of the transpose).

Even simpler!

So the 2-variable case above can be written

$\frac{\mathsf{d}}{\mathsf{d}t}f(\mathbf{x}) = \mathsf{d}\!f\, \dot{\mathbf{x}} = \begin{pmatrix}\frac{\partial f}{\partial x}& \frac{\partial f}{\partial y}\end{pmatrix}\begin{pmatrix}\dot x\cr \dot y\end{pmatrix}.$

Note $$\mathrm{d}f$$ is often written $$\nabla f$$ (grad $$f$$). [Strictly speaking, there is a difference though: namely, $$\mathrm{d}f$$ is a row vector, while $$\nabla f$$ is a column vector, so each is the transpose of the other.]