Math rules

This page contains mathematical rules we’ll use in this course that may be beyond what is covered in a linear algebra course.

Matrix calculus

Definition of gradient

Let $x = [\begin{matrix} x_{1} \\ x_{2} \\ ⋮ \\ x_{k} \end{matrix}]$ be a $k \times 1$ vector and $f (x)$ be a function of $x$ .

Then $\nabla_{x} f$ , the gradient of $f$ with respect to $x$ is

$\nabla_{x} f = [\begin{matrix} \frac{\partial f}{\partial x_{1}} \\ \frac{\partial f}{\partial x_{2}} \\ ⋮ \\ \frac{\partial f}{\partial x_{k}} \end{matrix}]$

Gradient of $x^{T} z$

Let $x$ be a $k \times 1$ vector and $z$ be a $k \times 1$ vector, such that $z$ is not a function of $x$ .

The gradient of $x^{T} z$ with respect to $x$ is

$\nabla_{x} x^{T} z = z$

Gradient of $x^{T} A x$

Let $x$ be a $k \times 1$ vector and $A$ be a $k \times k$ matrix, such that $A$ is not a function of $x$ .

Then the gradient of $x^{T} A x$ with respect to $x$ is

$\nabla_{x} x^{T} A x = (A x + A^{T} x) = (A + A^{T}) x$

If $A$ is symmetric, then

$(A + A^{T}) x = 2 A x$

Hessian matrix

The Hessian matrix, $\nabla_{x}^{2} f$ is a $k \times k$ matrix of partial second derivatives

$\nabla_{x}^{2} f = [\begin{matrix} \frac{\partial^{2} f}{\partial x_{1}^{2}} & \frac{\partial^{2} f}{\partial x_{1} \partial x_{2}} & \dots & \frac{\partial^{2} f}{\partial x_{1} \partial x_{k}} \\ \frac{\partial^{2} f}{\partial x_{2} \partial x_{1}} & \frac{\partial^{2} f}{\partial x_{2}^{2}} & \dots & \frac{\partial^{2} f}{\partial x_{2} \partial x_{k}} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ \frac{\partial^{2} f}{\partial x_{k} \partial x_{1}} & \frac{\partial^{2} f}{\partial x_{k} \partial x_{2}} & \dots & \frac{\partial^{2} f}{\partial x_{k}^{2}} \end{matrix}]$

Expected value

Expected value of random variable $X$

The expected value of a random variable $X$ is a weighted average, i.e., the mean value of the possible values a random variable can take weighted by the probability of the outcomes.

Let $f_{X} (x)$ be the probability distribution of $X$ . If $X$ is continuous then

$E (X) = \int_{- \infty}^{\infty} x f_{X} (x) d x$

If $X$ is discrete then

$E (X) = \sum_{x \in X} x f_{X} (x) = \sum_{x \in X} x P (X = x)$

Expected value of $a X + b$

Let $X$ be a random variable and $a$ and $b$ be constants. Then

$E (a X + b) = E (a X) + E (b) = a E (X) + b$

Expected value of $a g_{1} (X) + b g_{2} (X) + c$

Let $X$ be a random variable and $a$ , $b$ , and $c$ be constants. For any functions $g_{1} (x)$ and $g_{2} (x)$ , then

$E (a g_{1} (X) + b g_{2} (X) + c) = a E (g_{1} (X)) + b E (g_{2} (X)) + c$

Expected value of vector $b$

Let $b = [\begin{matrix} b_{1} \\ ⋮ \\ b_{p} \end{matrix}]$ be a $p \times 1$ vector of random variables.

Then $E (b) = E [\begin{matrix} b_{1} \\ ⋮ \\ b_{p} \end{matrix}] = [\begin{matrix} E (b_{1}) \\ ⋮ \\ E (b_{p}) \end{matrix}]$

Expected value of $Ab$

Let $A$ be a $n \times p$ matrix of constants and $b$ a $p \times 1$ vector of random variables. Then

$E (Ab) = A E (b)$

Variance

Variance of random variable $X$

The variance of a random variable $X$ is a measure of the spread of a distribution about its mean.

$V a r (X) = E [(X - E (X))^{2}] = E (X^{2}) - E (X)^{2}$

Variance of $a X + b$

Let $X$ be a random variable and $a$ and $b$ be constants. Then

$V a r (a X + b) = a^{2} V a r (X)$

Variance of vector $b$

Let $b = [\begin{matrix} b_{1} \\ ⋮ \\ b_{p} \end{matrix}]$ be a $p \times 1$ vector of random variables.

Then

$V a r (b) = E [(b - E (b)) (b - E (b))^{T}]$

Variance of $Ab$

Let $A$ be a $n \times p$ matrix of constants and $b$ a $p \times 1$ vector of random variables. Then

$\begin{aligned} V a r (Ab) & = E [(Ab - E (Ab)) (Ab - E (Ab))^{T}] \\ = A V a r (b) A^{T} \end{aligned}$

Probability distributions

Normal distribution

Let $X$ be a random variable, such that $X \sim N (μ, σ^{2})$ . Then the probability function is

$P (X = x | μ, σ^{2}) = \frac{1}{\sqrt{2 π σ^{2}}} \exp {- \frac{1}{2 σ^{2}} (x - μ)^{2}}$

Matrix calculus

Definition of gradient

Gradient of xTz

Gradient of xTAx

Hessian matrix

Expected value

Expected value of random variable X

Expected value of aX+b

Expected value of ag1(X)+bg2(X)+c

Expected value of vector b

Expected value of Ab