Math rules

This page contains mathematical rules we’ll use in this course that may be beyond what is covered in a linear algebra course.

Matrix calculus

Definition of gradient

Let x=[x1x2xk]be a k×1 vector and f(x) be a function of x.

Then xf, the gradient of f with respect to x is

xf=[fx1fx2fxk]


Gradient of xTz

Let x be a k×1 vector and z be a k×1 vector, such that z is not a function of x .

The gradient of xTz with respect to x is

xxTz=z


Gradient of xTAx

Let x be a k×1 vector and A be a k×k matrix, such that A is not a function of x .

Then the gradient of xTAx with respect to x is

xxTAx=(Ax+ATx)=(A+AT)x

If A is symmetric, then

(A+AT)x=2Ax


Hessian matrix

The Hessian matrix, x2f is a k×k matrix of partial second derivatives

x2f=[2fx122fx1x22fx1xk2f x2x12fx222fx2xk2fxkx12fxkx22fxk2]

Expected value

Expected value of random variable X

The expected value of a random variable X is a weighted average, i.e., the mean value of the possible values a random variable can take weighted by the probability of the outcomes.

Let fX(x) be the probability distribution of X. If X is continuous then

E(X)=xfX(x)dx

If X is discrete then

E(X)=xXxfX(x)=xXxP(X=x)


Expected value of aX+b

Let X be a random variable and a and b be constants. Then

E(aX+b)=E(aX)+E(b)=aE(X)+b


Expected value of ag1(X)+bg2(X)+c

Let X be a random variable and a, b, and c be constants. For any functions g1(x) and g2(x), then

E(ag1(X)+bg2(X)+c)=aE(g1(X))+bE(g2(X))+c


Expected value of vector b

Let b=[b1bp] be a p×1 vector of random variables.

Then E(b)=E[b1bp]=[E(b1)E(bp)]


Expected value of Ab

Let A be a n×p matrix of constants and b a p×1 vector of random variables. Then

E(Ab)=AE(b)

Variance

Variance of random variable X

The variance of a random variable X is a measure of the spread of a distribution about its mean.

Var(X)=E[(XE(X))2]=E(X2)E(X)2


Variance of aX+b

Let X be a random variable and a and b be constants. Then

Var(aX+b)=a2Var(X)


Variance of vector b

Let b=[b1bp] be a p×1 vector of random variables.

Then

Var(b)=E[(bE(b))(bE(b))T]

Variance of Ab

Let A be a n×p matrix of constants and b a p×1 vector of random variables. Then

Var(Ab)=E[(AbE(Ab))(AbE(Ab))T]=AVar(b)AT

Probability distributions

Normal distribution

Let X be a random variable, such that XN(μ,σ2). Then the probability function is

P(X=x|μ,σ2)=12πσ2exp{12σ2(xμ)2}