yuenseng

Estimating μ and σ²

6 May 2025

Sample mean and sample variance

Say we have a random variable X_i representing a random observation in a sample of size n, drawn from a population with mean μ and variance σ². That is,

E[X_i] = μ
Var[X_i] = σ²

The following statistics, called the sample mean and the sample variance, are unbiased estimators of μ and σ²:

X̄ = (1/n) Σ X_i
S² = (1/(n-1)) Σ (X_i - X̄)²

(Summation is over all i in the sample, that is, i = 1, 2, ..., n.)

In other words,

E[X̄] = μ
E[S²] = σ²

Proof. We start with the proof that X̄ is an unbiased estimator of μ.

E[X̄]
= E[(1/n) Σ X_i]
= (1/n) Σ E[X_i]
= (1/n) Σ μ
= μ

Next, we prove that S² is an unbiased estimator of σ², which is slightly more involved.

E[S²]
= E[(1/(n-1)) Σ (X_i - X̄)²]
= (1/(n-1)) E[Σ (X_i - X̄)²]
= (1/(n-1)) E[Σ (X_i² - 2X_iX̄ + X̄²)]

Zooming in on the summation term:

Σ (X_i² - 2X_iX̄ + X̄²)
= Σ X_i² - Σ 2X_iX̄ + Σ X̄²
= Σ X_i² - 2X̄ Σ X_i + Σ X̄²
= Σ X_i² - 2X̄(nX̄) + nX̄²
= Σ X_i² - nX̄²

Applying the expectation operator to the above expression:

E[Σ X_i² - nX̄²]
= E[Σ X_i²] - nE[X̄²]
= Σ E[X_i²] - nE[X̄²]
= Σ (Var(X_i) + E[X_i]²) - n(Var(X̄) + E[X̄]²)
= Σ (σ² + μ²) - n(σ²/n + μ²)
= nσ² + nμ² - σ² - nμ²
= nσ² - σ²
= (n-1)σ²

Zoom back out:

E[S²]
= (1/(n-1)) E[Σ (X_i² - nX̄²)]
= (1/(n-1)) ((n-1)σ²)
= σ² ∎

Point Estimators

X̄ and S² are called estimators. Estimators can be characterized by several general properties, including their bias, variance, MSE, efficiency, consistency, and sufficiency.

Say we have an estimator T(X) of some population parameter θ. The bias and variance of T is defined as

Bias(T) = E[T(X)] - θ
Var(T) = E[(T(X) - E[T(X)])²]

We say that T is unbiased if Bias(T) = 0, that is,

E[T(X)] = θ

The bias measures the average difference between the expected value of the estimator and the parameter. The variance measures the average squared difference between the values produced by the estimator and the expected value of the estimator, which may or may not be biased. For instance, an estimator may produce estimates that are close to the expected value of the estimator (low variance), but the expected value of the estimator may be far from the parameter (high bias).

The mean squared error (MSE) of an estimator is defined as

MSE(T) = E[(T(X) - θ)²]

and it measures the average squared difference between the estimator and the parameter. In other words, MSE measures the "true" error of the estimator. In fact, we can decompose the MSE into the bias and variance of the estimator:

MSE(T)
= E[(T(X) - θ)²]
= E[(T(X) - E[T(X)] + E[T(X)] - θ)²]

Expanding the inner term:

(T(x) - E[T(X)] + E[T(X)] - θ)²
= (T(x) - E[T(X)])² + 2(T(x) - E[T(X)])(E[T(X)] - θ) + (E[T(X)] - θ)²

Applying the expectation operator to each term:

E[(T(X) - E[T(X)])²] = Var(T)
E[2(T(X) - E[T(X)])(E[T(X)] - θ)] = 0
E[(E[T(X)] - θ)²] = Bias(T)²

The middle term evaluates to 0 because:

E[2(T(X) - E[T(X)])]
= 2(E[T(X)] - E[E[T(X)]])
= 2(E[T(X)] - E[T(X)])
= 2(0)
= 0

Thus,

MSE(T) = Var(T) + Bias(T)²

We can compare estimators by their MSEs. The smaller the MSE, the better the estimator, in that it produces estimates that are accurate and close to the true underlying parameter. Moreover, given two unbiased estimators for the same parameter, the one with the smaller variance is said to be more efficient, in that it "hones in" on the parameter more than the other estimator.

An estimator T_n(X₁, X₂, ..., X_n) is said to be consistent if

P(|T_n(X₁, X₂, ..., X_n) - θ| < ε) → 1

as n → ∞ for all ε > 0. In other words, as the sample size increases, the estimator produces estimates that are closer and closer to the true underlying parameter.

An estimator T_n(X₁, X₂, ..., X_n) is said to be sufficient if the sample X₁, X₂, ..., X_n contains all the information about the parameter θ. Formally,

f_X|T(x|T(x)=t;θ) = g(x|t)    (independent of θ)

Intuitively, given a statistic T(X), the distribution of X given T(X) no longer depends on the parameter θ. In other words, all information about the parameter θ is contained in the statistic T(X). As an example, say X~N(μ,1) and Y~N(μ,1), and we want to estimate μ. The following are two possible estimators for μ:

T₁ = X + Y
T₂ = X - Y

By the properties of the normal distribution, X + Y and X - Y are both normally distributed. More specifically,

T₁ = X + Y ~ N(2μ, 2)
T₂ = X - Y ~ N(0, 2)

However, note that T₂ no longer contains any information about μ. Thus, we say that T₂ is not sufficient for μ. In other words, T₂ no longer produces estimates that are close to μ:

E[X - Y]
= E[X] - E[Y]
= μ - μ
= 0

On the other hand, T₁ is sufficient for μ, and we can use T₁ to estimate μ, by taking the mean of X and Y:

E[(X + Y)/2]
= (1/2)E[X + Y]
= (1/2)(E[X] + E[Y])
= (1/2)(μ + μ)
= μ