There are two methods of estimating .
1. Method of Moments
It is a method of generating parametric estimators. These estimators are not optimal but they are easy to compute. They are also used to generate starting values for other numerical parametric estimation methods.
Definition 1 Moments and Sample Moments:
Suppose that the parameter
has
components:
. For
,
Define
moment as
Define
sample moment as
Definition 2
The method of moments estimator
is the value of
which satisfies
Why The Above Method Works: The method of moments estimator is obtained by equating the moment with the
sample moment. Since there are k of them we get k equations in k unknowns (the unknowns are the k parameters). This works because we can find out the
moments in terms of the unknown parameters and we can find the
sample moments numerically, since we know the sample values.
2. Maximum Likelihood Method
It is the most common method for estimating parameters in a parametric model.
Definition 3 Likelihood Function: Let
have a \textsc{pdf}
. The likelihood function is defined as
The log-likelihood fucntion is defined as
.
The likelihood function is the joint density of the data. We treat it as a function of the parameter . Thus
.
Definition 4 Maximum Likelihood Estimator: It is the value of
,
which maximizes the likelihood function.
Example 1 Let
be \textsc{iid} random variables with
distribution.
If
and
, then
. Otherwise
which is a decreasing function of
. Hence
.
3. Properties of MLE
4. Consistency of MLE
Definition 5 Kullback Leibler Distance:
Ifand
are \textsc{pdf}, the Kullback Leibler distance between them is defined as
5. Equivariance of MLE
6. Asymptotic Normality of MLE
The distribution of is asymptotically normal. We need the following definitions to prove it.
Definition 6 Score Function: Let
be a random variable with \textsc{pdf}
. Then the score function is defined as
Definition 7 Fisher Information: The Fisher Information is defined as
Theorem 8
Theorem 9
Theorem 10
Theorem 11
Definition 12 Let
.
Theorem 13
Theorem 14
Theorem 15 Let
.
In words, traps
with probability
. We call
the coverage of the confidence interval.
is random and
is fixed. Commonly, people use 95 percent confidence intervals, which corresponds to choosing
= 0.05. If
is a vector then we use a confidence set (such as a sphere or an ellipse) instead of an interval.
Theorem 16 (Normal Based Confidence Intervals)
Let
.
and let
be the \textsc{cdf} of a random variable
with standard normal distribution and
and let
Then,
For 95% confidence intervals is .95,
is .05,
is 1.96 and the interval is thus
.