1. Introduction
We assume that the data we are looking at comes from a probability distribution with some unknown parameters that control the exact shape of the distribution.
Definition 1 Statistical Inference: It is the process of using given data to infer the properties of the distribution (for example the values of the parameters) which generated the data. It is also called ‘learning’ in computer science.
Definition 2 Statistical Models: A statistical model is a set of distributions.
When we find out the form of the distribution (the equations that describe it) and the parameters used in the form we gain more understanding of the source of our data.
2. Parametric Models
Definition 3 Parametric Models: A parametric model is a statistical model which is parameterized by a finite number of parameters. A general form of a parametric model is
where
is an unknown parameter (or vector of parameters) that can take values in the parameter space
.
Example 1 An example of a parametric model is:
3. Non-Parametric Models
Definition 4 Non – Parametric Models: A non-parametric model is one in which
cannot be parameterized by a finite number of parameters.
3.1. Non-Paramateric Estimation of Functionals
Definition 5 Sobolev Space: Usually, it is not possible to estimate the probability distribution from the data by just assuming that it exists. We need to restrict the space of possible solutions. One way is to assume that the density function is a smooth function. The restricted space is called Sobolev Space.
Definition 6 Statistical Functional: Any function of \textsc{cdf}
is called a statistical functional.
Example 2 Statistical Functionals: The mean, variance and median can be thought of as functions of
:
The mean
is given as:
The variance is given as:
The median is given as:
4. Regression
Definition 7 Independent and Dependent Variables: We observe pairs of data:
.
is assumed to depend on
which is assumed to be the independent variable. The other names for these are, for:
: predictor, regressor, feature or independent variable.
: response variable, outcome or dependent variable.
Definition 8 Regression Function: The regression function is
Definition 9 Parametric and Non-Parametric Regression Models: If we assume that
is finite dimensional, then the model is a parametric regression model, otherwise it is a non-parametric regression model.
There can be three categories of regression, based on the purpose for which it was done:
- Prediction,
- Classification and
- Curve Estimation
Definition 10 Prediction: The goal of predicting
based on the value of
is called prediction.
Definition 11 Classification: If
is discrete then prediction is instead called classification.
Definition 12 Curve Estimation: If our goal is to estimate the function
, then we call this regression or curve estimation.
The regression function can be algebraically manipulated to express it in the form
where .
If is a parametric model, then we write
to denote the probability that X belongs to A. It does not mean that we are averaging over
, it means that the probability is calculated assuming the parameter is
.
5. Fundamental Concepts in Inference
Many inferential problems can be identified as being one of three types: estimation, confidence sets, or hypothesis testing.
5.1. Point Estimates
Definition 13 Point Estimation: Point estimation refers to providing a single “best guess” of some quantity of interest. The quantity of interest could be
- a parameter in a parametric model,
- a \textsc{cdf}
,
- a probability density function
,
- a regression function
, or
- a prediction for a future value
of some random variable.
By convention, we denote a point estimate of . Since
is a fixed, unknown quantity, the estimate
depends on the data so
is a random.
Definition 14 Point Estimator of
: Formally, let
be n \textsc{iid} data points from some distribution
. Then, a point estimator
of
is some function of
:
Definition 15 Bias of an Estimator: The bias of an estimator is defined as:
Definition 16 Consistent Estimator: A point estimator
of
is consistent if:
.
Definition 17 Sampling Distribution: The distribution of
is called sampling distribution.
Definition 18 Standard Error: The standard deviation of the sampling distribution is called standard error denoted by \textsf{se}.
In some cases, \textsf{se} depends upon the unkown distribution
. Its estimate is denoted by
.
Definition 19 Mean Squared Error: It is used to evaluate the quality of a point estimator. It is defined as
Example 3 Let
be \textsc{iid} random variables with Bernoulli distribution. Then
. Then,
. Hence,
is unbiased.
Definition 20 Asymptotically Normal Estimator: An estimator is asymptotically normal if
5.2. Confidence Sets
Definition 21 A
confidence interval for a parameter
is an interval
(where
and
are functions of the data), such that
In words, traps
with probability
. We call
the coverage of the confidence interval.
is random and
is fixed. Commonly, people use 95 percent confidence intervals, which corresponds to choosing
= 0.05. If
is a vector then we use a confidence set (such as a sphere or an ellipse) instead of an interval.
Theorem 22 (Normal Based Confidence Intervals)
Let
.
and let
be the \textsc{cdf} of a random variable
with standard normal distribution and
and let
Then,
For 95% confidence intervals is .95,
is .05,
is 1.96 and the interval is thus
.
5.3. Hypothesis Testing
In hypothesis testing, we start with some default theory – called a null hypothesis – and we ask if the data provide sufficient evidence to reject the theory. If not we retain the null hypothesis.