Introduction
In this note, we look into the problem of estimating various statistics such as mean, quantiles of an unknown distribution from its sample.
More concretely, consider a univariate continuous random variable \(X\) following an unknown distribution with probability density function (pdf) \(p_X (x)\). Given a sample set \(\{x_i\}_{i=1}^n\) drawn i.i.d from the distribution, estimate:
- The distribution mean (assume existence) \(\mu = \mathbb{E}(X)\)
- The distribution quantiles (e.g., median), i.e., for a \(\tau \in (0,1)\), roughly speaking the smallest value \(q_{\tau}\) such that \(\textrm{Prob}(X \leq q_\tau) = \tau\).
Disclaimer: I've tried to keep all the math below valid yet as simple as possible. Feel free to correct me if you find anything horribly wrong.
The mean
For an arbitrary \(m \in \mathbb{R}\), denote \(\mathcal{L}(m) = \int (x-m)^2~p(x)~dx\).
Claim 1: \(m^{*} = \underset{m}{\textrm{argmin}}~\mathcal{L}(m) = \mu\).
Proof sketch:
Sample estimate
Since \(p(x)\) is unknown, we instead minimize an approximate of the \(\mathcal{L}(m)\) using the given sample,
and obtain an estimate of the mean:
The median
Let \(F(x)\) denote the cumulative distribution, i.e., \(F(x) = \textrm{Prob}(X \leq x)\).
A median or \(q_{0.5}\) is roughly the point which divides the distribution in half, i.e.,
For an arbitrary \(m \in \mathbb{R}\), denote
Claim 2: \(m^{*} = \underset{m}{\textrm{argmin}}~\mathcal{L}_{0.5}(m) = q_{0.5}\).
See the next section for a proof sketch of a general case of any quantile \(\tau \in (0,1)\).
Sample estimate
Similar to the mean estimate, we can obtain an estimate of the median from the given sample,
The \(\tau\)-quantile
For an arbitrary \(m \in \mathbb{R}\), denote
where
Note that \((x-m)(\tau - 1_{x < m})\) is nothing other than the quantile loss often used in quantile regression.
Claim 3: \(m^{*} = \underset{m}{\textrm{argmin}}~\mathcal{L}_\tau (m) = q_{\tau}\)
Derivation of the quantile loss
Curious readers may wonder how to come up with the above formula of \(\mathcal{L}_{\tau}(m)\). Let's reverse-engineer the quantile loss.
Given arbitrary \(0 < c_1, c_2 \in \mathbb{R}\), consider a generalized form of the median loss \(\mathcal{L}_{0.5}\),
Intuitively, we assign different costs to \(x\) depending on its value relative to \(m\). The problem is for a given \(\tau \in (0,1)\) find \(c_1, c_2\) such that
Take a variable change \(u=F(x)\) then \(du = dF(x) = p(x)~dx\) and \(F(m) = v\) for some \(v \in [0,1]\). Also, \(x = F^{-1}(u)\). Note that we slightly abuse the notation here using \(\mathcal{L}_{c_1, c_2}(v)\) as the loss after the variable change to avoid introducing new notation.
From the Leibniz integral rule and the product rule,
For \(q_\tau\) to be the minimizer of \(\mathcal{L}_{c_1, c_2}(m)\), or equivalently, \(\tau\) is the minimizer of \(\mathcal{L}_{c_1, c_2}(v)\),
and since \(\frac{d}{dv}F^{-1}(v) \neq 0\) in general, this essentially implies
Note that the loss minimizer doesn't change if we scale \(c_1, c_2\) with the same rate, so without the loss of generality, we can set \(c_1 = 1\). Thus solving for \(c_2\), we obtain \(c_2 = \frac{1-\tau}{\tau}\). Substitute \(c_1 = 1, c_2 = \frac{1-\tau}{\tau}\) into the original loss:
Since \(\tau > 0\), minimizing \(\mathcal{L}_{c_1, c_2}(m)\) is equivalent to minimizing the following:
which is the same as
Why \(q_\tau\) is the minimizer of \(\mathcal{L}_{\tau}(m)\)?
As shown above, \(\tau\) is a critical point of \(\mathcal{L}_{\tau}(v)\),
From \((*)\) with \(c_1 = \tau, c_2 = 1 - \tau\), we have \(\frac{d}{dv}\mathcal{L}_{\tau}(v) = \frac{d}{dv}F^{-1}(v) (v - \tau)\). Since \(F^{-1}\) is a non-decreasing function, i.e., for any \(0 \leq v_1 \leq v_2 \leq 1\), \(F^{-1}(v_1) \leq F^{-1}(v_2)\); so \(\frac{d}{dv}F^{-1}(v) > 0\). Thus, \(\frac{d}{dv}\mathcal{L}_{\tau}(v)\) has the same sign as \(v - \tau\):
- For any \(v \geq \tau\), \(\frac{d}{dv}\mathcal{L}_{\tau}(v) \geq 0\); hence \(\mathcal{L}_{\tau}(v) \geq \mathcal{L}_{\tau}(\tau)\).
- For any \(v \leq \tau\), \(\frac{d}{dv}\mathcal{L}_{\tau}(v) \leq 0\); hence \(\mathcal{L}_{\tau}(v) \geq \mathcal{L}_{\tau}(\tau)\).
For any \(v \in [0,1]\), \(\mathcal{L}_{\tau}(v) \geq \mathcal{L}_{\tau}(\tau)\). Thus \(\tau\) is a minimizer of \(\mathcal{L}_{\tau}(v)\), or equivalently \(q_\tau\) is a minimizer of \(\mathcal{L}_{\tau}(m)\).
Sample estimate
Finally, we can obtain an estimate of the quantile as: