Introduction

In this note, we look into the problem of estimating various statistics such as mean, quantiles of an unknown distribution from its sample.

More concretely, consider a univariate continuous random variable $X$ following an unknown distribution with probability density function (pdf) $p_X (x)$. Given a sample set $\{x_i\}_{i=1}^n$ drawn i.i.d from the distribution, estimate:

The distribution mean (assume existence) $\mu = \mathbb{E}(X)$
The distribution quantiles (e.g., median), i.e., for a $\tau \in (0,1)$, roughly speaking the smallest value $q_{\tau}$ such that $\textrm{Prob}(X \leq q_\tau) = \tau$.

Disclaimer: I've tried to keep all the math below valid yet as simple as possible. Feel free to correct me if you find anything horribly wrong.

The mean

For an arbitrary $m \in \mathbb{R}$, denote $\mathcal{L}(m) = \int (x-m)^2~p(x)~dx$.

Claim 1: $m^{*} = \underset{m}{\textrm{argmin}}~\mathcal{L}(m) = \mu$.

Proof sketch:

$$ \begin{array}{rcl} \underset{m}{\textrm{argmin}}~\mathcal{L}(m) &=& \underset{m}{\textrm{argmin}} \left [\int m^2~p(x)~dx - \int 2mx~p(x)~dx \right ] \\ &=& \underset{m}{\textrm{argmin}} \left [m^2 - 2m\int x~p(x)~dx \right] \\ &=& \underset{m}{\textrm{argmin}}~\left (m - \int x~p(x)~dx\right )^2 = \int x~p(x)~dx = \mu \end{array} $$

Sample estimate

Since $p(x)$ is unknown, we instead minimize an approximate of the $\mathcal{L}(m)$ using the given sample,

$$\hat{\mathcal{L}}(m) = \frac{1}{n} \sum_{i=1}^n (x_i - m)^2$$

and obtain an estimate of the mean:

$$\hat{\mu} = \frac{1}{n} \sum_{i=1}^n~x_i = \underset{m}{\textrm{argmin}}~\hat{\mathcal{L}}(m)$$

The median

Let $F(x)$ denote the cumulative distribution, i.e., $F(x) = \textrm{Prob}(X \leq x)$.

A median or $q_{0.5}$ is roughly the point which divides the distribution in half, i.e.,

$$F(q_{0.5}) = 0.5$$

For an arbitrary $m \in \mathbb{R}$, denote

$$\mathcal{L}_{0.5} (m) = \int |x-m|~p(x)~dx$$

Claim 2: $m^{*} = \underset{m}{\textrm{argmin}}~\mathcal{L}_{0.5}(m) = q_{0.5}$.

See the next section for a proof sketch of a general case of any quantile $\tau \in (0,1)$.

Sample estimate

Similar to the mean estimate, we can obtain an estimate of the median from the given sample,

$$\hat{q}_{0.5} = \underset{m}{\textrm{argmin}}~\frac{1}{n} \sum_{i=1}^n |x_i-m| = \textrm{median}(\{x_i\}_{i=1}^n)$$

The $\tau$-quantile

For an arbitrary $m \in \mathbb{R}$, denote

$$\mathcal{L}_{\tau}(m) = \int (x-m)(\tau - 1_{x < m})~p(x)~dx$$

where

$$ 1_{x < m} = \left\{\begin{array}{ll} 1 & \text{if}~x < m \\ 0 & \text{otherwise} \end{array}\right. $$

Note that $(x-m)(\tau - 1_{x < m})$ is nothing other than the quantile loss often used in quantile regression.

Claim 3: $m^{*} = \underset{m}{\textrm{argmin}}~\mathcal{L}_\tau (m) = q_{\tau}$

Derivation of the quantile loss

Curious readers may wonder how to come up with the above formula of $\mathcal{L}_{\tau}(m)$. Let's reverse-engineer the quantile loss.

Given arbitrary $0 < c_1, c_2 \in \mathbb{R}$, consider a generalized form of the median loss $\mathcal{L}_{0.5}$,

$$\mathcal{L}_{c_1, c_2}(m) = c_1 \underset{x \geq m}{\int} (x-m)~p(x)~dx + c_2 \underset{x < m}{\int} (m-x)~p(x)~dx$$

Intuitively, we assign different costs to $x$ depending on its value relative to $m$. The problem is for a given $\tau \in (0,1)$ find $c_1, c_2$ such that

$$\underset{m}{\textrm{argmin}}~\mathcal{L}_{c_1, c_2}(m) = q_\tau$$

Take a variable change $u=F(x)$ then $du = dF(x) = p(x)~dx$ and $F(m) = v$ for some $v \in [0,1]$. Also, $x = F^{-1}(u)$. Note that we slightly abuse the notation here using $\mathcal{L}_{c_1, c_2}(v)$ as the loss after the variable change to avoid introducing new notation.

$$ \begin{array}{lcl} \mathcal{L}_{c_1, c_2}(v) &=& c_1\int_v^1(F^{-1}(u)-F^{-1}(v))~du + c_2\int_0^v(F^{-1}(v)-F^{-1}(u))~du \\ &=& c_1\int_v^1 F^{-1}(u)~du - c_1 F^{-1}(v) (1-v) + c_2 F^{-1}(v) v - c_2\int_0^v F^{-1}(u)~du \end{array} $$

From the Leibniz integral rule and the product rule,

$$ \begin{array}{lcl} \frac{d}{dv}\mathcal{L}_{c_1, c_2}(v) & = & -c_1 F^{-1}(v) + c_1 F^{-1}(v) - c_1 (1-v)\frac{d}{dv}F^{-1}(v) \\ & & +~c_2 F^{-1}(v) + c_2 v\frac{d}{dv}F^{-1}(v) - c_2 F^{-1}(v) \\ & = & \frac{d}{dv}F^{-1}(v) (-c_1 + c_1 v + c_2 v)~~(*) \end{array} $$

For $q_\tau$ to be the minimizer of $\mathcal{L}_{c_1, c_2}(m)$, or equivalently, $\tau$ is the minimizer of $\mathcal{L}_{c_1, c_2}(v)$,

$$\frac{d}{dv}\mathcal{L}_{c_1, c_2}(v)_{\mid v=\tau} = 0,$$

and since $\frac{d}{dv}F^{-1}(v) \neq 0$ in general, this essentially implies

$$c_1 \tau + c_2 \tau - c_1 = 0$$

Note that the loss minimizer doesn't change if we scale $c_1, c_2$ with the same rate, so without the loss of generality, we can set $c_1 = 1$. Thus solving for $c_2$, we obtain $c_2 = \frac{1-\tau}{\tau}$. Substitute $c_1 = 1, c_2 = \frac{1-\tau}{\tau}$ into the original loss:

$$\mathcal{L}_{c_1, c_2}(m) = \underset{x \geq m}{\int} (x-m)~p(x)~dx + \frac{1-\tau}{\tau}\underset{x < m}{\int} (m-x)~p(x)~dx$$

Since $\tau > 0$, minimizing $\mathcal{L}_{c_1, c_2}(m)$ is equivalent to minimizing the following:

$$\tau\underset{x \geq m}{\int} (x-m)~p(x)~dx + (1-\tau)\underset{x < m}{\int} (m-x)~p(x)~dx$$

which is the same as

$$\mathcal{L}_{\tau}(m) = \int (x-m)(\tau - 1_{x < m})~p(x)~dx$$

Why $q_\tau$ is the minimizer of $\mathcal{L}_{\tau}(m)$?

As shown above, $\tau$ is a critical point of $\mathcal{L}_{\tau}(v)$,

$$\frac{d}{dv}\mathcal{L}_{\tau}(v)_{\mid v=\tau} = 0$$

From $(*)$ with $c_1 = \tau, c_2 = 1 - \tau$, we have $\frac{d}{dv}\mathcal{L}_{\tau}(v) = \frac{d}{dv}F^{-1}(v) (v - \tau)$. Since $F^{-1}$ is a non-decreasing function, i.e., for any $0 \leq v_1 \leq v_2 \leq 1$, $F^{-1}(v_1) \leq F^{-1}(v_2)$; so $\frac{d}{dv}F^{-1}(v) > 0$. Thus, $\frac{d}{dv}\mathcal{L}_{\tau}(v)$ has the same sign as $v - \tau$:

For any $v \geq \tau$, $\frac{d}{dv}\mathcal{L}_{\tau}(v) \geq 0$; hence $\mathcal{L}_{\tau}(v) \geq \mathcal{L}_{\tau}(\tau)$.
For any $v \leq \tau$, $\frac{d}{dv}\mathcal{L}_{\tau}(v) \leq 0$; hence $\mathcal{L}_{\tau}(v) \geq \mathcal{L}_{\tau}(\tau)$.

For any $v \in [0,1]$, $\mathcal{L}_{\tau}(v) \geq \mathcal{L}_{\tau}(\tau)$. Thus $\tau$ is a minimizer of $\mathcal{L}_{\tau}(v)$, or equivalently $q_\tau$ is a minimizer of $\mathcal{L}_{\tau}(m)$.

Sample estimate

Finally, we can obtain an estimate of the quantile as:

$$\hat{q}_{\tau} = \underset{m}{\textrm{argmin}}~\frac{1}{n} \sum_{i=1}^n (x_i-m)(\tau - 1_{x_i < m})$$

Random Notes

Estimating Distribution Mean and Quantiles

Introduction

The mean

Sample estimate

The median

Sample estimate

The \(\tau\)-quantile

Derivation of the quantile loss

Why \(q_\tau\) is the minimizer of \(\mathcal{L}_{\tau}(m)\)?

Sample estimate