Regression analysis(distribution , confidence interval and statistical hypothesis testing)

15.3: Regression analysis(distribution , confidence interval and statistical hypothesis testing)

As mentioned in Problem 15.3 ( regression analysis), consider the measurement ${\mathsf M}_{L^\infty (\Omega_{0} \times {\mathbb R}_+)}( {\mathsf O} \equiv (X(={\mathbb R}^n), {\cal F}, F) ,$ $ S_{[(\beta_0,\beta_1, \sigma)]} {} )$

For each $(\beta, \sigma) \in {\mathbb R}^2 \times {\mathbb R}_+$, define the sample probability space $(X, {\mathcal F}, P_{(\beta, \sigma)} )$, where

\begin{align} P_{(\beta, \sigma)}(\Xi ) = [F(\Xi)](\beta_0,\beta_1, \sigma) \qquad (\forall \Xi \in {\mathcal F} ) \end{align}

Define $L^2(X, P_{(\beta, \sigma)})$ (or in short,$L^2(X)$) by

\begin{align} L^2(X)= \{\mbox{measurable function $f:X \to {\mathbb R}$} \;\;|\;\ [\int_X |f(x)|^2 P_{(\beta, \sigma)}(dx)]^{1/2 } < \infty \}. \tag{15.25} \end{align}

Furthermore, for each $f, g \in L^2(X)$, define $E(f)$ and $V(f)$ such that

\begin{align} & E(f)= \int_X f(x) P_{(\beta, \sigma)} (dx), \quad V(f)=\int_X |f(x) -E(f)|^2 P_{(\beta, \sigma)} (dx). \tag{15.26} \end{align}

Our main assertion is to mention Problem 15.3 (i.e., regression analysis in quantum language). This section should be regarded as an easy consequence of Problem 15.3 ( regression analysis). For the detailed proof of Lemma 15.5, see standard books of statistics.

Lemma 15.5 Consider the measurement ${\mathsf M}_{L^\infty(\Omega_{0} \times {\mathbb R}_+)}( {\mathsf O} \equiv (X, {\cal F}, F) , S_{[(\beta_0,\beta_1, \sigma)]} {} )$ in Problem 15.3 ( regression analysis). And assume the above notations. Then, we see:

$(A_1):$

$ \mbox{(1): } V(\hat{\beta}_0)= \frac{\sigma^2}{n}(1+ \frac{\overline{a}^2}{s_{aa}}), \qquad \mbox{(2): } V(\hat{\beta}_1)= \frac{\sigma^2}{n} \frac{1}{s_{aa}}, $

$(A_2):$

[Studentization]. Motivated by the (A$_1$), we see: \begin{align} & T_{\beta_0} := \frac{\sqrt{n}(\hat{\beta}_0-{\beta}_0)} {\sqrt{ {\hat{\sigma}^2(1+ \overline{a}^2/ s_{aa})}}} \sim t_{n-2}, \qquad T_{\beta_1} := \frac{\sqrt{n}(\hat{\beta}_1-{\beta}_1)} {\sqrt{ {\hat{\sigma}^2/ s_{aa}}}} \sim t_{n-2} \tag{15.27} \end{align} where $t_{n-2}$ is the student's distribution with $n-2$ degrees of freedom.

$\square \quad$

Let ${\mathsf M}_{L^\infty(\Omega_{0}(={\mathbb R}^2) \times {\mathbb R}_+)}( {\mathsf O} \equiv (X(={\mathbb R}^n), {\cal F}, F) , S_{[(\beta_0,\beta_1, \sigma)]} {} )$ be the measurement in Problem 15.3 ( regression analysis). For each $k=0,1$, define the estimator ${\widehat{E}}_k:X(={\mathbb R}^n) \to {\Theta_k}(={\mathbb R})$ and the quantity $\pi_k: \Omega(={\mathbb R}^2 \times {\mathbb R}_+) \to {\Theta_k}(={\mathbb R})$ as follows.

\begin{align} & {\widehat{E}}_0 ( x) (=\hat{\beta}_0(x)) = \overline{x}- \frac{s_{ax}}{s_{aa}} \overline{a}, \quad {\widehat{E}}_1 ( x) (=\hat{\beta}_1(x)) = \frac{s_{ax}}{s_{aa}} , \quad \pi_0 (\beta_0, \beta_1, \sigma ) = \beta_0. \quad \pi_1 (\beta_0, \beta_1, \sigma ) = \beta_1, \tag{15.28} \\ & \qquad \qquad \qquad ( \forall (\beta_0, \beta_1, \sigma ) \in {\mathbb R}^2 \times {\mathbb R}_+ ) \nonumber \end{align}

Let $\alpha$ be a real number such that $0 < \alpha \ll 1$, for example, $\alpha = 0.05$. For any state $ \omega =( \beta, \sigma ) (\in \Omega ={\mathbb R}^2 \times {\mathbb R}_+)$, define the positive number $\eta^\alpha_{\omega, k}$ $(> 0)$ by (6.9), (6.15), that is,

\begin{align} \eta^\alpha_{\omega, k} (=\delta_{\omega, k }^{1-\alpha} ) & = \inf \{ \eta > 0: [F(\{ x \in X \;:\; d^x_{\Theta_k} ( {\widehat{E}_k}(x) , \pi_k( \omega ) ) \ge \eta \} )](\omega ) \le \alpha \} \tag{15.29} \end{align}

where, for each $\theta_k^0, \theta_k^1 (\in \Theta_k )$, the semi-distance $d_{\Theta_k}^x$ in $\Theta_k$ is defined by

\begin{align} d^x_{\Theta_k}(\theta_k^0,\theta_k^1) = \left\{\begin{array}{ll} \frac{\sqrt{n}| \theta_0^0-\theta_0^1 |} {\sqrt{ {\hat{\sigma}^2(1+ \overline{a}^2/ s_{aa})}}} \quad & (\mbox{if }k=0) \\ \\ \frac{\sqrt{n} | \theta_1^0-\theta_1^1 | } {\sqrt{ {\hat{\sigma}^2/ s_{aa}}}} \quad & (\mbox{if }k=1) \end{array}\right. \tag{15.30} \end{align}

Therefore, we see, by Lemma 15.5, that

\begin{align} \eta^\alpha_{\omega, k} & = \left\{\begin{array}{ll} \inf \{ \eta > 0: [F(\{ x \in X \;:\; \frac{\sqrt{n}| \hat{\beta}_0(x) - \beta_0 |} {\sqrt{ {\hat{\sigma}^2(1+ \overline{a}^2/ s_{aa})}}} \ge \eta \} )](\omega ) \le \alpha \} \quad & (\mbox{if }k=0) \\ \\ \inf \{ \eta > 0: [F(\{ x \in X \;:\; \frac{\sqrt{n}|\hat{\beta}_1(x)-{\beta}_1|} {\sqrt{ {\hat{\sigma}^2(x)/ s_{aa}}}} \ge \eta \} )](\omega ) \le \alpha \}\quad & (\mbox{if }k=1) \end{array}\right. \tag{15.31} \\ & = t_{n-2}(\alpha/2) \tag{15.32} \end{align}

Summing up the above arguments, we have the following proposition:

Proposition 15.6 [confidence interval] Assume that a measured value $x \in X$ is obtained by the measurement ${\mathsf M}_{L^\infty(\Omega_{0} \times {\mathbb R}_+)}( {\mathsf O} \equiv (X, {\cal F}, F) , S_{[(\beta_0,\beta_1, \sigma)]} {} )$. Here, the state $(\beta_0,\beta_1, \sigma)$ is assumed to be unknown. Then, we have the $(1- \alpha)$-confidence interval $I_{x,k}^{1- \alpha}$ in Corollary 6.6 as follows.

\begin{align} & I_{x,k}^{1- \alpha} = \{ \pi_k(\omega) (\in \Theta_k) : d^x_{\Theta_k} ({\widehat{E}_k}(x), \pi_k(\omega ) ) < \eta^{1- \alpha}_{\omega, k } \} \nonumber \\ \nonumber \\ & = \left\{\begin{array}{ll} I_{x,0}^{1- \alpha} = \Big\{ \beta_0 = \pi_0(\omega) (\in {\Theta_0}) \;:\; \frac{ |\hat{\beta}_0 (x) -{\beta}_0| }{ {\sqrt{ {\frac{\hat{\sigma}^2(x)}{n}(1+ \overline{a}^2/ s_{aa})}}} } \le t_{n-2}(\alpha/2) \Big\} \quad & (\mbox{if }k=0) \\ \\ I_{x,1}^{1- \alpha} = \Big\{ \beta_1 = \pi_1(\omega) (\in {\Theta_1}) : \frac{ |\hat{\beta}_1 (x) -{\beta}_1| }{ {\sqrt{ {\frac{\hat{\sigma}^2(x)}{n}(1/ s_{aa})}}} } \le t_{n-2}(\alpha/2) \Big\} \quad & (\mbox{if }k=1) \end{array}\right. \tag{15.33} \end{align}

Proposition 15.7 [Statistical hypothesis testing] Consider the measurement ${\mathsf M}_{L^\infty(\Omega_{0} \times {\mathbb R}_+)}( {\mathsf O} \equiv (X, {\cal F}, F) , S_{[(\beta_0,\beta_1, \sigma)]} {} )$. Here, the state $(\beta_0,\beta_1, \sigma)$ is assumed to be unknown. Then, according to Corollary 6.6, we say:

$(B_1):$

Assume the null hypothesis $H_{N} = { \{ \beta_0 \}} (\subseteq \Theta_0={\mathbb R})$. Then, the rejection region is as follows: \begin{align} {\widehat R}_{{H_N}}^{\alpha; X} & = {\widehat{E}_0}^{-1}( {\widehat R}_{{H_N}}^{\alpha; {\Theta_0}}) = \bigcap_{\omega \in \Omega \mbox{ such that } \pi_0(\omega) \in {H_N}} \{ x (\in X) : d^x_{\Theta_0} ({\widehat{E}_0}(x), \pi_0(\omega ) ) \ge \eta^\alpha_{\omega } \} \nonumber \\ & = \Big\{ x \in X \;:\; \frac{ |\hat{\beta}_0 (x) -{\beta}_0| }{ {\sqrt{ {\frac{\hat{\sigma}^2(x)}{n}(1+ \overline{a}^2/ s_{aa})}}} } \ge t_{n-2}(\alpha/2) \Big\} \tag{15.34} \end{align}

$(B_2):$

Assume the null hypothesis $H_N = { \{ \beta_1 \}} (\subseteq \Theta_1={\mathbb R})$. Then, the rejection region is as follows:

\begin{align} {\widehat R}_{{H_N}}^{\alpha; X} & = {\widehat{E}_1}^{-1}( {\widehat R}_{{H_N}}^{\alpha; {\Theta_1}}) = \bigcap_{\omega \in \Omega \mbox{ such that } \pi_1(\omega) \in {H_N}} \{ x (\in X) : d^x_{\Theta_1} ({\widehat{E}_1}(x), \pi_1(\omega ) ) \ge \eta^\alpha_{\omega } \} \nonumber \\ & = \Big\{ x \in X \;:\; \frac{ |\hat{\beta}_1 (x) -{\beta}_1| }{ {\sqrt{ {\frac{\hat{\sigma}^2(x)}{n}(1/ s_{aa})}}} } \ge t_{n-2}(\alpha/2) \Big\} \tag{15.35} \end{align}