Comparing distributions in ⁿ

Consider two independent finite samples {x₁ⁱ}_i=1^k and {x₂ⁱ}_i=1^k of size k from distributions F₁ and F₂ in ⁿ. Their sample covariances at point q M, which we refer to as an observation point, are

∑k -→ -→ ˆΣ (q ) = 1- (qxi)(qxi)′, s = 1,2. s k s s i=1

Let Σ_s(q) and Ω

^n²×n² be the mean and covariance of Z = ( -→
qX

)(

)′, for X ~ F_s. Thus, we assume that the mean and covarince of the tensor valued trandom variable Z exist. As an application of the Central Limit Theorem we obtained the asymptotic

√-- k (ˆΣ1(q)ˆΣ -1(q) - In) ⇝ Nn ×n(0,2Ω ⊗ Σ -1(q)), 2

provided that Σ₁(q) = Σ₂(q) = Σ(q). Further application of delta method gives us

√-- -1 ′ T -1 ′ ξk(q;h) := k (h(ˆΣ1(q)ˆΣ 2 (q))- h (In )) ⇝ N (0,2[h(In)] (Ω ⊗Σ (q))[h (In)]),

(1)

for similarity invariant h which gradient h′(.) ^n² is continuous and does not vanish at I_n. For example, ξ_k(q; tr) = √ --
k (tr( ˆ
Σ ₁(q) ˆ
Σ ₂^-1(q)) - n) and ξ_k(q; det) = (det( ˆ
Σ ₁(q) ˆ
Σ ₂^-1(q)) - 1).

We apply bootstrapping method to utilize (1). For k′ < k, let y_m,m = 1,...,M be instances of statistic ξ_k(q; h) based on subsamples of size k′ of the initial k-samples. The observation point q is chosen to be the sample mean of the combined x₁ and x₂ samples. Then, according to (1), ξ = ∕s.e.(y) goes to N(0, 1) in distribution as k →∞.

Another hypothesis of interest compares the usual covariances, defined at the mean points H₂ : Σ₁(μ₁) = Σ₂(μ₂). The corresponding likelihood ratio statistics against the alternative H_a : Σ₁(μ₁)Σ₂(μ₂) is

|ˆΣ (ˆμ )|k∕2|Σˆ (ˆμ )|k∕2 λ = ---1--1------2--2----. |Σˆ1 (ˆμ1) + ˆΣ2(ˆμ2)|k

(2)

The exact distribution of λ is a product of independent Beta-distributions but can be approximated by chi-squared ones.

Comparing the peformance of ξ and λ statistics

The following applet simulates distributions from different families and calculates ξ and λ statistics. The results are reported in term of p-values, Xi-pval for ξ and L-pval for λ. The user can choose the dimension n, sample size k and whether the two samples are equally distributed or not. Sub-sample size is fixed to k′ = k∕2 and M = k∕4. Possible choices for function h are the trace, determinant and h(A) = tr(log(A)). Shown are the two k-samples in red and blue and the observation point in green.

Return home

Comparing distributions in n

Comparing the peformance of ξ and λ statistics

Comparing distributions in ⁿ