Discriminability¶

One Sample Test¶

class hyppo.discrim.DiscrimOneSample(is_dist=False, remove_isolates=True)[source]¶

A class that performs a one sample test of discriminability.

Discriminability index is a measure of whether a data acquisition and preprocessing pipeline is more discriminable among different subjects. The key insight is that each repeated mesurements of the same item should be the more similar to one another than measurements between different items. The one sample test measures whether the discriminability for a dataset differs from random chance. More details are in [1].

Parameters:	is_dist : bool, optional (default: False) Whether x1 and x2 are distance matrices or not. remove_isolates : bool, optional (default: True) Whether to remove the measurements with a single instance or not.

See also

DiscrimTwoSample: Two sample test for discriminability of a two different measurements.

Notes

With \(D_x\) as the sample discriminability of \(x\), one sample test performs the following test,

\[\begin{split}H_0: D_x &= D_0 \\ H_A: D_x &> D_0\end{split}\]

where \(D_0\) is the discriminability that would be observed by random chance.

References

[1]	(1, 2) Eric W. Bridgeford, et al. "Optimal Decisions for Reference Pipelines and Datasets: Applications in Connectomics." Bioarxiv (2019).

test(self, x, y, reps=1000, workers=-1)[source]¶

Calculates the test statistic and p-value for Discriminability one sample test.

Parameters:

x : ndarray: Input data matrices. x must have shape (n, p) n is the number of samples and p are the number of dimensions. Alternatively, x can be distance matrices, where the shape must be (n, n), and is_dist must set to True in this case.
y : ndarray: A vector containing the sample ids for our \(n\) samples.
reps : int, optional (default: 1000): The number of replications used to estimate the null distribution when using the permutation test used to calculate the p-value.
workers : int, optional (default: -1): The number of cores to parallelize the p-value computation over. Supply -1 to use all cores available to the Process.

Returns:

stat : float: The computed discriminability statistic.
pvalue : float: The computed one sample test p-value.

Examples

>>> import numpy as np
>>> from hyppo.discrim import DiscrimOneSample
>>> x = np.concatenate([np.zeros((50, 2)), np.ones((50, 2))], axis=0)
>>> y = np.concatenate([np.zeros(50), np.ones(50)], axis=0)
>>> '%.1f, %.2f' % DiscrimOneSample().test(x, y) # doctest: +SKIP
'1.0, 0.00'

Two Sample Test¶

class hyppo.discrim.DiscrimTwoSample(is_dist=False, remove_isolates=True)[source]¶

A class that compares the discriminability of two datasets.

Two sample test measures whether the discriminability is different for one dataset compared to another. More details can be described in [1].

Parameters:	is_dist : bool, optional (default: False) Whether x1 and x2 are distance matrices or not. remove_isolates : bool, optional (default: True) Whether to remove the measurements with a single instance or not.

See also

DiscrimOneSample: One sample test for discriminability of a single measurement

Notes

Let \(\hat D_{x_1}\) denote the sample discriminability of one approach, and \(\hat D_{x_2}\) denote the sample discriminability of another approach. Then,

\[\begin{split}H_0: D_{x_1} &= D_{x_2} \\ H_A: D_{x_1} &> D_{x_2}\end{split}\]

Alternatively, tests can be done for \(D_{x_1} < D_{x_2}\) and \(D_{x_1} \neq D_{x_2}\).

test(self, x1, x2, y, reps=1000, alt='neq', workers=-1)[source]¶

Calculates the test statistic and p-value for a two sample test for discriminability.

Parameters:

x1, x2 : ndarray: Input data matrices. x1 and x2 must have the same number of samples. That is, the shapes must be (n, p) and (n, q) where n is the number of samples and p and q are the number of dimensions. Alternatively, x1 and x2 can be distance matrices, where the shapes must both be (n, n), and is_dist must set to True in this case.
y : ndarray: A vector containing the sample ids for our n samples. Should be matched to the inputs such that y[i] is the corresponding label for x_1[i, :] and x_2[i, :].
reps : int, optional (default: 1000): The number of replications used to estimate the null distribution when using the permutation test used to calculate the p-value.
alt : {"greater", "less", "neq"} (default: "neq"): The alternative hypothesis for the test. Can test that first dataset is more discriminable (alt = "greater"), less discriminable (alt = "less") or unequal discriminability (alt = "neq").
workers : int, optional (default: -1): The number of cores to parallelize the p-value computation over. Supply -1 to use all cores available to the Process.

Returns:

d1 : float: The computed discriminability score for x1.
d2 : float: The computed discriminability score for x2.
pvalue : float: The computed two sample test p-value.

Examples

>>> import numpy as np
>>> from hyppo.discrim import DiscrimTwoSample
>>> x1 = np.ones((100,2), dtype=float)
>>> x2 = np.concatenate([np.zeros((50, 2)), np.ones((50, 2))], axis=0)
>>> y = np.concatenate([np.zeros(50), np.ones(50)], axis=0)
>>> '%.1f, %.1f, %.2f' % DiscrimTwoSample().test(x1, x2, y) # doctest: +SKIP
'0.5, 1.0, 0.00'