utils
¶SVD (X, \*\*kwargs) |
Robust SVD decomposition. | ||
absmax (arr) |
Find the absolute maximum of an array. | ||
analytic_expected_correlation (noise_level) |
Expected correlation coefficient of simulated i.i.d. | ||
columnwise_correlation (ypred, y[, zscorea, …]) |
Compute the correlation coefficients | ||
columnwise_rsquared (ypred, y, \*\*kwargs) |
Compute the R2 | ||
delay2slice (delay) |
Get a slicer for an array at a desired delay (e.g. | ||
delay_signal (mat[, delays, fill]) |
Create a temporally shifted version of the data | ||
determinant_normalizer (mat[, thresh]) |
Compute scalar to normalize covariance matrix determinant to 1. | ||
difference_operator (order, nobs) |
Get a finite difference operator matrix of size nobs. | ||
explainable_variance (repeats[, ncorrection, …]) |
Compute the explainable variance in the recorded signals. | ||
fast_indexing (a, rows[, cols]) |
Extract row and column entries from a 2D np.ndarray. | ||
generate_data ([n, p, v, noise, testsize, …]) |
Get some B,X,Y data generated from gaussian (0,1). | ||
generate_trnval_folds (N[, sampler, nchunks, …]) |
Split dataset into training and validation folds | ||
hrf_convolution (input_responses[, HRF, …]) |
Convolve a series of impulses in a matrix with a given HRF | ||
hrf_default_basis ([dt, duration]) |
Hemodynamic response function basis set. | ||
hyperopt_make_trial_data (tid, vals, loss) |
Generate a valid dictionary as a trial for hyperopt.Trials. | ||
hyperopt_make_trials (values, losses[, …]) |
|
||
isdiag (mat) |
Determine whether matrix is diagonal. | ||
mult_diag (d, mat[, left]) |
Efficient multiply a full matrix by a diagonal matrix. | ||
noise_ceiling_correction (repeats, yhat[, …]) |
Noise ceiling corrected correlation coefficient. |
tikreg.utils.
SVD
(X, **kwargs)¶Robust SVD decomposition.
First uses scipy.linalg.svd by default. If the SVD does not converge, it will use a slower more robust SVD algorithm (DGESVD).
See scipy.linalg.svd for full documentation.
Parameters: |
|
---|---|
Returns: |
|
tikreg.utils.
absmax
(arr)¶Find the absolute maximum of an array.
This is somewhat more efficient than e.g. np.nanmax(np.abs(arr))
Returns: |
|
---|
Examples
>>> arr = np.random.randn(20,20)
>>> maxval = absmax(arr)
>>> direct_maxval = np.nanmax(np.abs(arr))
>>> assert np.allclose(maxval, direct_maxval)
tikreg.utils.
analytic_expected_correlation
(noise_level)¶Expected correlation coefficient of simulated i.i.d. normal data.
Compute the expectation on the correlation coefficient given the amount of noise in the data. Assumes signal and noise are i.i.d. normal with a fixed noise_level.
Parameters: |
|
---|---|
Returns: |
|
Notes
Assumes both signal and noise are generated from a MVN distribution with zero-mean and the noise variance is determined by \(\sigma\) (noise_level).
If our model is perfect (i.e. \(\hat{y} = y\)), then the maximum correlation coefficient we can achieve is determined by \(\sigma\). Concretely:
where \(\rho\) is the correlation coefficient.
Examples
>>> nstim, noise_level = 100000, 1.0
>>> ydata = np.random.randn(nstim)
>>> noise = np.random.randn(nstim)*noise_level
>>> ypred = zscore(ydata) + noise
>>> empirical_correlation = columnwise_correlation(ydata, ypred)
>>> print(empirical_correlation) # doctest: +SKIP
0.7073279476259667
>>> # As nstim -> inf, this converges to the analytic solution
>>> expected_correlation = analytic_expected_correlation(noise_level)
>>> print(round(expected_correlation, 6))
0.707107
>>> assert np.allclose(expected_correlation, empirical_correlation, atol=1e-02)
tikreg.utils.
columnwise_correlation
(ypred, y, zscorea=True, zscoreb=True, axis=0)¶Compute the correlation coefficients
Predictions and actual responses are matrices whose columns correspond to the units sampled (e.g. voxels, neurons, etc).
Parameters: |
|
---|---|
Returns: |
|
Notes
Recall that the correlation cofficient is defined as
Since it is scale invariant, we can zscore and get the same
Examples
>>> x = np.random.randn(100,2)
>>> y = np.random.randn(100,2)
>>> cc = columnwise_correlation(x, y)
>>> cc.shape
(2,)
>>> c1 = np.corrcoef(x[:,0], y[:,0])[0,1]
>>> c2 = np.corrcoef(x[:,1], y[:,1])[0,1]
>>> assert np.allclose(cc, np.r_[c1, c2])
tikreg.utils.
columnwise_rsquared
(ypred, y, **kwargs)¶Compute the R2
Predictions and actual responses are matrices whose columns correspond to the units sampled (e.g. voxels, neurons, etc).
Parameters: |
|
---|---|
Returns: |
|
References
https://en.wikipedia.org/wiki/Coefficient_of_determination#Extensions
tikreg.utils.
delay2slice
(delay)¶Get a slicer for an array at a desired delay (e.g. a[delay2slice(3)]).
Parameters: |
|
---|---|
Returns: |
|
See also
delay_signal
Examples
>>> # e.g. 5 timepoints and 3 features
>>> mat = np.arange(5*3).reshape(5,3)
>>> print(mat)
[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]
[ 9 10 11]
[12 13 14]]
>>> slicer = delay2slice(2) # delay by 2 samples
>>> # starts at `delay` and has (n - delay) samples
>>> print(mat[slicer])
[[0 1 2]
[3 4 5]
[6 7 8]]
>>> # starts at 0th sample and shifts data explicitly
>>> delayed_mat = delay_signal(mat, [2])
>>> print(delayed_mat)
[[0 0 0]
[0 0 0]
[0 1 2]
[3 4 5]
[6 7 8]]
tikreg.utils.
delay_signal
(mat, delays=[0, 1, 2, 3], fill=0)¶Create a temporally shifted version of the data
Parameters: |
|
---|---|
Returns: |
|
Notes
The data is delayed such that each p columns correspond to one delay. The order of the delays is preserved.
where \(X_{\delta \left(j \right)}\) corresponds to the original data delayed by j samples.
Examples
>>> mat = np.arange(5*3).reshape(5,3)
>>> print(mat)
[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]
[ 9 10 11]
[12 13 14]]
>>> delayed_mat = delay_signal(mat, [0,1,2])
>>> print(delayed_mat.shape)
(5, 9)
>>> print(delayed_mat) #
[[ 0 1 2 0 0 0 0 0 0]
[ 3 4 5 0 1 2 0 0 0]
[ 6 7 8 3 4 5 0 1 2]
[ 9 10 11 6 7 8 3 4 5]
[12 13 14 9 10 11 6 7 8]]
>>> delayed_mat_neg = delay_signal(mat, [-1,0,1]) # negative delays
>>> print(delayed_mat_neg)
[[ 3 4 5 0 1 2 0 0 0]
[ 6 7 8 3 4 5 0 1 2]
[ 9 10 11 6 7 8 3 4 5]
[12 13 14 9 10 11 6 7 8]
[ 0 0 0 12 13 14 9 10 11]]
tikreg.utils.
determinant_normalizer
(mat, thresh=1e-08)¶Compute scalar to normalize covariance matrix determinant to 1.
Uses the pseudo-determinant for numerical robustness. This is implemented by ignoring the smallest eigenvalues.
Parameters: |
|
---|---|
Returns: |
|
Notes
The determinant can be thought of as the variance of a covariance matrix in high-dimensions. Analogous to standardizing data to have unit variance (e.g. z-scoring), one can standardize a matrix to have a determinant of 1. Setting the determinant to 1 allows the covariance structure to vary while keeping the ‘variance’ constant.
This function is useful when sampling covariance matrices from Wishart distributions or when the generating covariance matrix function has hyper-parameters of its own.
Examples
>>> mat = np.random.randn(20,20)
>>> cov = np.dot(mat.T, mat)
>>> assert np.allclose(np.linalg.det(cov), 1.0) is False
>>> scale = determinant_normalizer(cov)
>>> assert np.allclose(np.linalg.det(cov/scale), 1.0)
tikreg.utils.
difference_operator
(order, nobs)¶Get a finite difference operator matrix of size nobs.
Parameters: |
|
---|---|
Returns: |
|
tikreg.utils.
explainable_variance
(repeats, ncorrection=True, dozscore=True)¶Compute the explainable variance in the recorded signals.
Parameters: |
|
---|---|
Returns: |
|
Notes
Explainable variance can be interpreted as the \(R^2\) of a model that predicts each repetition with the mean across repetitions.
References
Schoppe, et al. (2016), Hsu, et al. (2004), David, et al. (2005).
Examples
First, simulate some repeated data for 50 units (e.g. voxels, neurons).
>>> nreps, ntpts, nunits, noise = 10, 100, 50, 2.0
>>> _, _, Y = generate_data(n=ntpts, testsize=0, v=nunits, noise=0.0)
>>> repeats = np.asarray([Y for i in range(nreps)])
>>> # Add i.i.d. gaussian noise to each copy of the data
>>> repeats += np.random.randn(nreps, ntpts, nunits)*noise
>>> print(repeats.shape)
(10, 100, 50)
The repeats can be used to compute the explainable variance for each simulated unit.
>>> EV = explainable_variance(repeats)
>>> EV.shape
(50,)
>>> print(EV.mean()) # doctest: +SKIP
0.20099454817453574
>>> analytic_R2 = analytic_expected_correlation(2.0)**2
>>> print(round(analytic_R2, 6))
0.2
tikreg.utils.
fast_indexing
(a, rows, cols=None)¶Extract row and column entries from a 2D np.ndarray.
Much faster and memory efficient than fancy indexing for rows and cols from a matrix. Slightly faster for row indexing.
Parameters: |
|
---|---|
Returns: |
|
tikreg.utils.
generate_data
(n=100, p=10, v=2, noise=1.0, testsize=None, dozscore=False, feature_sparsity=0.0)¶Get some B,X,Y data generated from gaussian (0,1).
Parameters: |
|
---|---|
Returns: |
|
Examples
>>> B, (Xtrain, Xtest), (Ytrain, Ytest) = generate_data(n=100, p=10, v=2,testsize=20)
>>> B.shape, (Xtrain.shape, Xtest.shape), (Ytrain.shape, Ytest.shape)
((10, 2), ((100, 10), (20, 10)), ((100, 2), (20, 2)))
>>> B, X, Y = generate_data(n=100, p=10, v=2,noise=1.0,testsize=0) # No test data
>>> B.shape, X.shape, Y.shape
((10, 2), (100, 10), (100, 2))
tikreg.utils.
generate_trnval_folds
(N, sampler='cv', nchunks=5, nfolds=5, testpct=0.2)¶Split dataset into training and validation folds
Parameters: |
|
---|---|
Yields: |
|
Notes
By default, this function is optimized for autocorrelated signals. If sampled signals are not autocorrelated, set nchunks=1
Examples
>>> folds = generate_trnval_folds(100, sampler='cv')
>>> fold_sizes = [(len(trnidx),len(validx)) for trnidx, validx in folds]
>>> print(fold_sizes)
[(80, 20), (80, 20), (80, 20), (80, 20), (80, 20)]
>>> folds = generate_trnval_folds(100, sampler='bcv', nfolds=(2,5))
>>> print(len(list(folds)))
10
>>> folds = generate_trnval_folds(127, sampler='cv', nchunks=10, nfolds=5)
>>> fold_sizes = [(len(trnidx),len(validx)) for trnidx, validx in folds]
>>> print(fold_sizes) # doctest: +SKIP
[(97, 30), (97, 30), (107, 20), (107, 20), (107, 20)]
>>> folds = generate_trnval_folds(100, sampler='nbb')
>>> fold_sizes = [(len(np.unique(trnidx)),len(validx)) for trnidx, validx in folds]
>>> print(fold_sizes) # doctest: +SKIP
[(50, 50), (60, 40), (55, 45), (60, 40), (55, 45)]
>>> folds = generate_trnval_folds(100, sampler='mbb')
>>> fold_sizes = [(len(np.unique(trnidx)),len(validx)) for trnidx, validx in folds]
>>> print(fold_sizes) # doctest: +SKIP
[(57, 43), (60, 40), (60, 40), (65, 35), (51, 49)]
tikreg.utils.
hrf_convolution
(input_responses, HRF=None, do_convolution=True, dt=None)¶Convolve a series of impulses in a matrix with a given HRF
Parameters: |
|
---|---|
Returns: |
|
Examples
>>> impulses = np.zeros((100, 2))
>>> impulses[5, 0] = 1
>>> impulses[25, 1] = 1
>>> bold = hrf_convolution(impulses, dt=1.0) # Default peak at 6s
tikreg.utils.
hrf_default_basis
(dt=2.0, duration=32)¶Hemodynamic response function basis set.
Wrapper to hrf_estimation package.
Parameters: |
|
---|---|
Returns: |
|
tikreg.utils.
hyperopt_make_trial_data
(tid, vals, loss)¶Generate a valid dictionary as a trial for hyperopt.Trials.
Parameters: |
|
---|---|
Returns: |
|
tikreg.utils.
hyperopt_make_trials
(values, losses, parameter_names=None)¶Parameters: |
|
---|---|
Returns: |
|
tikreg.utils.
isdiag
(mat)¶Determine whether matrix is diagonal.
Parameters: |
|
---|---|
Returns: |
|
tikreg.utils.
mult_diag
(d, mat, left=True)¶Efficient multiply a full matrix by a diagonal matrix.
This function should always be faster than dot.
Parameters: |
|
---|---|
Returns: |
|
Notes
This code by: Pietro Berkes berkes@gatsby.ucl.ac… Mon Mar 26 11:55:47 CDT 2007 http://mail.scipy.org/pipermail/numpy-discussion/2007-March/026807.html
Examples
>>> mat = np.random.randn(20,20)
>>> d = np.random.randn(20)
>>> assert np.allclose(mult_diag(d, mat, left=True), np.dot(np.diag(d), mat))
>>> assert np.allclose(mult_diag(d, mat, left=False), np.dot(mat, np.diag(d)))
tikreg.utils.
noise_ceiling_correction
(repeats, yhat, dozscore=True)¶Noise ceiling corrected correlation coefficient.
Correlation coefficient estimate that better reflects the error in the predictions that is due to the inaccuracy of the model, rather than the noise intrinsic in the responses. This is achieved by removing the non-stationary part of the noise in the measured signals across multiple repetitions of the same stimulus/task/conditions.
Parameters: |
|
---|---|
Returns: |
|
Notes
Repeats are used to compute the amount of explainable variance in each one of the units (e.g. voxels, neurons, etc.). This is equivalent to estimating the adjusted \(R^2\) of an OLS model that predicts each individual repeat timecourse with the mean timecourse computed across repetitions. This process is performed individually for each unit.
The mean timecouse of each unit is computed. The product between the predicted responses and the mean timecourse is then computed. This value is then divided by the amount of explainable variance.
\(r_{ncc}\) is misbehaved if \(R^2\) is very low.
References
Schoppe, et al. (2016), Hsu, et al. (2004), David, et al. (2005).
Examples
First, simulate some repeated data for 50 units (e.g. voxels, neurons).
>>> nreps, ntpts, nunits, noise = 10, 100, 50, 2.0
>>> _, _, Y = generate_data(n=ntpts, testsize=0, v=nunits, noise=0.0, dozscore=True)
>>> repeats = np.asarray([Y for i in range(nreps)])
>>> # Add i.i.d. gaussian noise to each copy of the data
>>> repeats += np.random.randn(nreps, ntpts, nunits)*noise
>>> print(repeats.shape)
(10, 100, 50)
Compute the noise ceiling corrected correlation coefficient using random predictions. Because the predictions are unrelated to the data, we expected a value of 0.
>>> mean_nccr = noise_ceiling_correction(repeats, np.random.randn(ntpts, nunits)).mean()
>>> mean_nccR2 = (mean_nccr**2)*np.sign(mean_nccr)
>>> # As ntpts -> inf, this converges to 0 because predictions are random
>>> assert np.allclose(round(mean_nccR2, 2), 0)
Next, we produce more accurate predictions.
>>> Yhat = Y + np.random.randn(ntpts, nunits)*0.5 # little noise
>>> # raw correlation reflects both noise in signals (2.0) and predictions (0.5)
>>> raw_perf = columnwise_correlation(repeats.mean(0), Yhat).mean()
>>> print(raw_perf) # doctest: +SKIP
0.7534268012539665
The raw correlation coefficient computed reflects the amount of noise in the signals (2.0) and the amount of noise in our model (0.5). The noise ceiling corrected correlation coefficient can be used to obtain an estimate that more closely reflects the error due to predictions alone.
>>> accurate_mean_nccr = noise_ceiling_correction(repeats, Yhat).mean()
>>> print(accurate_mean_nccr) # doctest: +SKIP
0.8954483753448146
>>> # As nreps, ntpts -> inf, the nccr is determined by the prediction error alone (0.5)
>>> # Analytically, the error in the predictions is 0.5 so the expected correlation is ~0.89
>>> print(round(analytic_expected_correlation(0.5), 6))
0.894427
>>> assert np.allclose(accurate_mean_nccr, analytic_expected_correlation(0.5), rtol=1e-01) # Small simulation