Description of GLMdenoisedata

 function [results,denoiseddata] = GLMdenoisedata(design,data,stimdur,tr,hrfmodel,hrfknobs,opt,figuredir)

 <design> is the experimental design.  There are three possible cases:
   1. A where A is a matrix with dimensions time x conditions.
      Each column should be zeros except for ones indicating condition onsets.
      (Fractional values in the design matrix are also allowed.)
   2. {A1 A2 A3 ...} where each of the A's are like the previous case.
      The different A's correspond to different runs, and different runs
      can have different numbers of time points.
   3. {{C1_1 C2_1 C3_1 ...} {C1_2 C2_2 C3_2 ...} ...} where Ca_b is a vector of 
      onset times (in seconds) for condition a in run b.  Time starts at 0 
      and is coincident with the acquisition of the first volume.  This case 
      is compatible only with <hrfmodel> set to 'assume'.
   Because this function involves cross-validation across runs, there must 
   be at least two runs in <design>.  
 <data> is the time-series data with dimensions X x Y x Z x time or a cell vector of 
   elements that are each X x Y x Z x time.  XYZ can be collapsed such that the data 
   are given as a 2D matrix (XYZ x time); however, if you do this, then several of the 
   figures that are written out by this function will not be useful to look at.  The 
   dimensions of <data> should mirror that of <design>.  (For example, <design> and 
   <data> should have the same number of runs, the same number of time points, etc.  
   Thus, <data> should have at least two runs since <design> must have at least two 
   runs.)  <data> should not contain any NaNs. We automatically convert <data> to 
   single format (if necessary).
 <stimdur> is the duration of a trial in seconds
 <tr> is the sampling rate in seconds
 <hrfmodel> (optional) indicates the type of model to use for the HRF:
   'fir' indicates a finite impulse response model (a separate timecourse
     is estimated for every voxel and every condition)
   'assume' indicates that the HRF is provided (see <hrfknobs>)
   'optimize' indicates that we should estimate a global HRF from the data
   Default: 'optimize'.
 <hrfknobs> (optional) is as follows:
   if <hrfmodel> is 'fir', then <hrfknobs> should be the number of 
     time points in the future to model (N >= 0).  For example, if N is 10, 
     then timecourses will consist of 11 points, with the first point 
     coinciding with condition onset.
   if <hrfmodel> is 'assume', then <hrfknobs> should be time x 1 with
     the HRF to assume.
   if <hrfmodel> is 'optimize', then <hrfknobs> should be time x 1 with the 
     initial seed for the HRF.  The length of this vector indicates the
     number of time points that we will attempt to estimate in the HRF.
   Note on normalization:  In the case that <hrfmodel> is 'assume' or
   'optimize', we automatically divide <hrfknobs> by the maximum value
   so that the peak is equal to 1.  And if <hrfmodel> is 'optimize',
   then after fitting the HRF, we again normalize the HRF to peak at 1
   (and adjust amplitudes accordingly).  Default in the case of 'fir' is
   20.  Default in the case of 'assume' and 'optimize' is to use a 
   canonical HRF that is calculated based on <stimdur> and <tr>.
 <opt> (optional) is a struct with the following fields:
   <extraregressors> (optional) is time x regressors or a cell vector
     of elements that are each time x regressors.  The dimensions of 
     <extraregressors> should mirror that of <design> (i.e. same number of 
     runs, same number of time points).  The number of extra regressors 
     does not have to be the same across runs, and each run can have zero 
     or more extra regressors.  If [] or not supplied, we do 
     not use extra regressors in the model.
   <maxpolydeg> (optional) is a non-negative integer with the maximum 
     polynomial degree to use for polynomial nuisance functions, which
     are used to capture low-frequency noise fluctuations in each run.
     Can be a vector with length equal to the number of runs (this
     allows you to specify different degrees for different runs).  
     Default is to use round(L/2) for each run where L is the 
     duration in minutes of a given run.
   <seed> (optional) is the random number seed to use (this affects
     the selection of bootstrap samples). Default: sum(100*clock).
   <bootgroups> (optional) is a vector of positive integers indicating
     the grouping of runs to use when bootstrapping.  For example, 
     a grouping of [1 1 1 2 2 2] means that of the six samples that are
     drawn, three samples will be drawn (with replacement) from the first
     three runs and three samples will be drawn (with replacement) from
     the second three runs.  This functionality is useful in situations
     where different runs involve different conditions.  Default: ones(1,D) 
     where D is the number of runs.
   <numforhrf> (optional) is a positive integer indicating the number 
     of voxels (with the best R^2 values) to consider in fitting the 
     global HRF.  This input matters only when <hrfmodel> is 'optimize'.
     Default: 50.  (If there are fewer than that number of voxels
     available, we just use the voxels that are available.)
   <hrffitmask> (optional) is X x Y x Z with 1s indicating all possible
     voxels to consider for fitting the global HRF.  This input matters
     only when <hrfmodel> is 'optimize'.  Special case is 1 which means
     all voxels can be potentially chosen.  Default: 1.
   <brainthresh> (optional) [A B] where A is a percentile for voxel intensity 
     values and B is a fraction to apply to the percentile.  These parameters
     are used in the selection of the noise pool.  Default: [99 0.5].
   <brainR2> (optional) is an R^2 value (percentage).  Voxels whose 
     cross-validation accuracy is below this value are allowed to enter 
     the noise pool.  Default: 0.
   <brainexclude> (optional) is X x Y x Z with 1s indicating voxels to 
     exclude when selecting the noise pool.  Special case is 0 which means 
     all voxels can be potentially chosen.  Default: 0.
   <numpcstotry> (optional) is a non-negative integer indicating the maximum
     number of PCs to enter into the model.  Default: 20.
   <pcR2cutoff> (optional) is an R^2 value (percentage).  To decide the number
     of PCs to include, we examine a subset of the available voxels.
     Specifically, we examine voxels whose cross-validation accuracy is above 
     <pcR2cutoff> for any of the numbers of PCs.  Default: 0.
   <pcR2cutoffmask> (optional) is X x Y x Z with 1s indicating all possible
     voxels to consider when selecting the subset of voxels.  Special case is
     1 which means all voxels can be potentially selected.  Default: 1.
   <pcstop> (optional) is
     A: a number greater than or equal to 1 indicating when to stop adding PCs 
        into the model.  For example, 1.05 means that if the cross-validation 
        performance with the current number of PCs is within 5% of the maximum 
        observed, then use that number of PCs.  (Performance is measured 
        relative to the case of 0 PCs.)  When <pcstop> is 1, the selection 
        strategy reduces to simply choosing the PC number that achieves
        the maximum.  The advantage of stopping early is to achieve a selection
        strategy that is robust to noise and shallow performance curves and 
        that avoids overfitting.
    -B: where B is the number of PCs to use for the final model (thus, the user
        chooses).  B can be any integer between 0 and opt.numpcstotry.
     Default: 1.05.
   <pccontrolmode> (optional) is for testing purposes.  Default is 0 which means
     to do nothing special.  If 1, then after we are done constructing the global
     noise regressors, we scramble the phase spectra of these regressors (prior
     to entering them into the GLM).  If 2, then after we are done constructing the
     noise regressors, we shuffle the assignment of noise regressors
     to the runs, ensuring that each run is assigned a new set of regressors.  Note that
     in this shuffling, the grouping of regressors (at the run level) is maintained.
     The shuffling is performed prior to entering noise regressors into the GLM.
   <numboots> (optional) is a positive integer indicating the number of 
     bootstraps to perform for the final model.  Special case is 0 which
     indicates that the final model should just be fit to the complete
     set of data (and not bootstrapped). Default: 100.
   <denoisespec> (optional) is a binary string or cell vector of binary strings
     indicating the components of the data to return in <denoiseddata>.  The 
     format of each string should be 'ABCDE' where A indicates whether to include 
     the signal (estimated hemodynamic responses evoked by the experiment), B 
     indicates whether to include the polynomial drift, C indicates whether
     to include any extra regressors provided by the user, D indicates 
     whether to include the noise regressors, and E indicates whether
     to include the residuals of the model.  If multiple strings are provided,
     then separate copies of the data will be returned in the rows of 
     <denoiseddata>.  Default: '11101' which indicates that all components of 
     the data will be returned except for the component corresponding to the 
     estimate of the contribution of the noise regressors.
   <wantpercentbold> (optional) is whether to convert the amplitude estimates
     in 'models', 'modelmd', and 'modelse' to percent BOLD change.  This is
     done as the very last step, and is accomplished by dividing by the 
     absolute value of 'meanvol' and multiplying by 100.  (The absolute 
     value prevents negative values in 'meanvol' from flipping the sign.)
     Default: 1.
   <hrfthresh> (optional) is an R^2 threshold.  If the R^2 between the estimated
     HRF and the initial HRF is less than <hrfthresh>, we decide to just use
     the initial HRF.  Set <hrfthresh> to -Inf if you never want to reject
     the estimated HRF.  Default: 50.
   <noisepooldirect> (optional) is {A B} where A is X x Y x Z with 1s indicating the 
     voxels to use as the noise pool and B is a non-negative integer indicating the 
     number of PCs to use.  (B must be less than or equal to <numpcstotry>.)
     If <noisepooldirect> is supplied, this causes much of the standard GLMdenoise 
     procedure to be bypassed.  For example, cross-validation is no longer necessary 
     and therefore no longer performed.  Thus, one benefit of using <noisepooldirect>
     is that you can apply GLMdenoisedata.m to a single run of data.  Note that if 
     <noisepooldirect> is supplied, various inputs no longer have any effect 
     (e.g., <brainthresh>, <brainR2>, <brainexclude>, <pcR2cutoff>, <pcR2cutoffmask>, 
     and <pcstop>) and various outputs and figures are omittted.  Default is [] which 
     means to perform the usual GLMdenoise procedure.
   <wantparametric> (optional) is whether to compute parametric GLM fits and associated
     error estimates.  These are added as additional outputs in the <results> struct.
     Default: 0.
   <wantsanityfigures> (optional) is whether to write out figures that allow you
     to sanity-check the data.  You may want to set this to 0 to save computational
     time.  Default: 1.
 <figuredir> (optional) is a directory to which to write figures.  (If the
   directory does not exist, we create it; if the directory already exists,
   we delete its contents so we can start afresh.)  If [], no figures are
   written.  If not supplied, default to 'GLMdenoisefigures' (in the current 
   directory).

 Based on the experimental design (<design>, <stimdur>, <tr>) and the 
 model specification (<hrfmodel>, <hrfknobs>), fit a GLM model to the 
 data (<data>, <xyzsize>) using a denoising strategy.  Figures 
 illustrating the results are written to <figuredir>.

 Return <results> as a struct containing the following fields:
 <models>, <modelmd>, <modelse>, <R2>, <R2run>, <signal>, <noise>, 
   <SNR>, and <hrffitvoxels> are all like the output of GLMestimatemodel.m 
   (please see that function for details).
 <hrffitvoxels> is X x Y x Z with logical values indicating the voxels that
   were used to fit the global HRF.  (If <hrfmodel> is not 'optimize',
   this is returned as [].)
 <meanvol> is X x Y x Z with the mean of all volumes
 <noisepool> is X x Y x Z with logical values indicating the voxels that
   were selected for the noise pool.
 <pcregressors> indicates the noise regressors that were used
   to denoise the data.  The format is a cell vector of elements that 
   are each time x regressors.  The number of regressors will be equal 
   to opt.numpcstotry and they are in order (the first PC is the first
   regressor, the second PC is the second regressor, etc.).  Note that
   only the first <pcnum> PCs are actually used for the final model.
 <pcR2> is X x Y x Z x (1+opt.numpcstotry) with cross-validated R^2 values for
   different numbers of PCs.  The first slice corresponds to 0 PCs, the
   second slice corresponds to 1 PC, the third slice corresponds to
   2 PCs, etc.
 <pcR2final> is X x Y x Z with cross-validated R^2 values for the final
   model that is chosen.  Note that this is simply pcR2(:,:,:,1+pcnum).
 <pcvoxels> is X x Y x Z with logical values indicating the voxels that
   were used to select the number of PCs.
 <pcnum> is the number of PCs that were selected for the final model.
 <pcweights> is X x Y x Z x <pcnum> x R with the estimated weights on the 
   PCs for each voxel and run.
 <SNRbefore> is X x Y x Z with signal-to-noise ratios before denoising
   (using no noise regressors).
 <SNRafter> is X x Y x Z with signal-to-noise ratios after denoising
   (using the final number of noise regressors).
 <inputs> is a struct containing all inputs used in the call to this
   function, excluding <data>.  We additionally include a field called 
   'datasize' which contains the size of each element of <data>.
 <parametric> is a struct that is included if opt.wantparametric.  The fields are:
   <designmatrix> is time x regressors with the full design matrix of the GLM
   <parameters> is X x Y x Z x regressors with the beta weight estimate for
     each regressor
   <parametersse> is X x Y x Z x regressors with the standard error on the beta weights
   <noisevar> is X x Y x Z with the estimate of the noise variance (sigma squared)
   Note that these are in raw units and are not converted into % BOLD change.
 
 Also return <denoiseddata>, which is just like <data> except that the 
 component of the data that is estimated to be due to the noise regressors
 is subtracted off.  This may be useful in situations where one wishes to
 treat the denoising as a pre-processing step prior to other analyses 
 of the time-series data.  Further customization of the contents of
 <denoiseddata> is controlled by opt.denoisespec.  If you do not need
 <denoiseddata>, do not assign the <denoiseddata> output when you call 
 GLMdenoisedata.m (this allows us to save on memory usage).

 Description of the denoising procedure:
 1. Determine HRF.  If <hrfmodel> is 'assume', we just use the HRF
    specified by the user.  If <hrfmodel> is 'optimize', we perform
    a full fit of the GLM model to the data, optimizing the shape of
    the HRF.  If <hrfmodel> is 'fir', we do nothing (since full 
    flexibility in the HRF is allowed for each voxel and each condition).
 2. Determine cross-validated R^2 values.  We fix the HRF to what is
    obtained in step 1 and estimate the rest of the GLM model.  Leave-one-
    run-out cross-validation is performed, and we obtain an estimate of the
    amount of variance (R^2) that can be predicted by the deterministic 
    portion of the GLM model (the HRF and the amplitudes).
 3. Determine noise pool.  This is done by calculating a mean volume (the 
    mean across all volumes) and then determining the voxels that
    satisfy the following criteria:
    (1) The voxels must have sufficient MR signal, that is, the signal
        intensity in the mean volume must be above a certain threshold
        (see opt.brainthresh).
    (2) The voxels must have cross-validated R^2 values that are 
        below a certain threshold (see opt.brainR2).
    (3) The voxels must not be listed in opt.brainexclude (which is an
        optional input that can be specified by the user).
 4. Determine noise regressors.  This is done by extracting the 
    time-series data for the voxels in the noise pool, projecting out the
    polynomial nuisance functions from each time-series, normalizing each
    time-series to be unit length, and then performing PCA.  The top N
    PCs from each run (where N is equal to opt.numpcstotry) are selected
    as noise regressors.  Each regressor is scaled to have a standard
    deviation of 1; this makes it easier to interpret the weights estimated
    for the regressors.
 5. Evaluate different numbers of PCs using cross-validation.  We refit
    the GLM model to the data (keeping the HRF fixed), systematically varying 
    the number of PCs from 1 to N.  For each number of PCs, leave-one-run-out 
    cross-validation is performed.  (Recall that only the deterministic
    portion of the model is cross-validated; thus, any changes in R^2
    directly reflect changes in the quality of the amplitude estimates.)
 6. Choose optimal number of PCs.  To choose the optimal number of PCs,
    we select a subset of voxels (namely, any voxel that has a cross-validated
    R^2 value greater than opt.pcR2cutoff (default: 0%) in any of the cases
    being considered) and then compute the median cross-validated R^2 for these 
    voxels for different numbers of PCs.  Starting from 0 PCs, we select the 
    number of PCs that achieves a cross-validation accuracy within opt.pcstop of 
    the maximum.  (The default for opt.pcstop is 1.05, which means that the
    chosen number of PCs will be within 5% of the maximum.)
 7. Fit the final model.  We fit the final GLM model (with the HRF fixed to 
    that obtained in step 1 and with the number of PCs selected in step 6) 
    to the data.  Bootstrapping is used to estimate error bars on 
    amplitude estimates.  (We also fit the final model with no PCs so that
    we can compare the SNR before and after denoising.)
 8. Return the de-noised data.  We calculate the component of the data that 
    is due to the noise regressors and return the original time-series 
    data with this component subtracted off.  Note that the other components of
    the model (the hemodynamic responses evoked by the experiment, 
    the polynomial drift, any extra regressors provided by the user, 
    model residuals) remain in the de-noised data.  To change this behavior, 
    please see the input opt.denoisespec.

 Figures:
 - "CheckData/CheckMeanStd_runN.png" shows the mean and standard deviation across voxels
   of each volume in run N.  This allows one to quickly check the sanity of the data, 
   e.g., with regards to whether weird transient artifacts exist, whether initial 
   magnetization effects are present in the data (the first few volumes should have
   been dropped to avoid these effects), whether there are non-finite values in the 
   data (there should not be any), and with regards to the units of the data 
   (the data should consist primarily of positive values and in particular, should
   not be mean-subtracted).
 - "CheckData/CheckDVARS_runN.png" shows the DVARS metric (Power 2012).  Specifically,
   this is a time-series of the root-mean-square of the difference between pairs of
   successive volumes.
 - "HRF.png" shows the initial assumed HRF (provided by <hrfknobs>) and the
   final estimated HRF (as calculated in step 1).  If <hrfmodel> is 'assume',
   the two plots will be identical.  If <hrfmodel> is 'fir', this figure
   is not written.
 - "HRFfitmask.png" shows (in white) the mask restricting the voxels that
   can be chosen for fitting the global HRF.  This figure is written only
   if <hrfmodel> is 'optimize' and is not written if opt.hrffitmask is 1.
 - "HRFfitvoxels.png" shows (in white) the voxels used to fit the global HRF.
   This figure is written only if <hrfmodel> is 'optimize' and only if
   the fitted global HRF is actually chosen for use (in some cases, the
   initial HRF estimate is chosen; see GLMestimatemodel.m).
 - "PCselection.png" shows for different numbers of PCs, the median 
   cross-validated R^2 across a subset of voxels (namely, those voxels that 
   have greater than opt.pcR2cutoff (default: 0%) R^2 for at least one of 
   the models considered).  The selected number of PCs is circled and 
   indicated in the title of the figure.
 - "PCscatterN.png" shows a scatter plot of cross-validated R^2 values obtained
   with no PCs against values obtained with N PCs.  The range of the plot
   is set to the full range of all R^2 values (across all numbers of PCs).
   Two different sets of points are plotted.  The first set is shown in green,
   and this is a set of up to 20,000 voxels randomly selected from the
   entire pool of voxels.  The second set is shown in red, and this is a set of
   up to 20,000 voxels randomly selected from the set of voxels that
   were used to select the number of PC regressors.
 - "MeanVolume.png" shows the mean volume (mean of all volumes).
 - "NoisePool.png" shows (in white) the voxels selected for the noise pool.
 - "NoiseExclude.png" shows (in white) voxels that were excluded by the user
   from entering the noise pool.  This figure is not written if opt.brainexclude is 0.
 - "PCcrossvalidationN.png" shows cross-validated R^2 values obtained with N PCs.
   The range is 0% to 100%, and the colormap is nonlinearly scaled to enhance
   visibility.
 - "PCcrossvalidationscaledN.png" shows cross-validated R^2 values obtained with N PCs.
   This is just like the previous figures except that the minimum and maximum of the
   color range is set to the 1st and 99th percentile of R^2 values observed across all 
   PCs.  The point of this is to make it easier to visualize datasets where R^2 values
   are low.  The disadvantage is that these figures cannot be directly compared across 
   different datasets (since the color range may differ).
 - "PCmask.png" shows (in white) the mask restricting the voxels that
   can be selected for determining the optimal number of PCs.  This figure is 
   not written if opt.pcR2cutoffmask is 1.
 - "PCvoxels.png" shows (in white) the voxels used to determine the optimal
   number of PCs.
 - "FinalModel.png" shows R^2 values for the final model (as estimated in
   step 7).  Note that these R^2 values are not cross-validated.
 - "FinalModel_runN.png" shows R^2 values for the final model separated by 
   runs.  For example, FinalModel_run01.png indicates R^2 values calculated
   over the data in run 1.  This might be useful for deciding post-hoc to
   exclude certain runs from the analysis.
 - "SNRsignal.png" shows the maximum absolute amplitude obtained (the signal).
   The range is 0 to the 99th percentile of the values.
 - "SNRnoise.png" shows the average amplitude error (the noise).
   The range is 0 to the 99th percentile of the values.
 - "SNR.png" shows the signal-to-noise ratio.  The range is 0 to 10.
 - "SNRcomparebeforeandafter.png" shows a scatter plot of SNR values obtained
   with no denoising (0 PCs) against SNR values obtained with denoising (with
   the selected number of PCs).  The range of the plot is set to the full range
   of SNR values.  The green and red coloring of the dots is the same as in the
   "PCscatterN.png" figures.
 - "SNRamountofdatagained.png" is exactly the same as the previous figure except
   that the y-axis is now converted into the equivalent amount of data gained.  
   For example, if SNR is boosted from 4 to 8, this is equivalent to having 
   obtained 300% more data than was actually obtained.
 - "PCmap/PCmap_runN_numO.png" shows the estimated weights for the Oth PC
   for run N.  The range is -A to A where A is the 99th percentile of the
   absolute value of all weights across all runs.  The colormap proceeds
   from blue (negative) to black (0) to red (positive).

 Additional information:
 - For additional details on model estimation and quantification of model
 accuracy (R^2), please see the documentation provided in GLMestimatemodel.m.
 - To ensure that <SNRbefore> and <SNRafter> are directly comparable, we first
 average the estimated signal across the no-denoise and denoise cases
 and then divide by the estimated noise from each case.  This has the 
 consequence that <SNRafter> will not be exactly identical to <SNR>.  
 - With regards to the onset-time specification of the experimental design,
 this is implemented by laying down the HRF at the appropriate onset times
 and then using cubic interpolation to obtain the predicted HRF values at the
 times at which data are actually sampled.

 History:
 - 2014/08/01: add opt.wantparametric input (which enables parametric GLM fits).
               add opt.wantsanityfigures input (which allows user to turn off
               the sanity-check figures).  add new DVARS sanity-check figures.
               change the form of the polynomials used in the GLM: the
               polynomials are now constructed to be orthogonal and unit-length.
               note that this changes previous behavior and due to numerical
               issues, the outputs given by this function may be slightly
               different compared to previous results.
 - 2014/07/14: add <noisepooldirect> input; fix some file- and directory-related
               issues, making things more platform independent
 - 2013/12/11: rename "global noise regressors" to "noise regressors" in the
               documentation; perform a quick error-check for non-finite values
               in the first run of data; omit saving of some of the figures if 
               opt.numboots is 0; add some new sanity-check figures; fix minor bug
 - 2013/11/18: update documentation; add opt.pccontrolmode; add opt.hrfthresh;
               use fullfile for compatibility with different platforms;
               if pcvoxels is empty, we now fallback to using the top 100 voxels;
               change the colormap scaling for PCcrossvalidationscaled*.png;
               other various small tweaks;
 - 2013/05/12: allow <design> to specify onset times
 - 2013/05/12: add opt.brainexclude and associated figure
 - 2013/05/12: add SNRbefore and SNRafter fields and associated figures
 - 2013/05/12: add PCcrossvalidationscaledN.png figures
 - 2013/05/12: update to indicate fractional values in design matrix are allowed.
 - 2013/05/12: regressors that are all zero now receive a 0 weight (instead of crashing)
 - 2013/05/12: add pcR2final as an output
 - 2012/12/06: automatically convert data to single format
 - 2012/12/05: fix minor bug (bad HRF estimate caused figure crash)
 - 2012/12/03: *** Tag: Version 1.02 ***. Use faster OLS computation (less
   error-checking; program execution will halt if design matrix is singular);
   documentation tweak; minor bug fix.
 - 2012/11/24:
   - INPUTS: add opt.hrffitmask; opt.pcR2cutoff; opt.pcR2cutoffmask; opt.pcstop; opt.denoisespec; opt.wantpercentbold;
   - OUTPUTS: add hrffitvoxels, pcvoxels, pcweights, inputs
   - FIGURES: HRFfitmask.png, HRFfitvoxels.png, PCmask.png, PCvoxels.png, Signal.png, Noise.png, SNR.png, PCmap*.png
   - hrfmodel can now be 'fir'!
   - change default of opt.numpcstotry to 20
   - PC regressors are now scaled to have standard deviation of 1
   - xval scatter plots now are divided into red and green dots (and up to 20,000 each)
   - pcselection figure uses a line now (not bar) and a selection circle is drawn
   - no more skipping of the denoiseddata computation
 - 2012/11/03 - add a speed-up
 - 2012/11/02 - Initial version.
 - 2012/10/31 - add meanvol and change that it is the mean of all
 - 2012/10/30 - Automatic division of HRF!

0001 function [results,denoiseddata] = GLMdenoisedata(design,data,stimdur,tr,hrfmodel,hrfknobs,opt,figuredir)
0002 
0003 % function [results,denoiseddata] = GLMdenoisedata(design,data,stimdur,tr,hrfmodel,hrfknobs,opt,figuredir)
0004 %
0005 % <design> is the experimental design.  There are three possible cases:
0006 %   1. A where A is a matrix with dimensions time x conditions.
0007 %      Each column should be zeros except for ones indicating condition onsets.
0008 %      (Fractional values in the design matrix are also allowed.)
0009 %   2. {A1 A2 A3 ...} where each of the A's are like the previous case.
0010 %      The different A's correspond to different runs, and different runs
0011 %      can have different numbers of time points.
0012 %   3. {{C1_1 C2_1 C3_1 ...} {C1_2 C2_2 C3_2 ...} ...} where Ca_b is a vector of
0013 %      onset times (in seconds) for condition a in run b.  Time starts at 0
0014 %      and is coincident with the acquisition of the first volume.  This case
0015 %      is compatible only with <hrfmodel> set to 'assume'.
0016 %   Because this function involves cross-validation across runs, there must
0017 %   be at least two runs in <design>.
0018 % <data> is the time-series data with dimensions X x Y x Z x time or a cell vector of
0019 %   elements that are each X x Y x Z x time.  XYZ can be collapsed such that the data
0020 %   are given as a 2D matrix (XYZ x time); however, if you do this, then several of the
0021 %   figures that are written out by this function will not be useful to look at.  The
0022 %   dimensions of <data> should mirror that of <design>.  (For example, <design> and
0023 %   <data> should have the same number of runs, the same number of time points, etc.
0024 %   Thus, <data> should have at least two runs since <design> must have at least two
0025 %   runs.)  <data> should not contain any NaNs. We automatically convert <data> to
0026 %   single format (if necessary).
0027 % <stimdur> is the duration of a trial in seconds
0028 % <tr> is the sampling rate in seconds
0029 % <hrfmodel> (optional) indicates the type of model to use for the HRF:
0030 %   'fir' indicates a finite impulse response model (a separate timecourse
0031 %     is estimated for every voxel and every condition)
0032 %   'assume' indicates that the HRF is provided (see <hrfknobs>)
0033 %   'optimize' indicates that we should estimate a global HRF from the data
0034 %   Default: 'optimize'.
0035 % <hrfknobs> (optional) is as follows:
0036 %   if <hrfmodel> is 'fir', then <hrfknobs> should be the number of
0037 %     time points in the future to model (N >= 0).  For example, if N is 10,
0038 %     then timecourses will consist of 11 points, with the first point
0039 %     coinciding with condition onset.
0040 %   if <hrfmodel> is 'assume', then <hrfknobs> should be time x 1 with
0041 %     the HRF to assume.
0042 %   if <hrfmodel> is 'optimize', then <hrfknobs> should be time x 1 with the
0043 %     initial seed for the HRF.  The length of this vector indicates the
0044 %     number of time points that we will attempt to estimate in the HRF.
0045 %   Note on normalization:  In the case that <hrfmodel> is 'assume' or
0046 %   'optimize', we automatically divide <hrfknobs> by the maximum value
0047 %   so that the peak is equal to 1.  And if <hrfmodel> is 'optimize',
0048 %   then after fitting the HRF, we again normalize the HRF to peak at 1
0049 %   (and adjust amplitudes accordingly).  Default in the case of 'fir' is
0050 %   20.  Default in the case of 'assume' and 'optimize' is to use a
0051 %   canonical HRF that is calculated based on <stimdur> and <tr>.
0052 % <opt> (optional) is a struct with the following fields:
0053 %   <extraregressors> (optional) is time x regressors or a cell vector
0054 %     of elements that are each time x regressors.  The dimensions of
0055 %     <extraregressors> should mirror that of <design> (i.e. same number of
0056 %     runs, same number of time points).  The number of extra regressors
0057 %     does not have to be the same across runs, and each run can have zero
0058 %     or more extra regressors.  If [] or not supplied, we do
0059 %     not use extra regressors in the model.
0060 %   <maxpolydeg> (optional) is a non-negative integer with the maximum
0061 %     polynomial degree to use for polynomial nuisance functions, which
0062 %     are used to capture low-frequency noise fluctuations in each run.
0063 %     Can be a vector with length equal to the number of runs (this
0064 %     allows you to specify different degrees for different runs).
0065 %     Default is to use round(L/2) for each run where L is the
0066 %     duration in minutes of a given run.
0067 %   <seed> (optional) is the random number seed to use (this affects
0068 %     the selection of bootstrap samples). Default: sum(100*clock).
0069 %   <bootgroups> (optional) is a vector of positive integers indicating
0070 %     the grouping of runs to use when bootstrapping.  For example,
0071 %     a grouping of [1 1 1 2 2 2] means that of the six samples that are
0072 %     drawn, three samples will be drawn (with replacement) from the first
0073 %     three runs and three samples will be drawn (with replacement) from
0074 %     the second three runs.  This functionality is useful in situations
0075 %     where different runs involve different conditions.  Default: ones(1,D)
0076 %     where D is the number of runs.
0077 %   <numforhrf> (optional) is a positive integer indicating the number
0078 %     of voxels (with the best R^2 values) to consider in fitting the
0079 %     global HRF.  This input matters only when <hrfmodel> is 'optimize'.
0080 %     Default: 50.  (If there are fewer than that number of voxels
0081 %     available, we just use the voxels that are available.)
0082 %   <hrffitmask> (optional) is X x Y x Z with 1s indicating all possible
0083 %     voxels to consider for fitting the global HRF.  This input matters
0084 %     only when <hrfmodel> is 'optimize'.  Special case is 1 which means
0085 %     all voxels can be potentially chosen.  Default: 1.
0086 %   <brainthresh> (optional) [A B] where A is a percentile for voxel intensity
0087 %     values and B is a fraction to apply to the percentile.  These parameters
0088 %     are used in the selection of the noise pool.  Default: [99 0.5].
0089 %   <brainR2> (optional) is an R^2 value (percentage).  Voxels whose
0090 %     cross-validation accuracy is below this value are allowed to enter
0091 %     the noise pool.  Default: 0.
0092 %   <brainexclude> (optional) is X x Y x Z with 1s indicating voxels to
0093 %     exclude when selecting the noise pool.  Special case is 0 which means
0094 %     all voxels can be potentially chosen.  Default: 0.
0095 %   <numpcstotry> (optional) is a non-negative integer indicating the maximum
0096 %     number of PCs to enter into the model.  Default: 20.
0097 %   <pcR2cutoff> (optional) is an R^2 value (percentage).  To decide the number
0098 %     of PCs to include, we examine a subset of the available voxels.
0099 %     Specifically, we examine voxels whose cross-validation accuracy is above
0100 %     <pcR2cutoff> for any of the numbers of PCs.  Default: 0.
0101 %   <pcR2cutoffmask> (optional) is X x Y x Z with 1s indicating all possible
0102 %     voxels to consider when selecting the subset of voxels.  Special case is
0103 %     1 which means all voxels can be potentially selected.  Default: 1.
0104 %   <pcstop> (optional) is
0105 %     A: a number greater than or equal to 1 indicating when to stop adding PCs
0106 %        into the model.  For example, 1.05 means that if the cross-validation
0107 %        performance with the current number of PCs is within 5% of the maximum
0108 %        observed, then use that number of PCs.  (Performance is measured
0109 %        relative to the case of 0 PCs.)  When <pcstop> is 1, the selection
0110 %        strategy reduces to simply choosing the PC number that achieves
0111 %        the maximum.  The advantage of stopping early is to achieve a selection
0112 %        strategy that is robust to noise and shallow performance curves and
0113 %        that avoids overfitting.
0114 %    -B: where B is the number of PCs to use for the final model (thus, the user
0115 %        chooses).  B can be any integer between 0 and opt.numpcstotry.
0116 %     Default: 1.05.
0117 %   <pccontrolmode> (optional) is for testing purposes.  Default is 0 which means
0118 %     to do nothing special.  If 1, then after we are done constructing the global
0119 %     noise regressors, we scramble the phase spectra of these regressors (prior
0120 %     to entering them into the GLM).  If 2, then after we are done constructing the
0121 %     noise regressors, we shuffle the assignment of noise regressors
0122 %     to the runs, ensuring that each run is assigned a new set of regressors.  Note that
0123 %     in this shuffling, the grouping of regressors (at the run level) is maintained.
0124 %     The shuffling is performed prior to entering noise regressors into the GLM.
0125 %   <numboots> (optional) is a positive integer indicating the number of
0126 %     bootstraps to perform for the final model.  Special case is 0 which
0127 %     indicates that the final model should just be fit to the complete
0128 %     set of data (and not bootstrapped). Default: 100.
0129 %   <denoisespec> (optional) is a binary string or cell vector of binary strings
0130 %     indicating the components of the data to return in <denoiseddata>.  The
0131 %     format of each string should be 'ABCDE' where A indicates whether to include
0132 %     the signal (estimated hemodynamic responses evoked by the experiment), B
0133 %     indicates whether to include the polynomial drift, C indicates whether
0134 %     to include any extra regressors provided by the user, D indicates
0135 %     whether to include the noise regressors, and E indicates whether
0136 %     to include the residuals of the model.  If multiple strings are provided,
0137 %     then separate copies of the data will be returned in the rows of
0138 %     <denoiseddata>.  Default: '11101' which indicates that all components of
0139 %     the data will be returned except for the component corresponding to the
0140 %     estimate of the contribution of the noise regressors.
0141 %   <wantpercentbold> (optional) is whether to convert the amplitude estimates
0142 %     in 'models', 'modelmd', and 'modelse' to percent BOLD change.  This is
0143 %     done as the very last step, and is accomplished by dividing by the
0144 %     absolute value of 'meanvol' and multiplying by 100.  (The absolute
0145 %     value prevents negative values in 'meanvol' from flipping the sign.)
0146 %     Default: 1.
0147 %   <hrfthresh> (optional) is an R^2 threshold.  If the R^2 between the estimated
0148 %     HRF and the initial HRF is less than <hrfthresh>, we decide to just use
0149 %     the initial HRF.  Set <hrfthresh> to -Inf if you never want to reject
0150 %     the estimated HRF.  Default: 50.
0151 %   <noisepooldirect> (optional) is {A B} where A is X x Y x Z with 1s indicating the
0152 %     voxels to use as the noise pool and B is a non-negative integer indicating the
0153 %     number of PCs to use.  (B must be less than or equal to <numpcstotry>.)
0154 %     If <noisepooldirect> is supplied, this causes much of the standard GLMdenoise
0155 %     procedure to be bypassed.  For example, cross-validation is no longer necessary
0156 %     and therefore no longer performed.  Thus, one benefit of using <noisepooldirect>
0157 %     is that you can apply GLMdenoisedata.m to a single run of data.  Note that if
0158 %     <noisepooldirect> is supplied, various inputs no longer have any effect
0159 %     (e.g., <brainthresh>, <brainR2>, <brainexclude>, <pcR2cutoff>, <pcR2cutoffmask>,
0160 %     and <pcstop>) and various outputs and figures are omittted.  Default is [] which
0161 %     means to perform the usual GLMdenoise procedure.
0162 %   <wantparametric> (optional) is whether to compute parametric GLM fits and associated
0163 %     error estimates.  These are added as additional outputs in the <results> struct.
0164 %     Default: 0.
0165 %   <wantsanityfigures> (optional) is whether to write out figures that allow you
0166 %     to sanity-check the data.  You may want to set this to 0 to save computational
0167 %     time.  Default: 1.
0168 % <figuredir> (optional) is a directory to which to write figures.  (If the
0169 %   directory does not exist, we create it; if the directory already exists,
0170 %   we delete its contents so we can start afresh.)  If [], no figures are
0171 %   written.  If not supplied, default to 'GLMdenoisefigures' (in the current
0172 %   directory).
0173 %
0174 % Based on the experimental design (<design>, <stimdur>, <tr>) and the
0175 % model specification (<hrfmodel>, <hrfknobs>), fit a GLM model to the
0176 % data (<data>, <xyzsize>) using a denoising strategy.  Figures
0177 % illustrating the results are written to <figuredir>.
0178 %
0179 % Return <results> as a struct containing the following fields:
0180 % <models>, <modelmd>, <modelse>, <R2>, <R2run>, <signal>, <noise>,
0181 %   <SNR>, and <hrffitvoxels> are all like the output of GLMestimatemodel.m
0182 %   (please see that function for details).
0183 % <hrffitvoxels> is X x Y x Z with logical values indicating the voxels that
0184 %   were used to fit the global HRF.  (If <hrfmodel> is not 'optimize',
0185 %   this is returned as [].)
0186 % <meanvol> is X x Y x Z with the mean of all volumes
0187 % <noisepool> is X x Y x Z with logical values indicating the voxels that
0188 %   were selected for the noise pool.
0189 % <pcregressors> indicates the noise regressors that were used
0190 %   to denoise the data.  The format is a cell vector of elements that
0191 %   are each time x regressors.  The number of regressors will be equal
0192 %   to opt.numpcstotry and they are in order (the first PC is the first
0193 %   regressor, the second PC is the second regressor, etc.).  Note that
0194 %   only the first <pcnum> PCs are actually used for the final model.
0195 % <pcR2> is X x Y x Z x (1+opt.numpcstotry) with cross-validated R^2 values for
0196 %   different numbers of PCs.  The first slice corresponds to 0 PCs, the
0197 %   second slice corresponds to 1 PC, the third slice corresponds to
0198 %   2 PCs, etc.
0199 % <pcR2final> is X x Y x Z with cross-validated R^2 values for the final
0200 %   model that is chosen.  Note that this is simply pcR2(:,:,:,1+pcnum).
0201 % <pcvoxels> is X x Y x Z with logical values indicating the voxels that
0202 %   were used to select the number of PCs.
0203 % <pcnum> is the number of PCs that were selected for the final model.
0204 % <pcweights> is X x Y x Z x <pcnum> x R with the estimated weights on the
0205 %   PCs for each voxel and run.
0206 % <SNRbefore> is X x Y x Z with signal-to-noise ratios before denoising
0207 %   (using no noise regressors).
0208 % <SNRafter> is X x Y x Z with signal-to-noise ratios after denoising
0209 %   (using the final number of noise regressors).
0210 % <inputs> is a struct containing all inputs used in the call to this
0211 %   function, excluding <data>.  We additionally include a field called
0212 %   'datasize' which contains the size of each element of <data>.
0213 % <parametric> is a struct that is included if opt.wantparametric.  The fields are:
0214 %   <designmatrix> is time x regressors with the full design matrix of the GLM
0215 %   <parameters> is X x Y x Z x regressors with the beta weight estimate for
0216 %     each regressor
0217 %   <parametersse> is X x Y x Z x regressors with the standard error on the beta weights
0218 %   <noisevar> is X x Y x Z with the estimate of the noise variance (sigma squared)
0219 %   Note that these are in raw units and are not converted into % BOLD change.
0220 %
0221 % Also return <denoiseddata>, which is just like <data> except that the
0222 % component of the data that is estimated to be due to the noise regressors
0223 % is subtracted off.  This may be useful in situations where one wishes to
0224 % treat the denoising as a pre-processing step prior to other analyses
0225 % of the time-series data.  Further customization of the contents of
0226 % <denoiseddata> is controlled by opt.denoisespec.  If you do not need
0227 % <denoiseddata>, do not assign the <denoiseddata> output when you call
0228 % GLMdenoisedata.m (this allows us to save on memory usage).
0229 %
0230 % Description of the denoising procedure:
0231 % 1. Determine HRF.  If <hrfmodel> is 'assume', we just use the HRF
0232 %    specified by the user.  If <hrfmodel> is 'optimize', we perform
0233 %    a full fit of the GLM model to the data, optimizing the shape of
0234 %    the HRF.  If <hrfmodel> is 'fir', we do nothing (since full
0235 %    flexibility in the HRF is allowed for each voxel and each condition).
0236 % 2. Determine cross-validated R^2 values.  We fix the HRF to what is
0237 %    obtained in step 1 and estimate the rest of the GLM model.  Leave-one-
0238 %    run-out cross-validation is performed, and we obtain an estimate of the
0239 %    amount of variance (R^2) that can be predicted by the deterministic
0240 %    portion of the GLM model (the HRF and the amplitudes).
0241 % 3. Determine noise pool.  This is done by calculating a mean volume (the
0242 %    mean across all volumes) and then determining the voxels that
0243 %    satisfy the following criteria:
0244 %    (1) The voxels must have sufficient MR signal, that is, the signal
0245 %        intensity in the mean volume must be above a certain threshold
0246 %        (see opt.brainthresh).
0247 %    (2) The voxels must have cross-validated R^2 values that are
0248 %        below a certain threshold (see opt.brainR2).
0249 %    (3) The voxels must not be listed in opt.brainexclude (which is an
0250 %        optional input that can be specified by the user).
0251 % 4. Determine noise regressors.  This is done by extracting the
0252 %    time-series data for the voxels in the noise pool, projecting out the
0253 %    polynomial nuisance functions from each time-series, normalizing each
0254 %    time-series to be unit length, and then performing PCA.  The top N
0255 %    PCs from each run (where N is equal to opt.numpcstotry) are selected
0256 %    as noise regressors.  Each regressor is scaled to have a standard
0257 %    deviation of 1; this makes it easier to interpret the weights estimated
0258 %    for the regressors.
0259 % 5. Evaluate different numbers of PCs using cross-validation.  We refit
0260 %    the GLM model to the data (keeping the HRF fixed), systematically varying
0261 %    the number of PCs from 1 to N.  For each number of PCs, leave-one-run-out
0262 %    cross-validation is performed.  (Recall that only the deterministic
0263 %    portion of the model is cross-validated; thus, any changes in R^2
0264 %    directly reflect changes in the quality of the amplitude estimates.)
0265 % 6. Choose optimal number of PCs.  To choose the optimal number of PCs,
0266 %    we select a subset of voxels (namely, any voxel that has a cross-validated
0267 %    R^2 value greater than opt.pcR2cutoff (default: 0%) in any of the cases
0268 %    being considered) and then compute the median cross-validated R^2 for these
0269 %    voxels for different numbers of PCs.  Starting from 0 PCs, we select the
0270 %    number of PCs that achieves a cross-validation accuracy within opt.pcstop of
0271 %    the maximum.  (The default for opt.pcstop is 1.05, which means that the
0272 %    chosen number of PCs will be within 5% of the maximum.)
0273 % 7. Fit the final model.  We fit the final GLM model (with the HRF fixed to
0274 %    that obtained in step 1 and with the number of PCs selected in step 6)
0275 %    to the data.  Bootstrapping is used to estimate error bars on
0276 %    amplitude estimates.  (We also fit the final model with no PCs so that
0277 %    we can compare the SNR before and after denoising.)
0278 % 8. Return the de-noised data.  We calculate the component of the data that
0279 %    is due to the noise regressors and return the original time-series
0280 %    data with this component subtracted off.  Note that the other components of
0281 %    the model (the hemodynamic responses evoked by the experiment,
0282 %    the polynomial drift, any extra regressors provided by the user,
0283 %    model residuals) remain in the de-noised data.  To change this behavior,
0284 %    please see the input opt.denoisespec.
0285 %
0286 % Figures:
0287 % - "CheckData/CheckMeanStd_runN.png" shows the mean and standard deviation across voxels
0288 %   of each volume in run N.  This allows one to quickly check the sanity of the data,
0289 %   e.g., with regards to whether weird transient artifacts exist, whether initial
0290 %   magnetization effects are present in the data (the first few volumes should have
0291 %   been dropped to avoid these effects), whether there are non-finite values in the
0292 %   data (there should not be any), and with regards to the units of the data
0293 %   (the data should consist primarily of positive values and in particular, should
0294 %   not be mean-subtracted).
0295 % - "CheckData/CheckDVARS_runN.png" shows the DVARS metric (Power 2012).  Specifically,
0296 %   this is a time-series of the root-mean-square of the difference between pairs of
0297 %   successive volumes.
0298 % - "HRF.png" shows the initial assumed HRF (provided by <hrfknobs>) and the
0299 %   final estimated HRF (as calculated in step 1).  If <hrfmodel> is 'assume',
0300 %   the two plots will be identical.  If <hrfmodel> is 'fir', this figure
0301 %   is not written.
0302 % - "HRFfitmask.png" shows (in white) the mask restricting the voxels that
0303 %   can be chosen for fitting the global HRF.  This figure is written only
0304 %   if <hrfmodel> is 'optimize' and is not written if opt.hrffitmask is 1.
0305 % - "HRFfitvoxels.png" shows (in white) the voxels used to fit the global HRF.
0306 %   This figure is written only if <hrfmodel> is 'optimize' and only if
0307 %   the fitted global HRF is actually chosen for use (in some cases, the
0308 %   initial HRF estimate is chosen; see GLMestimatemodel.m).
0309 % - "PCselection.png" shows for different numbers of PCs, the median
0310 %   cross-validated R^2 across a subset of voxels (namely, those voxels that
0311 %   have greater than opt.pcR2cutoff (default: 0%) R^2 for at least one of
0312 %   the models considered).  The selected number of PCs is circled and
0313 %   indicated in the title of the figure.
0314 % - "PCscatterN.png" shows a scatter plot of cross-validated R^2 values obtained
0315 %   with no PCs against values obtained with N PCs.  The range of the plot
0316 %   is set to the full range of all R^2 values (across all numbers of PCs).
0317 %   Two different sets of points are plotted.  The first set is shown in green,
0318 %   and this is a set of up to 20,000 voxels randomly selected from the
0319 %   entire pool of voxels.  The second set is shown in red, and this is a set of
0320 %   up to 20,000 voxels randomly selected from the set of voxels that
0321 %   were used to select the number of PC regressors.
0322 % - "MeanVolume.png" shows the mean volume (mean of all volumes).
0323 % - "NoisePool.png" shows (in white) the voxels selected for the noise pool.
0324 % - "NoiseExclude.png" shows (in white) voxels that were excluded by the user
0325 %   from entering the noise pool.  This figure is not written if opt.brainexclude is 0.
0326 % - "PCcrossvalidationN.png" shows cross-validated R^2 values obtained with N PCs.
0327 %   The range is 0% to 100%, and the colormap is nonlinearly scaled to enhance
0328 %   visibility.
0329 % - "PCcrossvalidationscaledN.png" shows cross-validated R^2 values obtained with N PCs.
0330 %   This is just like the previous figures except that the minimum and maximum of the
0331 %   color range is set to the 1st and 99th percentile of R^2 values observed across all
0332 %   PCs.  The point of this is to make it easier to visualize datasets where R^2 values
0333 %   are low.  The disadvantage is that these figures cannot be directly compared across
0334 %   different datasets (since the color range may differ).
0335 % - "PCmask.png" shows (in white) the mask restricting the voxels that
0336 %   can be selected for determining the optimal number of PCs.  This figure is
0337 %   not written if opt.pcR2cutoffmask is 1.
0338 % - "PCvoxels.png" shows (in white) the voxels used to determine the optimal
0339 %   number of PCs.
0340 % - "FinalModel.png" shows R^2 values for the final model (as estimated in
0341 %   step 7).  Note that these R^2 values are not cross-validated.
0342 % - "FinalModel_runN.png" shows R^2 values for the final model separated by
0343 %   runs.  For example, FinalModel_run01.png indicates R^2 values calculated
0344 %   over the data in run 1.  This might be useful for deciding post-hoc to
0345 %   exclude certain runs from the analysis.
0346 % - "SNRsignal.png" shows the maximum absolute amplitude obtained (the signal).
0347 %   The range is 0 to the 99th percentile of the values.
0348 % - "SNRnoise.png" shows the average amplitude error (the noise).
0349 %   The range is 0 to the 99th percentile of the values.
0350 % - "SNR.png" shows the signal-to-noise ratio.  The range is 0 to 10.
0351 % - "SNRcomparebeforeandafter.png" shows a scatter plot of SNR values obtained
0352 %   with no denoising (0 PCs) against SNR values obtained with denoising (with
0353 %   the selected number of PCs).  The range of the plot is set to the full range
0354 %   of SNR values.  The green and red coloring of the dots is the same as in the
0355 %   "PCscatterN.png" figures.
0356 % - "SNRamountofdatagained.png" is exactly the same as the previous figure except
0357 %   that the y-axis is now converted into the equivalent amount of data gained.
0358 %   For example, if SNR is boosted from 4 to 8, this is equivalent to having
0359 %   obtained 300% more data than was actually obtained.
0360 % - "PCmap/PCmap_runN_numO.png" shows the estimated weights for the Oth PC
0361 %   for run N.  The range is -A to A where A is the 99th percentile of the
0362 %   absolute value of all weights across all runs.  The colormap proceeds
0363 %   from blue (negative) to black (0) to red (positive).
0364 %
0365 % Additional information:
0366 % - For additional details on model estimation and quantification of model
0367 % accuracy (R^2), please see the documentation provided in GLMestimatemodel.m.
0368 % - To ensure that <SNRbefore> and <SNRafter> are directly comparable, we first
0369 % average the estimated signal across the no-denoise and denoise cases
0370 % and then divide by the estimated noise from each case.  This has the
0371 % consequence that <SNRafter> will not be exactly identical to <SNR>.
0372 % - With regards to the onset-time specification of the experimental design,
0373 % this is implemented by laying down the HRF at the appropriate onset times
0374 % and then using cubic interpolation to obtain the predicted HRF values at the
0375 % times at which data are actually sampled.
0376 %
0377 % History:
0378 % - 2014/08/01: add opt.wantparametric input (which enables parametric GLM fits).
0379 %               add opt.wantsanityfigures input (which allows user to turn off
0380 %               the sanity-check figures).  add new DVARS sanity-check figures.
0381 %               change the form of the polynomials used in the GLM: the
0382 %               polynomials are now constructed to be orthogonal and unit-length.
0383 %               note that this changes previous behavior and due to numerical
0384 %               issues, the outputs given by this function may be slightly
0385 %               different compared to previous results.
0386 % - 2014/07/14: add <noisepooldirect> input; fix some file- and directory-related
0387 %               issues, making things more platform independent
0388 % - 2013/12/11: rename "global noise regressors" to "noise regressors" in the
0389 %               documentation; perform a quick error-check for non-finite values
0390 %               in the first run of data; omit saving of some of the figures if
0391 %               opt.numboots is 0; add some new sanity-check figures; fix minor bug
0392 % - 2013/11/18: update documentation; add opt.pccontrolmode; add opt.hrfthresh;
0393 %               use fullfile for compatibility with different platforms;
0394 %               if pcvoxels is empty, we now fallback to using the top 100 voxels;
0395 %               change the colormap scaling for PCcrossvalidationscaled*.png;
0396 %               other various small tweaks;
0397 % - 2013/05/12: allow <design> to specify onset times
0398 % - 2013/05/12: add opt.brainexclude and associated figure
0399 % - 2013/05/12: add SNRbefore and SNRafter fields and associated figures
0400 % - 2013/05/12: add PCcrossvalidationscaledN.png figures
0401 % - 2013/05/12: update to indicate fractional values in design matrix are allowed.
0402 % - 2013/05/12: regressors that are all zero now receive a 0 weight (instead of crashing)
0403 % - 2013/05/12: add pcR2final as an output
0404 % - 2012/12/06: automatically convert data to single format
0405 % - 2012/12/05: fix minor bug (bad HRF estimate caused figure crash)
0406 % - 2012/12/03: *** Tag: Version 1.02 ***. Use faster OLS computation (less
0407 %   error-checking; program execution will halt if design matrix is singular);
0408 %   documentation tweak; minor bug fix.
0409 % - 2012/11/24:
0410 %   - INPUTS: add opt.hrffitmask; opt.pcR2cutoff; opt.pcR2cutoffmask; opt.pcstop; opt.denoisespec; opt.wantpercentbold;
0411 %   - OUTPUTS: add hrffitvoxels, pcvoxels, pcweights, inputs
0412 %   - FIGURES: HRFfitmask.png, HRFfitvoxels.png, PCmask.png, PCvoxels.png, Signal.png, Noise.png, SNR.png, PCmap*.png
0413 %   - hrfmodel can now be 'fir'!
0414 %   - change default of opt.numpcstotry to 20
0415 %   - PC regressors are now scaled to have standard deviation of 1
0416 %   - xval scatter plots now are divided into red and green dots (and up to 20,000 each)
0417 %   - pcselection figure uses a line now (not bar) and a selection circle is drawn
0418 %   - no more skipping of the denoiseddata computation
0419 % - 2012/11/03 - add a speed-up
0420 % - 2012/11/02 - Initial version.
0421 % - 2012/10/31 - add meanvol and change that it is the mean of all
0422 % - 2012/10/30 - Automatic division of HRF!
0423 
0424 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% DEAL WITH INPUTS, ETC.
0425 
0426 % input
0427 if ~exist('hrfmodel','var') || isempty(hrfmodel)
0428   hrfmodel = 'optimize';
0429 end
0430 if ~exist('hrfknobs','var') || isempty(hrfknobs)
0431   if isequal(hrfmodel,'fir')
0432     hrfknobs = 20;
0433   else
0434     hrfknobs = normalizemax(getcanonicalhrf(stimdur,tr)');
0435   end
0436 end
0437 if ~exist('opt','var') || isempty(opt)
0438   opt = struct();
0439 end
0440 if ~exist('figuredir','var')
0441   figuredir = 'GLMdenoisefigures';
0442 end
0443 
0444 % massage input
0445 if ~iscell(design)
0446   design = {design};
0447 end
0448 if ~iscell(data)
0449   data = {data};
0450 end
0451 for p=1:length(data)
0452   if ~isa(data{p},'single')
0453     fprintf('*** GLMdenoisedata: converting data in run %d to single format (consider doing this before the function call to reduce memory usage). ***\n',p);
0454     data{p} = single(data{p});
0455   end
0456 end
0457 
0458 % do some error checking
0459 if any(flatten(~isfinite(data{1})))
0460   fprintf('*** GLMdenoisedata: WARNING: we checked the first run and found some non-finite values (e.g. NaN, Inf). unexpected results may occur due to non-finite values. please fix and re-run GLMdenoisedata. ***\n');
0461 end
0462 
0463 % calc
0464 numruns = length(design);
0465 dataclass = class(data{1});  % will always be 'single'
0466 is3d = size(data{1},4) > 1;
0467 if is3d
0468   dimdata = 3;
0469   dimtime = 4;
0470   xyzsize = sizefull(data{1},3);
0471 else
0472   dimdata = 1;
0473   dimtime = 2;
0474   xyzsize = size(data{1},1);
0475 end
0476 numvoxels = prod(xyzsize);
0477 
0478 % deal with defaults
0479 if ~isfield(opt,'extraregressors') || isempty(opt.extraregressors)
0480   opt.extraregressors = cell(1,numruns);
0481 end
0482 if ~isfield(opt,'maxpolydeg') || isempty(opt.maxpolydeg)
0483   opt.maxpolydeg = zeros(1,numruns);
0484   for p=1:numruns
0485     opt.maxpolydeg(p) = round(((size(data{p},dimtime)*tr)/60)/2);
0486   end
0487 end
0488 if ~isfield(opt,'seed') || isempty(opt.seed)
0489   opt.seed = sum(100*clock);
0490 end
0491 if ~isfield(opt,'bootgroups') || isempty(opt.bootgroups)
0492   opt.bootgroups = ones(1,numruns);
0493 end
0494 if ~isfield(opt,'numforhrf') || isempty(opt.numforhrf)
0495   opt.numforhrf = 50;
0496 end
0497 if ~isfield(opt,'hrffitmask') || isempty(opt.hrffitmask)
0498   opt.hrffitmask = 1;
0499 end
0500 if ~isfield(opt,'brainthresh') || isempty(opt.brainthresh)
0501   opt.brainthresh = [99 0.5];
0502 end
0503 if ~isfield(opt,'brainR2') || isempty(opt.brainR2)
0504   opt.brainR2 = 0;
0505 end
0506 if ~isfield(opt,'brainexclude') || isempty(opt.brainexclude)
0507   opt.brainexclude = 0;
0508 end
0509 if ~isfield(opt,'numpcstotry') || isempty(opt.numpcstotry)
0510   opt.numpcstotry = 20;
0511 end
0512 if ~isfield(opt,'pcR2cutoff') || isempty(opt.pcR2cutoff)
0513   opt.pcR2cutoff = 0;
0514 end
0515 if ~isfield(opt,'pcR2cutoffmask') || isempty(opt.pcR2cutoffmask)
0516   opt.pcR2cutoffmask = 1;
0517 end
0518 if ~isfield(opt,'pcstop') || isempty(opt.pcstop)
0519   opt.pcstop = 1.05;
0520 end
0521 if ~isfield(opt,'pccontrolmode') || isempty(opt.pccontrolmode)
0522   opt.pccontrolmode = 0;
0523 end
0524 if ~isfield(opt,'numboots') || isempty(opt.numboots)
0525   opt.numboots = 100;
0526 end
0527 if ~isfield(opt,'denoisespec') || (isempty(opt.denoisespec) && ~iscell(opt.denoisespec))
0528   opt.denoisespec = '11101';
0529 end
0530 if ~isfield(opt,'wantpercentbold') || isempty(opt.wantpercentbold)
0531   opt.wantpercentbold = 1;
0532 end
0533 if ~isfield(opt,'hrfthresh') || isempty(opt.hrfthresh)
0534   opt.hrfthresh = 50;
0535 end
0536 if ~isfield(opt,'noisepooldirect') || isempty(opt.noisepooldirect)
0537   opt.noisepooldirect = [];
0538 end
0539 if ~isequal(hrfmodel,'fir')
0540   hrfknobs = normalizemax(hrfknobs);
0541 end
0542 if ~isfield(opt,'wantparametric') || isempty(opt.wantparametric)
0543   opt.wantparametric = 0;
0544 end
0545 if ~isfield(opt,'wantsanityfigures') || isempty(opt.wantsanityfigures)
0546   opt.wantsanityfigures = 1;
0547 end
0548 if length(opt.maxpolydeg) == 1
0549   opt.maxpolydeg = repmat(opt.maxpolydeg,[1 numruns]);
0550 end
0551 if ~iscell(opt.extraregressors)
0552   opt.extraregressors = {opt.extraregressors};
0553 end
0554 if ~iscell(opt.denoisespec)
0555   opt.denoisespec = {opt.denoisespec};
0556 end
0557 
0558 % delete and/or make figuredir
0559 if ~isempty(figuredir)
0560   if exist(figuredir,'dir')
0561     assert(rmdir(figuredir,'s'));
0562   end
0563   assert(mkdir(figuredir));
0564   assert(mkdir(fullfile(figuredir,'PCmap')));
0565   if opt.wantsanityfigures
0566     assert(mkdir(fullfile(figuredir,'CheckData')));
0567   end
0568   figuredir = absolutepath(figuredir);
0569 end
0570 
0571 % whether to bypass a lot of the usual GLMdenoise routine
0572 % (since the noise pool and number of PCs are already supplied by the user)
0573 wantbypass = ~isempty(opt.noisepooldirect);
0574 
0575 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% PRE-FLIGHT SANITY CHECK
0576 
0577 if ~isempty(figuredir) && opt.wantsanityfigures
0578   fprintf('*** GLMdenoisedata: generating sanity-check figures. ***\n');
0579   
0580   % make figure showing mean and std dev of each volume over time
0581   for p=1:length(data)
0582   
0583     % calc
0584     meants = mean(squish(data{p},dimdata),1);
0585     stdts = std(squish(data{p},dimdata),[],1);
0586     dvarts = sqrt(mean(diff(squish(data{p},dimdata),1,2).^2,1));  % RMS of frame-to-frame change
0587 
0588     % make mean+std figure
0589     figureprep([100 100 1000 300]); hold on;
0590     errorbar2(1:length(meants),meants,stdts,'v','r-');
0591     plot(1:length(meants),meants,'r-');
0592     ax = axis; axis([1-10 length(meants)+10 ax(3:4)]);
0593     xlabel('TR');
0594     ylabel('Signal (raw units)');
0595     title(sprintf('Run %d (mean + std dev of each volume)',p));
0596     figurewrite(sprintf('CheckMeanStd_run%02d',p),[],[],fullfile(figuredir,'CheckData'));
0597 
0598     % make dvars figure
0599     figureprep([100 100 1000 300]); hold on;
0600     plot(dvarts,'r-');
0601     ax = axis; axis([1-10 length(dvarts)+10 ax(3:4)]);
0602     xlabel('Volume number');
0603     ylabel('RMS of difference image (raw units)');
0604     title(sprintf('Run %d (DVARS)',p));
0605     figurewrite(sprintf('CheckDVARS_run%02d',p),[],[],fullfile(figuredir,'CheckData'));
0606 
0607   end
0608 
0609 end
0610 
0611 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% DETERMINE HRF
0612 
0613 % if 'optimize', perform full-fit to determine HRF
0614 switch hrfmodel
0615 case 'optimize'
0616   fprintf('*** GLMdenoisedata: performing full fit to estimate global HRF. ***\n');
0617   fullfit = GLMestimatemodel(design,data,stimdur,tr,hrfmodel,hrfknobs,0,opt,[],2);
0618   hrf = fullfit.modelmd{1};
0619   hrffitvoxels = fullfit.hrffitvoxels;
0620   clear fullfit;
0621 
0622 % if 'assume', the HRF is provided by the user
0623 case 'assume'
0624   hrf = hrfknobs;
0625   hrffitvoxels = [];
0626 
0627 % if 'fir', do nothing
0628 case 'fir'
0629   hrffitvoxels = [];
0630   
0631 end
0632 
0633 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% CALCULATE CROSS-VALIDATED R^2 VALUES
0634 
0635 if ~wantbypass
0636 
0637   % perform cross-validation to determine R^2 values
0638   fprintf('*** GLMdenoisedata: performing cross-validation to determine R^2 values. ***\n');
0639   switch hrfmodel
0640   case {'optimize' 'assume'}
0641     xvalfit = GLMestimatemodel(design,data,stimdur,tr,'assume',hrf,-1,opt,[],1);
0642   case 'fir'
0643     xvalfit = GLMestimatemodel(design,data,stimdur,tr,'fir',hrfknobs,-1,opt,[],1);
0644   end
0645   pcR2 = xvalfit.R2;
0646   clear xvalfit;
0647 
0648 end
0649 
0650 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% DETERMINE NOISE POOL AND CALCULATE NOISE REGRESSORS
0651 
0652 % calculate mean volume
0653 volcnt = cellfun(@(x) size(x,dimtime),data);
0654 meanvol = reshape(catcell(2,cellfun(@(x) squish(mean(x,dimtime),dimdata),data,'UniformOutput',0)) ...
0655                   * (volcnt' / sum(volcnt)),[xyzsize 1]);
0656 
0657 if ~wantbypass
0658 
0659   % determine noise pool
0660   fprintf('*** GLMdenoisedata: determining noise pool. ***\n');
0661   thresh = prctile(meanvol(:),opt.brainthresh(1))*opt.brainthresh(2);  % threshold for non-brain voxels
0662   bright = meanvol > thresh;                                           % logical indicating voxels that are bright (brain voxels)
0663   badxval = pcR2 < opt.brainR2;                                        % logical indicating voxels with poor cross-validation accuracy
0664   noisepool = bright & badxval & ~opt.brainexclude;                    % logical indicating voxels that satisfy all criteria
0665 
0666 else
0667 
0668   % the noise pool is specified by the user
0669   noisepool = opt.noisepooldirect{1};
0670 
0671 end
0672   
0673 % determine noise regressors
0674 fprintf('*** GLMdenoisedata: calculating noise regressors. ***\n');
0675 pcregressors = {};
0676 for p=1:length(data)
0677 
0678   % extract the time-series data for the noise pool
0679   temp = subscript(squish(data{p},dimdata),{noisepool ':'})';  % time x voxels
0680 
0681   % project out polynomials from the data
0682   temp = projectionmatrix(constructpolynomialmatrix(size(temp,1),0:opt.maxpolydeg(p))) * temp;
0683 
0684   % unit-length normalize each time-series (ignoring any time-series that are all 0)
0685   [temp,len] = unitlengthfast(temp,1);
0686   temp = temp(:,len~=0);
0687 
0688   % perform SVD and select the top PCs
0689   [u,s,v] = svds(double(temp*temp'),opt.numpcstotry);
0690   u = bsxfun(@rdivide,u,std(u,[],1));  % scale so that std is 1
0691   pcregressors{p} = cast(u,dataclass);
0692 
0693 end
0694 clear temp len u s v;
0695 
0696 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% PERTURB THE NOISE REGRESSORS (IF REQUESTED)
0697 
0698 switch opt.pccontrolmode
0699 
0700 % do nothing (this is the default)
0701 case 0
0702 
0703 % phase-scramble each regressor
0704 case 1
0705 
0706   % for each run
0707   for p=1:length(pcregressors)
0708   
0709     % for each regressor
0710     for q=1:size(pcregressors{p},2)
0711     
0712       % the original regressor
0713       temp = pcregressors{p}(:,q);
0714 
0715       % a sample of white noise
0716       temp2 = randn(size(temp));
0717 
0718       % new regressor has the amplitude spectrum of the original regressor,
0719       % but the phase spectrum of the white noise
0720       pcregressors{p}(:,q) = real(ifft(abs(fft(temp,[],1)) .* exp(j * angle(fft(temp2,[],1)))));
0721 
0722     end
0723 
0724   end
0725   clear temp temp2;
0726 
0727 % shuffle regressors across runs (ensuring none match up to the original assignment)
0728 case 2
0729 
0730   % repeat until we have a satisfactory assignment
0731   while 1
0732   
0733     % generate a random permutation
0734     temp = permutedim(1:length(pcregressors));
0735     
0736     % if none matched the original assignment, we are done
0737     if ~any(temp == (1:length(pcregressors)))
0738       break;
0739     end
0740 
0741   end
0742 
0743   % shuffle the regressors
0744   pcregressors = pcregressors(temp);
0745 
0746 end
0747 
0748 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% ADD NOISE REGRESSORS INTO MODEL AND CHOOSE OPTIMAL NUMBER
0749 
0750 if ~wantbypass
0751 
0752   % perform cross-validation with increasing number of noise regressors
0753   for p=1:opt.numpcstotry
0754     fprintf('*** GLMdenoisedata: performing cross-validation with %d PCs. ***\n',p);
0755     opt2 = opt;
0756     for q=1:numruns
0757       opt2.extraregressors{q} = cat(2,opt2.extraregressors{q},pcregressors{q}(:,1:p));
0758     end
0759     opt2.wantpercentbold = 0;  % no need to do this, so optimize for speed
0760     switch hrfmodel
0761     case {'optimize' 'assume'}
0762       xvalfit = GLMestimatemodel(design,data,stimdur,tr,'assume',hrf,-1,opt2,[],1);
0763     case 'fir'
0764       xvalfit = GLMestimatemodel(design,data,stimdur,tr,'fir',hrfknobs,-1,opt2,[],1);
0765     end
0766     pcR2 = cat(dimdata+1,pcR2,xvalfit.R2);
0767   end
0768   clear xvalfit;
0769 
0770   % prepare to select optimal number of PCs
0771   temp = squish(pcR2,dimdata);  % voxels x 1+pcs
0772   pcvoxels = any(temp > opt.pcR2cutoff,2) & squish(opt.pcR2cutoffmask,dimdata);  % if pcR2cutoffmask is 1, this still works
0773   if ~any(pcvoxels)
0774     warning(['No voxels passed the threshold for the selection of the number of PCs. ' ...
0775              'We fallback to simply using the top 100 voxels (i.e. compute each voxel''s maximum ' ...
0776              'cross-validation accuracy under any number of PCs and then choose the top 100 voxels.']);
0777     if isequal(opt.pcR2cutoffmask,1)
0778       ix = 1:size(temp,1);
0779     else
0780       ix = find(squish(opt.pcR2cutoffmask,dimdata));
0781     end
0782     pcvoxels = logical(zeros(size(temp,1),1));
0783     temp2 = max(temp(ix,:),[],2);  % max cross-validation for each voxel (within mask)
0784     [d,ix2] = sort(temp2,'descend');
0785     pcvoxels(ix(ix2(1:min(100,length(ix2))))) = 1;
0786   end
0787   xvaltrend = median(temp(pcvoxels,:),1);
0788 
0789   % choose number of PCs
0790   chosen = 0;  % this is the fall-back
0791   if opt.pcstop <= 0
0792     chosen = -opt.pcstop;  % in this case, the user decides
0793   else
0794     curve = xvaltrend - xvaltrend(1);  % this is the performance curve that starts at 0 (corresponding to 0 PCs)
0795     mx = max(curve);                   % store the maximum of the curve
0796     best = -Inf;                       % initialize (this will hold the best performance observed thus far)
0797     for p=0:opt.numpcstotry
0798   
0799       % if better than best so far
0800       if curve(1+p) > best
0801     
0802         % record this number of PCs as the best
0803         chosen = p;
0804         best = curve(1+p);
0805       
0806         % if we are within opt.pcstop of the max, then we stop.
0807         if best*opt.pcstop >= mx
0808           break;
0809         end
0810       
0811       end
0812   
0813     end
0814   end
0815 
0816   % record the number of PCs
0817   pcnum = chosen;
0818   fprintf('*** GLMdenoisedata: selected number of PCs is %d. ***\n',pcnum);
0819 
0820 else
0821 
0822   % the number of PCs is specified by the user
0823   pcnum = opt.noisepooldirect{2};
0824   fprintf('*** GLMdenoisedata: user-specified number of PCs is %d. ***\n',pcnum);
0825   
0826 end
0827 
0828 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% FIT FINAL MODEL AND PREPARE OUTPUT
0829 
0830 % fit final model (NO DENOISING)
0831 opt2 = opt;
0832 opt2.wantpercentbold = 0;  % do not do the conversion yet.  we will do it ourselves below.
0833 fprintf('*** GLMdenoisedata: fitting final model (no denoising, for comparison purposes). ***\n');
0834 switch hrfmodel
0835 case {'optimize' 'assume'}
0836   resultsTEMP = GLMestimatemodel(design,data,stimdur,tr,'assume',hrf,opt.numboots,opt2);
0837 case 'fir'
0838   resultsTEMP = GLMestimatemodel(design,data,stimdur,tr,'fir',hrfknobs,opt.numboots,opt2);
0839 end
0840 
0841 % save some results
0842 signal_nodenoise = resultsTEMP.signal;
0843 noise_nodenoise = resultsTEMP.noise;
0844 clear resultsTEMP;
0845 
0846 % fit final model (WITH DENOISING)
0847 opt2 = opt;
0848 for q=1:numruns
0849   opt2.extraregressors{q} = cat(2,opt2.extraregressors{q},pcregressors{q}(:,1:pcnum));
0850 end
0851 opt2.wantpercentbold = 0;  % do not do the conversion yet.  we will do it ourselves below.
0852 fprintf('*** GLMdenoisedata: fitting final model (with denoising). ***\n');
0853 switch hrfmodel     % note that we will use the rawdesign field from the cache output for the parametric fits
0854 case {'optimize' 'assume'}
0855   [results,cache] = GLMestimatemodel(design,data,stimdur,tr,'assume',hrf,opt.numboots,opt2);
0856 case 'fir'
0857   [results,cache] = GLMestimatemodel(design,data,stimdur,tr,'fir',hrfknobs,opt.numboots,opt2);
0858 end
0859 
0860 % prepare additional outputs
0861 results.hrffitvoxels = hrffitvoxels;  % note that this overrides the existing entry in results
0862 results.meanvol = meanvol;
0863 results.noisepool = noisepool;
0864 results.pcregressors = pcregressors;
0865 if ~wantbypass
0866   results.pcR2 = pcR2;
0867   results.pcvoxels = reshape(pcvoxels,[xyzsize 1]);
0868 end
0869 results.pcnum = pcnum;
0870 clear meanvol noisepool pcregressors pcR2 pcnum;
0871 
0872 % prepare SNR comparison
0873 amp = (results.signal + signal_nodenoise)/2;
0874 results.SNRbefore = amp ./ noise_nodenoise;
0875 results.SNRafter = amp ./ results.noise;
0876 clear amp signal_nodenoise noise_nodenoise;
0877 
0878 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% PARAMETRIC FITS AND ERROR ESTIMATES
0879 
0880 if opt.wantparametric
0881 
0882   fprintf('*** GLMdenoisedata: calculating parametric fits and error estimates. ***\n');
0883 
0884   % construct design matrix
0885   temp = {};
0886   for p=1:numruns
0887     numtime = size(data{p},dimtime);
0888     temp{p} = cat(2,constructpolynomialmatrix(numtime,0:opt.maxpolydeg(p)), ...
0889                     opt.extraregressors{p}, ...
0890                     results.pcregressors{p}(:,1:results.pcnum));
0891   end
0892   results.parametric.designmatrix = [catcell(1,cache.rawdesign) blkdiag(temp{:})];  % time x regressors
0893 
0894   % estimate parameters using OLS
0895   results.parametric.parameters = ...
0896     mtimescell(olsmatrix2(results.parametric.designmatrix), ...
0897                cellfun(@(x) squish(x,dimdata)',data,'UniformOutput',0));  % regressors x voxels
0898 
0899   % compute sum of the squares of the residuals (e.g. sum(resid.^2))
0900   sumsq = sum((results.parametric.designmatrix * results.parametric.parameters - ...
0901                catcell(1,cellfun(@(x) squish(x,dimdata)',data,'UniformOutput',0))).^2,1);  % 1 x voxels
0902 
0903   % estimate noise variance (e.g. sum(resid.^2) / (n - rank(X)))
0904   results.parametric.noisevar = ...  % 1 x voxels
0905     sumsq ./ (size(results.parametric.designmatrix,1) - rank(results.parametric.designmatrix));
0906 
0907   % calculate standard error on parameters (e.g. sqrt(sigma^2 * inv(X'*X)))
0908   good = ~all(results.parametric.designmatrix==0,1);
0909   X = results.parametric.designmatrix(:,good);
0910   results.parametric.parametersse = ...  % regressors x voxels
0911     sqrt(bsxfun(@times,results.parametric.noisevar, ...
0912                 copymatrix(zeros(size(results.parametric.designmatrix,2),1),good,diag(inv(X'*X)))));
0913   
0914   % clean up
0915   clear temp sumsq good X;
0916   
0917   % prepare for output
0918   results.parametric.parameters =   reshape(results.parametric.parameters',  [xyzsize size(results.parametric.parameters,1)]);
0919   results.parametric.noisevar =     reshape(results.parametric.noisevar,     [xyzsize 1]);
0920   results.parametric.parametersse = reshape(results.parametric.parametersse',[xyzsize size(results.parametric.parametersse,1)]);
0921 
0922 end
0923 
0924 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% CALCULATE DENOISED DATA AND PCWEIGHTS
0925 
0926 fprintf('*** GLMdenoisedata: calculating denoised data and PC weights. ***\n');
0927 
0928 % for each run, perform regression to figure out the various contributions
0929 denoiseddata = cell(length(opt.denoisespec),numruns);
0930 for p=1:numel(denoiseddata)
0931   denoiseddata{p} = cast(denoiseddata{p},dataclass);
0932 end
0933 results.pcweights = zeros([numvoxels results.pcnum numruns],dataclass);
0934 for p=1:numruns
0935 
0936   % calc
0937   numtime = size(data{p},dimtime);
0938 
0939   % calculate signal contribution
0940   modelcomponent = GLMpredictresponses(results.modelmd,{design{p}},tr,numtime,dimdata);
0941   modelcomponent = modelcomponent{1};  % X x Y x Z x T
0942 
0943   % prepare polynomial regressors
0944   polymatrix = constructpolynomialmatrix(numtime,0:opt.maxpolydeg(p));
0945   numpoly = size(polymatrix,2);
0946 
0947   % prepare other regressors
0948   exmatrix = opt.extraregressors{p};
0949   numex = size(exmatrix,2);
0950 
0951   % prepare noise regressors
0952   pcmatrix = results.pcregressors{p}(:,1:results.pcnum);
0953   numpc = size(pcmatrix,2);
0954 
0955   % estimate weights
0956   h = olsmatrix2(cat(2,polymatrix,exmatrix,pcmatrix))*squish(data{p} - modelcomponent,dimdata)';  % parameters x voxels
0957 
0958   % record weights on noise regressors
0959   results.pcweights(:,:,p) = h(numpoly+numex+(1:numpc),:)';
0960   
0961   % construct time-series
0962   polycomponent = reshape((polymatrix*h(1:numpoly,:))',[xyzsize numtime]);
0963   if numex == 0
0964     excomponent = zeros([xyzsize numtime],dataclass);
0965   else
0966     excomponent = reshape((exmatrix*h(numpoly+(1:numex),:))',[xyzsize numtime]);
0967   end
0968   if numpc == 0
0969     pccomponent = zeros([xyzsize numtime],dataclass);
0970   else
0971     pccomponent = reshape((pcmatrix*h(numpoly+numex+(1:numpc),:))',[xyzsize numtime]);
0972   end
0973   residcomponent = data{p} - (modelcomponent + polycomponent + excomponent + pccomponent);
0974   
0975   % construct denoised data (but don't bother if the user doesn't want it)
0976   if nargout > 1
0977     for q=1:length(opt.denoisespec)
0978       denoiseddata{q,p} = bitget(bin2dec(opt.denoisespec{q}),5) * modelcomponent + ...
0979                          bitget(bin2dec(opt.denoisespec{q}),4) * polycomponent + ...
0980                          bitget(bin2dec(opt.denoisespec{q}),3) * excomponent + ...
0981                          bitget(bin2dec(opt.denoisespec{q}),2) * pccomponent + ...
0982                          bitget(bin2dec(opt.denoisespec{q}),1) * residcomponent;
0983     end
0984   end
0985 
0986 end
0987 
0988 % clean up
0989 clear modelcomponent h polycomponent excomponent pccomponent residcomponent;
0990 
0991 % prepare for output
0992 results.pcweights = reshape(results.pcweights,[xyzsize results.pcnum numruns]);
0993 
0994 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% PREPARE ADDITIONAL OUTPUTS
0995 
0996 % return all the inputs (except for the data) in the output.
0997 % also, include a new field 'datasize'.
0998 results.inputs.design = design;
0999 results.inputs.datasize = cellfun(@(x) size(x),data,'UniformOutput',0);
1000 results.inputs.stimdur = stimdur;
1001 results.inputs.tr = tr;
1002 results.inputs.hrfmodel = hrfmodel;
1003 results.inputs.hrfknobs = hrfknobs;
1004 results.inputs.opt = opt;
1005 results.inputs.figuredir = figuredir;
1006 
1007 if ~wantbypass
1008   % prepare pcR2final
1009   results.pcR2final = subscript(results.pcR2,[repmat({':'},[1 dimdata]) {1+results.pcnum}]);
1010 end
1011 
1012 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% CONVERT TO % BOLD CHANGE
1013 
1014 if opt.wantpercentbold
1015   fprintf('*** GLMdenoisedata: converting to percent BOLD change. ***\n');
1016   con = 1./abs(results.meanvol) * 100;
1017   switch hrfmodel
1018   case 'fir'
1019     for p=1:size(results.models,dimdata+3)  % ugly to save MEMORY
1020       if dimdata==3
1021         results.models(:,:,:,:,:,p) = bsxfun(@times,results.models(:,:,:,:,:,p),con);
1022       else
1023         results.models(:,:,:,p) = bsxfun(@times,results.models(:,:,:,p),con);
1024       end
1025     end
1026     results.modelmd = bsxfun(@times,results.modelmd,con);
1027     results.modelse = bsxfun(@times,results.modelse,con);
1028   case {'assume' 'optimize'}
1029     for p=1:size(results.models{2},dimdata+2)  % ugly to save MEMORY
1030       if dimdata==3
1031         results.models{2}(:,:,:,:,p) = bsxfun(@times,results.models{2}(:,:,:,:,p),con);
1032       else
1033         results.models{2}(:,:,p) = bsxfun(@times,results.models{2}(:,:,p),con);
1034       end
1035     end
1036     results.modelmd{2} = bsxfun(@times,results.modelmd{2},con);
1037     results.modelse{2} = bsxfun(@times,results.modelse{2},con);
1038   end
1039 end
1040 
1041 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% GENERATE FIGURES
1042 
1043 if ~isempty(figuredir)
1044   fprintf('*** GLMdenoisedata: generating figures. ***\n');
1045 
1046   % make figure showing HRF
1047   if ~isequal(hrfmodel,'fir') && length(hrfknobs) > 1
1048     figureprep([100 100 450 250]); hold on;
1049     numinhrf = length(hrfknobs);
1050     h1 = plot(0:tr:(numinhrf-1)*tr,hrfknobs,'ro-');
1051     h2 = plot(0:tr:(numinhrf-1)*tr,results.modelmd{1},'bo-');
1052     ax = axis; axis([0 (numinhrf-1)*tr ax(3) 1.2]);
1053     straightline(0,'h','k-');
1054     legend([h1 h2],{'Initial HRF' 'Estimated HRF'});
1055     xlabel('Time from condition onset (s)');
1056     ylabel('Response');
1057     figurewrite('HRF',[],[],figuredir);
1058   end
1059   
1060   % write out image showing HRF fit voxels
1061   if isequal(hrfmodel,'optimize') && ~isempty(results.hrffitvoxels)
1062     imwrite(uint8(255*makeimagestack(results.hrffitvoxels,[0 1])),gray(256),fullfile(figuredir,'HRFfitvoxels.png'));
1063   end
1064 
1065   if ~wantbypass
1066 
1067     % make figure illustrating selection of number of PCs
1068     figureprep([100 100 400 400]); hold on;
1069     plot(0:opt.numpcstotry,xvaltrend,'r.-');
1070     set(scatter(results.pcnum,xvaltrend(1+results.pcnum),100,'ro'),'LineWidth',2);
1071     set(gca,'XTick',0:opt.numpcstotry);
1072     xlabel('Number of PCs');
1073     ylabel('Cross-validated R^2 (median across voxels)');
1074     title(sprintf('Selected PC number = %d',results.pcnum));
1075     figurewrite('PCselection',[],[],figuredir);
1076   
1077     % make figure showing scatter plots of cross-validated R^2
1078     rng = [min(results.pcR2(:)) max(results.pcR2(:))];
1079     for p=1:opt.numpcstotry
1080       temp = squish(results.pcR2,dimdata);  % voxels x 1+pcs
1081       figureprep([100 100 500 500]); hold on;
1082       scattersparse(temp(:,1),temp(:,1+p),20000,0,36,'g','.');
1083       scattersparse(temp(pcvoxels,1),temp(pcvoxels,1+p),20000,0,36,'r','.');
1084       axis([rng rng]); axissquarify; axis([rng rng]); 
1085       straightline(0,'h','y-');
1086       straightline(0,'v','y-');
1087       xlabel('Cross-validated R^2 (0 PCs)');
1088       ylabel(sprintf('Cross-validated R^2 (%d PCs)',p));
1089       title(sprintf('Number of PCs = %d',p));
1090       figurewrite(sprintf('PCscatter%02d',p),[],[],figuredir);
1091     end
1092 
1093   end
1094 
1095   % write out image showing mean volume (of first run)
1096   imwrite(uint8(255*makeimagestack(results.meanvol,1)),gray(256),fullfile(figuredir,'MeanVolume.png'));
1097 
1098   % write out image showing noise pool
1099   imwrite(uint8(255*makeimagestack(results.noisepool,[0 1])),gray(256),fullfile(figuredir,'NoisePool.png'));
1100 
1101   % write out image showing voxels excluded from noise pool
1102   if ~isequal(opt.brainexclude,0)
1103     imwrite(uint8(255*makeimagestack(opt.brainexclude,[0 1])),gray(256),fullfile(figuredir,'NoiseExclude.png'));
1104   end
1105 
1106   % write out image showing voxel mask for HRF fitting
1107   if isequal(hrfmodel,'optimize') && ~isequal(opt.hrffitmask,1)
1108     imwrite(uint8(255*makeimagestack(opt.hrffitmask,[0 1])),gray(256),fullfile(figuredir,'HRFfitmask.png'));
1109   end
1110 
1111   % define a function that will write out R^2 values to an image file
1112   imfun = @(results,filename) ...
1113     imwrite(uint8(255*makeimagestack(signedarraypower(results/100,0.5),[0 1])),hot(256),filename);
1114 
1115   if ~wantbypass
1116 
1117     % write out image showing voxel mask for PC selection
1118     if ~isequal(opt.pcR2cutoffmask,1)
1119       imwrite(uint8(255*makeimagestack(opt.pcR2cutoffmask,[0 1])),gray(256),fullfile(figuredir,'PCmask.png'));
1120     end
1121   
1122     % write out image showing the actual voxels used for PC selection
1123     imwrite(uint8(255*makeimagestack(results.pcvoxels,[0 1])),gray(256),fullfile(figuredir,'PCvoxels.png'));
1124 
1125     % figure out bounds for the R^2 values
1126     bounds = prctile(results.pcR2(:),[1 99]);
1127     if bounds(1)==bounds(2)  % a hack to avoid errors in normalization
1128       bounds(2) = bounds(1) + 1;
1129     end
1130 
1131     % define another R^2 image-writing function
1132     imfunB = @(results,filename) ...
1133       imwrite(uint8(255*makeimagestack(signedarraypower(normalizerange(results,0,1,bounds(1),bounds(2)),0.5),[0 1])),hot(256),filename);
1134 
1135     % write out cross-validated R^2 for the various numbers of PCs
1136     for p=1:size(results.pcR2,dimdata+1)
1137       temp = subscript(results.pcR2,[repmat({':'},[1 dimdata]) {p}]);
1138       feval(imfun,temp,fullfile(figuredir,sprintf('PCcrossvalidation%02d.png',p-1)));
1139       feval(imfunB,temp,fullfile(figuredir,sprintf('PCcrossvalidationscaled%02d.png',p-1)));
1140     end
1141 
1142   end
1143 
1144   % write out overall R^2 for final model
1145   feval(imfun,results.R2,fullfile(figuredir,sprintf('FinalModel.png')));
1146 
1147   % write out R^2 separated by runs for final model
1148   for p=1:size(results.R2run,dimdata+1)
1149     temp = subscript(results.R2run,[repmat({':'},[1 dimdata]) {p}]);
1150     feval(imfun,temp,fullfile(figuredir,sprintf('FinalModel_run%02d.png',p)));
1151   end
1152   
1153   % write out signal, noise, and SNR
1154   imwrite(uint8(255*makeimagestack(results.signal,[0 prctile(results.signal(:),99)])),hot(256),fullfile(figuredir,'SNRsignal.png'));
1155   if opt.numboots ~= 0
1156     imwrite(uint8(255*makeimagestack(results.noise,[0 max(eps,prctile(results.noise(:),99))])),hot(256),fullfile(figuredir,'SNRnoise.png'));
1157     imwrite(uint8(255*makeimagestack(results.SNR,[0 10])),hot(256),fullfile(figuredir,'SNR.png'));
1158   end
1159   
1160   % write out SNR comparison figures (first figure)
1161   if opt.numboots ~= 0
1162     rng = [min([results.SNRbefore(:); results.SNRafter(:)]) max([results.SNRbefore(:); results.SNRafter(:)])];
1163     if ~all(isfinite(rng))  % hack to deal with cases of no noise estimate
1164       rng = [0 1];
1165     end
1166     figureprep([100 100 500 500]); hold on;
1167     scattersparse(results.SNRbefore(:),results.SNRafter(:),20000,0,36,'g','.');
1168     if ~wantbypass
1169       scattersparse(results.SNRbefore(pcvoxels),results.SNRafter(pcvoxels),20000,0,36,'r','.');
1170     end
1171     axis([rng rng]); axissquarify; axis([rng rng]);
1172     xlabel('SNR (before denoising)');
1173     ylabel('SNR (after denoising)');
1174     figurewrite(sprintf('SNRcomparebeforeandafter'),[],[],figuredir);
1175   end
1176   
1177   % write out SNR comparison figures (second figure)
1178   if opt.numboots ~= 0
1179     datagain = ((results.SNRafter./results.SNRbefore).^2 - 1) * 100;
1180     figureprep([100 100 500 500]); hold on;
1181     scattersparse(results.SNRbefore(:),datagain(:),20000,0,36,'g','.');
1182     if ~wantbypass
1183       scattersparse(results.SNRbefore(pcvoxels),datagain(pcvoxels),20000,0,36,'r','.');
1184     end
1185     ax = axis; axis([rng ax(3:4)]);
1186     xlabel('SNR (before denoising)');
1187     ylabel('Equivalent amount of data gained (%)');
1188     figurewrite(sprintf('SNRamountofdatagained'),[],[],figuredir);
1189   end
1190   
1191   % write out maps of pc weights
1192   thresh = prctile(abs(results.pcweights(:)),99);
1193   for p=1:size(results.pcweights,dimdata+1)
1194     for q=1:size(results.pcweights,dimdata+2)
1195       temp = subscript(results.pcweights,[repmat({':'},[1 dimdata]) {p} {q}]);
1196       imwrite(uint8(255*makeimagestack(temp,[-thresh thresh])),cmapsign(256), ...
1197               fullfile(figuredir,'PCmap',sprintf('PCmap_run%02d_num%02d.png',q,p)));
1198     end
1199   end
1200 
1201 end

GLMdenoisedata

PURPOSE

SYNOPSIS

DESCRIPTION

CROSS-REFERENCE INFORMATION

SOURCE CODE