Home > analyzePRF > utilities > fitnonlinearmodel.m

fitnonlinearmodel

PURPOSE ^

function results = fitnonlinearmodel(opt,chunksize,chunknum)

SYNOPSIS ^

function results = fitnonlinearmodel(opt,chunksize,chunknum)

DESCRIPTION ^

 function results = fitnonlinearmodel(opt,chunksize,chunknum)

 <opt> is a struct with the following fields (or a .mat file with 'opt'):

   *** OUTPUT DIRECTORY ***
   <outputdir> (optional) is the directory to save results to

   *** STIMULUS ***
   <stimulus> is:
     (1) a matrix with time points x components
     (2) a cell vector of (1) indicating different runs
     (3) a function that returns (1) or (2)

   *** DATA ***
   <data> is:
     (1) a matrix with time points x voxels
     (2) a cell vector of (1) indicating different runs
     (3) a function that returns (1) or (2)
     (4) a function that accepts a vector of voxel indices and returns (1) or (2)
         corresponding to those voxels.  in this case, <vxs> must be supplied.
   <vxs> (optional) is:
     (1) a vector of all voxel indices that you wish to analyze.  (If you use
         the chunking mechanism (<chunksize>, <chunknum>), then a subset of these
         voxels are analyzed in any given function call.)  Note that we automatically
         sort the voxel indices and ensure uniqueness.
     (2) a .mat file with 'vxs' as (1)
     this input matters only if <data> is of case (4).

   *** MODEL ***
   <model> is:
     {X Y Z W} where
       X is the initial seed (1 x P).
       Y are the bounds (2 x P).  NaNs in the first row indicate parameters to fix.
       Z is a function that accepts two arguments, parameters (1 x P) and 
         stimuli (N x C), and outputs predicted responses (N x 1).
       W (optional) is a function that transforms stimuli into a new form prior 
         to model evaluation.
    OR
     {M1 M2 M3 ...} where M1 is of the form {X Y Z W} described above,
       and the remaining Mi are of the form {F G H I} where
       F is a function that takes fitted parameters (1 x P) from the previous model
         and outputs an initial seed (1 x Pnew) for the current model
       G are the bounds (2 x Pnew).  NaNs in the first row indicate parameters to fix.
       H is a function that takes fitted parameters (1 x P) from the previous model
         and outputs a function that accepts two arguments, parameters (1 x Pnew) and
         stimuli (N x C), and outputs predicted responses (N x 1).
       I (optional) is a function that takes fitted parameters (1 x P) from the 
         previous model and outputs a function that transforms stimuli into a 
         new form prior to model evaluation.
    OR
     M where M is a function that takes stimuli (N x C) and responses (N x 1) and
       outputs an estimate of the linear weights (1 x C).  For example, simple
       OLS regression is the case where M is @(X,y) (inv(X'*X)*X'*y)'.
       This case is referred to as the linear-model case.

   *** SEED ***
   <seed> (optional) is:
     (1) the initial seed (1 x P)
     (2) several initial seeds to try (Q x P) in order to find the one that
         produces the least error
     (3) a function that accepts a single voxel index and returns (1) or (2).
         in this case, <vxs> must be supplied.
     If supplied, <seed> overrides the contents of X in <model>.
     In the linear-model case, <seed> is not applicable and should be [].

   *** OPTIMIZATION OPTIONS ***
   <optimoptions> (optional) are optimization options in the form used by optimset.m.
     Can also be a cell vector with option/value pairs, in which these are applied
     after the default optimization options.  The default options are:
       optimset('Display','iter','FunValCheck','on', ...
                'MaxFunEvals',Inf,'MaxIter',Inf, ...
                'TolFun',1e-6,'TolX',1e-6, ...
                'OutputFcn',@(a,b,c) outputfcnsanitycheck(a,b,c,1e-6,10))
     In particular, it may be useful to specify a specific optimization algorithm to use.
     In the linear-model case, <optimoptions> is ignored.
   <outputfcn> (optional) is a function suitable for use as an 'OutputFcn'.  If you
     supply <outputfcn>, it will take precedence over any 'OutputFcn' in <optimoptions>.
     The reason for <outputfcn> is that the data points being fitted will be passed as a
     fourth argument to <outputfcn> (if <outputfcn> accepts four arguments).  This 
     enables some useful functionality such as being able to visualize the model and
     the data during the optimization.
     In the linear-model case, <outputfcn> is ignored.

   *** RESAMPLING SCHEMES ***
   <wantresampleruns> (optional) is whether to resample at the level of runs (as opposed
     to the level of individual data points).  If only one run of data is supplied, the
     default is 0 (resample data points).  If more than one run of data is supplied, the
     default is 1 (resample runs).
   <resampling> (optional) is:
     0 means fit fully (no bootstrapping nor cross-validation)
     B or {B SEED GROUP} indicates to perform B bootstraps, using SEED as the random
       number seed, and GROUP as the grouping to use.  GROUP should be a vector of
       positive integers.  For example, [1 1 1 2 2 2] means to draw six bootstrap
       samples in total, with three bootstrap samples from the first three cases and
       three bootstrap samples from the second three cases.  If SEED is not provided,
       the default is sum(100*clock).  If GROUP is not provided, the default is ones(1,D)
       where D is the total number of runs or data points.
     V where V is a matrix of dimensions (cross-validation schemes) x (runs or data
       points).  Each row indicates a distinct cross-validation scheme, where 1 indicates
       training, -1 indicates testing, and 0 indicates to not use.  For example, 
       [1 1 -1 -1 0] specifies a scheme where the first two runs (or data points) are 
       used for training and the second two runs (or data points) are used for testing.
     Default: 0.

   *** METRIC ***
   <metric> (optional) determine how model performance is quantified.  <metric> should
     be a function that accepts two column vectors (the first is the model; the second 
     is the data) and outputs a number.  Default: @calccorrelation.

   *** ADDITIONAL REGRESSORS ***
   <maxpolydeg> (optional) is a non-negative integer with the maximum polynomial degree
     to use for polynomial nuisance functions.  The polynomial nuisance functions are
     constructed on a per-run basis.  <maxpolydeg> can be a vector to indicate different
     degrees for different runs.  A special case is NaN which means to omit polynomials.
     Default: NaN.
   <wantremovepoly> (optional) is whether to project the polynomials out from both the
     model and the data before computing <metric>.  Default: 1.
   <extraregressors> (optional) is:
     (1) a matrix with time points x regressors
     (2) a cell vector of (1) indicating different runs
     (3) a function that returns (1) or (2)
     Note that a separate set of regressors must be supplied for each run.  The number
     of regressors does not have to be the same across runs.
   <wantremoveextra> (optional) is whether to project the extraregressors out from
     both the model and the data before computing <metric>.  Default: 1.

   *** OUTPUT-RELATED ***
   <dontsave> (optional) is a string or a cell vector of strings indicating outputs
     to omit when returning.  For example, you may want to omit 'testdata', 'modelpred',
     'modelfit', 'numiters', and 'resnorms' since they may use a lot of memory.  
     If [] or not supplied, then we use the default of {'modelfit' 'numiters' 'resnorms'}.
     If {}, then we will return all outputs.  Note: <dontsave> can also refer to 
     auxiliary variables that are saved to the .mat files when <outputdir> is used.
   <dosave> (optional) is just like 'dontsave' except that the outputs specified here
     are guaranteed to be returned.  (<dosave> takes precedence over <dontsave>.)
     Default is {}.

 <chunksize> (optional) is the number of voxels to process in a single function call.
   The default is to process all voxels.
 <chunknum> (optional) is the chunk number to process.  Default: 1.

 This function, fitnonlinearmodel.m, is essentially a wrapper around MATLAB's
 lsqcurvefit.m function for the purposes of fitting nonlinear (and linear) models
 to data.

 This function provides the following key benefits:
 - Deals with input and output issues (making it easy to process many individual
   voxels and evaluate different models)
 - Deals with resampling (cross-validation and bootstrapping)
 - In the case of nonlinear models, makes it easy to evaluate multiple initial 
   seeds (to avoid local minima)
 - In the case of nonlinear models, makes it easy to perform stepwise fitting of models

 Outputs:
 - 'params' is resampling cases x parameters x voxels.
     These are the estimated parameters from each resampling scheme for each voxel.
 - 'trainperformance' is resampling cases x voxels.
     This is the performance of the model on the training data under each resampling
     scheme for each voxel.
 - 'testperformance' is resampling cases x voxels.
     This is the performance of the model on the testing data under each resampling
     scheme for each voxel.
 - 'aggregatedtestperformance' is 1 x voxels.
     This is the performance of the model on the testing data, after aggregating
     the data and model predictions across the resampling schemes.
 - 'testdata' is time points x voxels.
     This is the aggregated testing data across the resampling schemes.
 - 'modelpred' is time points x voxels.
     This is the aggregated model predictions across the resampling schemes.
 - 'modelfit' is resampling cases x time points x voxels.
     This is the model fit for each resampling scheme.  Here, by "model fit"
     we mean the fit for each of the original stimuli based on the parameters
     estimated in a given resampling case; we do not mean the fit for each of the 
     stimuli involved in the fitting.  (For example, if there are 100 stimuli and 
     we are performing cross-validation, there will nevertheless be 100 time points
     in 'modelfit'.)  Also, note that 'modelfit' is the raw fit; it is not adjusted
     for <wantremovepoly> and <wantremoveextra>.
 - 'numiters' is a cell vector of dimensions 1 x voxels.  Each element is
     is resampling cases x seeds x models.  These are the numbers of iterations
     used in the optimizations.  Note that 'numiters' is [] in the linear-model case.
 - 'resnorms' is a cell vector of dimensions 1 x voxels.  Each element is
     is resampling cases x seeds.  These are the residual norms obtained
     in the optimizations.  This is useful for diagnosing multiple-seed issues.
     Note that 'resnorms' is [] in the linear-model case.

 Notes:
 - Since we use %06d.mat to name output files, you should use no more than 999,999 chunks.
 - <chunksize> and <chunknum> can be strings (if so, they will be passed to str2double).
 - <stimulus> can actually have multiple frames in the third dimension.  This is handled
   by making it such that the prediction for a given data point is calculated as the
   average of the predicted responses for the individual stimulus frames associated with
   that data point.
 - In the case of nonlinear models, to control the scale of the computations, in the 
   optimization call we divide the data by its standard deviation and apply the exact 
   same scaling to the model.  This has the effect of controlling the scale of the 
   residuals.  This last-minute scaling should have no effect on the final parameter estimates.

 History:
 - 2014/05/01 - change the main loop to parfor; some cosmetic tweaks;
                now, if no parameters are to be optimized, just return the initial seed
 - 2013/10/02 - implement the linear-model case
 - 2013/09/07 - fix bug (if polynomials or extra regressors were used in multiple runs,
                then they were not getting fit properly).
 - 2013/09/07 - in fitnonlinearmodel_helper.m, convert to double in the call to lsqcurvefit;
                and perform a speed-up (don't compute modelfit if unwanted)
 - 2013/09/04 - add totalnumvxs variable
 - 2013/09/03 - allow <dontsave> to refer to the auxiliary variables
 - 2013/09/02 - add 'modelfit' and adjust default for 'dontsave'; add 'dosave'
 - 2013/08/28 - new outputs 'resnorms' and 'numiters'; last-minute data scaling; 
                tweak default handling of 'dontsave'
 - 2013/08/18 - Initial version.

 Example 1:
 
 % first, a simple example
 x = randn(100,1);
 y = 2*x + 3 + randn(100,1);
 opt = struct( ...
   'stimulus',[x ones(100,1)], ...
   'data',y, ...
   'model',{{[1 1] [-Inf -Inf; Inf Inf] @(pp,dd) dd*pp'}});
 results = fitnonlinearmodel(opt);
 
 % now, try 100 bootstraps
 opt.resampling = 100;
 opt.optimoptions = {'Display' 'off'};  % turn off reporting
 results = fitnonlinearmodel(opt);
 
 % now, try leave-one-out cross-validation
 opt.resampling = -(2*(eye(100) - 0.5));
 results = fitnonlinearmodel(opt);
 
 Example 2:
 
 % try a more complicated example.  we use 'outputfcn' to 
 % visualize the data and model during the optimization.
 x = (1:.1:10)';
 y = evalgaussian1d([5 1 4 0],x);
 y = y + randn(size(y));
 opt = struct( ...
   'stimulus',x, ...
   'data',y, ...
   'model',{{[1 2 1 0] [repmat(-Inf,[1 4]); repmat(Inf,[1 4])] ...
             @(pp,dd) evalgaussian1d(pp,dd)}}, ...
   'outputfcn',@(a,b,c,d) pause2(.1) | outputfcnsanitycheck(a,b,c,1e-6,10) | outputfcnplot(a,b,c,1,d));
 results = fitnonlinearmodel(opt);

 Example 3:

 % same as the first example in Example 1, but now we use the
 % linear-model functionality
 x = randn(100,1);
 y = 2*x + 3 + randn(100,1);
 opt = struct( ...
   'stimulus',[x ones(100,1)], ...
   'data',y, ...
   'model',@(X,y) (inv(X'*X)*X'*y)');
 results = fitnonlinearmodel(opt);

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function results = fitnonlinearmodel(opt,chunksize,chunknum)
0002 
0003 % function results = fitnonlinearmodel(opt,chunksize,chunknum)
0004 %
0005 % <opt> is a struct with the following fields (or a .mat file with 'opt'):
0006 %
0007 %   *** OUTPUT DIRECTORY ***
0008 %   <outputdir> (optional) is the directory to save results to
0009 %
0010 %   *** STIMULUS ***
0011 %   <stimulus> is:
0012 %     (1) a matrix with time points x components
0013 %     (2) a cell vector of (1) indicating different runs
0014 %     (3) a function that returns (1) or (2)
0015 %
0016 %   *** DATA ***
0017 %   <data> is:
0018 %     (1) a matrix with time points x voxels
0019 %     (2) a cell vector of (1) indicating different runs
0020 %     (3) a function that returns (1) or (2)
0021 %     (4) a function that accepts a vector of voxel indices and returns (1) or (2)
0022 %         corresponding to those voxels.  in this case, <vxs> must be supplied.
0023 %   <vxs> (optional) is:
0024 %     (1) a vector of all voxel indices that you wish to analyze.  (If you use
0025 %         the chunking mechanism (<chunksize>, <chunknum>), then a subset of these
0026 %         voxels are analyzed in any given function call.)  Note that we automatically
0027 %         sort the voxel indices and ensure uniqueness.
0028 %     (2) a .mat file with 'vxs' as (1)
0029 %     this input matters only if <data> is of case (4).
0030 %
0031 %   *** MODEL ***
0032 %   <model> is:
0033 %     {X Y Z W} where
0034 %       X is the initial seed (1 x P).
0035 %       Y are the bounds (2 x P).  NaNs in the first row indicate parameters to fix.
0036 %       Z is a function that accepts two arguments, parameters (1 x P) and
0037 %         stimuli (N x C), and outputs predicted responses (N x 1).
0038 %       W (optional) is a function that transforms stimuli into a new form prior
0039 %         to model evaluation.
0040 %    OR
0041 %     {M1 M2 M3 ...} where M1 is of the form {X Y Z W} described above,
0042 %       and the remaining Mi are of the form {F G H I} where
0043 %       F is a function that takes fitted parameters (1 x P) from the previous model
0044 %         and outputs an initial seed (1 x Pnew) for the current model
0045 %       G are the bounds (2 x Pnew).  NaNs in the first row indicate parameters to fix.
0046 %       H is a function that takes fitted parameters (1 x P) from the previous model
0047 %         and outputs a function that accepts two arguments, parameters (1 x Pnew) and
0048 %         stimuli (N x C), and outputs predicted responses (N x 1).
0049 %       I (optional) is a function that takes fitted parameters (1 x P) from the
0050 %         previous model and outputs a function that transforms stimuli into a
0051 %         new form prior to model evaluation.
0052 %    OR
0053 %     M where M is a function that takes stimuli (N x C) and responses (N x 1) and
0054 %       outputs an estimate of the linear weights (1 x C).  For example, simple
0055 %       OLS regression is the case where M is @(X,y) (inv(X'*X)*X'*y)'.
0056 %       This case is referred to as the linear-model case.
0057 %
0058 %   *** SEED ***
0059 %   <seed> (optional) is:
0060 %     (1) the initial seed (1 x P)
0061 %     (2) several initial seeds to try (Q x P) in order to find the one that
0062 %         produces the least error
0063 %     (3) a function that accepts a single voxel index and returns (1) or (2).
0064 %         in this case, <vxs> must be supplied.
0065 %     If supplied, <seed> overrides the contents of X in <model>.
0066 %     In the linear-model case, <seed> is not applicable and should be [].
0067 %
0068 %   *** OPTIMIZATION OPTIONS ***
0069 %   <optimoptions> (optional) are optimization options in the form used by optimset.m.
0070 %     Can also be a cell vector with option/value pairs, in which these are applied
0071 %     after the default optimization options.  The default options are:
0072 %       optimset('Display','iter','FunValCheck','on', ...
0073 %                'MaxFunEvals',Inf,'MaxIter',Inf, ...
0074 %                'TolFun',1e-6,'TolX',1e-6, ...
0075 %                'OutputFcn',@(a,b,c) outputfcnsanitycheck(a,b,c,1e-6,10))
0076 %     In particular, it may be useful to specify a specific optimization algorithm to use.
0077 %     In the linear-model case, <optimoptions> is ignored.
0078 %   <outputfcn> (optional) is a function suitable for use as an 'OutputFcn'.  If you
0079 %     supply <outputfcn>, it will take precedence over any 'OutputFcn' in <optimoptions>.
0080 %     The reason for <outputfcn> is that the data points being fitted will be passed as a
0081 %     fourth argument to <outputfcn> (if <outputfcn> accepts four arguments).  This
0082 %     enables some useful functionality such as being able to visualize the model and
0083 %     the data during the optimization.
0084 %     In the linear-model case, <outputfcn> is ignored.
0085 %
0086 %   *** RESAMPLING SCHEMES ***
0087 %   <wantresampleruns> (optional) is whether to resample at the level of runs (as opposed
0088 %     to the level of individual data points).  If only one run of data is supplied, the
0089 %     default is 0 (resample data points).  If more than one run of data is supplied, the
0090 %     default is 1 (resample runs).
0091 %   <resampling> (optional) is:
0092 %     0 means fit fully (no bootstrapping nor cross-validation)
0093 %     B or {B SEED GROUP} indicates to perform B bootstraps, using SEED as the random
0094 %       number seed, and GROUP as the grouping to use.  GROUP should be a vector of
0095 %       positive integers.  For example, [1 1 1 2 2 2] means to draw six bootstrap
0096 %       samples in total, with three bootstrap samples from the first three cases and
0097 %       three bootstrap samples from the second three cases.  If SEED is not provided,
0098 %       the default is sum(100*clock).  If GROUP is not provided, the default is ones(1,D)
0099 %       where D is the total number of runs or data points.
0100 %     V where V is a matrix of dimensions (cross-validation schemes) x (runs or data
0101 %       points).  Each row indicates a distinct cross-validation scheme, where 1 indicates
0102 %       training, -1 indicates testing, and 0 indicates to not use.  For example,
0103 %       [1 1 -1 -1 0] specifies a scheme where the first two runs (or data points) are
0104 %       used for training and the second two runs (or data points) are used for testing.
0105 %     Default: 0.
0106 %
0107 %   *** METRIC ***
0108 %   <metric> (optional) determine how model performance is quantified.  <metric> should
0109 %     be a function that accepts two column vectors (the first is the model; the second
0110 %     is the data) and outputs a number.  Default: @calccorrelation.
0111 %
0112 %   *** ADDITIONAL REGRESSORS ***
0113 %   <maxpolydeg> (optional) is a non-negative integer with the maximum polynomial degree
0114 %     to use for polynomial nuisance functions.  The polynomial nuisance functions are
0115 %     constructed on a per-run basis.  <maxpolydeg> can be a vector to indicate different
0116 %     degrees for different runs.  A special case is NaN which means to omit polynomials.
0117 %     Default: NaN.
0118 %   <wantremovepoly> (optional) is whether to project the polynomials out from both the
0119 %     model and the data before computing <metric>.  Default: 1.
0120 %   <extraregressors> (optional) is:
0121 %     (1) a matrix with time points x regressors
0122 %     (2) a cell vector of (1) indicating different runs
0123 %     (3) a function that returns (1) or (2)
0124 %     Note that a separate set of regressors must be supplied for each run.  The number
0125 %     of regressors does not have to be the same across runs.
0126 %   <wantremoveextra> (optional) is whether to project the extraregressors out from
0127 %     both the model and the data before computing <metric>.  Default: 1.
0128 %
0129 %   *** OUTPUT-RELATED ***
0130 %   <dontsave> (optional) is a string or a cell vector of strings indicating outputs
0131 %     to omit when returning.  For example, you may want to omit 'testdata', 'modelpred',
0132 %     'modelfit', 'numiters', and 'resnorms' since they may use a lot of memory.
0133 %     If [] or not supplied, then we use the default of {'modelfit' 'numiters' 'resnorms'}.
0134 %     If {}, then we will return all outputs.  Note: <dontsave> can also refer to
0135 %     auxiliary variables that are saved to the .mat files when <outputdir> is used.
0136 %   <dosave> (optional) is just like 'dontsave' except that the outputs specified here
0137 %     are guaranteed to be returned.  (<dosave> takes precedence over <dontsave>.)
0138 %     Default is {}.
0139 %
0140 % <chunksize> (optional) is the number of voxels to process in a single function call.
0141 %   The default is to process all voxels.
0142 % <chunknum> (optional) is the chunk number to process.  Default: 1.
0143 %
0144 % This function, fitnonlinearmodel.m, is essentially a wrapper around MATLAB's
0145 % lsqcurvefit.m function for the purposes of fitting nonlinear (and linear) models
0146 % to data.
0147 %
0148 % This function provides the following key benefits:
0149 % - Deals with input and output issues (making it easy to process many individual
0150 %   voxels and evaluate different models)
0151 % - Deals with resampling (cross-validation and bootstrapping)
0152 % - In the case of nonlinear models, makes it easy to evaluate multiple initial
0153 %   seeds (to avoid local minima)
0154 % - In the case of nonlinear models, makes it easy to perform stepwise fitting of models
0155 %
0156 % Outputs:
0157 % - 'params' is resampling cases x parameters x voxels.
0158 %     These are the estimated parameters from each resampling scheme for each voxel.
0159 % - 'trainperformance' is resampling cases x voxels.
0160 %     This is the performance of the model on the training data under each resampling
0161 %     scheme for each voxel.
0162 % - 'testperformance' is resampling cases x voxels.
0163 %     This is the performance of the model on the testing data under each resampling
0164 %     scheme for each voxel.
0165 % - 'aggregatedtestperformance' is 1 x voxels.
0166 %     This is the performance of the model on the testing data, after aggregating
0167 %     the data and model predictions across the resampling schemes.
0168 % - 'testdata' is time points x voxels.
0169 %     This is the aggregated testing data across the resampling schemes.
0170 % - 'modelpred' is time points x voxels.
0171 %     This is the aggregated model predictions across the resampling schemes.
0172 % - 'modelfit' is resampling cases x time points x voxels.
0173 %     This is the model fit for each resampling scheme.  Here, by "model fit"
0174 %     we mean the fit for each of the original stimuli based on the parameters
0175 %     estimated in a given resampling case; we do not mean the fit for each of the
0176 %     stimuli involved in the fitting.  (For example, if there are 100 stimuli and
0177 %     we are performing cross-validation, there will nevertheless be 100 time points
0178 %     in 'modelfit'.)  Also, note that 'modelfit' is the raw fit; it is not adjusted
0179 %     for <wantremovepoly> and <wantremoveextra>.
0180 % - 'numiters' is a cell vector of dimensions 1 x voxels.  Each element is
0181 %     is resampling cases x seeds x models.  These are the numbers of iterations
0182 %     used in the optimizations.  Note that 'numiters' is [] in the linear-model case.
0183 % - 'resnorms' is a cell vector of dimensions 1 x voxels.  Each element is
0184 %     is resampling cases x seeds.  These are the residual norms obtained
0185 %     in the optimizations.  This is useful for diagnosing multiple-seed issues.
0186 %     Note that 'resnorms' is [] in the linear-model case.
0187 %
0188 % Notes:
0189 % - Since we use %06d.mat to name output files, you should use no more than 999,999 chunks.
0190 % - <chunksize> and <chunknum> can be strings (if so, they will be passed to str2double).
0191 % - <stimulus> can actually have multiple frames in the third dimension.  This is handled
0192 %   by making it such that the prediction for a given data point is calculated as the
0193 %   average of the predicted responses for the individual stimulus frames associated with
0194 %   that data point.
0195 % - In the case of nonlinear models, to control the scale of the computations, in the
0196 %   optimization call we divide the data by its standard deviation and apply the exact
0197 %   same scaling to the model.  This has the effect of controlling the scale of the
0198 %   residuals.  This last-minute scaling should have no effect on the final parameter estimates.
0199 %
0200 % History:
0201 % - 2014/05/01 - change the main loop to parfor; some cosmetic tweaks;
0202 %                now, if no parameters are to be optimized, just return the initial seed
0203 % - 2013/10/02 - implement the linear-model case
0204 % - 2013/09/07 - fix bug (if polynomials or extra regressors were used in multiple runs,
0205 %                then they were not getting fit properly).
0206 % - 2013/09/07 - in fitnonlinearmodel_helper.m, convert to double in the call to lsqcurvefit;
0207 %                and perform a speed-up (don't compute modelfit if unwanted)
0208 % - 2013/09/04 - add totalnumvxs variable
0209 % - 2013/09/03 - allow <dontsave> to refer to the auxiliary variables
0210 % - 2013/09/02 - add 'modelfit' and adjust default for 'dontsave'; add 'dosave'
0211 % - 2013/08/28 - new outputs 'resnorms' and 'numiters'; last-minute data scaling;
0212 %                tweak default handling of 'dontsave'
0213 % - 2013/08/18 - Initial version.
0214 %
0215 % Example 1:
0216 %
0217 % % first, a simple example
0218 % x = randn(100,1);
0219 % y = 2*x + 3 + randn(100,1);
0220 % opt = struct( ...
0221 %   'stimulus',[x ones(100,1)], ...
0222 %   'data',y, ...
0223 %   'model',{{[1 1] [-Inf -Inf; Inf Inf] @(pp,dd) dd*pp'}});
0224 % results = fitnonlinearmodel(opt);
0225 %
0226 % % now, try 100 bootstraps
0227 % opt.resampling = 100;
0228 % opt.optimoptions = {'Display' 'off'};  % turn off reporting
0229 % results = fitnonlinearmodel(opt);
0230 %
0231 % % now, try leave-one-out cross-validation
0232 % opt.resampling = -(2*(eye(100) - 0.5));
0233 % results = fitnonlinearmodel(opt);
0234 %
0235 % Example 2:
0236 %
0237 % % try a more complicated example.  we use 'outputfcn' to
0238 % % visualize the data and model during the optimization.
0239 % x = (1:.1:10)';
0240 % y = evalgaussian1d([5 1 4 0],x);
0241 % y = y + randn(size(y));
0242 % opt = struct( ...
0243 %   'stimulus',x, ...
0244 %   'data',y, ...
0245 %   'model',{{[1 2 1 0] [repmat(-Inf,[1 4]); repmat(Inf,[1 4])] ...
0246 %             @(pp,dd) evalgaussian1d(pp,dd)}}, ...
0247 %   'outputfcn',@(a,b,c,d) pause2(.1) | outputfcnsanitycheck(a,b,c,1e-6,10) | outputfcnplot(a,b,c,1,d));
0248 % results = fitnonlinearmodel(opt);
0249 %
0250 % Example 3:
0251 %
0252 % % same as the first example in Example 1, but now we use the
0253 % % linear-model functionality
0254 % x = randn(100,1);
0255 % y = 2*x + 3 + randn(100,1);
0256 % opt = struct( ...
0257 %   'stimulus',[x ones(100,1)], ...
0258 %   'data',y, ...
0259 %   'model',@(X,y) (inv(X'*X)*X'*y)');
0260 % results = fitnonlinearmodel(opt);
0261 
0262 % internal notes:
0263 % - replaces fitprf.m, fitprfstatic.m, fitprfmulti.m, and fitprfstaticmulti.m
0264 % - some of the new features: opt struct format, fix projection matrix bug (must
0265 %   compute projection matrix based on concatenated regressors), multiple initial
0266 %   seeds are handled internally!, user must deal with model specifics like
0267 %   the HRF and positive rectification, massive clean up of the logic (e.g.
0268 %   runs and data points are treated as a single case), consolidation of
0269 %   the different functions, drop support for data trials (not worth the
0270 %   implementation cost), drop support for NaN stimulus frames, hide the
0271 %   myriad optimization options from the input level, drop run-separated metrics,
0272 %   drop the stimulus transformation speed-up (it was getting implemented in a
0273 %   non-general way)
0274 % - regularization is its own thing? own code module?
0275 
0276 %%%%%%%%%%%%%%%%%%%%%%%%%%%% REPORT
0277 
0278 fprintf('*** fitnonlinearmodel: started at %s. ***\n',datestr(now));
0279 stime = clock;  % start time
0280 
0281 %%%%%%%%%%%%%%%%%%%%%%%%%%%% SETUP
0282 
0283 % deal with opt
0284 if ischar(opt)
0285   opt = loadmulti(opt,'opt');
0286 end
0287 
0288 % is <data> of case (4)?
0289 isvxscase = isa(opt.data,'function_handle') && nargin(opt.data) > 0;
0290 
0291 % deal with outputdir
0292 if ~isfield(opt,'outputdir') || isempty(opt.outputdir)
0293   opt.outputdir = [];
0294 end
0295 wantsave = ~isempty(opt.outputdir);  % should we save results to disk?
0296 
0297 % deal with vxs
0298 if isfield(opt,'vxs')
0299   if ischar(opt.vxs)
0300     vxsfull = loadmulti(opt.vxs,'vxs');
0301   else
0302     vxsfull = opt.vxs;
0303   end
0304   vxsfull = sort(union([],flatten(vxsfull)));
0305   totalnumvxs = length(vxsfull);
0306 end
0307 
0308 % deal with chunksize and chunknum
0309 if ~exist('chunksize','var') || isempty(chunksize)
0310   chunksize = [];  % deal with this later
0311 end
0312 if ~exist('chunknum','var') || isempty(chunknum)
0313   chunknum = 1;
0314 end
0315 if ischar(chunksize)
0316   chunksize = str2double(chunksize);
0317 end
0318 if ischar(chunknum)
0319   chunknum = str2double(chunknum);
0320 end
0321 
0322 % deal with data (including load the data)
0323 if isa(opt.data,'function_handle')
0324   fprintf('*** fitnonlinearmodel: loading data. ***\n');
0325   if nargin(opt.data) == 0
0326     data = feval(opt.data);
0327     if iscell(data)
0328       totalnumvxs = size(data{1},2);
0329     else
0330       totalnumvxs = size(data,2);
0331     end
0332   else  % note that in this case, vxs should have been specified,
0333         % so totalnumvxs should have already been calculated above.
0334     if isempty(chunksize)
0335       chunksize = length(vxsfull);
0336     end
0337     vxs = chunking(vxsfull,chunksize,chunknum);
0338     data = feval(opt.data,vxs);
0339   end
0340 else
0341   data = opt.data;
0342   if iscell(data)
0343     totalnumvxs = size(data{1},2);
0344   else
0345     totalnumvxs = size(data,2);
0346   end
0347 end
0348 if ~iscell(data)
0349   data = {data};
0350 end
0351 
0352 % deal with chunksize
0353 if isempty(chunksize)
0354   chunksize = totalnumvxs;  % default is all voxels
0355 end
0356 
0357 % if not isvxscase, then we may still need to do chunking
0358 if ~isvxscase
0359   vxs = chunking(1:totalnumvxs,chunksize,chunknum);
0360   if ~isequal(vxs,1:totalnumvxs)
0361     for p=1:length(data)
0362       data{p} = data{p}(:,vxs);
0363     end
0364   end
0365 end
0366 
0367 % calculate the number of voxels to analyze in this function call
0368 vnum = length(vxs);
0369 
0370 % finally, since we have dealt with chunksize and chunknum, we can do some reporting
0371 fprintf(['*** fitnonlinearmodel: outputdir = %s, chunksize = %d, chunknum = %d\n'], ...
0372   opt.outputdir,chunksize,chunknum);
0373 
0374 % deal with model
0375 if ~isa(opt.model,'function_handle') && ~iscell(opt.model{1})
0376   opt.model = {opt.model};
0377 end
0378 if ~isa(opt.model,'function_handle')
0379   for p=1:length(opt.model)
0380     if length(opt.model{p}) < 4 || isempty(opt.model{p}{4})
0381       if p==1
0382         opt.model{p}{4} = @identity;
0383       else
0384         opt.model{p}{4} = @(ss) @identity;
0385       end
0386     end
0387   end
0388 end
0389 
0390 % deal with seed
0391 if ~isfield(opt,'seed') || isempty(opt.seed)
0392   opt.seed = [];
0393 end
0394 
0395 % deal with optimization options
0396 if ~isfield(opt,'optimoptions') || isempty(opt.optimoptions)
0397   opt.optimoptions = {};
0398 end
0399 if iscell(opt.optimoptions)
0400   temp = optimset('Display','iter','FunValCheck','on', ...
0401                   'MaxFunEvals',Inf,'MaxIter',Inf, ...
0402                   'TolFun',1e-6,'TolX',1e-6, ...
0403                   'OutputFcn',@(a,b,c) outputfcnsanitycheck(a,b,c,1e-6,10));
0404   for p=1:length(opt.optimoptions)/2
0405     temp.(opt.optimoptions{(p-1)*2+1}) = opt.optimoptions{(p-1)*2+2};
0406   end
0407   opt.optimoptions = temp;
0408   clear temp;
0409 end
0410 if ~isfield(opt,'outputfcn') || isempty(opt.outputfcn)
0411   opt.outputfcn = [];
0412 end
0413 
0414 % deal with resampling schemes
0415 if ~isfield(opt,'wantresampleruns') || isempty(opt.wantresampleruns)
0416   if length(data) == 1
0417     opt.wantresampleruns = 0;
0418   else
0419     opt.wantresampleruns = 1;
0420   end
0421 end
0422 if opt.wantresampleruns
0423   numdataunits = length(data);
0424 else
0425   numdataunits = sum(cellfun(@(x) size(x,1),data));
0426 end
0427 if ~isfield(opt,'resampling') || isempty(opt.resampling)
0428   opt.resampling = 0;
0429 end
0430 if isequal(opt.resampling,0)
0431   resamplingmode = 'full';
0432 elseif ~iscell(opt.resampling) && numel(opt.resampling) > 1
0433   resamplingmode = 'xval';
0434 else
0435   resamplingmode = 'boot';
0436 end
0437 if isequal(resamplingmode,'boot')
0438   if ~iscell(opt.resampling)
0439     opt.resampling = {opt.resampling};
0440   end
0441   if length(opt.resampling) < 2 || isempty(opt.resampling{2})
0442     opt.resampling{2} = sum(100*clock);
0443   end
0444   if length(opt.resampling) < 3 || isempty(opt.resampling{3})
0445     opt.resampling{3} = ones(1,numdataunits);
0446   end
0447 end
0448 
0449 % deal with metric
0450 if ~isfield(opt,'metric') || isempty(opt.metric)
0451   opt.metric = @calccorrelation;
0452 end
0453 
0454 % deal with additional regressors
0455 if ~isfield(opt,'maxpolydeg') || isempty(opt.maxpolydeg)
0456   opt.maxpolydeg = NaN;
0457 end
0458 if length(opt.maxpolydeg) == 1
0459   opt.maxpolydeg = repmat(opt.maxpolydeg,[1 length(data)]);
0460 end
0461 if ~isfield(opt,'wantremovepoly') || isempty(opt.wantremovepoly)
0462   opt.wantremovepoly = 1;
0463 end
0464 if ~isfield(opt,'extraregressors') || isempty(opt.extraregressors)
0465   opt.extraregressors = [];
0466 end
0467 if ~isfield(opt,'wantremoveextra') || isempty(opt.wantremoveextra)
0468   opt.wantremoveextra = 1;
0469 end
0470 if ~isfield(opt,'dontsave') || (isempty(opt.dontsave) && ~iscell(opt.dontsave))
0471   opt.dontsave = {'modelfit' 'numiters' 'resnorms'};
0472 end
0473 if ~iscell(opt.dontsave)
0474   opt.dontsave = {opt.dontsave};
0475 end
0476 if ~isfield(opt,'dosave') || isempty(opt.dosave)
0477   opt.dosave = {};
0478 end
0479 if ~iscell(opt.dosave)
0480   opt.dosave = {opt.dosave};
0481 end
0482 
0483 % make outputdir if necessary
0484 if wantsave
0485   mkdirquiet(opt.outputdir);
0486   opt.outputdir = subscript(matchfiles(opt.outputdir),1,1);
0487   outputfile = sprintf([opt.outputdir '/%06d.mat'],chunknum);
0488 end
0489 
0490 % set random number seed
0491 if isequal(resamplingmode,'boot')
0492   setrandstate({opt.resampling{2}});
0493 end
0494 
0495 % calc
0496 numtime = cellfun(@(x) size(x,1),data);
0497 
0498 % save initial time
0499 if wantsave
0500   saveexcept(outputfile,[{'data'} setdiff(opt.dontsave,opt.dosave)]);
0501 end
0502 
0503 %%%%%%%%%%%%%%%%%%%%%%%%%%%% LOAD SOME ITEMS
0504 
0505 % deal with stimulus
0506 if isa(opt.stimulus,'function_handle')
0507   fprintf('*** fitnonlinearmodel: loading stimulus. ***\n');
0508   stimulus = feval(opt.stimulus);
0509 else
0510   stimulus = opt.stimulus;
0511 end
0512 if ~iscell(stimulus)
0513   stimulus = {stimulus};
0514 end
0515 stimulus = cellfun(@full,stimulus,'UniformOutput',0);
0516 
0517 % deal with extraregressors
0518 if isa(opt.extraregressors,'function_handle')
0519   fprintf('*** fitnonlinearmodel: loading extra regressors. ***\n');
0520   extraregressors = feval(opt.extraregressors);
0521 else
0522   extraregressors = opt.extraregressors;
0523 end
0524 if isempty(extraregressors)
0525   extraregressors = repmat({[]},[1 length(data)]);
0526 end
0527 if ~iscell(extraregressors)
0528   extraregressors = {extraregressors};
0529 end
0530 
0531 %%%%%%%%%%%%%%%%%%%%%%%%%%%% PRECOMPUTE SOME STUFF
0532 
0533 % construct polynomial regressors matrix
0534 polyregressors = {};
0535 for p=1:length(data)
0536   if isnan(opt.maxpolydeg(p))
0537     polyregressors{p} = zeros(numtime(p),0);
0538   else
0539     polyregressors{p} = constructpolynomialmatrix(numtime(p),0:opt.maxpolydeg(p));
0540   end
0541 end
0542 
0543 % construct total regressors matrix for fitting purposes
0544 % (i.e. both polynomials and extra regressors)
0545   % first, construct the run-wise regressors
0546 tmatrix = {};
0547 for p=1:length(data)
0548   tmatrix{p} = cat(2,polyregressors{p},extraregressors{p});
0549 end
0550   % then, separate them using blkdiag
0551 temp = blkdiag(tmatrix{:});
0552 cnt = 0;
0553 for p=1:length(data)
0554   tmatrix{p} = temp(cnt+(1:size(tmatrix{p},1)),:);
0555   cnt = cnt + size(tmatrix{p},1);
0556 end
0557 clear temp;
0558 
0559 % construct special regressors matrix for the purposes of the <metric>
0560   % first, construct the run-wise regressors
0561 smatrix = {};
0562 for p=1:length(data)
0563   temp = [];
0564   if opt.wantremovepoly
0565     temp = cat(2,temp,polyregressors{p});
0566   end
0567   if opt.wantremoveextra
0568     temp = cat(2,temp,extraregressors{p});
0569   end
0570   smatrix{p} = temp;
0571 end
0572   % then, separate them using blkdiag
0573 temp = blkdiag(smatrix{:});
0574 cnt = 0;
0575 for p=1:length(data)
0576   smatrix{p} = temp(cnt+(1:size(smatrix{p},1)),:);
0577   cnt = cnt + size(smatrix{p},1);
0578 end
0579 clear temp;
0580 
0581 % figure out trainfun and testfun for resampling
0582 switch resamplingmode
0583 case 'full'
0584   trainfun = {@(x) catcell(1,x)};
0585   testfun =  {@(x) []};
0586 case 'xval'
0587   trainfun = {};
0588   testfun =  {};
0589   for p=1:size(opt.resampling,1)
0590     trainix = find(opt.resampling(p,:) == 1);
0591     testix =  find(opt.resampling(p,:) == -1);
0592     if opt.wantresampleruns
0593       trainfun{p} = @(x) catcell(1,x(trainix));
0594       testfun{p} =  @(x) catcell(1,x(testix));
0595     else
0596       trainfun{p} = @(x) subscript(catcell(1,x),{trainix ':' ':' ':' ':' ':'});  % HACKY
0597       testfun{p}  = @(x) subscript(catcell(1,x),{testix ':' ':' ':' ':' ':'});
0598     end
0599   end
0600 case 'boot'
0601   trainfun = {};
0602   testfun = {};
0603   for p=1:opt.resampling{1}
0604     trainix = [];
0605     for b=1:max(opt.resampling{3})
0606       temp = opt.resampling{3}==b;
0607       trainix = [trainix subscript(find(temp),ceil(rand(1,sum(temp))*sum(temp)))];
0608     end
0609     testix = [];
0610     if opt.wantresampleruns
0611       trainfun{p} = @(x) catcell(1,x(trainix));
0612       testfun{p} =  @(x) catcell(1,x(testix));
0613     else
0614       trainfun{p} = @(x) subscript(catcell(1,x),{trainix ':' ':' ':' ':' ':'});  % HACKY
0615       testfun{p}  = @(x) subscript(catcell(1,x),{testix ':' ':' ':' ':' ':'});
0616     end
0617   end
0618 end
0619 
0620 %%%%%%%%%%%%%%%%%%%%%%%%%%%% PERFORM THE FITTING
0621 
0622 % loop over voxels
0623 clear results0;
0624 parfor p=1:vnum
0625 
0626   % report
0627   fprintf('*** fitnonlinearmodel: processing voxel %d (%d of %d). ***\n',vxs(p),p,vnum);
0628   vtime = clock;  % start time for current voxel
0629 
0630   % get data and hack it in
0631   opt2 = opt;
0632   opt2.data = cellfun(@(x) x(:,p),data,'UniformOutput',0);
0633 
0634   % get seed and hack it in
0635   if ~isempty(opt2.seed)
0636     assert(~isa(opt2.model,'function_handle'));  % sanity check
0637     if isa(opt2.seed,'function_handle')
0638       seed0 = feval(opt2.seed,vxs(p));
0639     else
0640       seed0 = opt2.seed;
0641     end
0642     opt2.model{1}{1} = seed0;
0643   end
0644   
0645   % call helper function to do the actual work
0646   results0(p) = fitnonlinearmodel_helper(opt2,stimulus,tmatrix,smatrix,trainfun,testfun);
0647 
0648   % report
0649   fprintf('*** fitnonlinearmodel: voxel %d (%d of %d) took %.1f seconds. ***\n', ...
0650           vxs(p),p,vnum,etime(clock,vtime));
0651 
0652 end
0653 
0654 % consolidate results
0655 results = struct;
0656 results.params = cat(3,results0.params);
0657 results.testdata = cat(2,results0.testdata);
0658 results.modelpred = cat(2,results0.modelpred);
0659 results.modelfit = cat(3,results0.modelfit);
0660 results.trainperformance = cat(1,results0.trainperformance).';
0661 results.testperformance  = cat(1,results0.testperformance).';
0662 results.aggregatedtestperformance = cat(2,results0.aggregatedtestperformance);
0663 results.numiters = cat(2,{results0.numiters});
0664 results.resnorms = cat(2,{results0.resnorms});
0665 
0666 % kill unwanted outputs
0667 for p=1:length(opt.dontsave)
0668 
0669   % if member of dosave, keep it!
0670   if ismember(opt.dontsave{p},opt.dosave)
0671 
0672   % if not, then kill it (if it exists)!
0673   else
0674     if isfield(results,opt.dontsave{p})
0675       results = rmfield(results,opt.dontsave{p});
0676     end
0677   end
0678 
0679 end
0680 
0681 % save results
0682 if wantsave
0683   save(outputfile,'-struct','results','-append');
0684 end
0685 
0686 %%%%%%%%%%%%%%%%%%%%%%%%%%%% REPORT
0687 
0688 fprintf('*** fitnonlinearmodel: ended at %s (%.1f minutes). ***\n', ...
0689         datestr(now),etime(clock,stime)/60);

Generated on Wed 18-Jun-2014 21:47:41 by m2html © 2005