SimBiology from Scripts II: Parameter scanning

In a previous post, I (not so) briefly showed how to load the SimBiology file, select from it the model and simulation configuration settings, and evaluate the model with SimBiology’s simulation engine.

That’s great if you only needed the results for one model with one configuration. I’m guessing that won’t be the case for most use cases, though. In my own research, I am more often interested in how a system’s evolution changes across different initial configurations. For a specie or parameter of interest, I create a range of values to explore, and re-simulate my model for each parameter value in that range. Then I can compare the model outputs to analyze how the variations impacted the system.

SimBiology has a feature for performing these sort of studies, which it calls parameter scanning. However, these parameter scans run painfully slow through SimBio’s graphical interface (surprise, surprise), worsening with increases to model size or complexity. And this is just scanning over a single parameter. SimBiology often becomes unstable for scans of two or more parameters.

This frustration is almost wholly avoidable if we handle our SimBiology models programmatically instead. Today, I’ll demonstrate a simple way to replicate SimBio’s parameter scan process from a MATLAB script. Basically, we will decide on a parameter to vary, and a range of values we want to explore. We setup an array of values spanning that range, loop through that array, and run a simulation for each iteration.

The Model

Okay, let’s get started! This example assumes a very simple chemical reaction network ⇀ ⇀ C where k1 and k2 are kinetic rate constants for the chemical reactions where A becomes B, and B becomes C, respectively. Let’s say that we are interested in how k1 impacts levels of species C ultimately.. We will loop through a range of rate values, simulating for each configuration, and saving the results.

To start, let’s load our SimBiology project file and model.

% Load project into 'file' variable' from the full file path
file = sbioload('./PATH/TO/YOUR/file.sbproj');

% Pull from the project Model #1
model = file.m1;

We also need to load and setup the simulation ConfigSet. We are using ODE15s as our solver and will evaluate the model for 1000 simulated seconds.

cs = getconfigset(model, 'default');
set(cs, 'SolverType', 'ode15s');
set(cs, 'StopTime', 1000);

Next, let’s define an array of rate constant values for simbiology to loop through.

k1Values = [0.001, 0.01, 0.1, 1.0, 10.0, 100.0, 1000.0, 1E5]

Here we are explicitly providing each values, but you can use functions like logspace() or linspace() to generate a range of values with different value spacings. As the names suggest, logspace will space the values logarithmically and linspace will space values linearly.

Before we actually proceed with looping through simulations, we should figure out how we will handle the simulation outputs. Since we’re looping through a potentially arbitrarily large set of simulations, we will need an orderly way of storing the results. I typically create two arrays or sets for storing results: (1) Raw output set that contains each simulation’s raw time and data points; (2) A summary array that documents the specific parameter value(s) applied that iteration, and the steady-state (i.e. final value for output variables) values for each simulation.

MATLAB is optimized for performing operations on matrices of known sizes, and dynamically resizing arrays or other collections, such as adding a new row every loop iteration, hampers its performance and slows down your computational studies. The best practice is to pre-allocate arrays; that is to say, define the size and shape of our results matrices, wherever possible, ahead of time. My general approach applied to our simple one-parameter variation is below. Some of the code is inelegant, I admit, but it is straightforward at least.

% The total number of simulations is simply the length of k1Values
totalSimulations = length(k1Values);
 
% This is used later in allocating our summary array
numberOfParametersVaried = 1
 
% Oftentimes, we won't record the values for all species in a complex reaction network.
% Here, let's record __A__ and __C__, ignoring the intermediary specie __B__.
numberOfRecordedOutputs = 2;
 
% Now, we preallocate an array that has three columns, and eight rows (the totalSimulations)
% One for our varied parameter, two for recording __A__ and __B__ at steady-state.
% Each row will represent a single simulation's (k1Value, Steady-state A level, Steady-state B level)
steadyStateSummaryArray = zeros(totalSimulations, (numberOfParametersVaried + numberOfRecordedOutputs)

We are pre-allocating a matrix that will hold simulation results for each parameter condition, so we first determine how many simulations will be run.

% The total number of simulations is simply the length of k1Values
totalSimulations = length(k1Values);

Note: If we vary another rate constant k2 that is varied with k1 (i.e. We vary k2’s value across each value of k1), the totalSimulations would be the product of each range’s length.

totalSimulations = length(k1Values) * length(k2Values);

We may also need to refer to the raw time vs data series. Pre-allocating an array for this is impractical because each simulation will likely have different time-step sizes, and thus a varying number of data points. Such an array would need to be three-dimensional as well (Time vs Specie vs Simulation Number.) MATLAB provides a datatype called a cell array that I find simplifies this considerably.

rawSimData = cell(1, totalSimulations)

The rawSimDate set will be initialized as an array of #totalSimulations sets ([] [] [] [] []…. []) with each set representing a given simulations time-data series.

Now we’re ready to simulate! The configuration we just did is the more intensive part of this kind of study. Running the parameter variation is pretty straightforward.

count = 1
for k1 = k1Values
        fprintf('[Status] Simulation %d/%d\t:kL1=%f\tk1=%f\n', count, totalSimulations, k1);
 
        % Reload the model object from the file at each loop.
        % This ensures that, for complex situations, we don't accidentally leave a parameter value change between loops
        m = file.m1;
 
        % Varied Parameters
        m.Parameters(1).value = k1; % Set our k1 parameter to the current loop iteration
        % NOTE: ALWAYS check the number of your parameters  
        % Simulations often fail, sometimes for certain parameter configurations, 
        % or with the wrong SolverType choice. When they fail, an exception is thrown 
        % that, unhandled, will stop the script. 

% This is a particular pain when we are running simulations
% within a loop, such as during a parameter variation study. 
         To avoid this, we enclose the command inside a try/catch block. 
         If sbiosimulate throws an exception, we catch it, 
         print the exception description, 
        %} and then move on in the script.
        try
            tempResults = sbiosimulate(m, cs, [], []);
        catch exception
            exception
            count = count+1;
            continue
        end
        data = tempResults.Data;
        time = tempResults.time;
        % Add raw time and amount data set into rawSimData celltable
        rawSimData{count} = [time, data];
        % Store steady-state values into our summary array
        steadyStateSummaryArray(count, :) = [k1 data(end, 1) data(end, 3)]; 
        % Here '1' and '3' refer to species a and b
 
        % Update current simulation count.
        count = count + 1;
    end
disp('[Status] Complete!')

That’s all there is to it! If you have any questions, or notice any errors, feel free to comment or send me an email. Thanks!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.