Run definition DSL

This page provides the full specification of the JSON-based DSL used to define runs. For a more progressive introduction to workflows, please refer to this guide.

JSON schema

A run file should contain a single JSON object formed of the following fields.

Name Type Description
workflow string Specification of the workflow to execute.
name string; optional A human-readable name.
notes string; optional Notes describing the purpose of this experiment.
tags string[]; optional Some tags, used when searching for runs.
seed long; optional Seed to be used for deterministic reproduction. If not specified, a random one will be generated.
repeat integer; optional; default: 1 Number of times to repeat each run.
params object; optional Mapping between parameter names and their values.

Specifying parameters

When defining an experiment, you can specify values for parameters. You must specify a value for all parameters, except if they are only used by optional inputs. You can either specify a single value or multiple values for any parameter, triggering the execution of multiple versions of the same original workflow. Accio currently supports several ways to specify parameters, described in the next sections.

The different values that a parameter will take are defined via a JSON object whose only key is values, mapped to a JSON array with all values taken by the parameter. The order of values has no importance. Values should be specified using the same format as for workflow input values. For example:

{
  "params": {
    "epsilon": {
      "values": [1, 0.1, 0.001, 0.0001]
    }
  }
}

Of course, a singleton can be passed as values. If at least one parameter has more than one possible value, the cross product of all values will be taken to determine the runs to actually trigger. For example, 8 (4 x 2 x 1) different runs will be launched with the following configuration:

{
  "params": {
    "epsilon": {
      "values": [1, 0.1, 0.001, 0.0001]
    },
    "uri": {
      "values": ["/path/to/geolife", "/path/to/cabspotting"]    
    },
    "level": {
      "values": [12]
    }
  }
}

Controlling randomness with a seed

Some workflows may include operators marked as unstable, which means they need some source of randomness when being executed. This randomness is provided through a seed. By default, a random seed is generated for each run, but you may fix it through the seed key. If a run definition gives birth to several runs, the seed specified in the definition will be used to deterministically generate a seed for each child run. If a run definition corresponds to a single run, the specified seed will be used directly.