Skip to contents

This document is intended for intermediate R programmers who are familiar with Monte Carlo simulation and quantitative risk assessment. For an introduction to quantitative risk assessment, please refer to:

Introduction

This guide presents a flexible approach for building risk assessment models with Monte Carlo simulations in R. The framework is multivariate, allowing simultaneous analysis of multiple variates (different categories or groups), and modular, breaking complex systems into manageable components.

The mc2d R package (Pouillot and Delignette-Muller 2010) provides tools to build and analyse models involving multiple variables and uncertainties based on Monte Carlo simulations. However, these tools are limited when handling multiple variates, and complex models can be difficult to analyse and interpret. The mcmodule package builds on the R package mc2d to enhance model flexibility. This new package provides:

  1. Tools to add metadata to model elements for better traceability and debugging
  2. Tools to handle complex multivariate variables
  3. A framework to break down Monte Carlo models into smaller, reusable components (modules) that can be analysed independently before integration

Multivariate Monte-Carlo simulations

The following section is based primarily on the mc2d package vignette, adapted and expanded specifically for the context of modular, flexible Monte Carlo simulations.

Quantitative risk analysis is the numerical assessment of risk to facilitate decision making in the face of uncertainty. Monte Carlo simulation is a technique used to model and analyse uncertainty (Vose 2008). In typical two-dimensional Monte Carlo simulation, one dimension represents variability and the other represents uncertainty. However, in this framework, we take a different approach where:

  • We combine variability and uncertainty into one dimension (u)

  • We use variates (representing different groups or categories) in the other dimension (i)

This approach enables us to track multiple values across groups simultaneously using variates and to handle cases where values must be aggregated by animal movements, farms, or other grouped entities. Although it would be technically possible to separate variability and uncertainty, the exponential increase in computational and programming requirements would make it impractical in models of this size.

In this framework, an mcnode is an array of dimensions (u × 1 × i), where u is the dimension of variability and uncertainty combined, and i is the number of variates. As most of the input parameters are uncertain, we will use the term “uncertainty dimension” throughout this document to refer to the combined dimension of variability and uncertainty.

Structure of an mcnode containing u uncertainty iterations and i variates.
Structure of an mcnode containing u uncertainty iterations and i variates.

For example, if we are buying a number of cows and heifers from a specific region, a mcnode for herd prevalence would have:

  • Multiple variates (rows), each representing a specific animal category-disease combination. The variables that define and distinguish these variates are called keys.

  • Probability values generated through a PERT distribution (rpert), using minimum, mode, and maximum parameters across u iterations (columns) to model the uncertainty in these estimates.

An mcnode representing herd prevalence of three pathogens, tuberculosis (TB), infectious bovine rhinotracheitis (IBR), and bovine viral diarrhea (BVD), and two animal categories (cows and heifers). The herd prevalence is the same for both animal categories, but values differ due to the stochastic nature of random sampling from the PERT distribution.
An mcnode representing herd prevalence of three pathogens, tuberculosis (TB), infectious bovine rhinotracheitis (IBR), and bovine viral diarrhea (BVD), and two animal categories (cows and heifers). The herd prevalence is the same for both animal categories, but values differ due to the stochastic nature of random sampling from the PERT distribution.

Scenarios

This framework enables the creation and comparison of scenarios. For comparisons, you need a baseline scenario (scenario “0”) and one or more alternative scenarios. Variates within a scenario are called “groups.” While not every scenario needs to contain all baseline groups, any groups present in alternative scenarios must exist in the baseline.

The scenario id, group id, and row number are the three mcnode properties that enable row matching and row aggregation operations.

An mcnode representing the probability of a farm being infected in a baseline scenario (“0”, in blue) and in a what-if scenario where there is a control plan implemented (“a”, in green).
An mcnode representing the probability of a farm being infected in a baseline scenario (“0”, in blue) and in a what-if scenario where there is a control plan implemented (“a”, in green).

Multivariate nodes operations

Element-wise operations

Most operations in this model are element-wise, meaning each calculation occurs independently on matching elements across nodes. These operations preserve node dimensions while applying functions (such as multiplication or addition) to corresponding elements, allowing uncertainties and variates to propagate through the calculations.

Row matching

Row matching operations align nodes to ensure they have compatible dimensions and properly matched rows for element-wise calculations. This is especially important when working with nodes containing different scenarios. The matching process may require reordering, duplicating, or adding rows to either or both nodes.

Row matching between nodes is needed in the following cases and their combinations:

Group matching

Nodes with same scenarios/dimensions but different group order

  1. Check if keys that define groups are in a different order

  1. Assign common group ids, based on keys

  1. Reorder rows to align group ids. This is similar to dplyr left_join

Scenario matching

Nodes with same groups but different scenarios. The general rule for these operations is that when no scenario is specified, it is assumed that a variable takes the values from the baseline scenario.

  1. Check if scenarios are different

  1. For missing scenarios, duplicate the baseline scenario (scenario “0”)

  1. When nodes matched by scenario are used in an element-wise operation, the resulting node will contain all scenarios from both input nodes. This is conceptually similar to a dplyr full_join, except that any unmatched rows use values from the baseline scenario.

Null matching

When nodes contain different scenarios and groups, some groups may be missing from certain nodes. These missing groups (called “null” variates) are assigned a probability of 0. This operation typically complements scenario matching.

  1. Check for missing groups

  1. Add null variates with probability 0 for any missing groups (and apply scenario matching)

  1. When nodes are matched by scenario in an element-wise operation, the resulting node will contain all groups from both input nodes

Combined probabilities

A special type of element-wise operation that automatically deals with row matching. Combines probabilities of multiple events assuming independence, using the formula:

P(AB)=1(1P(A))(1P(B)) P(A \cup B) = 1 - (1-P(A))(1-P(B))

Row aggregation

Row aggregation operations combine values across multiple rows (variates) in an mcnode using specific criteria. These operations calculate overall probabilities or sum quantities across groups.

Probabilities

Some events can occur independently across multiple variates of the same node. For example, a disease might be present in animals from different regions or farms. To calculate the probability of an event occurring in at least one variate within a subset, we first group the data by aggregation keys (agg_keys). Then, for each subset, we use the standard probability formula for independent trials to find the probability of the event occurring in at least one variate.

p_agg=1pS(1p) p\_{agg}=1-\prod_{p\subseteq S}(1-p)

Where:

  • S represents the subset of variates included in the aggregation

  • p represents the probability of an event occurring

  • p_agg represents the probability of occurrence in at least one variate of S

Row aggregation by pathogen. The calculation is performed element-wise across subset variates to maintain uncertainty propagation.
Row aggregation by pathogen. The calculation is performed element-wise across subset variates to maintain uncertainty propagation.

If scenarios are being simulated, scenario id must always be included as an aggregation key, as probabilities are not calculated across different what-if scenarios.

You can also keep all variariates, with their corresponding aggregated values, to ensure dimensions compatibility in further calculations.

Default row aggregation and row aggregation keeping all variates
Default row aggregation and row aggregation keeping all variates
Quantities

When working with quantities (such as number of animals or farms), we calculate totals by summing values across variates within each subset.

n_agg=nSn n\_{agg}=\sum_{n\subseteq S} n

Where:

  • S represents the subset of variates included in the aggregation

  • n represents the quantity value

  • n_agg represents the sum of quantities across all variates in S

Trials

In this framework, nodes typically represent the probability of success for a single trial with a binary outcome (success/failure). A trial is one instance where an event might occur, for example, an animal, farm, batch, animal movement, or farm visit. Each trial has two possible outcomes, such as whether an animal is infected or not, or whether a test is positive or negative. In single-level trials, all trials are independent and have the same probability of success1.

Some probability processes follow a hierarchical structure where events occur at different levels. For example, when animals are purchased from multiple farms, we must consider both the probability of a farm being infected and the probability of an individual animal within that farm being infected. This means infection probabilities for animals from the same farm are not independent. In multilevel trials, trials are organized into subsets, with each subset having its own selection probability. A subset represents a group sharing specific characteristics, for example, animals from the same farm or with the same health status.

Most of the following probability processes and calculations are based on Chapter 5 of the Handbook on Import Risk Analysis for Animals and Animal Products Volume 2. Quantitative risk assessment (Murray 2004).

Single-level trials

In single-level trials, each trial is independent with the same probability of success (trial_ptrial\_p). For a set of trials_ntrials\_n trials, the probability of at least one success is:

set_p=1(1trial_p)trials_n set\_p= 1-(1-trial\_p)^{trials\_n}

Multilevel trials

Simple multilevel

In multilevel trials, we account for hierarchical structures where trials are organized into subsets. For example, when considering animals (trials_ntrials\_n) from different farms (subset_nsubset\_n), we must account for both the farm’s selection probability (subset_psubset\_p) and the individual animal’s probability of success (trial_ptrial\_p).

The probability of at least one success in this hierarchical structure is given by:

set_p=1(1subset_p(1(1trial_p)trial_n))subset_n set\_p= 1-(1-subset\_p \cdot (1-(1-trial\_p)^{trial\_n}))^{subset\_n}

Where:

  • trials_p represents the probability of a trial in a subset being a success

  • trials_n represents the number of trials in subset

  • subset_p represents the probability of a subset being selected

  • subset_n represents the number of subsets

  • set_p represents the probability of a at least one trial of at least one subsetbeing a success

Multiple group multilevel trials

In multiple group multilevel trials, several variates (groups) can belong to the same subset. For example, when selecting animals of different categories (cow, calf, heifer, bull) from the same farm, their infection probabilities are not independent. You should include aggregation keys for groups that are not independent to calculate their trial probabilities, for example, aggregating variates for different animal categories by farm. For this calculation, the subset_p and subset_n values must be identical across all variates being aggregated.

trial_p_agg=1trial_pS(1trial_p)trial_n trial\_p\_agg=1-\prod_{trial\_p\subseteq S}(1-trial\_p)^{trial\_n}

set_p_agg=1(1subset_ptrial_p_agg)subset_n set\_p\_agg = 1-(1-subset\_p \cdot trial\_p\_agg)^{subset\_n}

Where:

  • S represents the variates included in the aggregation

  • trials_p represents the probability of success of a trial in a subset

  • trials_n represents the number of trials in subset

  • trials_p _agg represents the probability of success of at least one variate in at least one of the aggregated variates

  • subset_p represents the probability of a subset being selected

  • subset_n represents the number of subsets

  • set_p_agg represents the probability of success of a at least one trial in at least one of the aggregated variates in at least one subset

As with row aggregation, you can keep all variates to maintain dimensions compatibility in further calculations.

Model elements

Expressions

Data

Nodes

Input nodes

Nodes metadata

Modules

Estructure

Modules metadata

Murray, Noel. 2004. Handbook on Import Risk Analysis for Animals and Animal Products Volume 2 Quantitative Risk Assessment. Vol. 2. https://rr-africa.woah.org/app/uploads/2018/03/handbook_on_import_risk_analysis_-_oie_-_vol_ii.pdf.
Pouillot, R., and M.-L. Delignette-Muller. 2010. “Evaluating Variability and Uncertainty in Microbial Quantitative Risk Assessment Using Two r Packages” 142: 330–40.
Vose, David. 2008. Risk Analysis, a Quantitative Guide. Chichester, England: John Wiley & Sons.