---------------------------------------------------------------------- This is the API documentation for the sim_tools library. ---------------------------------------------------------------------- ## Classes Main classes provided by the package Bernoulli(p: float, random_seed: Union[int, numpy.random.bit_generator.SeedSequence, NoneType] = None) Bernoulli distribution implementation. A discrete probability distribution that takes value 1 with probability p and value 0 with probability 1-p. This class conforms to the Distribution protocol and provides methods to sample from a Bernoulli distribution with a specified probability. Beta(alpha1: float, alpha2: float, lower_bound: float = 0.0, upper_bound: float = 1.0, random_seed: Union[int, numpy.random.bit_generator.SeedSequence, NoneType] = None) Beta distribution implementation. A flexible continuous probability distribution defined on the interval [0,1], which can be rescaled to any arbitrary interval [min, max]. As defined in Simulation Modeling and Analysis (Law, 2007). Common uses: ----------- 1. Useful as a rough model in the absence of data 2. Distribution of a random proportion 3. Time to complete a task CombinationDistribution(*args: sim_tools.distributions.Distribution, dists=None) Combination distribution implementation. A distribution that combines (sums) samples from multiple underlying distributions. Useful for modeling compound effects or building complex distributions from simpler ones. This class conforms to the Distribution protocol and provides methods to sample a combination of values from multiple distributions. DiscreteEmpirical(values: Union[numpy._typing._array_like._Buffer, numpy._typing._array_like._SupportsArray[numpy.dtype[Any]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[Any]]], complex, bytes, str, numpy._typing._nested_sequence._NestedSequence[complex | bytes | str]], freq: Union[numpy._typing._array_like._Buffer, numpy._typing._array_like._SupportsArray[numpy.dtype[Any]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[Any]]], complex, bytes, str, numpy._typing._nested_sequence._NestedSequence[complex | bytes | str]], random_seed: Union[int, numpy.random.bit_generator.SeedSequence, NoneType] = None) DiscreteEmpirical distribution implementation. A probability distribution that samples values with specified frequencies. Useful for modeling categorical data or discrete outcomes with known probabilities. Example uses: ------------- 1. Routing percentages 2. Classes of entity 3. Batch sizes of arrivals DistributionRegistry() Registry for probability distribution classes with batch creation capabilities. The DistributionRegistry provides a central repository for registering distribution classes and instantiating them from configuration data. This facilitates dynamic creation of distribution objects based on runtime configuration, supporting scenarios like simulation models, statistical analysis, or any application requiring configurable random distributions. Key features: - Register distribution classes with a simple decorator - Create distribution instances from class names and parameters - Batch create multiple distributions from a dictionary or list - Automatic generation of statistically independent random seeds Examples -------- Register distribution classes: >>> @DistributionRegistry.register() ... class Exponential: ... def __init__(self, mean, random_seed=None): ... self.rng = np.random.default_rng(random_seed) ... self.mean = mean ... ... def sample(self, size=None): ... return self.rng.exponential(self.mean, size=size) ... >>> @DistributionRegistry.register("uniform_dist") # Custom name ... class Uniform: ... def __init__(self, low, high, random_seed=None): ... self.rng = np.random.default_rng(random_seed) ... self.low = low ... self.high = high ... ... def sample(self, size=None): ... return self.rng.uniform(self.low, self.high, size=size) Create a distribution with parameters: >>> exp_dist = DistributionRegistry.create("Exponential", mean=2.0) >>> exp_dist.sample(5) # Generate 5 samples Create multiple distributions from a flat configuration: >>> config = { ... "arrivals": { ... "class_name": "Exponential", ... "params": {"mean": 5.0} ... }, ... "service_times": { ... "class_name": "uniform_dist", ... "params": {"low": 1.0, "high": 3.0} ... } ... } >>> distributions = DistributionRegistry.create_batch(config, ... main_seed=12345) >>> arrivals = distributions["arrivals"] >>> service_times = distributions["service_times"] Create distributions from a nested configuration: >>> config = { ... "call": { ... "C1": {"class_name": "Exponential", "params": {"mean": 10.0}}, ... "C2": {"class_name": "Exponential", "params": {"mean": 8.0}}, ... }, ... "response_time": { ... "C1": { ... "class_name": "Lognormal", ... "params": {"mean": 30.0, "stdev": 5.0}, ... }, ... }, ... } >>> dists = DistributionRegistry.create_batch( ... config, main_seed=42, preserve_structure=True ... ) >>> dists["call"]["C1"].sample(5) >>> dists["response_time"]["C1"].sample(5) Notes ----- When creating distributions in batch with a main_seed, each distribution receives its own statistically independent seed derived from the main seed. This ensures proper statistical independence between random number streams while maintaining overall reproducibility through the main seed. When `preserve_structure=True`, the output mirrors the shape of the input configuration, so nested configs produce nested dicts of instances. When `preserve_structure=False` (default), all leaf distributions are flattened into a single dict whose keys are the path components joined by underscores (e.g. `"call_C1"`). With `sort=True` (default), keys at every nesting level are sorted alphabetically before seeds are assigned. This ensures that seed assignment is stable even if the insertion order of keys in the config dict changes between runs. Erlang(mean: float, stdev: float, location: float = 0.0, random_seed: Union[int, numpy.random.bit_generator.SeedSequence, NoneType] = None) Erlang distribution implementation. A continuous probability distribution that is a special case of the Gamma distribution where the shape parameter is an integer. This implementation allows users to specify mean and standard deviation rather than shape (k) and scale (theta) parameters. This class conforms to the Distribution protocol and provides methods to sample from an Erlang distribution with specified parameters. Notes ----- The Erlang is a special case of the gamma distribution where k is an integer. Internally this is implemented using numpy Generator's gamma method. The k parameter is calculated from the mean and standard deviation and rounded to an integer. Sources ------- Conversion between mean+stdev to k+theta: https://www.statisticshowto.com/erlang-distribution/ ErlangK(k: int, theta: float, location: float = 0.0, random_seed: Union[int, numpy.random.bit_generator.SeedSequence, NoneType] = None) Erlang distribution where k and theta are specified. The Erlang is a special case of the gamma distribution where k is a positive integer. Internally this is implemented using numpy Generator's gamma method. Optionally a user can offset the origin of the distribution using the location parameter. Exponential(mean: float, random_seed: Union[int, numpy.random.bit_generator.SeedSequence, NoneType] = None) Exponential distribution implementation. A probability distribution that models the time between events in a Poisson process, where events occur continuously and independently at a constant average rate. This class conforms to the Distribution protocol and provides methods to sample from an exponential distribution with a specified mean. FixedDistribution(value: float) Fixed distribution implementation. A degenerate distribution that always returns the same fixed value. Useful for constants or deterministic parameters in models. This class conforms to the Distribution protocol and provides methods to sample a constant value regardless of the number of samples requested. Gamma(alpha: float, beta: float, location: float = 0.0, random_seed: Union[int, numpy.random.bit_generator.SeedSequence, NoneType] = None) Gamma distribution implementation with shape (alpha) and scale (beta) parameters. This class conforms to the Distribution protocol and provides methods to: - Calculate theoretical mean and variance - Derive parameters from specified mean/variance - Generate samples from the distribution GroupedContinuousEmpirical(lower_bounds: Union[numpy._typing._array_like._Buffer, numpy._typing._array_like._SupportsArray[numpy.dtype[Any]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[Any]]], complex, bytes, str, numpy._typing._nested_sequence._NestedSequence[complex | bytes | str]], upper_bounds: Union[numpy._typing._array_like._Buffer, numpy._typing._array_like._SupportsArray[numpy.dtype[Any]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[Any]]], complex, bytes, str, numpy._typing._nested_sequence._NestedSequence[complex | bytes | str]], freq: Union[numpy._typing._array_like._Buffer, numpy._typing._array_like._SupportsArray[numpy.dtype[Any]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[Any]]], complex, bytes, str, numpy._typing._nested_sequence._NestedSequence[complex | bytes | str]], random_seed: Union[int, numpy.random.bit_generator.SeedSequence, NoneType] = None) Continuous Empirical Distribution for Grouped Data implementation. A distribution that performs linear interpolation between upper and lower bounds of a discrete distribution. Useful for modeling empirical data with a continuous approximation. This class conforms to the Distribution protocol and provides methods to sample from a continuous empirical distribution. Hyperexponential(probs: Union[numpy._typing._array_like._Buffer, numpy._typing._array_like._SupportsArray[numpy.dtype[Any]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[Any]]], complex, bytes, str, numpy._typing._nested_sequence._NestedSequence[complex | bytes | str]], rates: Union[numpy._typing._array_like._Buffer, numpy._typing._array_like._SupportsArray[numpy.dtype[Any]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[Any]]], complex, bytes, str, numpy._typing._nested_sequence._NestedSequence[complex | bytes | str]], random_seed: Union[int, numpy.random.bit_generator.SeedSequence, NoneType] = None) Hyperexponential distribution implementation. A continuous probability distribution that is a mixture (weighted sum) of exponential distributions. It has a higher coefficient of variation than a single exponential distribution, making it useful for modeling highly variable processes or heavy-tailed phenomena. The hyperexponential distribution is useful to model service processes where customers may require fundamentally different types of service with varying durations. For example, in a technical support call center, customers might either: 1. Have a simple issue (resolved quickly with rate λ₁) with probability p₁ 2. Have a complex issue (requiring longer service with rate λ₂) with probability p₂ This class conforms to the Distribution protocol and provides methods to sample from a hyperexponential distribution with specified phase probabilities and rates. Lognormal(mean: float, stdev: float, random_seed: Union[int, numpy.random.bit_generator.SeedSequence, NoneType] = None) Lognormal distribution implementation. A continuous probability distribution where the logarithm of a random variable is normally distributed. It is useful for modeling variables that are the product of many small independent factors. This class conforms to the Distribution protocol and provides methods to sample from a lognormal distribution with a specified mean and standard deviation. Normal(mean: float, sigma: float, minimum: Optional[float] = None, random_seed: Union[int, numpy.random.bit_generator.SeedSequence, NoneType] = None) Normal distribution implementation with optional truncation. A continuous probability distribution that follows the Gaussian bell curve. This implementation allows truncating the distribution at a minimum value. This class conforms to the Distribution protocol and provides methods to sample from a normal distribution with specified mean and standard deviation. PearsonV(alpha: float, beta: float, random_seed: Union[int, numpy.random.bit_generator.SeedSequence, NoneType] = None) Pearson Type V distribution implementation (inverse Gamma distribution). Where alpha = shape, and beta = scale (both > 0). Law (2007, pg 293-294) defines the distribution as PearsonV(alpha, beta) = 1/Gamma(alpha, 1/beta) and notes that the PDF is similar to that of lognormal, but has a larger spike close to 0. It can be used to model the time to complete a task. For certain values of the shape parameter the mean and variance can be directly computed: mean = beta / (alpha - 1) for alpha > 1.0 var = beta^2 / (alpha - 1)^2 × (alpha - 2) for alpha > 2.0 This class conforms to the Distribution protocol. Alternative Sources: -------------------- [1] https://riskwiki.vosesoftware.com/PearsonType5distribution.php [2] https://modelassist.epixanalytics.com/display/EA/Pearson+Type+5 Note ---- A good R package for Pearson distributions is PearsonDS https://www.rdocumentation.org/packages/PearsonDS/versions/1.3.0 PearsonVI(alpha1: float, alpha2: float, beta: float, random_seed: Union[int, numpy.random.bit_generator.SeedSequence, NoneType] = None) Pearson Type VI distribution implementation (inverted beta distribution). Where: - alpha1 = shape parameter 1 (> 0) - alpha2 = shape parameter 2 (> 0) - beta = scale (> 0) Law (2007, pg 294-295) notes that PearsonVI can be used to model the time to complete a task. For certain values of the shape parameters, the mean and variance can be directly computed. See functions mean() and var() for details. Sampling: --------- Pearson6(a1,a2,b) = b*X/(1-X), where X=Beta(a1,a2) This class conforms to the Distribution protocol. Sources: -------- [1] https://riskwiki.vosesoftware.com/PearsonType6distribution.php Note ---- A good R package for Pearson distributions is PearsonDS https://www.rdocumentation.org/packages/PearsonDS/versions/1.3.0 Poisson(rate: float, random_seed: Union[int, numpy.random.bit_generator.SeedSequence, NoneType] = None) Poisson distribution implementation. Used to simulate number of events that occur in an interval of time. E.g. number of items in a batch. This class conforms to the Distribution protocol. Sources: -------- Law (2007 pg. 308) Simulation modelling and analysis. RawContinuousEmpirical(data: Union[numpy._typing._array_like._Buffer, numpy._typing._array_like._SupportsArray[numpy.dtype[Any]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[Any]]], complex, bytes, str, numpy._typing._nested_sequence._NestedSequence[complex | bytes | str]], random_seed: Union[int, numpy.random.bit_generator.SeedSequence, NoneType] = None) Continuous Empirical Distribution for Raw Data using Law and Kelton's method. A distribution that performs linear interpolation between data points according to the algorithm described in Law & Kelton's "Simulation Modeling and Analysis". The implementation follows a two-step approach: 1. Generate U ~ Uniform(0, 1), calculate P = (n-1)U, and I = int(P) + 1 2. Return X_I + (P-I)(X_{I+1} - X_I) This approach ensures proper weighting across intervals and is suitable for both Monte Carlo and discrete-event simulation applications. Maximum and minimum values of the distribution are defined by the data. RawDiscreteEmpirical(values: Union[numpy._typing._array_like._Buffer, numpy._typing._array_like._SupportsArray[numpy.dtype[Any]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[Any]]], complex, bytes, str, numpy._typing._nested_sequence._NestedSequence[complex | bytes | str]], random_seed: Union[int, numpy.random.bit_generator.SeedSequence, NoneType] = None) Raw Empirical distribution implementation. Samples with replacement from a list of empirical values. Useful when no theoretical distribution fits the observed data well. This class conforms to the Distribution protocol. Triangular(low: float, mode: float, high: float, random_seed: Union[int, numpy.random.bit_generator.SeedSequence, NoneType] = None) Triangular distribution implementation. A continuous probability distribution with lower limit, upper limit, and mode, forming a triangular-shaped probability density function. This class conforms to the Distribution protocol and provides methods to sample from a triangular distribution with specified parameters. TruncatedDistribution(dist_to_truncate: sim_tools.distributions.Distribution, lower_bound: float) Truncated Distribution implementation. Wraps any distribution conforming to the Distribution protocol and truncates samples at a specified lower bound. No resampling is performed; the class simply ensures no values are below the lower bound. This class itself conforms to the Distribution protocol. Uniform(low: float, high: float, random_seed: Union[int, numpy.random.bit_generator.SeedSequence, NoneType] = None) Uniform distribution implementation. A continuous probability distribution where all values in a range have equal probability of being sampled. This class conforms to the Distribution protocol and provides methods to sample from a uniform distribution between specified low and high values. Weibull(alpha: float, beta: float, location: float = 0.0, random_seed: Union[int, numpy.random.bit_generator.SeedSequence, NoneType] = None) Weibull distribution implementation. A continuous probability distribution useful for modeling time-to-failure and similar phenomena. Characterized by shape (alpha) and scale (beta) parameters. This implementation also includes a third parameter "location" (default=0) to shift the distribution if a lower bound is needed. The probability density function (PDF) is: f(x) = (α/β) * ((x-location)/β)^(α-1) * exp(-((x-location)/β)^α) for x ≥ location, where α is the shape parameter and β is the scale parameter. The samples are generated using: X = scale × (-ln(U))^(1/shape) + location where U is a uniform random number between 0 and 1. The Weibull distribution reduces to: - Exponential distribution when shape=1 - Rayleigh distribution when shape=2 - Approximately Normal distribution when shape≈3.4 OnlineStatistics(data: Optional[numpy.ndarray] = None, alpha: Optional[float] = 0.1, observer: Optional[sim_tools.output_analysis.ReplicationObserver] = None) -> None Computes running sample mean and variance using Welford's algorithm. This is a robust and numerically stable approach first described in the 1960s and popularised in Donald Knuth's *The Art of Computer Programming* (Vol. 2). The term *"online"* means each new data point is processed immediately to update statistics, without storing or reprocessing the entire dataset. This implementation additionally supports computation of: - Confidence intervals (CIs). - Percentage deviation of CI half-widths from the mean. Attributes ---------- n : int Number of data points processed so far. x_i : float Most recent data point. mean : float Current running mean. _sq : float Sum of squared differences from the current mean (used for variance). alpha : float Significance level for confidence interval calculations observer : list Registered observers notified upon updates. ReplicationTabulizer() Observer class for recording replication results from an `OnlineStatistics` instance during simulation runs or repeated experiments. Implements the observer pattern to collect statistics after each update from the observed object, enabling later tabulation and analysis. After data collection, results can be exported as a summary dataframe (equivalent Implement as the part of observer pattern. Provides a summary frame to the output of `confidence_interval_method`). Attributes ---------- stdev : list[float] Sequence of recorded standard deviations. lower : list[float] Sequence of recorded lower confidence interval bounds. upper : list[float] Sequence of recorded upper confidence interval bounds. dev : list[float] Sequence of recorded percentage deviations of CI half-width from the mean. cumulative_mean : list[float] Sequence of running mean values. x_i : list[float] Sequence of last observed raw data points. n : int Total number of updates recorded. ReplicationsAlgorithm(alpha: Optional[float] = 0.05, half_width_precision: Optional[float] = 0.1, initial_replications: Optional[int] = 3, look_ahead: Optional[int] = 5, replication_budget: Optional[float] = 1000, verbose: Optional[bool] = False, observer_factory: Optional[Callable[[], sim_tools.output_analysis.AlgorithmObserver]] = None) Automatically determine the number of simulation replications needed to achieve and maintain a target confidence interval precision. Implements the *Replications Algorithm* from Hoad, Robinson & Davies (2010), which combines: - The **confidence interval method** to assess whether the target precision has been met. - A **sequential look-ahead procedure** to verify that precision remains stable in additional replications. Attributes ---------- alpha : float Significance level for confidence interval calculations. half_width_precision : float Target CI half-width precision (i.e. percentage deviation of the confidence interval from the mean). initial_replications : int Number of replications to run before evaluating precision. look_ahead : int Number of additional replications to simulate for stability checks (adjusted proportionally when `n > 100`). replication_budget : int Maximum number of replications allowed. verbose : bool If True, prints the current replication count during execution. observer_factory : callable or None Callable returning a new observer instance for each metric. Should be a function, lambda, or class constructor taking no arguments. Returned object must follow the `AlgorithmObserver` protocol. If None, uses `ReplicationTabulizer`. n : int Current replication count (updated during execution). _n_solution : int Solution replication count once convergence is met (or replication budget if not met). stats : OnlineStatistics or None Tracks running mean, variance, and confidence interval metrics. References ---------- Hoad, K., Robinson, S., & Davies, R. (2010). Automated selection of the number of replications for a discrete-event simulation. *Journal of the Operational Research Society*, 61(11), 1632-1644. https://www.jstor.org/stable/40926090 DistributionRegistry() Registry for probability distribution classes with batch creation capabilities. The DistributionRegistry provides a central repository for registering distribution classes and instantiating them from configuration data. This facilitates dynamic creation of distribution objects based on runtime configuration, supporting scenarios like simulation models, statistical analysis, or any application requiring configurable random distributions. Key features: - Register distribution classes with a simple decorator - Create distribution instances from class names and parameters - Batch create multiple distributions from a dictionary or list - Automatic generation of statistically independent random seeds Examples -------- Register distribution classes: >>> @DistributionRegistry.register() ... class Exponential: ... def __init__(self, mean, random_seed=None): ... self.rng = np.random.default_rng(random_seed) ... self.mean = mean ... ... def sample(self, size=None): ... return self.rng.exponential(self.mean, size=size) ... >>> @DistributionRegistry.register("uniform_dist") # Custom name ... class Uniform: ... def __init__(self, low, high, random_seed=None): ... self.rng = np.random.default_rng(random_seed) ... self.low = low ... self.high = high ... ... def sample(self, size=None): ... return self.rng.uniform(self.low, self.high, size=size) Create a distribution with parameters: >>> exp_dist = DistributionRegistry.create("Exponential", mean=2.0) >>> exp_dist.sample(5) # Generate 5 samples Create multiple distributions from a flat configuration: >>> config = { ... "arrivals": { ... "class_name": "Exponential", ... "params": {"mean": 5.0} ... }, ... "service_times": { ... "class_name": "uniform_dist", ... "params": {"low": 1.0, "high": 3.0} ... } ... } >>> distributions = DistributionRegistry.create_batch(config, ... main_seed=12345) >>> arrivals = distributions["arrivals"] >>> service_times = distributions["service_times"] Create distributions from a nested configuration: >>> config = { ... "call": { ... "C1": {"class_name": "Exponential", "params": {"mean": 10.0}}, ... "C2": {"class_name": "Exponential", "params": {"mean": 8.0}}, ... }, ... "response_time": { ... "C1": { ... "class_name": "Lognormal", ... "params": {"mean": 30.0, "stdev": 5.0}, ... }, ... }, ... } >>> dists = DistributionRegistry.create_batch( ... config, main_seed=42, preserve_structure=True ... ) >>> dists["call"]["C1"].sample(5) >>> dists["response_time"]["C1"].sample(5) Notes ----- When creating distributions in batch with a main_seed, each distribution receives its own statistically independent seed derived from the main seed. This ensures proper statistical independence between random number streams while maintaining overall reproducibility through the main seed. When `preserve_structure=True`, the output mirrors the shape of the input configuration, so nested configs produce nested dicts of instances. When `preserve_structure=False` (default), all leaf distributions are flattened into a single dict whose keys are the path components joined by underscores (e.g. `"call_C1"`). With `sort=True` (default), keys at every nesting level are sorted alphabetically before seeds are assigned. This ensures that seed assignment is stable even if the insertion order of keys in the config dict changes between runs. NSPPThinning(data: pandas.DataFrame, interval_width: Optional[float] = None, random_seed1: Union[int, numpy.random.bit_generator.SeedSequence, NoneType] = None, random_seed2: Union[int, numpy.random.bit_generator.SeedSequence, NoneType] = None) Non Stationary Poisson Process via Thinning. Thinning is an acceptance-rejection approach to sampling inter-arrival times (IAT) from a time-dependent distribution where each time period follows its own exponential distribution. This implementation takes mean inter-arrival times as inputs, making it consistent with NumPy's exponential distribution parameterization. ## Abstract Classes Abstract base classes ## Protocols Protocol / structural-typing interfaces Distribution(*args, **kwargs) Distribution protocol defining the interface for probability distributions. Any class implementing this protocol should provide a sampling mechanism that generates random values according to a specific probability distribution. AlgorithmObserver(*args, **kwargs) Protocol for observer classes used in ReplicationsAlgorithm. Classes implementing this protocol should provide a `dev` attribute to store observations, a method `update` to add new results, and a `summary_table` method to summarize the stored replication statistics. Attributes ---------- dev : List[Any] Collection of observed replication results. Methods ------- update(results) -> None Add an observation of a replication. summary_table() -> pd.DataFrame Create a DataFrame summarising all recorded replication statistics. ReplicationObserver(*args, **kwargs) Interface (protocol) for observers that track simulation replication results. ReplicationsAlgorithmModelAdapter(*args, **kwargs) Adapter pattern for the "Replications Algorithm". All models that use ReplicationsAlgorithm must provide a single_run(replication_number) interface. ## Functions Utility functions load_banks_et_al_nspp() -> pandas.DataFrame Load example Non-stationary poisson process data from Banks et al. The function reads in the mean inter-arrival times by interval. The arrival rate (1/mean_iat) is calculated. Returns ------- pandas.DataFrame is_integer(value: Any, name: str) -> None Validates that a value is an integer. is_non_negative(value: Any, name: str) -> None Validates that a value is greater than or equal to 0. is_numeric(value: Any, name: str) -> None Validates that a value is a number (int or float). is_ordered_pair(low: Any, high: Any, low_name: str = 'low', high_name: str = 'high') -> None Validates that two values are in ascending order (low < high). is_ordered_triplet(low: Any, middle: Any, high: Any, low_name: str = 'low', middle_name: str = 'middle', high_name: str = 'high') -> None Validates that three values are in ascending order. is_positive(value: Any, name: str) -> None Validates that a value is positive (> 0). is_positive_array(value: numpy.ndarray, name: str) -> None Validates that all elements in the array are positive. is_probability(value: Any, name: str) -> None Validates that a value is a valid probability (between 0 and 1). is_probability_vector(value: numpy.ndarray, name: str) -> None Validates that the array is a valid probability vector. spawn_seeds(n_streams: int, main_seed: Optional[int] = None) Generate multiple statistically independent random seeds. This function creates a set of SeedSequence objects that are guaranteed to produce independent streams of random numbers. This is crucial for ensuring that multiple random number generators don't produce correlated outputs, which could bias simulation results. Parameters ---------- n_streams : int The number of independent seed sequences to generate. Must be a positive integer. main_seed : Optional[int], default=None Master seed that determines all generated sequences. If None, a random entropy source is used, making results non-reproducible across runs. Providing a value enables reproducible sequences. Returns ------- List[np.random.SeedSequence] A list of n_streams SeedSequence objects that can be used to initialize random number generators with independent streams. Notes ----- This approach is preferred over manually creating seeds because it uses NumPy's entropy pool management to guarantee statistical independence between streams, avoiding subtle correlations that might occur with manually chosen seeds. Examples -------- >>> seeds = spawn_seeds(3, main_seed=12345) >>> rng1 = np.random.default_rng(seeds[0]) >>> rng2 = np.random.default_rng(seeds[1]) >>> rng3 = np.random.default_rng(seeds[2]) validate(value: Any, name: str, *validators: Callable[[Any, str], NoneType]) -> None Applies multiple validators to a value. confidence_interval_method(replications: Union[pandas.Series, pandas.DataFrame, Sequence[float], Sequence[Sequence[float]], Dict[str, Sequence[float]]], alpha: Optional[float] = 0.05, desired_precision: Optional[float] = 0.1, min_rep: Optional[int] = 5, decimal_places: Optional[int] = 2) Determine the minimum number of simulation replications required to achieve a target precision in the confidence interval of one or several performance metrics. This function applies the **confidence interval method**: it identifies the smallest replication count where the relative half-width of the confidence interval is less than the specified `desired_precision` for each metric. Parameters ---------- replications: array-like, pd.Series, pd.DataFrame, list, or dict Replication results for one or more performance metrics. Accepted formats: - `pd.Series` or 1D list/numpy array → single metric - `pd.DataFrame` → multiple metrics in columns - `dict[str, list/array/Series]` → {metric_name: replications} - list of lists / numpy arrays / Series → multiple metrics unnamed Each inner sequence/Series/numpy array must contain numeric replication results in the order they were generated. alpha: float, optional (default=0.05) Significance level for confidence interval calculations (CI level = 100 * (1 - alpha) %). desired_precision: float, optional (default=0.1) Target CI half-width precision (i.e. percentage deviation of the confidence interval from the mean). min_rep: int, optional (default=5) Minimum number of replications to consider before evaluating precision. Helps avoid unstable early results. decimal_places: int, optional (default=2) Number of decimal places to round values in the returned results table. Returns ------- - Single-metric input → tuple `(n_reps, results_df)` - Multi-metric input → dict: `{metric_name: (n_reps, results_df)}` Where: n_reps : int The smallest number of replications achieving the desired precision. Returns -1 if precision is never reached. results_df : pandas.DataFrame Summary statistics at each replication: "Mean", "Cumulative Mean", "Standard Deviation", "Lower Interval", "Upper Interval", "% deviation" Warns ----- UserWarning Issued per metric if the desired precision is never reached. plotly_confidence_interval_method(n_reps, conf_ints, metric_name, figsize=(1200, 400), shaded=True) Create an interactive Plotly visualisation of the cumulative mean and confidence intervals for each replication. This plot displays: - The running (cumulative) mean of a performance metric. - Lower and upper bounds of the confidence interval at each replication. - Annotated deviation (as % of mean) on hover. - A vertical dashed line at the minimum number of replications (`n_reps`) required to achieve the target precision. Parameters ---------- n_reps: int Minimum number of replications needed to achieve desired precision (typically the output of `confidence_interval_method`). conf_ints: pandas.DataFrame Results DataFrame from `confidence_interval_method`, containing columns: `"Cumulative Mean"`, `"Lower Interval"`, `"Upper Interval"`, etc. metric_name: str Name of the performance metric displayed in the y-axis label. figsize: tuple, optional (default=(1200,400)) Figure size in pixels: (width, height). shaded: bool, optional If True, use shaded CI region. If False, use dashed lines (legacy). Returns ------- plotly.graph_objects.Figure nspp_plot(arrival_profile: pandas.DataFrame, run_length: Optional[float] = None, n_reps: Optional[int] = 1000) -> Tuple[matplotlib.figure.Figure, matplotlib.axes._axes.Axes] Generate a matplotlib chart to visualise a non-stationary poisson process for the set arrival profile. This uses the sim-tools NSPPThinning class. Useful for validating the the NSPP has been set up correctly and is producing the desired profile for the simulation model. Parameters ---------- arrival_profile: pandas.DataFrame The arrival profile is a pandas data frame containing 't', 'arrival_rate' and 'mean_iat' columns. run_length: float, optional (default=None) How long should the simulation be run. If none then uses the last value in 't' + the interval (assumes equal width intervals) n_reps: int, optional (default=1000) The number of replications to run. nspp_simulation(arrival_profile: pandas.DataFrame, run_length: Optional[float] = None, n_reps: Optional[int] = 1000) -> pandas.DataFrame Generate a pandas dataframe that contains multiple replications of a non-stationary poisson process for the set arrival profile. This uses the sim-tools NSPPThinning class. Useful for validating the the NSPP has been set up correctly and is producing the desired profile for the simulation model. On each replication the function counts the number of arrivals during the intervals from the arrival profile. Returns a data frame with reps (rows) and interval arrivals (columns). Parameters ---------- arrival_profile: pandas.DataFrame The arrival profile is a pandas data frame containing 't', 'arrival_rate' and 'mean_iat' columns. run_length: float, optional (default=None) How long should the simulation be run. If none then uses the last value in 't' + the interval (assumes equal width intervals) n_reps: int, optional (default=1000) The number of replications to run. Returns ------- pd.DataFrame. ## Constants Module-level constants and data datasets.FILE_NAME_NSPP_1 str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'. datasets.PATH_NSPP_1 Path subclass for non-Windows systems. On a POSIX system, instantiating a Path should return this object. T Type variable. Usage:: T = TypeVar('T') # Can be anything A = TypeVar('A', str, bytes) # Must be str or bytes Type variables exist primarily for the benefit of static type checkers. They serve as the parameters for generic types as well as for generic function definitions. See class Generic for more information on generic types. Generic functions work as follows: def repeat(x: T, n: int) -> List[T]: '''Return a list containing n references to x.''' return [x]*n def longest(x: A, y: A) -> A: '''Return the longest of two strings.''' return x if len(x) >= len(y) else y The latter example's signature is essentially the overloading of (str, str) -> str and (bytes, bytes) -> bytes. Also note that if the arguments are instances of some subclass of str, the return type is still plain str. At runtime, isinstance(x, T) and issubclass(C, T) will raise TypeError. Type variables defined with covariant=True or contravariant=True can be used to declare covariant or contravariant generic types. See PEP 484 for more details. By default generic types are invariant in all type variables. Type variables can be introspected. e.g.: T.__name__ == 'T' T.__constraints__ == () T.__covariant__ == False T.__contravariant__ = False A.__constraints__ == (str, bytes) Note that only type variables defined in global scope can be pickled. output_analysis.ALG_INTERFACE_ERROR str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'. output_analysis.OBSERVER_INTERFACE_ERROR str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'. ## Other Additional exports sim_tools.ovs ---------------------------------------------------------------------- This is the User Guide documentation for the package. ---------------------------------------------------------------------- ### User Guide The User Guide is organised into four themes: * **Sampling:** Generate reproducible random input for your models. * **Debugging:** Lightweight tracing to help you understand and fix simulation logic. * **Output analysis:** Tools to help you plan the number of replications. * **Optimisation:** Optimisation via simulation procedures. ### Theoretical distributions This notebook provides an overview of how to use the `Distribution` classes from the `sim_tools.distributions` module. The classes are a higher level abstract of sampling from the numpy library. They encapsulate the distribution parameters, a random number generator instance, and a random seed. For example, the code for the Exponential class is: ```{.python} class Exponential: """ Exponential distribution implementation. A probability distribution that models the time between events in a Poisson process, where events occur continuously and independently at a constant average rate. This class conforms to the Distribution protocol and provides methods to sample from an exponential distribution with a specified mean. """ def __init__( self, mean: float, random_seed: Optional[Union[int, SeedSequence]] = None, ): """ Initialize an exponential distribution. Parameters ---------- mean : float The mean of the exponential distribution. Must be positive. random_seed : Optional[Union[int, SeedSequence]], default=None A random seed or SeedSequence to reproduce samples. If None, a unique sample sequence is generated. """ self.rng = np.random.default_rng(random_seed) self.mean = mean def sample( self, size: Optional[Union[int, Tuple[int, ...]]] = None ) -> Union[float, NDArray[np.float64]]: """ Generate random samples from the exponential distribution. Parameters ---------- size : Optional[Union[int, Tuple[int, ...]]], default=None The number/shape of samples to generate: - If None: returns a single sample as a float - If int: returns a 1-D array with that many samples - If tuple of ints: returns an array with that shape Returns ------- Union[float, NDArray[np.float64]] Random samples from the exponential distribution: - A single float when size is None - A numpy array of floats with shape determined by size parameter """ return self.rng.exponential(self.mean, size=size) ``` Here we can see that the `__init__` method accepts the distribution mean plus a random seed to control sampling from a `np.random.Generator` object. The creation of the generator is done by the super class `Distribution` The `sample` method delegates work to the `np.random.Generator` object. The abstraction into a class enables the distribution to be used as a simple parameter in a model. For some classes there are no direct mappings to `numpy`. Samples from these distributions are typically derived from other distributions. For example, `PearsonV` is an inverse gamma distribution. ```{.python} class PearsonV: """ Pearson Type V distribution implementation (inverse Gamma distribution). Where alpha = shape, and beta = scale (both > 0). Law (2007, pg 293-294) defines the distribution as PearsonV(alpha, beta) = 1/Gamma(alpha, 1/beta) and notes that the PDF is similar to that of lognormal, but has a larger spike close to 0. It can be used to model the time to complete a task. For certain values of the shape parameter the mean and variance can be directly computed: mean = beta / (alpha - 1) for alpha > 1.0 var = beta^2 / (alpha - 1)^2 × (alpha - 2) for alpha > 2.0 This class conforms to the Distribution protocol. Alternative Sources: -------------------- [1] https://riskwiki.vosesoftware.com/PearsonType5distribution.php [2] https://modelassist.epixanalytics.com/display/EA/Pearson+Type+5 Notes: ------ A good R package for Pearson distributions is PearsonDS https://www.rdocumentation.org/packages/PearsonDS/versions/1.3.0 """ def __init__( self, alpha: float, beta: float, random_seed: Optional[Union[int, SeedSequence]] = None, ): """ Initialize a Pearson Type V distribution. Parameters ---------- alpha : float Shape parameter. Must be > 0. beta : float Scale parameter. Must be > 0. random_seed : Optional[Union[int, SeedSequence]], default=None A random seed or SeedSequence to reproduce samples. If None, a unique sample sequence is generated. Raises ------ ValueError If alpha or beta are not positive. """ if alpha <= 0 or beta <= 0: raise ValueError("alpha and beta must be > 0") self.rng = np.random.default_rng(random_seed) self.alpha = alpha # shape self.beta = beta # scale def mean(self) -> float: """ Calculate the mean of the Pearson Type V distribution. Returns ------- float The theoretical mean of this distribution. Raises ------ ValueError If alpha <= 1.0, as the mean is not defined in this case. """ if self.alpha > 1.0: return self.beta / (self.alpha - 1) msg = "Cannot directly compute mean when alpha <= 1.0" raise ValueError(msg) def var(self) -> float: """ Calculate the variance of the Pearson Type V distribution. Returns ------- float The theoretical variance of this distribution. Raises ------ ValueError If alpha <= 2.0, as the variance is not defined in this case. """ if self.alpha > 2.0: return (self.beta**2) / ( ((self.alpha - 1) ** 2) * (self.alpha - 2) ) msg = "Cannot directly compute var when alpha <= 2.0" raise ValueError(msg) def sample( self, size: Optional[Union[int, Tuple[int, ...]]] = None ) -> Union[float, NDArray[np.float64]]: """ Generate random samples from the Pearson Type V distribution. Parameters ---------- size : Optional[Union[int, Tuple[int, ...]]], default=None The number/shape of samples to generate: - If None: returns a single sample as a float - If int: returns a 1-D array with that many samples - If tuple of ints: returns an array with that shape Returns ------- Union[float, NDArray[np.float64]] Random samples from the Pearson Type V distribution: - A single float when size is None - A numpy array of floats with shape determined by size parameter """ return 1 / self.rng.gamma(self.alpha, 1 / self.beta, size) ``` ## Summary of implemented distributions 1. **Exponential**: This class is used for modeling the time between events in a Poisson process, where events occur continuously and independently at a constant average rate. It takes the mean of the distribution and an optional random seed as parameters. 2. **Bernoulli**: This class models a process where an event has only two possible outcomes (success or failure). It takes the probability of success and an optional random seed as parameters. 3. **Lognormal**: This class is used when the logarithm of a variable is normally distributed. It takes the mean and standard deviation of the lognormal distribution and an optional random seed as parameters. 4. **Normal**: This class models a process where a variable is normally distributed. It takes the mean and standard deviation of the normal distribution, a boolean to allow or disallow negative samples, and an optional random seed as parameters. 5. **Uniform**: This class models a process where all outcomes are equally likely. It takes the lower and upper range of the uniform distribution and an optional random seed as parameters. 6. **Triangular**: This class models a process where a variable is most likely to be at the mode (peak) and the probabilities decrease uniformly on either side. It takes the lower limit, mode, upper limit, and an optional random seed as parameters. 7. **FixedDistribution**: This class returns a fixed value, useful when a constant value is needed in a simulation. It takes a fixed value as a parameter. 8. **CombinationDistribution**: This class is used to model a process where the outcome is the sum of outcomes from multiple distributions. It takes multiple distribution instances as parameters. 9. **GroupedContinuousEmpirical**: This class is used to model a process based on observed data, where the probability of an outcome is proportional to its frequency in the data. To produce a continuous value, linear interpolation between bounds is used. It takes lower bounds (of a discrete empirical distribution), upper bounds, observed frequencies, and an optional random seed as parameters. 10. **Erlang**: This class is used for modeling the time between events in a Poisson process, where events occur continuously and independently at a constant average rate. The class is implemented to allow for users to input the mean, and stdev of the distribution as opposed to k and theta. This allows for compatibility of parameters with other commerical simulation packages. The mean and stdev are converted to k and theta internally. The Erlang is a special case of the gamma distribution where is an integer. Internally this is implemented using numpy Generator's gamma method. As k is calculated from the mean and stdev it is rounded to an integer value using python's built in 'round' function. Optionally a user can offet the original of the distribution using the location parameter. 11. **ErlangK**: The same as **Erlang**, but `k` and `theta` are specified directly by a user. 12. The **Weibull** distribution class is characterized by a shape parameter (alpha) and a scale parameter (beta), both of which must be greater than zero. The scale parameter influences the variance of the samples. An optional location parameter can shift the distribution to set a lower bound. This distribution is useful for modeling the life duration of materials and systems, where the failure rate is not constant over time[1]. 13. The **Gamma** distribution class is defined by an alpha (scale) and beta (shape) parameter. It includes methods to compute the mean and variance, as well as a static method to determine alpha and beta from a specified mean and variance. The Gamma distribution is often used to model waiting times for multiple events in a Poisson process. 14. The **Beta** distribution class accepts two shape parameters (alpha1 and alpha2) and can be rescaled to a specific range using lower and upper bounds. It is commonly used to model random variables that are constrained to an interval, such as proportions and percentages. 15. **DiscreteEmpirical** distribution class allows sampling from a set of values with given observed frequencies. It is particularly useful for modeling scenarios where outcomes have specific probabilities, such as routing percentages or entity classes. 16. **TruncatedDistribution** class can truncate any given distribution at a specified lower bound. It is useful when modeling variables that have a natural lower limit, ensuring that no sampled values fall below this threshold. 17. The **RawDiscreteEmpirical** distribution class enables sampling with replacement from a list of raw empirical values. This class is beneficial when no theoretical distribution fits the observed data well, and a non-parametric approach is preferred. 18. The **PearsonV** (Pearson type 5) distribution, also known as the inverse Gamma distribution, is characterized by two parameters: alpha (shape) and beta (scale), both of which must be greater than zero. It is particularly useful for modeling time delays where there is a minimum delay and the maximum delay is unbounded. The mean and variance of the Pearson Type V distribution can be directly computed for certain values of the shape parameter. 19. The **PearsonVI** (Pearson Type 6) distribution, an inverted beta distribution, is defined by three parameters: alpha1 (shape parameter 1), alpha2 (shape parameter 2), and beta (scale parameter), all of which must be greater than zero. The Pearson Type VI distribution can be used to model the time to complete tasks and is sampled using the transformation Pearson6(a1, a2, b) = b * X / (1 - X), where X is a Beta-distributed random variable with parameters alpha1 and alpha2. The mean and variance can be computed directly for certain values of the second shape parameter. 20. The **Hyperexoponential** distribution. A continuous probability distribution that is a mixture (weighted sum) of exponential distributions. It has a higher coefficient of variation than a single exponential distribution, making it useful for modeling highly variable processes or heavy-tailed phenomena. ## Example output The code below creates an instance of the distributions on offer. Here we used the `size` parameter in `sample()` to produce a large number of samples to allow for visualisation. ```{python} # pylint: disable=missing-module-docstring,invalid-name # Import package to generate plots import matplotlib.pyplot as plt # Import a selection of distributions from sim_tools.distributions import ( Exponential, Poisson, Bernoulli, Lognormal, Normal, Uniform, Triangular, FixedDistribution, CombinationDistribution, GroupedContinuousEmpirical, Erlang, ErlangK, Weibull, Gamma, Beta, DiscreteEmpirical, PearsonV, PearsonVI, Hyperexponential, ) ``` ### Create plots ```{python} # Create an instance of each distribution and generate 10,000 samples distributions = [ Exponential(mean=1.0, random_seed=42), Poisson(rate=1.0, random_seed=42), Erlang(mean=32.0, stdev=9.26, location=2.83, random_seed=42), ErlangK(k=3, theta=1.0, location=0.0, random_seed=42), Bernoulli(p=0.3, random_seed=42), Lognormal(mean=10, stdev=1, random_seed=42), Normal(mean=0, sigma=1, minimum=None, random_seed=42), Uniform(low=0, high=1, random_seed=42), Triangular(low=0, mode=0.5, high=1, random_seed=42), FixedDistribution(value=5), CombinationDistribution(Exponential(mean=1.0), Normal(mean=0, sigma=1)), Hyperexponential([0.5, 0.5], [1.0, 0.2], random_seed=42), Weibull(alpha=1.93, beta=19.15, random_seed=42), Gamma(alpha=2.84, beta=7.42, location=0.0, random_seed=42), Beta(alpha1=1.32, alpha2=2.56, lower_bound=10.0, upper_bound=15.0, random_seed=42), DiscreteEmpirical(values=[1, 2, 3], freq=[95, 3, 2], random_seed=42), PearsonV(alpha=10.0, beta=2.5, random_seed=42), PearsonVI(alpha1=10.0, alpha2=5.0, beta=0.85, random_seed=42), ] fig, axs = plt.subplots(len(distributions) // 2, 2, figsize=(15, 15)) for i, distribution in enumerate(distributions): row = i // 2 col = i % 2 samples = distribution.sample(size=100_000) axs[row, col].hist(samples, bins=30, alpha=0.7) axs[row, col].set_title(distribution.__class__.__name__) fig.tight_layout() plt.show() ``` ## Single samples ```{python} arrival_dist = Exponential(mean=30, random_seed=42) arrival_dist.sample() ``` ```{python} activity_time = GroupedContinuousEmpirical( lower_bounds=[0, 5, 10, 15, 30, 45, 60, 120, 180, 240, 480], upper_bounds=[5, 10, 15, 30, 45, 60, 120, 180, 240, 480, 2880], freq=[34, 4, 8, 13, 15, 13, 19, 13, 9, 12, 73], random_seed=42, ) activity_time.sample() ``` ## Overview This notebook demonstrates how to use the `DistributionRegistry` class for managing probability distributions in `sim-tools`. The `DistributionRegistry` aims to support simulation modellers set their models up to easily be parameterised with statistical distributions. It offers the following features: - **Configuration flexibility**: Define distributions through simple dictionaries or JSON files - **Reproducibility**: Generate statistically independent random streams with controlled seeds - **Extensibility**: Easily add new distribution types without changing existing code - **Batch creation**: Create multiple related distributions with a single command The approach is useful for building discrete event simulations, Monte Carlo analyses, statistical models, or any system that requires configurable random behavior. The registry pattern provides a clean, maintainable approach that separates configuration from implementation. In this notebook, we'll cover: 1. Importing the `DistributionRegistry`. 2. Creating distributions from configuration data. 3. Working with JSON configurations. 4. Define and add your own distribution to the registry. 5. Using the `sort` argument in `create_batch`. ## 1. Importing The `DistributionRegistry` is part of the `distributions` module. You can think of the registry as a factory that takes distribution orders and delivers ready to use distribution objects. We do not need to create an instance of the `DistributionRegistry`. Instead we will use its class methods. ```{python} # pylint: disable=missing-module-docstring,invalid-name import json import numpy as np from sim_tools.distributions import DistributionRegistry ``` ## 2. Creating batches of distributions from configuration data `DistributionRegistry` allows a user to create a batch of statistical distributions in one go using `create_batch`. A user can either pass in a `dict` or a `list`. > `DistributionRegistry` sets up non-overlapping pseudo random number streams for the distributions using the `distributions.spawn_seeds()` function. For reproducibile results a user should se the `main_seed` parameter of `create_batch`. ```{python} # Dictionary-based configuration (named distributions) config_dict = { "customer_arrivals": {"class_name": "Exponential", "params": {"mean": 4.5}}, "service_times": {"class_name": "Normal", "params": {"mean": 10.0, "sigma": 2.0}}, "satisfaction_scores": { "class_name": "Triangular", "params": {"low": 1.0, "high": 10.0, "mode": 8.0}, }, } # Create all distributions with a master seed distributions = DistributionRegistry.create_batch(config_dict, main_seed=12345) # Access distributions by name arrivals = distributions["customer_arrivals"] service = distributions["service_times"] satisfaction = distributions["satisfaction_scores"] print("Created distributions:") for name, dist in distributions.items(): print(f"- {name}: {dist}") # Generate samples from all distributions for name, dist in distributions.items(): print(f"{name} samples: {dist.sample()}") ``` ```{python} # List-based configuration (unnamed distributions) config_list = [ {"class_name": "Exponential", "params": {"mean": 2.0}}, {"class_name": "Uniform", "params": {"low": 0.0, "high": 1.0}}, ] # Create distributions from the list dist_list = DistributionRegistry.create_batch(config_list, main_seed=54321) # Access by index print("\nList-based distributions:") for i, dist in enumerate(dist_list): print(f"Distribution {i}: {dist}") print(f"Sample: {dist.sample()}") ``` ## 3. Working with JSON Configurations JSON is a natural format for storing distribution configurations for your model. First we will create an example JSON file. ```{python} # Example JSON configuration json_config = """ { "simulation_parameters": { "customer_arrivals": { "class_name": "Exponential", "params": {"mean": 5.0} }, "checkout_times": { "class_name": "Lognormal", "params": {"mean": 3.5, "stdev": 0.8} }, "browse_duration": { "class_name": "Triangular", "params": {"low": 2.0, "high": 45.0, "mode": 15.0} }, "purchase_amount": { "class_name": "Normal", "params": {"mean": 75.0, "sigma": 2.0} } } } """ # Parse the JSON config_data = json.loads(json_config) # write JSON to file with open("example_sim_config.json", "w", encoding="utf-8") as f: json.dump(json_config, f, indent=4) ``` Let's assume we want to run out model with the distributions specified in the JSON file. We need to 1. Load and parse the JSON file 2. Extract the simulation parameters 3. Batch create the distributions ```{python} # read in file with open("example_sim_config.json", "r", encoding="utf-8") as f: config = json.load(f) # Extract the distributions configuration distributions_config = config_data["simulation_parameters"] # Create all distributions with reproducible seed distributions = DistributionRegistry.create_batch(distributions_config, main_seed=42) # Now we can use these distributions in a simulation print("JSON-configured distributions:") for name, dist in distributions.items(): print(f"- {name}: {dist}") print(f" Sample: {dist.sample()}") ``` ### Using the built in JSON template `DistributionRegistry` also contains a built in template to enable quick setup of the distributions you want to use in your model. Access it via the `get_template()` function. Set `format` to `"json"`. > Note: The template contains all distributions in the registry with example parameters. An example of each type is included once. You will need to modify it before passing to `batch_create`. ```{python} # Generate a JSON template template_json = DistributionRegistry.get_template(format="json") print("\nJSON Template (first 200 characters):") print(template_json[:200] + "...") ``` ```{.python} # You can save the template to a file for reference with open("distribution_template.json", "w", encoding="utf-8") as f: f.write(template_json) ``` ## 4. Define and add your own distribution to the registry If `sim_tools` does not contain the distribution you need you can still make use of the `DistributionRegistry` by registering your own custom class. Instances of the class can then be created in the same way as standard `sim_tools` distributions. ```{python} # Define and register common probability distributions @DistributionRegistry.register() class CustomDistribution: """My own custom distribution""" def __init__(self, param1, param2, random_seed=None): """Initialise""" self.param1 = param1 self.param2 = param2 self.rng = np.random.default_rng(random_seed) def __repr__(self): """Print parameters.""" return f"CustomDistribution({self.param1}, {self.param2})" def sample(self): """Sample - replace with any sampling mechanism.""" return 1.0 ``` ```{python} # Dictionary-based configuration (named distributions) config_dict = { "customer_arrivals": {"class_name": "Exponential", "params": {"mean": 4.5}}, "service_times": { "class_name": "CustomDistribution", "params": {"param1": 10.0, "param2": 2.0}, }, "treatment_times": { "class_name": "CustomDistribution", "params": {"param1": 25.0, "param2": 3.0}, }, } # Create all distributions with a master seed distributions = DistributionRegistry.create_batch(config_dict, main_seed=12345) print("Created distributions:") for name, dist in distributions.items(): print(f"- {name}: {dist}") # Generate samples from all distributions for name, dist in distributions.items(): print(f"{name} samples: {dist.sample()}") ``` ## 5. `sort` argument in `DistributionRegistry.create_batch()` The `sort` argument controls whether named distributions (when a config dictionary is used) are ordered by key before assigning random seeds. If `sort=True` (default), distributions always get the same seeds as long as their names/definitions are unchanged, regardless of insertion order - improving reproducibility when config order is unstable (e.g., from JSON files). If `sort=False`, seeds are based directly on the config's item order—this can result in different streams if dictionary ordering changes. **Example:** ```{python} config = { "B": {"class_name": "Exponential", "params": {"mean": 2.0}}, "A": {"class_name": "Normal", "params": {"mean": 10.0, "sigma": 1.0}}, } # Example with sort=True (default) dists_sorted = DistributionRegistry.create_batch(config, main_seed=42, sort=True) print("Sort=True: ", dists_sorted) # Example with sort=False (order depends on dict) dists_unsorted = DistributionRegistry.create_batch(config, main_seed=42, sort=False) print("Sort=False: ", dists_unsorted) ``` **Why use `sort=True`?** * **Ensures reproducibility**: named distributions always get the same random seeds, regardless of how the configuration dictionary is ordered. * **Avoids subtle, hard-to-diagnose bugs**: Without sorting, changing the order of keys (e.g., by editing a JSON file) can silently change the random seeds, and thus your simulation results, unless you use `sort=True`. ### Empirial Distributions In many applications of simulation, theoretical distributons may not be a good fit to the data; for example, if the data are bimodal. A number of empirical distribution options are provided by `sim-tools` to support modelling in these situations * `RawContinuousEmpirical` * `RawDiscreteEmpirical` * `GroupedContinuousEmpirical` * `DiscreteEmpirical` ```{python} # pylint: disable=missing-module-docstring,invalid-name import math import numpy as np import pandas as pd import plotly.graph_objects as go from plotly.subplots import make_subplots from sim_tools.distributions import ( RawContinuousEmpirical, GroupedContinuousEmpirical, RawDiscreteEmpirical, DistributionRegistry, ) ``` ## 1. Sampling from the raw data with interpolation If a user has access to the raw sample data they can pass this to `RawContinuousEmpirical`. The class implements the method described in Law and Kelton's *Simulation Modeling and Analysis*. The approach creates a piecewise linear model where interpolation is used for gaps between data samples. The maximum and minimum values of the sample are the bounds of the distribution. ### 1.1 A simple example The example below illustrates the difference between the standard Empirical Cumulative Distribution Function (a stepped function) and Law and Kelton's method of linear interpolation between points that is used in the sampling. ```{python} # Generate a simple dataset with 10 values representing length of stay in days # Values chosen to be spread out to clearly show linear lines between points simple_los = np.array([1.0, 3.5, 5.0, 7.2, 10.0, 12.5, 15.0, 18.0, 22.0, 30.0]) simple_dist = RawContinuousEmpirical(simple_los) # plot the standard ECDF fig = simple_dist.plotly_ecdf_standard(showlegend=False) # Display the plot fig.show() ``` ```{python} # Create the empirical distribution simple_dist = RawContinuousEmpirical(data=simple_los, random_seed=42) # plot the linear interpolation fig = simple_dist.plotly_ecdf_linear_interpolation() # Display the plot fig.show() ``` ### 1.2 More complex data We will generate four datasets to illustrate the distributon in use. One is approx uniform , one right skewed (exponential), one bimodal (a mix of two normal distributions) and finally one with outliers. ```{python} # Create a dictionary of distribution configurations for batch creation dist_configs = { # 1. Basic uniform distribution for Hospital A "uniform_dist": {"class_name": "Uniform", "params": {"low": 0.5, "high": 45.5}}, # 2. Exponential distribution for Hospital B "exponential_dist": {"class_name": "Exponential", "params": {"mean": 5}}, # 3. Normal distributions for bimodal Hospital C "normal_short": {"class_name": "Normal", "params": {"mean": 3, "sigma": 1}}, "normal_long": {"class_name": "Normal", "params": {"mean": 20, "sigma": 6}}, # 4. Distributions for the realistic Hospital D "lognormal_dist": { "class_name": "Lognormal", "params": {"mean": 5.4, "stdev": 3.5}, }, "uniform_outliers": {"class_name": "Uniform", "params": {"low": 30, "high": 45}}, } # Create distributions using the DistributionRegistry create_batch method # Setting only the main_seed parameter - the class handles individual seeds # automatically distributions = DistributionRegistry.create_batch(dist_configs, main_seed=42) # Sample from distributions to create our hospital length of stay data # 1. Hospital A - uniform distribution los_uniform = distributions["uniform_dist"].sample(100) # 2. Hospital B - exponential distribution with offset and clipping los_skewed = distributions["exponential_dist"].sample(100) + 0.5 los_skewed = np.clip(los_skewed, 0.5, 45.5) # 3. Hospital C - bimodal distribution (combining two normal distributions) los_bimodal = np.concatenate( [ # Short stays centered around 3 days distributions["normal_short"].sample(70), # Longer stays centered around 20 days distributions["normal_long"].sample(30), ] ) los_bimodal = np.clip(los_bimodal, 0.5, 45.5) # Clip to our desired range # 4. Hospital D - realistic distribution with outliers los_realistic = np.concatenate( [ # Main distribution distributions["lognormal_dist"].sample(95), # Few outliers with very long stays distributions["uniform_outliers"].sample(5), ] ) los_realistic = np.clip(los_realistic, 0.5, 45.5) # Clip to our desired range # Create a DataFrame to organize our data hospital_data = pd.DataFrame( { "Hospital A": los_uniform, "Hospital B": los_skewed, "Hospital C": los_bimodal, "Hospital D": los_realistic, } ) # A summary of the synthetic dataset hospital_data.describe() ``` The function `create_combined_plotly_figure` is included as a helper to merge four plots of the hospital datasets together. ```{python} # pylint: disable=too-many-arguments,too-many-positional-arguments,too-many-locals def create_combined_plotly_figure( distributions_dict, plot_method_name="plotly_ecdf_linear_interpolation", cols=2, plot_args=None, # Dictionary to pass arguments to the plot method subplot_title_prefix="", main_title="Combined Distribution Plots", share_xaxes=False, share_yaxes=False, layout_options=None, # Dictionary for fig.update_layout ) -> go.Figure: """ Combines multiple Plotly figures from a dictionary of distributions into a single figure with subplots. Parameters ---------- distributions_dict : dict A dictionary where keys are names (e.g., 'Hospital A') and values are instances of RawContinuousEmpirical (or similar). plot_method_name : str, default="plot_ecdf_linear_interpolation_plotly" The name of the method on the distribution object that returns a Plotly figure (e.g., 'plotly_ecdf_standard_plot'). cols : int, default=2 Number of columns in the subplot grid. Rows are calculated automatically. plot_args : dict, optional Keyword arguments to pass to the individual plotting method specified by plot_method_name. subplot_title_prefix : str, default="" A prefix to add before the distribution name in subplot titles. main_title : str, default="Combined Distribution Plots" The main title for the combined figure. share_xaxes : bool, default=False Whether to share x-axes across subplots [2]. share_yaxes : bool, default=False Whether to share y-axes across subplots [2]. layout_options : dict, optional Additional layout options to apply to the final figure using fig.update_layout(). Returns ------- plotly.graph_objects.Figure The combined Plotly figure with subplots. """ if not distributions_dict: return go.Figure().update_layout(title_text="No distributions provided") if plot_args is None: plot_args = {} dist_names = list(distributions_dict.keys()) num_plots = len(dist_names) rows = math.ceil(num_plots / cols) subplot_titles = [f"{subplot_title_prefix}{name}" for name in dist_names] # Create the subplot figure structure [2, 3] fig = make_subplots( rows=rows, cols=cols, subplot_titles=subplot_titles, shared_xaxes=share_xaxes, shared_yaxes=share_yaxes, ) # Iterate through distributions, generate plots, and add traces [1, 4] for i, name in enumerate(dist_names): distribution = distributions_dict[name] plot_method = getattr(distribution, plot_method_name) # Generate the individual figure for this distribution # Pass any specific plot arguments individual_fig = plot_method(**plot_args) # Calculate subplot position (1-based index) current_row = i // cols + 1 current_col = i % cols + 1 # Add each trace from the individual figure to the main subplot # figure [1, 5] for trace in individual_fig.data: # Preserve legend grouping if the individual plot used it trace.legendgroup = name # Optionally hide duplicate legend entries if desired, # though Plotly often handles this with legendgroup if i > 0: # Keep legend status consistent trace.showlegend = plot_args.get("showlegend", True) fig.add_trace(trace, row=current_row, col=current_col) # Update overall layout fig.update_layout(title_text=main_title) # Hide duplicate legends if traces weren't automatically grouped well # fig.update_layout(showlegend=True) # Or False if too cluttered if layout_options: fig.update_layout(layout_options) return fig ``` ```{python} # create the datasets distributions = {} for hospital in hospital_data.columns: distributions[hospital] = RawContinuousEmpirical( data=hospital_data[hospital], random_seed=42 ) # plot the linear interpolation function plot_args = {"showlegend": False} combined_fig = create_combined_plotly_figure( distributions, plot_method_name="plotly_ecdf_linear_interpolation", plot_args=plot_args, cols=2, main_title="Linear Interpolation ECDFs", ) combined_fig.show() ``` Lastly, we can check the sampling is working as expected using the function `plotly_original_versus_sampled`. This will compare the original raw data with the samples in a histogram. Change the hospital name to see a different chart. ```{python} def plotly_original_versus_sampled(sample_dict, original_data): """ Creates a Plotly chart with a dropdown to switch between datasets in sample_dict. Parameters ---------- sample_dict : dict Dictionary where keys are dataset names and values are the data. original_data : dict Dictionary containing the original data. Same keys as sample_dict. Returns ------- plotly.graph_objects.Figure A Plotly figure with the original data and a dropdown menu. """ # Create a figure with subplots fig = make_subplots( rows=1, cols=2, subplot_titles=("Original Data", "Sampled Data") ) first_key = next(iter(sample_dict)) # Add original data histogram fig.add_trace( go.Histogram( x=original_data[first_key], nbinsx=20, name="Original", histnorm="probability density", ), row=1, col=1, ) # Add initial sampled data histogram (first key in the dictionary) fig.add_trace( go.Histogram( x=sample_dict[first_key], nbinsx=20, name="Sampled", histnorm="probability density", ), row=1, col=2, ) # Create dropdown buttons dropdown_buttons = [] for key in sample_dict: n = len(sample_dict[key]) button = { "method": "update", "label": key, "args": [ {"x": [original_data[key], sample_dict[key]]}, {"title": f"{key}: Original vs Sampled Data ({key}; n = {n})"}, ], } dropdown_buttons.append(button) # Update layout with dropdown menu fig.update_layout( updatemenus=[ { "active": 0, "buttons": dropdown_buttons, "x": 1.0, # Center horizontally "xanchor": "right", # Anchor point is the center "y": 1.1, # Position vertically (adjust as needed) "yanchor": "top", } ], # Anchor point is the top title_text=f"{first_key}: Original vs Sampled Data ({first_key})", xaxis_title_text="Length of Stay (Days)", yaxis_title_text="Relative Freq", height=500, width=900, ) return fig ``` ```{python} # Generate samples from each distribution samples = {} for hospital in hospital_data.columns: samples[hospital] = distributions[hospital].sample(size=1000) fig = plotly_original_versus_sampled(samples, hospital_data) fig.show() ``` ## 2. Sampling from the Raw data directly If the raw data is available then an alternative to Law and Kelton's interpolation method israw data directly using `RawDiscreteEmpirical`. ```{python} distributions = {} samples = {} for hospital in hospital_data.columns: distributions[hospital] = RawDiscreteEmpirical(hospital_data[hospital]) samples[hospital] = distributions[hospital].sample(size=1000) fig = plotly_original_versus_sampled(samples, hospital_data) fig.show() ``` ## 3. Sampling from grouped empirical data In some circumstances a modeller may need to sample from simpler grouped data. For example, if the raw data are not available (e.g. if taken from an academic research paper) or if the dataset is large and causes memory issues or a bottleneck in model run time. `GroupedContinuousEmpirical` implements the interpolation method described in Law and Kelton's *Simulation Modeling and Analysis*. We need to provide the data as k adjacent intervals $$[a_0, a1), [a_1, a_2), \dots, [a_{k-1}, a_k)$$ In practice we provide `GroupedContinuousEmpirical` with two arrays: `lower_bounds` and `upper_bounds`. Note that the intervals do no need to be equally spaced. A third array `freq` contains the number of observations in each interval. ```{python} grp_dist = GroupedContinuousEmpirical( lower_bounds=[0, 5, 10, 15, 30, 45, 60, 120, 180, 240, 480], upper_bounds=[5, 10, 15, 30, 45, 60, 120, 180, 240, 480, 2880], freq=[34, 4, 8, 13, 15, 13, 19, 13, 9, 12, 73], random_seed=42, ) # sample will interpolate across and within each group. grp_dist.sample() ``` ### Time dependent arrivals via thinning This notebook provides an overview of how to use the `time_dependent.NSPPThinning` class. Thinning is an acceptance-rejection approach to sampling inter-arrival times (IAT) from a time dependent distribution where each time period follows its own exponential distribution. There are two random variables employed in sampling: an exponential distribution (used to sample IAT) and a uniform distibution (used to accept/reject samples). All IATs are sampled from an Exponential distribution with the highest arrival rate (most frequent). These arrivals are then rejected (thinned) proportional to the ratio of the current arrival rate to the maximum arrival rate. The algorithm executes until a sample is accepted. The IAT returned is the sum of all the IATs that were sampled. ```{note} As of v0.9.0 you only need to provide an arrival data set containing two columns * `t`: the time period e.g. 0, 60, 120 minutes or 0, 1, 2 hours * `mean_iat`: the **mean interval arrival time (IAT)** for the time period. ``` ## The thinning algorithm A NSPP has arrival rate $\lambda(t)$ where $0 \leq t \leq T$ Here $i$ is the arrival number and $\mathcal{T_i}$ is its arrival time. 1. Let $\lambda^* = \max_{0 \leq t \leq T}\lambda(t)$ be the maximum of the arrival rate function and set $t = 0$ and $i=1$ 2. Generate $e$ from the exponential distribution with rate $\lambda^*$ and let $t = t + e$ (this is the time of the next entity will arrive) 3. Generate $u$ from the $U(0,1)$ distribution. If $u \leq \dfrac{\lambda(t)}{\lambda^*}$ then $\mathcal{T_i} =t$ and $i = i + 1$ 4. Go to Step 2. ## `sim-tools` imports ```{python} # pylint: disable=missing-module-docstring,invalid-name import numpy as np import matplotlib.pyplot as plt from sim_tools.datasets import load_banks_et_al_nspp from sim_tools.time_dependent import NSPPThinning, nspp_plot, nspp_simulation ``` ```{python} # ggplot style for plots plt.style.use("ggplot") ``` ## Example from Banks et al. We will illustrate the use of `NSPPThinning` using an example from Banks et al. The table below breaks an arrival process down into 60 minutes intervals. | t(min) | Mean time between arrivals (min) | Arrival Rate $\lambda(t)$ (arrivals/min) | |:------:|:--------------------------------:|:--------------------------------------:| | 0 | 15 | 1/15 | | 60 | 12 | 1/12 | | 120 | 7 | 1/7 | | 180 | 5 | 1/5 | | 240 | 8 | 1/8 | | 300 | 10 | 1/10 | | 360 | 15 | 1/15 | | 420 | 20 | 1/20 | | 480 | 20 | 1/20 | > **Interpretation**: In the table above the fastest arrival rate is 1/5 customers per minute or 5 minutes between customer arrivals. ```{python} data = load_banks_et_al_nspp() print(data) ``` The `NSPPThinning` class does not require the `arrival_rate` column. Internally it assume that intervals are equal and uses the 1st and 2nd rows of the `t` column to calculate width (override with the `interval_width` parameter). As the IAT is the inverse of the rate it thins the arrivals using $$ u \leq \dfrac{IAT_{min}}{IAT(t)}$$ ```{python} # create arrivals and set random number seeds seeds = np.random.SeedSequence(42).spawn(2) arrivals = NSPPThinning(data=data, random_seed1=seeds[0], random_seed2=seeds[1]) print(arrivals) ``` ```{python} # number of arrivals to simulate n_arrivals = 15 # run simulation simulation_time = 0.0 for _ in range(n_arrivals): iat = arrivals.sample(simulation_time) simulation_time += iat print(f"{simulation_time:.2f} Patient arrival (IAT={iat:.2f})") ``` Alternatively you could set the interval width manually ```{python} # create arrivals and set random number seeds seeds = np.random.SeedSequence(42).spawn(2) arrivals = NSPPThinning( data=data, interval_width=60, random_seed1=seeds[0], random_seed2=seeds[1] ) print(arrivals) ``` ## Check NSPP is working as expected As part of model testing you can check that the NSPP for arrivals has been setup correctly by using `nspp_plot`. This runs multiple replications of the NSPP and returns a plot of the number of arrivals per interval. The profile can be used as a sense check. If you prefer to have the data in tabular form you can use `nspp_simulation` that returns a `pd.DataFrame` containing multiple replications of the number of arrivals per interval. ### NSPP plot ```{python} fig, ax = nspp_plot(data, run_length=540) ``` ### Tabular NSPP data The function `nspp_simulation` returns a `pd.DataFrame`. By default 1000 replications of the NSPP are run. The run length can be specified, but if omitted it defaults to the last interval + interval width in the arrival profile. In the Banks et al. example we are given the rate per minute of arrivals. We therefore multiply this by 60 get the expected number of arrivals in an hour. We compare this to the actual simulated. ```{python} nspp_replications = nspp_simulation(data, n_reps=1000) # peak at data nspp_replications.head() ``` ```{python} # re-label columns nspp_replications.columns = data["t"].values ## banks data converted from arrivasl per min to arrivals per hour data["expected_mean_arrivals_hr"] = data["arrival_rate"] * 60.0 data["simulated_mean_arrivals_hr"] = nspp_replications.mean(axis=0).values data.round(1) ``` ### A simple trace For debugging code based models such as those using `simpy` it is sometimes useful to do informal debugging by printing events as the simulation executes. Python's built in `print` function can be used. Alternatively `sim-tools.trace` can be used. For small models the `trace` module offers the function `trace()`. It provides easy to use functionality with defaults. This notebooks illustrates simple tracing of simulation using some small models. ## 1. Imports ```{python} # pylint: disable=missing-module-docstring,invalid-name import itertools from typing import Optional import simpy from sim_tools.trace import trace from sim_tools.distributions import Exponential ``` ## 2. A trace using default colouring In this first example we simulate patient arrivals to a treatment facility. Arrivals are at random with a mean inter-arrival time of 5 minutes. Rather than using python's print command we instead make use of `trace`. We pass the following arguments: 1. `time`: the current simulation time 2. `debug`: if we toggle to `False` this hides the trace or `True` to show (default = False) 3. `msg`: the string message to display. This can include emoji 4. `process_id`: an optional string to identify the process. Ideally this should be unique to aid debugging. ```{python} def patient_generator( env: simpy.Environment, dist: Exponential, debug: Optional[bool] = False ): """Generate patient arrivals to the treatment clinic""" for patient_count in itertools.count(1): # sample inter-arrival time iat = dist.sample() yield env.timeout(iat) trace(time=env.now, debug=debug, msg="new arrival 🤒", identifier=patient_count) ``` ```{python} # script to run model DEBUG = True SEED = 42 arrival_dist = Exponential(5.0, random_seed=SEED) env = simpy.Environment() env.process(patient_generator(env, arrival_dist, DEBUG)) env.run(50.0) ``` ## 3. Configure colouring of output In `trace` the `config` parameter is a user settable dictionary object. It can be used to change the colour of text in the trace. We will first show a simple demonstration of setting options. and then use to illustrate how it is useful in practice. The default`config` is ```{.python} config = { "class":None, "class_colour":"bold blue", "time_colour":'bold blue', "time_dp":2, "message_colour":'black', "tracked":None } ``` * `class`: a string representing the class or type of trace event occuring. This could be a process type for example, "patient", "stroke patient" or "arrival" or "treatment". * `class_colour`: choose a colour to display the class name e.g. "green" or "bold green" * `time_colour`: choose a colour to display the time * `time_dp`: choose the number of decimal places for time (default=2) * `message_color`: colour of the message text * `tracked`: a list containing identifiers (e.g. `[1, 2, 25]`) that limits what is tracked. Works with `identiifier` parameters of `trace` > Note: you do not need to set all of the parameters. Just set what you need and the defaults will be used for other parameters. ```{python} def get_config(): """Returns a custom trace configuration""" config = { "class": "Arrival", "class_colour": "green", "time_colour": "bold black", "message_colour": "red", } return config # pylint: disable=function-redefined def patient_generator( env: simpy.Environment, dist: Exponential, debug: Optional[bool] = False ): """Generate patient arrivals to the treatment clinic""" for patient_count in itertools.count(1): # sample inter-arrival time iat = dist.sample() yield env.timeout(iat) trace( time=env.now, debug=debug, msg="new patient 🤒", identifier=patient_count, config=get_config(), ) ``` ```{python} # script to run model DEBUG = True SEED = 42 arrival_dist = Exponential(5.0, random_seed=SEED) env = simpy.Environment() env.process(patient_generator(env, arrival_dist, DEBUG)) env.run(50.0) ``` ## 4. Configuring multiple processes/ activities We modify the `simpy` example to include a `treatment` process. We wish for treatment events to be easily seen in the trace output. We therefore modify `get_config()` as follows: ```{python} def get_config(): """Returns a custom trace configuration for arrivals""" config = { "arrivals": { "class": "Patient", "class_colour": "green", "time_colour": "bold black", "message_colour": "green", }, "treatment": { "class": "Patient", "class_colour": "magenta", "time_colour": "bold black", "message_colour": "magenta", }, } return config ``` The function now returns a nested dictionary containing trace configurations for both arrivals and treatment processes. The modified `simpy` model is below. We have included an `Experiment` class to hold our simulation input parameters. Note this has also been used to store the debug toggle. This reduces the number of arguments each function requires. When we run the new model is it simple to distinguish between the trace for arrivals and the treatment. > Note: this could be further fine tuned by manipulating which processes have `debug=True` ```{python} # pylint: disable=too-few-public-methods class Experiment: """Parameter container""" def __init__( self, mean_iat: float, mean_service: float, debug: Optional[bool] = False ): self.arrival_dist = Exponential(mean_iat, random_seed=42) self.treat_dist = Exponential(mean_service, random_seed=101) self.debug = debug def treatment(env: simpy.Environment, exp: Experiment, patient_id: int): """Simulate a uncapacitated treatment process""" trace( time=env.now, debug=exp.debug, msg="enter treatment 👩🏻⚕️", identifier=patient_id, config=get_config()["treatment"], ) # delay service_time = exp.treat_dist.sample() yield env.timeout(service_time) trace( time=env.now, debug=exp.debug, msg="treatment complete ⛔", identifier=patient_id, config=get_config()["treatment"], ) # pylint: disable=function-redefined def patient_generator(env: simpy.Environment, exp: Experiment): """Generate patient arrivals to the treatment clinic""" for patient_count in itertools.count(1): # sample inter-arrival time iat = exp.arrival_dist.sample() yield env.timeout(iat) trace( time=env.now, debug=exp.debug, msg="new arrival 🤒", identifier=patient_count, config=get_config()["arrivals"], ) env.process(treatment(env, exp, patient_count)) ``` ```{python} # script to run model DEBUG = True experiment = Experiment(5.0, 10.0, DEBUG) env = simpy.Environment() env.process(patient_generator(env, experiment)) env.run(40.0) ``` ## Tracking Assume we wish to track patients 4 and 5. The `config['tracking']` option accepts a list of identifiers and will limit trace output. For simple models it is best to add this as a parameter in the Experiment class. The script below declares the variable `TRACKED=[4, 5]` > Note that when there are a lot of processes to track it may be become simplier to use the object orientated interface to trace functionality. ```{python} # pylint: disable=too-few-public-methods class Experiment: """Parameter container""" def __init__( self, mean_iat: float, mean_service: float, debug: Optional[bool] = False, tracked: Optional[list] = None, ): self.arrival_dist = Exponential(mean_iat, random_seed=42) self.treat_dist = Exponential(mean_service, random_seed=101) self.debug = debug self.tracked = tracked def get_config(exp: Experiment): """Returns a custom trace configuration for arrivals and treatment Modified to add tracking information to each config""" config = { "arrivals": { "class": "Patient", "class_colour": "green", "time_colour": "bold black", "message_colour": "green", "tracked": exp.tracked, }, "treatment": { "class": "Patient", "class_colour": "magenta", "time_colour": "bold black", "message_colour": "magenta", "tracked": exp.tracked, }, } return config def treatment(env: simpy.Environment, exp: Experiment, patient_id: int): """Simulate a uncapacitated treatment process""" trace( time=env.now, debug=exp.debug, msg="enter treatment 👩🏻⚕️", identifier=patient_id, config=get_config(exp)["treatment"], ) # delay service_time = exp.treat_dist.sample() yield env.timeout(service_time) trace( time=env.now, debug=exp.debug, msg="treatment complete ⛔", identifier=patient_id, config=get_config(exp)["treatment"], ) def patient_generator(env: simpy.Environment, exp: Experiment): """Generate patient arrivals to the treatment clinic""" for patient_count in itertools.count(1): # sample inter-arrival time iat = exp.arrival_dist.sample() yield env.timeout(iat) trace( time=env.now, debug=exp.debug, msg="new arrival 🤒", identifier=patient_count, config=get_config(exp)["arrivals"], ) env.process(treatment(env, exp, patient_count)) ``` ```{python} # script to run model DEBUG = True TRACKED = [4, 5] experiment = Experiment(5.0, 10.0, DEBUG, TRACKED) env = simpy.Environment() env.process(patient_generator(env, experiment)) env.run(50.0) ``` ## 1. Imports The model used to demonstrate trace functionality is implemented in `simpy` ```{python} # pylint: disable=missing-module-docstring,invalid-name,too-many-lines import itertools import numpy as np import pandas as pd import simpy import sim_tools from sim_tools.distributions import Exponential, Normal, Lognormal, Bernoulli from sim_tools.time_dependent import NSPPThinning from sim_tools.trace import Traceable ``` ```{python} print(f"simpy version: {simpy.__version__}") print(f"sim-tools version: {sim_tools.__version__}") ``` ## 2. Constants and defaults for modelling **as-is** ### 2.1 Distribution parameters ```{python} # sign-in/triage parameters DEFAULT_TRIAGE_MEAN = 3.0 # registration parameters DEFAULT_REG_MEAN = 5.0 DEFAULT_REG_VAR = 2.0 # examination parameters DEFAULT_EXAM_MEAN = 16.0 DEFAULT_EXAM_VAR = 3.0 DEFAULT_EXAM_MIN = 0.5 # trauma/stabilisation DEFAULT_TRAUMA_MEAN = 90.0 # Trauma treatment DEFAULT_TRAUMA_TREAT_MEAN = 30.0 DEFAULT_TRAUMA_TREAT_VAR = 4.0 # Non trauma treatment DEFAULT_NON_TRAUMA_TREAT_MEAN = 13.3 DEFAULT_NON_TRAUMA_TREAT_VAR = 2.0 # prob patient requires treatment given trauma DEFAULT_NON_TRAUMA_TREAT_P = 0.60 # proportion of patients triaged as trauma DEFAULT_PROB_TRAUMA = 0.12 ``` ### 2.2 Time dependent arrival rates data The data for arrival rates varies between clinic opening at 6am and closure at 12am. ```{python} # data are held in the Github repo and loaded from there. NSPP_PATH = ( "https://raw.githubusercontent.com/TomMonks/" + "open-science-for-sim/main/src/notebooks/01_foss_sim/data/ed_arrivals.csv" ) ``` ### 2.3 Resource counts > Inter count variables representing the number of resources at each activity in the processes. ```{python} DEFAULT_N_TRIAGE = 1 DEFAULT_N_REG = 1 DEFAULT_N_EXAM = 3 DEFAULT_N_TRAUMA = 2 # Non-trauma cubicles DEFAULT_N_CUBICLES_1 = 1 # trauma pathway cubicles DEFAULT_N_CUBICLES_2 = 1 ``` ### 2.4 Simulation model run settings ```{python} # default random number SET DEFAULT_RNG_SET = None N_STREAMS = 20 # default results collection period DEFAULT_RESULTS_COLLECTION_PERIOD = 60 * 19 # number of replications. DEFAULT_N_REPS = 5 # set to True to show a trace of the simulation model (best used in single run # mode) DEBUG = False # provide a list of integers that represent patient IDs. Only these patients # will be tracked TRACKED = None ``` ## 3. Model parameterisation For convienience a container class is used to hold the large number of model parameters. The `Scenario` class includes defaults these can easily be changed and at runtime to experiments with different designs. ```{python} def load_arrival_profile(profile_path): """ Load and transform an arrival profile from a CSV file. Parameters: ----------- profile_path : str Path to CSV file containing the arrival rate profile. It is expected to have a column called 'arrival_rate'. Returns: -------- pd.DataFrame A DataFrame with the following columns: - 't': Time. - 'arrival_rate': Original arrival rate from CSV. - 'mean_iat': Mean inter-arrival time. """ df = ( pd.read_csv(profile_path) .reset_index() .rename(columns={"index": "t"}) .assign(mean_iat=lambda x: 60.0 / x.arrival_rate, t=lambda x: x.t * 60.0) ) return df ``` ```{python} load_arrival_profile(NSPP_PATH) ``` ```{python} # pylint: disable=too-many-instance-attributes,too-many-arguments # pylint: disable=too-many-positional-arguments,too-many-locals class Scenario: """ Container class for scenario parameters/arguments Passed to a model and its process classes """ def __init__( self, random_number_set=DEFAULT_RNG_SET, n_triage=DEFAULT_N_TRIAGE, n_reg=DEFAULT_N_REG, n_exam=DEFAULT_N_EXAM, n_trauma=DEFAULT_N_TRAUMA, n_cubicles_1=DEFAULT_N_CUBICLES_1, n_cubicles_2=DEFAULT_N_CUBICLES_2, triage_mean=DEFAULT_TRIAGE_MEAN, reg_mean=DEFAULT_REG_MEAN, reg_var=DEFAULT_REG_VAR, exam_mean=DEFAULT_EXAM_MEAN, exam_var=DEFAULT_EXAM_VAR, exam_min=DEFAULT_EXAM_MIN, trauma_mean=DEFAULT_TRAUMA_MEAN, trauma_treat_mean=DEFAULT_TRAUMA_TREAT_MEAN, trauma_treat_var=DEFAULT_TRAUMA_TREAT_VAR, non_trauma_treat_mean=DEFAULT_NON_TRAUMA_TREAT_MEAN, non_trauma_treat_var=DEFAULT_NON_TRAUMA_TREAT_VAR, non_trauma_treat_p=DEFAULT_NON_TRAUMA_TREAT_P, prob_trauma=DEFAULT_PROB_TRAUMA, debug=DEBUG, tracked=TRACKED, ): """ Create a scenario to parameterise the simulation model Parameters: ----------- random_number_set: int, optional (default=DEFAULT_RNG_SET) Set to control the initial seeds of each stream of pseudo random numbers used in the model. n_triage: int The number of triage cubicles n_reg: int The number of registration clerks n_exam: int The number of examination rooms n_trauma: int The number of trauma bays for stabilisation n_cubicles_1: int The number of non-trauma treatment cubicles n_cubicles_2: int The number of trauma treatment cubicles triage_mean: float Mean duration of the triage distribution (Exponential) reg_mean: float Mean duration of the registration distribution (Lognormal) reg_var: float Variance of the registration distribution (Lognormal) exam_mean: float Mean of the examination distribution (Normal) exam_var: float Variance of the examination distribution (Normal) exam_min: float The minimum value that an examination can take (Truncated Normal) trauma_mean: float Mean of the trauma stabilisation distribution (Exponential) trauma_treat_mean: float Mean of the trauma cubicle treatment distribution (Lognormal) trauma_treat_var: float Variance of the trauma cubicle treatment distribution (Lognormal) non_trauma_treat_mean: float Mean of the non trauma treatment distribution non_trauma_treat_var: float Variance of the non trauma treatment distribution non_trauma_treat_p: float Probability non trauma patient requires treatment prob_trauma: float probability that a new arrival is a trauma patient. debug: int Set to true to display a trace of simulated events in the model Note this is best used in single_run mode. tracked: list List of integers that represent patient IDs. Used with debug. If debug is True then only display trace for specific patients. """ # sampling self.random_number_set = random_number_set # store parameters for sampling self.triage_mean = triage_mean self.reg_mean = reg_mean self.reg_var = reg_var self.exam_mean = exam_mean self.exam_var = exam_var self.exam_min = exam_min self.trauma_mean = trauma_mean self.trauma_treat_mean = trauma_treat_mean self.trauma_treat_var = trauma_treat_var self.non_trauma_treat_mean = non_trauma_treat_mean self.non_trauma_treat_var = non_trauma_treat_var self.non_trauma_treat_p = non_trauma_treat_p self.prob_trauma = prob_trauma self.init_sampling() # count of each type of resource self.init_resourse_counts( n_triage, n_reg, n_exam, n_trauma, n_cubicles_1, n_cubicles_2 ) # debug/trace information self.debug = debug self.tracked = tracked def set_random_no_set(self, random_number_set): """ Controls the random sampling Parameters: ---------- random_number_set: int Used to control the set of pseudo random numbers used by the distributions in the simulation. """ self.random_number_set = random_number_set self.init_sampling() # pylint: disable=too-many-arguments,too-many-positional-arguments def init_resourse_counts( self, n_triage, n_reg, n_exam, n_trauma, n_cubicles_1, n_cubicles_2 ): """ Init the counts of resources to default values... """ self.n_triage = n_triage self.n_reg = n_reg self.n_exam = n_exam self.n_trauma = n_trauma # non-trauma (1), trauma (2) treatment cubicles self.n_cubicles_1 = n_cubicles_1 self.n_cubicles_2 = n_cubicles_2 def init_sampling(self): """ Create the distributions used by the model and initialise the random seeds of each. """ # MODIFICATION. Better method for producing n non-overlapping streams seed_sequence = np.random.SeedSequence(self.random_number_set) # Generate n high quality child seeds self.seeds = seed_sequence.spawn(N_STREAMS) # create distributions # Triage duration self.triage_dist = Exponential(self.triage_mean, random_seed=self.seeds[0]) # Registration duration (non-trauma only) self.reg_dist = Lognormal( self.reg_mean, np.sqrt(self.reg_var), random_seed=self.seeds[1] ) # Evaluation (non-trauma only) self.exam_dist = Normal( self.exam_mean, np.sqrt(self.exam_var), minimum=self.exam_min, random_seed=self.seeds[2], ) # Trauma/stablisation duration (trauma only) self.trauma_dist = Exponential(self.trauma_mean, random_seed=self.seeds[3]) # Non-trauma treatment self.nt_treat_dist = Lognormal( self.non_trauma_treat_mean, np.sqrt(self.non_trauma_treat_var), random_seed=self.seeds[4], ) # treatment of trauma patients self.treat_dist = Lognormal( self.trauma_treat_mean, np.sqrt(self.trauma_treat_var), random_seed=self.seeds[5], ) # probability of non-trauma patient requiring treatment self.nt_p_treat_dist = Bernoulli( self.non_trauma_treat_p, random_seed=self.seeds[6] ) # probability of non-trauma versus trauma patient self.p_trauma_dist = Bernoulli(self.prob_trauma, random_seed=self.seeds[7]) # init sampling for non-stationary poisson process self.init_nspp() def init_nspp(self): """ Initialise the Nonhomogeneous Poisson Process (NSPP) for arrivals. """ # Load arrival profile self.arrivals = load_arrival_profile(NSPP_PATH) # Compuate maximum arrival rate (smallest time between arrivals) self.lambda_max = self.arrivals["arrival_rate"].max() # NSPP thinning distribution self.thinning_dist = NSPPThinning( self.arrivals, random_seed1=self.seeds[8], random_seed2=self.seeds[9] ) ``` ## 6. Patient Pathways Process Logic `simpy` uses a process based worldview. We can easily create whatever logic - simple or complex for the model. Here the process logic for trauma and non-trauma patients is seperated into two classes `TraumaPathway` and `NonTraumaPathway`. ```{python} class TraumaPathway(Traceable): """ Encapsulates the process a patient with severe injuries or illness. These patients are signed into the ED and triaged as having severe injuries or illness. Patients are stabilised in resus (trauma) and then sent to Treatment. Following treatment they are discharged. """ def __init__(self, identifier, env, args): """ Constructor method Params: ----- identifier: int a numeric identifier for the patient. env: simpy.Environment the simulation environment args: Scenario Container class for the simulation parameters """ # initialise Trace super().__init__(debug=args.debug) self.identifier = identifier self.env = env self.args = args # metrics self.arrival = -np.inf self.wait_triage = -np.inf self.wait_trauma = -np.inf self.wait_treat = -np.inf self.total_time = -np.inf self.triage_duration = -np.inf self.trauma_duration = -np.inf self.treat_duration = -np.inf def execute(self): """ simulates the major treatment process for a patient 1. request and wait for sign-in/triage 2. trauma 3. treatment """ # record the time of arrival and entered the triage queue self.arrival = self.env.now self.trauma_arrival() # request sign-in/triage with self.args.triage.request() as req: yield req # record the waiting time for triage self.wait_triage = self.env.now - self.arrival # triage begin event self.triage_begin() # sample triage duration. self.triage_duration = self.args.triage_dist.sample() yield self.env.timeout(self.triage_duration) # triage complete event self.triage_complete() # record the time that entered the trauma queue start_wait = self.env.now # request trauma room with self.args.trauma.request() as req: yield req # record the waiting time for trauma self.wait_trauma = self.env.now - start_wait # trauma begin event self.trauma_begin() # sample stablisation duration. self.trauma_duration = self.args.trauma_dist.sample() yield self.env.timeout(self.trauma_duration) # trauma complete event self.trauma_complete() # record the time that entered the treatment queue start_wait = self.env.now # request treatment cubicle with self.args.cubicle_2.request() as req: yield req # record the waiting time for trauma self.wait_treat = self.env.now - start_wait self.treatment_begin() # sample treatment duration. self.treat_duration = self.args.treat_dist.sample() yield self.env.timeout(self.treat_duration) self.treatment_complete() # total time in system self.total_time = self.env.now - self.arrival def trauma_arrival(self): """Trauma arrival event""" self.trace(self.env.now, "arrival at centre 🚑", self.identifier) def triage_begin(self): """Enter triage event""" self.trace( self.env.now, f"enter triage. Waiting time: {self.wait_triage:.3f}", self.identifier, ) def trauma_begin(self): """Enter trauma stabilisation""" self.trace( self.env.now, f"enter stabilisation. Waiting time: {self.wait_trauma:.3f}", self.identifier, ) def treatment_begin(self): """Enter trauma treatment post stabilisation""" self.trace( self.env.now, f"enter treatment. Waiting time: {self.wait_treat:.3f}", self.identifier, ) def triage_complete(self): """Triage complete event""" self.trace(self.env.now, "triage complete", self.identifier) def trauma_complete(self): """Patient stay in trauma is complete.""" self.trace(self.env.now, "stabilisation complete.", self.identifier) def treatment_complete(self): """Treatment complete event""" self.trace( self.env.now, f"patient {self.identifier} treatment complete; " f"waiting time was {self.wait_treat:.3f}", self.identifier, ) def _trace_config(self): """Override trace config""" config = { "name": "Trauma", "name_colour": "bold magenta", "time_colour": "bold magenta", "tracked": self.args.tracked, } return config ``` ```{python} class NonTraumaPathway(Traceable): """ Encapsulates the process a patient with minor injuries and illness. These patients are signed into the ED and triaged as having minor complaints and streamed to registration and then examination. Post examination 40% are discharged while 60% proceed to treatment. Following treatment they are discharged. """ def __init__(self, identifier, env, args): """ Constructor method Params: ----- identifier: int a numeric identifier for the patient. env: simpy.Environment the simulation environment args: Scenario Container class for the simulation parameters """ super().__init__(debug=args.debug) self.identifier = identifier self.env = env self.args = args # triage resource self.triage = args.triage # metrics self.arrival = -np.inf self.wait_triage = -np.inf self.wait_reg = -np.inf self.wait_exam = -np.inf self.wait_treat = -np.inf self.total_time = -np.inf self.triage_duration = -np.inf self.reg_duration = -np.inf self.exam_duration = -np.inf self.treat_duration = -np.inf def execute(self): """ simulates the non-trauma/minor treatment process for a patient 1. request and wait for sign-in/triage 2. patient registration 3. examination 4. split patients 4.1 % discharged 4.2 % treatment then discharge """ # record the time of arrival and entered the triage queue self.arrival = self.env.now # trace arrival self.trace(self.env.now, "arrival to centre.", self.identifier) # request sign-in/triage with self.triage.request() as req: yield req # record the waiting time for triage self.wait_triage = self.env.now - self.arrival # trace enter triage self.trace( self.env.now, f"entering triage. Waiting time: {self.wait_triage:.3f} mins", self.identifier, ) # sample triage duration. self.triage_duration = self.args.triage_dist.sample() yield self.env.timeout(self.triage_duration) # trace exit triage self.trace(self.env.now, "triage complete", self.identifier) # record the time that entered the registration queue start_wait = self.env.now # request registration clert with self.args.registration.request() as req: yield req # record the waiting time for registration self.wait_reg = self.env.now - start_wait # trace begin registration self.trace(self.env.now, "starting patient registration.", self.identifier) # sample registration duration. self.reg_duration = self.args.reg_dist.sample() yield self.env.timeout(self.reg_duration) # registration complete... self.trace( self.env.now, f"patient registered;waiting time was {self.wait_triage:.3f}", self.identifier, ) # record the time that entered the evaluation queue start_wait = self.env.now # request examination resource with self.args.exam.request() as req: yield req # record the waiting time for registration self.wait_exam = self.env.now - start_wait # trace begin examination self.trace( self.env.now, f"enter examination. Waiting time: {self.wait_exam:.3f}", self.identifier, ) # sample examination duration. self.exam_duration = self.args.exam_dist.sample() yield self.env.timeout(self.exam_duration) # trace exit examination self.trace(self.env.now, "examination complete.", self.identifier) # sample if patient requires treatment? self.require_treat = self.args.nt_p_treat_dist.sample() if self.require_treat: # record the time that entered the treatment queue start_wait = self.env.now # request treatment cubicle with self.args.cubicle_1.request() as req: yield req # record the waiting time for treatment self.wait_treat = self.env.now - start_wait # trace enter treatment self.trace( self.env.now, f"enter treatment. Waiting time:{self.wait_treat:.3f}", self.identifier, ) # sample treatment duration. self.treat_duration = self.args.nt_treat_dist.sample() yield self.env.timeout(self.treat_duration) # trace exit treatment self.trace(self.env.now, "treatment complete ⛔", self.identifier) # total time in system self.total_time = self.env.now - self.arrival def _trace_config(self): """Override trace config""" config = { "name": "Non-trauma", "name_colour": "bold green", "time_colour": "bold green", "message_colour": "black", "tracked": self.args.tracked, } return config ``` ## 7. Main model class The main class that a user interacts with to run the model is `TreatmentCentreModel`. This implements a `.run()` method, contains a simple algorithm for the non-stationary poission process for patients arrivals and inits instances of `TraumaPathway` or `NonTraumaPathway` depending on the arrival type. ```{python} class TreatmentCentreModel: """ The treatment centre model Patients arrive at random to a treatment centre, are triaged and then processed in either a trauma or non-trauma pathway. """ def __init__(self, args): self.env = simpy.Environment() self.args = args self.init_resources() self.patients = [] self.trauma_patients = [] self.non_trauma_patients = [] self.rc_period = None self.results = None def init_resources(self): """ Init the number of resources and store in the arguments container object Resource list: 1. Sign-in/triage bays 2. registration clerks 3. examination bays 4. trauma bays 5. non-trauma cubicles (1) 6. trauma cubicles (2) """ # sign/in triage self.args.triage = simpy.Resource(self.env, capacity=self.args.n_triage) # registration self.args.registration = simpy.Resource(self.env, capacity=self.args.n_reg) # examination self.args.exam = simpy.Resource(self.env, capacity=self.args.n_exam) # trauma self.args.trauma = simpy.Resource(self.env, capacity=self.args.n_trauma) # non-trauma treatment self.args.cubicle_1 = simpy.Resource(self.env, capacity=self.args.n_cubicles_1) # trauma treatment self.args.cubicle_2 = simpy.Resource(self.env, capacity=self.args.n_cubicles_2) def run(self, results_collection_period=DEFAULT_RESULTS_COLLECTION_PERIOD): """ Conduct a single run of the model in its current configuration. Parameters: ---------- results_collection_period, float, optional default = DEFAULT_RESULTS_COLLECTION_PERIOD warm_up, float, optional (default=0) length of initial transient period to truncate from results. Returns: -------- None """ # setup the arrival generator process self.env.process(self.arrivals_generator()) # store rc period self.rc_period = results_collection_period # run self.env.run(until=results_collection_period) def arrivals_generator(self): """ Simulate the arrival of patients to the model Patients either follow a TraumaPathway or NonTraumaPathway simpy process. Non stationary arrivals implemented via Thinning acceptance-rejection algorithm. """ for patient_count in itertools.count(): # sample iat via thinning interarrival_time = self.args.thinning_dist.sample(self.env.now) # delay until next arrival yield self.env.timeout(interarrival_time) # sample if the patient is trauma or non-trauma trauma = self.args.p_trauma_dist.sample() if trauma: # create and store a trauma patient to update KPIs. new_patient = TraumaPathway(patient_count, self.env, self.args) self.trauma_patients.append(new_patient) else: # create and store a non-trauma patient to update KPIs. new_patient = NonTraumaPathway(patient_count, self.env, self.args) self.non_trauma_patients.append(new_patient) # start the pathway process for the patient self.env.process(new_patient.execute()) ``` ### 8. Logic to process end of run results. the class `SimulationSummary` accepts a `TraumaCentreModel`. At the end of a run it can be used calculate mean queuing times and the percentage of the total run that a resource was in use. ```{python} class SimulationSummary: """ End of run result processing logic of the simulation model """ def __init__(self, model): """ Constructor Params: ------ model: TraumaCentreModel The model. """ self.model = model self.args = model.args self.results = None def process_run_results(self): """ Calculates statistics at end of run. """ self.results = {} # list of all patients patients = self.model.non_trauma_patients + self.model.trauma_patients # mean triage times (both types of patient) mean_triage_wait = self.get_mean_metric("wait_triage", patients) # triage utilisation (both types of patient) triage_util = self.get_resource_util( "triage_duration", self.args.n_triage, patients ) # mean waiting time for registration (non_trauma) mean_reg_wait = self.get_mean_metric("wait_reg", self.model.non_trauma_patients) # registration utilisation (trauma) reg_util = self.get_resource_util( "reg_duration", self.args.n_reg, self.model.non_trauma_patients ) # mean waiting time for examination (non_trauma) mean_wait_exam = self.get_mean_metric( "wait_exam", self.model.non_trauma_patients ) # examination utilisation (non-trauma) exam_util = self.get_resource_util( "exam_duration", self.args.n_exam, self.model.non_trauma_patients ) # mean waiting time for treatment (non-trauma) mean_treat_wait = self.get_mean_metric( "wait_treat", self.model.non_trauma_patients ) # treatment utilisation (non_trauma) treat_util1 = self.get_resource_util( "treat_duration", self.args.n_cubicles_1, self.model.non_trauma_patients ) # mean total time (non_trauma) mean_total = self.get_mean_metric("total_time", self.model.non_trauma_patients) # mean waiting time for trauma mean_trauma_wait = self.get_mean_metric( "wait_trauma", self.model.trauma_patients ) # trauma utilisation (trauma) trauma_util = self.get_resource_util( "trauma_duration", self.args.n_trauma, self.model.trauma_patients ) # mean waiting time for treatment (trauma) mean_treat_wait2 = self.get_mean_metric( "wait_treat", self.model.trauma_patients ) # treatment utilisation (trauma) treat_util2 = self.get_resource_util( "treat_duration", self.args.n_cubicles_2, self.model.trauma_patients ) # mean total time (trauma) mean_total2 = self.get_mean_metric("total_time", self.model.trauma_patients) self.results = { "00_arrivals": len(patients), "01a_triage_wait": mean_triage_wait, "01b_triage_util": triage_util, "02a_registration_wait": mean_reg_wait, "02b_registration_util": reg_util, "03a_examination_wait": mean_wait_exam, "03b_examination_util": exam_util, "04a_treatment_wait(non_trauma)": mean_treat_wait, "04b_treatment_util(non_trauma)": treat_util1, "05_total_time(non-trauma)": mean_total, "06a_trauma_wait": mean_trauma_wait, "06b_trauma_util": trauma_util, "07a_treatment_wait(trauma)": mean_treat_wait2, "07b_treatment_util(trauma)": treat_util2, "08_total_time(trauma)": mean_total2, "09_throughput": self.get_throughput(patients), } def get_mean_metric(self, metric, patients): """ Calculate mean of the performance measure for the select cohort of patients, Only calculates metrics for patients where it has been measured. Params: ------- metric: str The name of the metric e.g. 'wait_treat' patients: list A list of patients """ mean = np.array( [getattr(p, metric) for p in patients if getattr(p, metric) > -np.inf] ).mean() return mean def get_resource_util(self, metric, n_resources, patients): """ Calculate proportion of the results collection period where a resource was in use. Done by tracking the duration by patient. Only calculates metrics for patients where it has been measured. Params: ------- metric: str The name of the metric e.g. 'treatment_duration' patients: list A list of patients """ total = np.array( [getattr(p, metric) for p in patients if getattr(p, metric) > -np.inf] ).sum() return total / (self.model.rc_period * n_resources) def get_throughput(self, patients): """ Returns the total number of patients that have successfully been processed and discharged in the treatment centre (they have a total time record) Params: ------- patients: list list of all patient objects simulated. Returns: ------ float """ return len([p for p in patients if p.total_time > -np.inf]) def summary_frame(self): """ Returns run results as a pandas.DataFrame Returns: ------- pd.DataFrame """ # append to results df if self.results is None: self.process_run_results() df = pd.DataFrame({"1": self.results}) df = df.T df.index.name = "rep" return df ``` ## 9. Model execution We note that there are **many ways** to setup a `simpy` model and execute it (that is part of its fantastic flexibility). The organisation of code we show below is based on our experience of using the package in practice. The approach also allows for easy parallisation over multiple CPU cores using `joblib`. We include two functions. `single_run()` and `multiple_replications`. The latter is used to repeatedly call and process the results from `single_run`. ```{python} def single_run( scenario, rc_period=DEFAULT_RESULTS_COLLECTION_PERIOD, random_no_set=DEFAULT_RNG_SET ): """ Perform a single run of the model and return the results Parameters: ----------- scenario: Scenario object The scenario/paramaters to run rc_period: int The length of the simulation run that collects results random_no_set: int or None, optional (default=DEFAULT_RNG_SET) Controls the set of random seeds used by the stochastic parts of the model. Set to different ints to get different results. Set to None for a random set of seeds. Returns: -------- pandas.DataFrame: results from single run. """ # set random number set - this controls sampling for the run. scenario.set_random_no_set(random_no_set) # create an instance of the model model = TreatmentCentreModel(scenario) # run the model model.run(results_collection_period=rc_period) # run results summary = SimulationSummary(model) summary_df = summary.summary_frame() return summary_df ``` ```{python} def multiple_replications( scenario, rc_period=DEFAULT_RESULTS_COLLECTION_PERIOD, n_reps=5 ): """ Perform multiple replications of the model. Params: ------ scenario: Scenario Parameters/arguments to configurethe model rc_period: float, optional (default=DEFAULT_RESULTS_COLLECTION_PERIOD) results collection period. the number of minutes to run the model to collect results n_reps: int, optional (default=DEFAULT_N_REPS) Number of independent replications to run. Returns: -------- pandas.DataFrame """ results = [ single_run(scenario, rc_period, random_no_set=rep) for rep in range(n_reps) ] # format and return results in a dataframe df_results = pd.concat(results) df_results.index = np.arange(1, len(df_results) + 1) df_results.index.name = "rep" return df_results ``` ### 9.1 Single run of the model The script below performs a single replication of the simulation model. **Try:** * Changing the `random_no_set` of the `single_run` call. * Assigning the value `True` to `debug` * Tracking patient arrivals 10 and 19 through the model by setting `tracked=[10, 19]` ```{python} # create the default scenario. args = Scenario(debug=True, tracked=[10, 19]) # use the single_run() func # try changing `random_no_set` to see different run results print("Running simulation ...", end=" => ") results = single_run(args, random_no_set=42) print("simulation complete.") # show results (transpose replication for easier view) print(results.T) ``` ### 9.2 Disabling trace for multiple independent replications **Important**: set `debug=False` in `Scenario` for multiple replications. This will ensure that each replication is not traced. ```{python} args = Scenario(debug=False) # run multiple replications. results = multiple_replications(args, n_reps=50) print("All replications complete.") # summarise the results (2.dp) results.mean().round(2) ``` ### Using the example `treat-sim` model For examples in this section, we'll use the `treat-sim` simulation model. This is an example model of a terminating system based on exercise 13 from page 170 of: > Nelson. B.L. (2013). Foundations and methods of stochastic simulation. Springer. ## Installing the model The model is available on [GitHub](https://github.com/pythonhealthdatascience/stars-treat-sim), and can be installed from [PyPI](https://pypi.org/project/treat-sim/): ```bash pip install treat-sim ``` ```{python} # pylint: disable=missing-module-docstring from treat_sim.model import Scenario, single_run, multiple_replications ``` ## Running a single replication 1. **Create an instance of `Scenario`**. This contains all the parameters used in the model (e.g. resources and statistical distributions for arrivals and activities). 2. **Run the model once with `single_run()`**. This function is passed an instance of `Scenario` and an integer to set the random numbers used. This returns a `pandas.DataFrame` with all performance measures for that replication, e.g. waiting times, utilisation, throughput. ```{python} scenario = Scenario() # Run a single replication results = single_run(scenario, random_no_set=1) print(results.T) ``` You can extract a single metric like this: ```{python} print(results["01a_triage_wait"].iloc[0]) ``` ## Running multiple replications Running several replications at once is very similar to running a single replication — you just use the `multiple_replications()` function instead. You'll need to provide an instance of `Scenario`, along with the number of replications you want to run. ```{python} scenario = Scenario() # Run multiple replications results = multiple_replications(scenario, n_reps=20) results.head() ``` ### Choosing the number of replications When running a simulation, you need to decide how many replications (runs) are enough. The **confidence interval method** can be used to help guide this choice. **Note:** The examples below use the `treat-sim` model. If you haven't run it before, see [Using the example `treat-sim` model](01-treatsim.qmd) for set-up and basic usage. ## Imports ```{python} # pylint: disable=missing-module-docstring from treat_sim.model import Scenario, multiple_replications from sim_tools.output_analysis import ( confidence_interval_method, plotly_confidence_interval_method, ) ``` ## Confidence interval method In this method, you first **run the simulation** for a set number of replications. Due to stochasticity, each will produce slightly different averages for each performance metric. Once the runs are complete, you go step-by-step (for the first run, then the first two runs, the first three runs, and so on) calculating: * **Cumulative mean** * **Confidence interval** around that mean As the number of replications included increases, you'll typically see the interval narrows. The required number of replications is the point where you feel results are **stable** - i.e. that doing more replications is unlikely to change your conclusions in a meaningful way. You can decided this by setting a **desired precision**. Precision here means the **percentage deviation** of the confidence interval's half-width from the mean. For example, if the precision is set to `0.1`, it will identify the point where the half-width of the confidence interval is less than or equal to 10% of the mean. ## Example: Single performance metric The function returns a **tuple** consisting of: 1. The minimum number of replications to achieve the desired precision. 2. A detailed DataFrame of statistics for each stage. ```{python} scenario = Scenario() rep_results = multiple_replications(scenario, n_reps=150) confint_result = confidence_interval_method( replications=rep_results["01a_triage_wait"], desired_precision=0.1 ) # View results print(confint_result[0]) confint_result[1].head() ``` ## Visualise results You can plot how the confidence interval narrows as you add more runs using the `plotly_confidence_interval_method` function. ```{python} plotly_confidence_interval_method( n_reps=confint_result[0], conf_ints=confint_result[1], metric_name="01a_triage_wait" ) ``` ## Running on multiple performance metrics You can check several outcomes at once. Just pass multiple columns to `confidence_interval_method`. This will return a dictionary, with a tuple for each metric. ```{python} confint_multiple = confidence_interval_method( replications=rep_results[ ["01a_triage_wait", "01b_triage_util", "02a_registration_wait"] ], desired_precision=0.1, ) # View output dictionary keys print(confint_multiple.keys()) ``` ```{python} # View results from one of the metrics print(confint_multiple["02a_registration_wait"][0]) confint_multiple["02a_registration_wait"][1].head() ``` ### Automating the selection of the number of replications The `sim-tools` package includes an implementation of the "replications algorithm" from [Hoad, Robinson, & Davies (2010)](https://www.jstor.org/stable/40926090), used to automatically determine the number of simulation replications needed to achieve a user-specified confidence interval (CI) precision. It works by combining: * **The confidence interval method** (described on [the previous page](02-confint_method.qmd)), which estimates the required replications for a target CI half-width. * **A sequential look-ahead procedure** to check whether the desired CI precision is still met as additional replications are run. If you make use of this code in your work **please cite the `sim-tools` as well as the original authors article**: > Hoad, Robinson, & Davies (2010). Automated selection of the number of replications for a discrete-event simulation. *Journal of the Operational Research Society*. [https://www.jstor.org/stable/40926090](https://www.jstor.org/stable/40926090) **Note:** The examples below use the `treat-sim` model. If you haven't run it before, see [Using the example `treat-sim` model](01-treatsim.qmd) for set-up and basic usage. ## Imports ```{python} # pylint: disable=missing-module-docstring,invalid-name from collections.abc import Callable from typing import Optional from treat_sim.model import Scenario, single_run from sim_tools.output_analysis import ( ReplicationsAlgorithm, ReplicationsAlgorithmModelAdapter, ReplicationTabulizer, plotly_confidence_interval_method, ) ``` ## Replications Algorithm overview The look ahead component is governed by the formula below. A user specifies a look ahead period called $k_{limit}$. If $n \leq 100$ then a parameter $k_{limit}$ is used. Hoad et al.(2010) recommend a value of 5 based on extensive empirical work. If $n \geq 100$ then a fraction is returned. The notation $\lfloor$ and $\rfloor$ mean that the function must return an integer value (as it represents the number of replications to look ahead). $$ f(k_{limit}, n) = \left\lfloor \dfrac{k_{limit}}{100}\max(n, 100)\right\rfloor $$ In general $f(k_{limit}, n)$ means that as the number of replications increases so does the look ahead period. For example when you have run 50 replications a $k_{limit} = 5$ means you simulate an extra 5 replications. When you have run 800 replications, a $k_{limit} = 5$ means the number of extra replications you simulate is: $$ \left\lfloor \dfrac{5}{100} \times 800\right\rfloor = 40 $$ The paper provides a formal algorithm notation. Here we provide a pictorial representation of the general logic of the replications algorithm.  ## Observers The `ReplicationAlgorithm` requires an observer. This is an object whose job it is to watch each replication in your algorithm and record relevant statistics as the simulation runs. A suitable observer is supplied by default - `ReplicationTabulizer` - but you can also create your own, but it needs to follow this protocol: ```{.python} @runtime_checkable class AlgorithmObserver(Protocol): """ Protocol for observer classes used in ReplicationsAlgorithm. Classes implementing this protocol should provide a `dev` attribute to store observations, a method `update` to add new results, and a `summary_table` method to summarize the stored replication statistics. Attributes ---------- dev : List[Any] Collection of observed replication results. Methods ------- update(results) -> None Add an observation of a replication. summary_table() -> pd.DataFrame Create a DataFrame summarising all recorded replication statistics. """ dev: List[Any] def update(self, results) -> None: ... def summary_table(self) -> pd.DataFrame: ... ```