distributions.GroupedContinuousEmpirical
Continuous Empirical Distribution for Grouped Data implementation.
Usage
distributions.GroupedContinuousEmpirical()A distribution that performs linear interpolation between upper and lower bounds of a discrete distribution. Useful for modeling empirical data with a continuous approximation.
This class conforms to the Distribution protocol and provides methods to sample from a continuous empirical distribution.
Attributes
| Name | Description |
|---|---|
| mean | Calculate the theoretical mean of the distribution. |
| variance | Calculate the theoretical variance of the GroupedContinuousEmpirical |
mean
Calculate the theoretical mean of the distribution.
mean: float
variance
Calculate the theoretical variance of the GroupedContinuousEmpirical
variance: float
distribution.
The total variance is composed of two components: 1. Between-bin variance: Variance arising from the differences between bin midpoints 2. Within-bin variance: Additional variance from the linear interpolation within each bin
For a linear interpolation model, the within-bin component follows the variance formula for a uniform distribution: (bin_width)²/12 for each bin, weighted by the bin’s probability.
Notes
This calculation provides the exact theoretical variance of a continuous distribution created through linear interpolation between grouped data points. The formula accounts for both the positioning of the groups and the additional variance introduced by the interpolation process itself.
Simple variance calculations that only consider bin midpoints will underestimate the true variance of the interpolated distribution.
Example
dist = GroupedContinuousEmpirical([0, 1, 2], [1, 2, 3], [10, 20, 30]) dist.variance() 0.6388888888888888
Methods
| Name | Description |
|---|---|
| __init__() | Initialize a continuous empirical distribution. |
| create_cumulative_probs() | Calculate cumulative relative frequency from frequency. |
| sample() | Sample from the Continuous Empirical Distribution. |
__init__()
Initialize a continuous empirical distribution.
Usage
__init__(lower_bounds, upper_bounds, freq, random_seed=None)Parameters
lower_bounds: ArrayLike-
Lower bounds of a discrete empirical distribution.
upper_bounds: ArrayLike-
Upper bounds of a discrete empirical distribution.
freq: ArrayLike-
Frequency of observations between bounds.
random_seed: Optional[Union[int, SeedSequence]] = None- A random seed or SeedSequence to reproduce samples. If None, a unique sample sequence is generated.
create_cumulative_probs()
Calculate cumulative relative frequency from frequency.
Usage
create_cumulative_probs(freq)Parameters
freq: ArrayLike- Frequency distribution.
Returns
NDArray[np.float64]- Cumulative relative frequency.
sample()
Sample from the Continuous Empirical Distribution.
Usage
sample(size=None)Parameters
size: Optional[Union[int, Tuple[int, …]]] = None-
The number/shape of samples to generate:
- If None: returns a single sample as a float
- If int: returns a 1-D array with that many samples
- If tuple of ints: returns an array with that shape
Returns
Union[float, NDArray[np.float64]]-
Random samples from the continuous empirical distribution:
- A single float when size is None
- A numpy array of floats with shape determined by size parameter