Skip to content

Fitting Distributions

This tutorial covers fitting the pre-built distributions in symlik to data.

Exponential Distribution

Models waiting times, lifetimes, and inter-arrival times.

Density: \(f(x; \lambda) = \lambda e^{-\lambda x}\) for \(x \geq 0\)

MLE: \(\hat\lambda = 1/\bar{x}\)

from symlik.distributions import exponential
import numpy as np

# Simulate data with true rate = 2
np.random.seed(42)
data = {'x': np.random.exponential(scale=1/2, size=100).tolist()}

# Fit
model = exponential()
mle, _ = model.mle(data=data, init={'lambda': 1.0}, bounds={'lambda': (0.01, 10)})
se = model.se(mle, data)

print(f"True rate: 2.0")
print(f"Estimated: {mle['lambda']:.3f} (SE: {se['lambda']:.3f})")

Normal Distribution

The workhorse of statistics. symlik provides two versions.

Both Parameters Unknown

from symlik.distributions import normal

data = {'x': [4.2, 5.1, 4.8, 5.3, 4.9, 5.2, 4.7]}
model = normal()
mle, _ = model.mle(
    data=data,
    init={'mu': 0, 'sigma2': 1},
    bounds={'sigma2': (0.01, 100)}  # Variance must be positive
)
se = model.se(mle, data)

print(f"Mean: {mle['mu']:.3f} (SE: {se['mu']:.3f})")
print(f"Variance: {mle['sigma2']:.3f} (SE: {se['sigma2']:.3f})")

Known Variance

When variance is known (e.g., from prior studies), use normal_mean():

from symlik.distributions import normal_mean

data = {'x': [4.2, 5.1, 4.8, 5.3, 4.9]}
model = normal_mean(known_var=1.0)  # Assume sigma^2 = 1
mle, _ = model.mle(data=data, init={'mu': 0})

print(f"Mean: {mle['mu']:.3f}")

Poisson Distribution

Models count data: number of events in a fixed time/space.

Probability: \(P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}\)

MLE: \(\hat\lambda = \bar{x}\)

from symlik.distributions import poisson

# Number of customer arrivals per hour
data = {'x': [3, 5, 2, 4, 6, 3, 4, 5, 2, 4]}
model = poisson()
mle, _ = model.mle(data=data, init={'lambda': 1.0}, bounds={'lambda': (0.01, 100)})
se = model.se(mle, data)

print(f"Rate: {mle['lambda']:.3f} (SE: {se['lambda']:.3f})")
print(f"Sample mean: {np.mean(data['x']):.3f}")  # Should match

Bernoulli Distribution

Models binary outcomes: success/failure, yes/no.

MLE: \(\hat{p} = k/n\) where \(k\) is number of successes

from symlik.distributions import bernoulli

# Survey responses: 1 = yes, 0 = no
data = {'x': [1, 0, 1, 1, 0, 1, 0, 0, 1, 1]}
model = bernoulli()
mle, _ = model.mle(data=data, init={'p': 0.5}, bounds={'p': (0.01, 0.99)})
se = model.se(mle, data)

print(f"Proportion: {mle['p']:.3f} (SE: {se['p']:.3f})")

Gamma Distribution

Generalizes exponential. Models positive continuous data with varying shapes.

from symlik.distributions import gamma

# Simulate gamma data: shape=2, rate=1
np.random.seed(42)
data = {'x': np.random.gamma(shape=2, scale=1, size=200).tolist()}

model = gamma()
mle, _ = model.mle(
    data=data,
    init={'alpha': 1.0, 'beta': 1.0},
    bounds={'alpha': (0.1, 10), 'beta': (0.1, 10)}
)

print(f"Shape (alpha): {mle['alpha']:.3f}")
print(f"Rate (beta): {mle['beta']:.3f}")

Weibull Distribution

Models time-to-failure with varying hazard rates.

from symlik.distributions import weibull

# Simulate Weibull data
np.random.seed(42)
data = {'x': np.random.weibull(a=2, size=100).tolist()}

model = weibull()
mle, _ = model.mle(
    data=data,
    init={'k': 1.0, 'lambda': 1.0},
    bounds={'k': (0.1, 10), 'lambda': (0.1, 10)}
)

print(f"Shape (k): {mle['k']:.3f}")
print(f"Scale (lambda): {mle['lambda']:.3f}")

Beta Distribution

Models proportions and probabilities (data in \((0, 1)\)).

from symlik.distributions import beta

# Proportion data (e.g., percentage of time on task)
data = {'x': [0.3, 0.5, 0.4, 0.6, 0.35, 0.55, 0.45]}

model = beta()
mle, _ = model.mle(
    data=data,
    init={'alpha': 1.0, 'beta': 1.0},
    bounds={'alpha': (0.1, 10), 'beta': (0.1, 10)}
)

print(f"Alpha: {mle['alpha']:.3f}")
print(f"Beta: {mle['beta']:.3f}")

Tips for Fitting

Choose good starting values. Bad initial guesses can cause convergence issues. Use sample statistics as guides:

  • Exponential: start with 1/mean
  • Poisson: start with mean
  • Normal: start with sample mean and variance

Use appropriate bounds. Many parameters have natural constraints:

  • Rate/scale parameters: (0.01, upper)
  • Probabilities: (0.01, 0.99)
  • Shape parameters: (0.1, upper)

Check convergence. The MLE returns iteration count. If it hits max_iter, the optimizer may not have converged.

Next: Custom Models

When pre-built distributions are not enough, build your own models.