Your First Model¶
This guide walks through fitting an exponential distribution to data. Along the way, you will see how symlik handles the statistical computations.
The Setup¶
Suppose you have measurements of how long customers wait in a queue:
The exponential distribution is a natural model for waiting times. Its probability density is:
where \(\lambda\) is the rate parameter (higher means shorter waits).
Fitting the Model¶
from symlik.distributions import exponential
# Create the model
model = exponential()
# Fit to data
data = {'x': [2.1, 0.8, 1.5, 3.2, 1.1, 0.9, 2.4, 1.8]}
mle, iterations = model.mle(data=data, init={'lambda': 1.0})
print(f"Estimated rate: {mle['lambda']:.4f}")
print(f"Converged in {iterations} iterations")
The mle() method uses Newton-Raphson optimization with symbolically-computed derivatives.
Understanding the Output¶
The MLE for an exponential distribution is \(\hat\lambda = 1/\bar{x}\):
import numpy as np
sample_mean = np.mean(data['x'])
print(f"Sample mean: {sample_mean:.4f}")
print(f"1/mean: {1/sample_mean:.4f}")
print(f"MLE: {mle['lambda']:.4f}")
# These should match
Computing Standard Errors¶
Standard errors quantify uncertainty in your estimates:
The standard error comes from the observed Fisher information:
For the exponential distribution, this works out to \(\hat\lambda / \sqrt{n}\).
Confidence Intervals¶
Build a 95% confidence interval:
ci_lower = mle['lambda'] - 1.96 * se['lambda']
ci_upper = mle['lambda'] + 1.96 * se['lambda']
print(f"95% CI: ({ci_lower:.3f}, {ci_upper:.3f})")
What Happened Behind the Scenes¶
When you called model.mle(), symlik:
- Differentiated the log-likelihood symbolically to get the score function
- Differentiated again to get the Hessian matrix
- Iterated Newton-Raphson: \(\theta_{new} = \theta - H^{-1} \nabla \ell\)
- Stopped when the score was close to zero
You can inspect these symbolic quantities:
# The score function (should be zero at MLE)
score = model.score()
print(f"Score expression: {score}")
# Evaluate at MLE
score_at_mle = model.score_at({**data, **mle})
print(f"Score at MLE: {score_at_mle}") # Should be ~0
Adding Bounds¶
Some parameters have natural constraints. For rate parameters, we need \(\lambda > 0\):
Bounds keep the optimizer in valid parameter space.
Next Steps¶
- Learn the s-expression syntax to write your own models
- See Fitting Distributions for more distribution examples
- Build Custom Models for non-standard likelihoods