Custom Models¶
This tutorial shows how to build your own likelihood models from scratch.
The LikelihoodModel Class¶
To create a custom model, you need two things:
- A log-likelihood as an s-expression
- A list of parameter names
Example: Custom Exponential¶
Let's rebuild the exponential distribution to see how it works.
The log-likelihood is:
In s-expression form:
from symlik import LikelihoodModel
log_lik = ['sum', 'i', ['len', 'x'],
['+', ['log', 'lambda'],
['*', -1, ['*', 'lambda', ['@', 'x', 'i']]]]]
model = LikelihoodModel(log_lik, params=['lambda'])
# Test it
data = {'x': [1.0, 2.0, 3.0]}
mle, _ = model.mle(data=data, init={'lambda': 1.0}, bounds={'lambda': (0.01, 10)})
print(f"MLE: {mle['lambda']:.4f}") # Should be 1/mean = 0.5
Inspecting Symbolic Derivatives¶
The model computes derivatives automatically. You can inspect them:
# Score function (gradient)
score = model.score()
print(f"Score: {score}")
# Hessian matrix
hess = model.hessian()
print(f"Hessian: {hess}")
# Fisher information
info = model.information()
print(f"Information: {info}")
Example: Linear Regression¶
Least squares regression is maximum likelihood under normal errors.
For \(y_i = \beta_0 + \beta_1 x_i + \epsilon_i\) with \(\epsilon_i \sim N(0, \sigma^2)\):
from symlik import LikelihoodModel
# Assuming known sigma^2 = 1 for simplicity
residual = ['-', ['@', 'y', 'i'],
['+', 'beta0', ['*', 'beta1', ['@', 'x', 'i']]]]
log_lik = ['*', -0.5,
['sum', 'i', ['len', 'y'],
['^', residual, 2]]]
model = LikelihoodModel(log_lik, params=['beta0', 'beta1'])
# Generate data: y = 2 + 3x + noise
import numpy as np
np.random.seed(42)
x = np.linspace(0, 1, 20)
y = 2 + 3*x + np.random.normal(0, 0.5, 20)
data = {'x': x.tolist(), 'y': y.tolist()}
mle, _ = model.mle(data=data, init={'beta0': 0, 'beta1': 0})
print(f"Intercept: {mle['beta0']:.3f} (true: 2)")
print(f"Slope: {mle['beta1']:.3f} (true: 3)")
Example: Mixture Model (Simplified)¶
A mixture of two exponentials with known mixing proportion:
For known \(p = 0.5\), the log-likelihood of one observation is:
from symlik import LikelihoodModel
# Component densities (as log then exp for numerical stability)
comp1 = ['*', 'lambda1', ['exp', ['*', -1, ['*', 'lambda1', ['@', 'x', 'i']]]]]
comp2 = ['*', 'lambda2', ['exp', ['*', -1, ['*', 'lambda2', ['@', 'x', 'i']]]]]
# Mixture (p=0.5)
mixture = ['*', 0.5, ['+', comp1, comp2]]
log_lik = ['sum', 'i', ['len', 'x'], ['log', mixture]]
model = LikelihoodModel(log_lik, params=['lambda1', 'lambda2'])
Note: Mixture models can have multiple local optima. Starting values matter.
Alternative Parameterizations¶
Sometimes a different parameterization improves optimization.
Log-transform for positive parameters:
Instead of estimating \(\lambda > 0\), estimate \(\theta = \log(\lambda)\):
# Use exp(theta) where you would use lambda
log_lik = ['sum', 'i', ['len', 'x'],
['+', 'theta', # log(exp(theta)) = theta
['*', -1, ['*', ['exp', 'theta'], ['@', 'x', 'i']]]]]
model = LikelihoodModel(log_lik, params=['theta'])
# After fitting, transform back
mle, _ = model.mle(data=data, init={'theta': 0})
lambda_estimate = np.exp(mle['theta'])
Evaluating the Log-Likelihood¶
Check your log-likelihood by evaluating it:
from symlik import evaluate
# Direct evaluation
data = {'x': [1, 2, 3]}
params = {'lambda': 0.5}
ll_value = model.evaluate({**data, **params})
print(f"Log-likelihood at lambda=0.5: {ll_value:.4f}")
# Compare to manual calculation
import math
manual_ll = sum(math.log(0.5) - 0.5*xi for xi in data['x'])
print(f"Manual calculation: {manual_ll:.4f}")
Debugging Tips¶
Check dimensions. Make sure your summation bounds match your data:
Test with simple data. Use small datasets where you can verify by hand.
Print intermediate values. Evaluate sub-expressions to find errors:
sub_expr = ['@', 'x', 1] # First element
print(evaluate(sub_expr, {'x': [10, 20, 30]})) # Should be 10
Check for numerical issues. Log of zero, division by zero, and overflow can cause problems. Use bounds to keep parameters in valid ranges.
Next: Symbolic Calculus¶
Learn to use symlik's calculus functions for standalone symbolic math.