API¤
Reference implementation in python¤
In order to keep the reference implementation as close to the math as possible we define some utilities with unicode symbols.
E.g. 𝚷(i for i in ℤ[1,3])
is a valid python code for
\(\prod_{i=1}^{3} i\)
gsd.gsd_prob(ψ, ρ, k)
¤
Reference implementation of GSD probabilities in pure python.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ψ |
float
|
mean |
required |
ρ |
float
|
dispersion |
required |
k |
int
|
response |
required |
Returns:
Type | Description |
---|---|
float
|
probability of response k |
JAX functions¤
Distribution functions implemented in JAX for speed and auto differentiation.
Currently, we support only GSD with 5 point scale
gsd.log_prob(psi, rho, k)
¤
Compute log probability of the response k for given parameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
psi |
ArrayLike
|
mean |
required |
rho |
ArrayLike
|
dispersion |
required |
k |
ArrayLike
|
response |
required |
Returns:
Type | Description |
---|---|
Array
|
log of the probability in GSD distribution |
gsd.sample(psi, rho, shape, key)
¤
Sample from GSD
Parameters:
Name | Type | Description | Default |
---|---|---|---|
psi |
ArrayLike
|
mean |
required |
rho |
ArrayLike
|
dispersion |
required |
shape |
Shape
|
sample shape |
required |
key |
Array
|
random key |
required |
Returns:
Type | Description |
---|---|
Array
|
Array of shape :param shape: |
gsd.mean(psi, rho)
¤
Mean of GSD distribution
gsd.variance(psi, rho)
¤
Variance of GSD distribution
gsd.sufficient_statistic(data)
¤
Compute GSD sufficient statistic from samples.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
ArrayLike
|
Samples from GSD data[i] in [1..5] |
required |
Returns:
Type | Description |
---|---|
Array
|
Counts of each possible value |
Fit¤
We provide few estimators. The simple one is based on moments.
A more advanced gradient-based estimator maximum likelihood estimator is
provided in gsd.experimental
. We also provide a naive grid search MLE.
Besides the high-level API one can use optimizers form scipy
or tensorflow_probability
.
gsd.fit_moments(data)
¤
Fits GSD using moment estimator
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
ArrayLike
|
An Array of counts of each response. |
required |
Returns:
Type | Description |
---|---|
GSDParams
|
GSD Parameters |
Constrained parameter space¤
gsd.fit.log_pmax(log_probs)
¤
Calculate the maximum of log of the sum of two probabilities from logarithsms of probabilities
Parameters:
Name | Type | Description | Default |
---|---|---|---|
log_probs |
Array
|
logarithsms of probabilities |
required |
Returns:
Type | Description |
---|---|
Array
|
Scalar array |
gsd.fit.allowed_region(log_probs, n)
¤
Compute whether given log_probs satisfy conditions pmax <= 1-1/n
as
described in Appendix D.
This is computed in the log domain as logpmax <= log(1-1/n)
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
log_probs |
Array
|
logarithsms of probabilities |
required |
n |
Array
|
Total number of obserwations |
required |
Returns:
Type | Description |
---|---|
Array
|
Binary array |
Structures¤
gsd.fit.GSDParams
¤
Bases: NamedTuple
NamedTuple representing parameters for the Generalized Score Distribution (GSD).
This class is used to store the psi and rho parameters for the GSD. It provides a convenient way to group these parameters together for use in various statistical and modeling applications.
Experimental¤
gsd.experimental.fit_mle(data, max_iterations=100, log_lr_min=-15, log_lr_max=2.0, num_lr=10, constrain_by_pmax=False)
¤
Finds the maximum likelihood estimator of the GSD parameters. The algorithm used here is a simple gradient ascent. We use the concept of projected gradient to enforce constraints for parameters (psi in [1, 5], rho in [0, 1]) and exhaustive search for line search along the gradient.
Since the mass function is not smooth, a gradient-based estimator can fail
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
ArrayLike
|
An array of counts for each response. |
required |
max_iterations |
int
|
Maximum number of iterations. |
100
|
log_lr_min |
ArrayLike
|
Log2 of the smallest learning rate. |
-15
|
log_lr_max |
ArrayLike
|
Log2 of the largest learning rate. |
2.0
|
num_lr |
ArrayLike
|
Number of learning rates to check during the line search. |
10
|
Returns:
Type | Description |
---|---|
tuple[GSDParams, OptState]
|
An opt state whore params filed contains estimated values of GSD Parameters |
gsd.experimental.fit_mle_grid(data, num, constrain_by_pmax=False)
¤
Fit GSD using naive grid search method.
This function uses numpy
and cannot be used in jit
>>> data = jnp.asarray([20, 0, 0, 0, 0.0])
>>> theta = fit_mle_grid(data, GSDParams(32,32),False)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
ArrayLike
|
An array of counts for each response. |
required |
num |
GSDParams
|
Number of search for each parameter |
required |
constrain_by_pmax |
Bool flag whether add constrain described in Appendix D |
False
|
Returns:
Type | Description |
---|---|
GSDParams
|
Fitted parameters |
gsd.experimental.g_test(n, p, m, q)
¤
G-test,"Bogdan: stosujemy bootstrapową wersję (zamiast asymptotycznej ze względu na małe n) klasycznego testu o nazwie G-test czyli testu ilorazu wiarygodności."
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n |
Array
|
Observation counts :math: |
required |
p |
Array
|
Estimated distribution :math: |
required |
m |
Array
|
T Bootstrap samples from distribution :math: |
required |
q |
Array
|
T estimated distributions for bootstrapped samples, array[T,5] |
required |
Returns:
Type | Description |
---|---|
Array
|
G-test p-value |
gsd.experimental.pp_plot_data(data, estimator, key, n_bootstrap_samples)
¤
gsd.experimental.BootstrapResult
¤
Bases: NamedTuple
gsd.experimental.GridEstimator
¤
Bases: NamedTuple
Stateful MLE based on grid search
Parameters:
Name | Type | Description | Default |
---|---|---|---|
psis |
Grid of psi axis |
required | |
rhos |
Grid of rho axis |
required | |
lps |
Grid of |
required |
__call__(data)
¤
Fit GSD using naive grid search method.
This is a stateful version of fit_mle_grid
that supports jax.vmap
and jax.jit
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
Array
|
An array of counts for each response. |
required |
Returns:
Type | Description |
---|---|
GSDParams
|
Fitted parameters |
make(num)
staticmethod
¤
Make a grid estimator for GSD. This estimator precomputed log probabilities for each answer on a regular grid.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
num |
GSDParams
|
Number of grid points |
required |
Returns:
Type | Description |
---|---|
GridEstimator
|
Estimator |
gsd.experimental.OptState
¤
Bases: NamedTuple
A class representing the state of an optimization process.
Attributes:
Parameters:
Name | Type | Description | Default |
---|---|---|---|
(GSDParams) |
params
|
The current optimization parameters. |
required |
(int) |
count
|
An integer count indicating the step or iteration of the optimization process. This class is used to store and manage the state of an optimization algorithm, allowing you to keep track of the current parameters, previous parameters, and the step count. |
required |
Maximum entropy¤
GSD distribution can be considered as the whole family of distributions with the following properties:
- Its distribution over \([1,N]\)
- The first parameter represents expectation value
- It covers all possible variances
Another distribution that has similar properties and can be considered a member of GSD family is maximum entropy distribution.
gsd.experimental.MaxEntropyGSD
¤
Bases: Module
Maximum entropy distribution supported on Z[1,N]
This distribution is defined to fulfill the following conditions on \(p_i\)
- Maximize \(H= -\sum_i p_i\log(p_i)\) wrt.
- \(\sum p_i=1\)
- \(\sum i p_i= \mu\)
- \(\sum (i-\mu)^2 p_i= \sigma^2\)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mean |
Expectation value of the distribution. |
required | |
sigma |
Standard deviation of the distribution. |
required | |
N |
Number of responses |
required |
from_gsd(theta, N)
staticmethod
¤
Created maxentropy from GSD parameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
theta |
GSDParams
|
Parameters of a GSD distribution. |
required |
N |
int
|
Support size |
required |
Returns:
Type | Description |
---|---|
MaxEntropyGSD
|
A distribution object |