Skip to content

API¤

Reference implementation in python¤

In order to keep the reference implementation as close to the math as possible we define some utilities with unicode symbols. E.g. 𝚷(i for i in ℤ[1,3]) is a valid python code for

\(\prod_{i=1}^{3} i\)

gsd.gsd_prob(ψ, ρ, k) ¤

Reference implementation of GSD probabilities in pure python.

Parameters:

Name Type Description Default
ψ float

mean

required
ρ float

dispersion

required
k int

response

required

Returns:

Type Description
float

probability of response k

JAX functions¤

Distribution functions implemented in JAX for speed and auto differentiation.

Currently, we support only GSD with 5 point scale

gsd.log_prob(psi, rho, k) ¤

Compute log probability of the response k for given parameters.

Parameters:

Name Type Description Default
psi ArrayLike

mean

required
rho ArrayLike

dispersion

required
k ArrayLike

response

required

Returns:

Type Description
Array

log of the probability in GSD distribution


gsd.sample(psi, rho, shape, key) ¤

Sample from GSD

Parameters:

Name Type Description Default
psi ArrayLike

mean

required
rho ArrayLike

dispersion

required
shape Shape

sample shape

required
key Array

random key

required

Returns:

Type Description
Array

Array of shape :param shape:


gsd.mean(psi, rho) ¤

Mean of GSD distribution


gsd.variance(psi, rho) ¤

Variance of GSD distribution


gsd.sufficient_statistic(data) ¤

Compute GSD sufficient statistic from samples.

Parameters:

Name Type Description Default
data ArrayLike

Samples from GSD data[i] in [1..5]

required

Returns:

Type Description
Array

Counts of each possible value

Fit¤

We provide few estimators. The simple one is based on moments. A more advanced gradient-based estimator maximum likelihood estimator is provided in gsd.experimental. We also provide a naive grid search MLE. Besides the high-level API one can use optimizers form scipy or tensorflow_probability.

gsd.fit_moments(data) ¤

Fits GSD using moment estimator

Parameters:

Name Type Description Default
data ArrayLike

An Array of counts of each response.

required

Returns:

Type Description
GSDParams

GSD Parameters

Constrained parameter space¤

gsd.fit.log_pmax(log_probs) ¤

Calculate the maximum of log of the sum of two probabilities from logarithsms of probabilities

Parameters:

Name Type Description Default
log_probs Array

logarithsms of probabilities

required

Returns:

Type Description
Array

Scalar array

gsd.fit.allowed_region(log_probs, n) ¤

Compute whether given log_probs satisfy conditions pmax <= 1-1/n as described in Appendix D. This is computed in the log domain as logpmax <= log(1-1/n).

Parameters:

Name Type Description Default
log_probs Array

logarithsms of probabilities

required
n Array

Total number of obserwations

required

Returns:

Type Description
Array

Binary array

Structures¤

gsd.fit.GSDParams ¤

Bases: NamedTuple

NamedTuple representing parameters for the Generalized Score Distribution (GSD).

This class is used to store the psi and rho parameters for the GSD. It provides a convenient way to group these parameters together for use in various statistical and modeling applications.

Experimental¤

gsd.experimental.fit_mle(data, max_iterations=100, log_lr_min=-15, log_lr_max=2.0, num_lr=10, constrain_by_pmax=False) ¤

Finds the maximum likelihood estimator of the GSD parameters. The algorithm used here is a simple gradient ascent. We use the concept of projected gradient to enforce constraints for parameters (psi in [1, 5], rho in [0, 1]) and exhaustive search for line search along the gradient.

Since the mass function is not smooth, a gradient-based estimator can fail

Parameters:

Name Type Description Default
data ArrayLike

An array of counts for each response.

required
max_iterations int

Maximum number of iterations.

100
log_lr_min ArrayLike

Log2 of the smallest learning rate.

-15
log_lr_max ArrayLike

Log2 of the largest learning rate.

2.0
num_lr ArrayLike

Number of learning rates to check during the line search.

10

Returns:

Type Description
tuple[GSDParams, OptState]

An opt state whore params filed contains estimated values of GSD Parameters

gsd.experimental.fit_mle_grid(data, num, constrain_by_pmax=False) ¤

Fit GSD using naive grid search method. This function uses numpy and cannot be used in jit

>>> data = jnp.asarray([20, 0, 0, 0, 0.0])
>>> theta = fit_mle_grid(data, GSDParams(32,32),False)

Parameters:

Name Type Description Default
data ArrayLike

An array of counts for each response.

required
num GSDParams

Number of search for each parameter

required
constrain_by_pmax

Bool flag whether add constrain described in Appendix D

False

Returns:

Type Description
GSDParams

Fitted parameters

gsd.experimental.g_test(n, p, m, q) ¤

G-test,"Bogdan: stosujemy bootstrapową wersję (zamiast asymptotycznej ze względu na małe n) klasycznego testu o nazwie G-test czyli testu ilorazu wiarygodności."

Parameters:

Name Type Description Default
n Array

Observation counts :math:(n_1, n_2, n_3, n_4, n_5), a 1d array

required
p Array

Estimated distribution :math:(p_1, p_2, p_3, p_4, p_5), a 1d array

required
m Array

T Bootstrap samples from distribution :math:p, Array[T,5]

required
q Array

T estimated distributions for bootstrapped samples, array[T,5]

required

Returns:

Type Description
Array

G-test p-value

gsd.experimental.pp_plot_data(data, estimator, key, n_bootstrap_samples) ¤

gsd.experimental.BootstrapResult ¤

Bases: NamedTuple

gsd.experimental.GridEstimator ¤

Bases: NamedTuple

Stateful MLE based on grid search

Parameters:

Name Type Description Default
psis

Grid of psi axis

required
rhos

Grid of rho axis

required
lps

Grid of log_prob for each answer and each entry in the axes.

required

__call__(data) ¤

Fit GSD using naive grid search method. This is a stateful version of fit_mle_grid that supports jax.vmap and jax.jit.

Parameters:

Name Type Description Default
data Array

An array of counts for each response.

required

Returns:

Type Description
GSDParams

Fitted parameters

make(num) staticmethod ¤

Make a grid estimator for GSD. This estimator precomputed log probabilities for each answer on a regular grid.

Parameters:

Name Type Description Default
num GSDParams

Number of grid points

required

Returns:

Type Description
GridEstimator

Estimator

gsd.experimental.OptState ¤

Bases: NamedTuple

A class representing the state of an optimization process.

Attributes:

Parameters:

Name Type Description Default
(GSDParams) params

The current optimization parameters.

required
(int) count

An integer count indicating the step or iteration of the optimization process. This class is used to store and manage the state of an optimization algorithm, allowing you to keep track of the current parameters, previous parameters, and the step count.

required

Maximum entropy¤

GSD distribution can be considered as the whole family of distributions with the following properties:

  1. Its distribution over \([1,N]\)
  2. The first parameter represents expectation value
  3. It covers all possible variances

Another distribution that has similar properties and can be considered a member of GSD family is maximum entropy distribution.

gsd.experimental.MaxEntropyGSD ¤

Bases: Module

Maximum entropy distribution supported on Z[1,N]

This distribution is defined to fulfill the following conditions on \(p_i\)

  • Maximize \(H= -\sum_i p_i\log(p_i)\) wrt.
  • \(\sum p_i=1\)
  • \(\sum i p_i= \mu\)
  • \(\sum (i-\mu)^2 p_i= \sigma^2\)

Parameters:

Name Type Description Default
mean

Expectation value of the distribution.

required
sigma

Standard deviation of the distribution.

required
N

Number of responses

required

from_gsd(theta, N) staticmethod ¤

Created maxentropy from GSD parameters.

Parameters:

Name Type Description Default
theta GSDParams

Parameters of a GSD distribution.

required
N int

Support size

required

Returns:

Type Description
MaxEntropyGSD

A distribution object