API¤

Reference implementation in python¤

In order to keep the reference implementation as close to the math as possible we define some utilities with unicode symbols. E.g. 𝚷(i for i in ℤ[1,3]) is a valid python code for

\(\prod_{i=1}^{3} i\)

`gsd.gsd_prob(ψ, ρ, k)` ¤

Reference implementation of GSD probabilities in pure python.

Parameters:

Name	Type	Description	Default
`ψ`	`float`	mean	required
`ρ`	`float`	dispersion	required
`k`	`int`	response	required

Returns:

Type	Description
`float`	probability of response k

JAX functions¤

Distribution functions implemented in JAX for speed and auto differentiation.

Currently, we support only GSD with 5 point scale

`gsd.log_prob(psi, rho, k)` ¤

Compute log probability of the response k for given parameters.

Parameters:

Name	Type	Description	Default
`psi`	`ArrayLike`	mean	required
`rho`	`ArrayLike`	dispersion	required
`k`	`ArrayLike`	response	required

Returns:

Type	Description
`Array`	log of the probability in GSD distribution

`gsd.sample(psi, rho, shape, key)` ¤

Sample from GSD

Parameters:

Name	Type	Description	Default
`psi`	`ArrayLike`	mean	required
`rho`	`ArrayLike`	dispersion	required
`shape`	`Shape`	sample shape	required
`key`	`Array`	random key	required

Returns:

Type	Description
`Array`	Array of shape :param shape:

`gsd.mean(psi, rho)` ¤

Mean of GSD distribution

`gsd.variance(psi, rho)` ¤

Variance of GSD distribution

`gsd.sufficient_statistic(data)` ¤

Compute GSD sufficient statistic from samples.

Parameters:

Name	Type	Description	Default
`data`	`ArrayLike`	Samples from GSD data[i] in [1..5]	required

Returns:

Type	Description
`Array`	Counts of each possible value

Fit¤

We provide few estimators. The simple one is based on moments. A more advanced gradient-based estimator maximum likelihood estimator is provided in gsd.experimental. We also provide a naive grid search MLE. Besides the high-level API one can use optimizers form scipy or tensorflow_probability.

`gsd.fit_moments(data)` ¤

Fits GSD using moment estimator

Parameters:

Name	Type	Description	Default
`data`	`ArrayLike`	An Array of counts of each response.	required

Returns:

Type	Description
`GSDParams`	GSD Parameters

Constrained parameter space¤

`gsd.fit.log_pmax(log_probs)` ¤

Calculate the maximum of log of the sum of two probabilities from logarithsms of probabilities

Parameters:

Name	Type	Description	Default
`log_probs`	`Array`	logarithsms of probabilities	required

Returns:

Type	Description
`Array`	Scalar array

`gsd.fit.allowed_region(log_probs, n)` ¤

Compute whether given log_probs satisfy conditions pmax <= 1-1/n as described in Appendix D. This is computed in the log domain as logpmax <= log(1-1/n).

Parameters:

Name	Type	Description	Default
`log_probs`	`Array`	logarithsms of probabilities	required
`n`	`Array`	Total number of obserwations	required

Returns:

Type	Description
`Array`	Binary array

Structures¤

`gsd.fit.GSDParams` ¤

Bases: NamedTuple

NamedTuple representing parameters for the Generalized Score Distribution (GSD).

This class is used to store the psi and rho parameters for the GSD. It provides a convenient way to group these parameters together for use in various statistical and modeling applications.

Experimental¤

`gsd.experimental.fit_mle(data, max_iterations=100, log_lr_min=-15, log_lr_max=2.0, num_lr=10, constrain_by_pmax=False)` ¤

Finds the maximum likelihood estimator of the GSD parameters. The algorithm used here is a simple gradient ascent. We use the concept of projected gradient to enforce constraints for parameters (psi in [1, 5], rho in [0, 1]) and exhaustive search for line search along the gradient.

Since the mass function is not smooth, a gradient-based estimator can fail

Parameters:

Name	Type	Description	Default
`data`	`ArrayLike`	An array of counts for each response.	required
`max_iterations`	`int`	Maximum number of iterations.	`100`
`log_lr_min`	`ArrayLike`	Log2 of the smallest learning rate.	`-15`
`log_lr_max`	`ArrayLike`	Log2 of the largest learning rate.	`2.0`
`num_lr`	`ArrayLike`	Number of learning rates to check during the line search.	`10`

Returns:

Type	Description
`tuple[GSDParams, OptState]`	An opt state whore params filed contains estimated values of GSD Parameters

`gsd.experimental.fit_mle_grid(data, num, constrain_by_pmax=False)` ¤

Fit GSD using naive grid search method. This function uses numpy and cannot be used in jit

>>> data = jnp.asarray([20, 0, 0, 0, 0.0])
>>> theta = fit_mle_grid(data, GSDParams(32,32),False)

Parameters:

Name	Type	Description	Default
`data`	`ArrayLike`	An array of counts for each response.	required
`num`	`GSDParams`	Number of search for each parameter	required
`constrain_by_pmax`		Bool flag whether add constrain described in Appendix D	`False`

Returns:

Type	Description
`GSDParams`	Fitted parameters

`gsd.experimental.g_test(n, p, m, q)` ¤

G-test,"Bogdan: stosujemy bootstrapową wersję (zamiast asymptotycznej ze względu na małe n) klasycznego testu o nazwie G-test czyli testu ilorazu wiarygodności."

Parameters:

Name	Type	Description	Default
`n`	`Array`	Observation counts :math:`(n_1, n_2, n_3, n_4, n_5)`, a 1d array	required
`p`	`Array`	Estimated distribution :math:`(p_1, p_2, p_3, p_4, p_5)`, a 1d array	required
`m`	`Array`	T Bootstrap samples from distribution :math:`p`, Array[T,5]	required
`q`	`Array`	T estimated distributions for bootstrapped samples, array[T,5]	required

Returns:

Type	Description
`Array`	G-test p-value

`gsd.experimental.pp_plot_data(data, estimator, key, n_bootstrap_samples)` ¤

`gsd.experimental.BootstrapResult` ¤

Bases: NamedTuple

`gsd.experimental.GridEstimator` ¤

Bases: NamedTuple

Stateful MLE based on grid search

Parameters:

Name	Description	Default
`psis`	Grid of psi axis	required
`rhos`	Grid of rho axis	required
`lps`	Grid of `log_prob` for each answer and each entry in the axes.	required

`call(data)` ¤

Fit GSD using naive grid search method. This is a stateful version of fit_mle_grid that supports jax.vmap and jax.jit.

Parameters:

Name	Type	Description	Default
`data`	`Array`	An array of counts for each response.	required

Returns:

Type	Description
`GSDParams`	Fitted parameters

`make(num)` `staticmethod` ¤

Make a grid estimator for GSD. This estimator precomputed log probabilities for each answer on a regular grid.

Parameters:

Name	Type	Description	Default
`num`	`GSDParams`	Number of grid points	required

Returns:

Type	Description
`GridEstimator`	Estimator

`gsd.experimental.OptState` ¤

Bases: NamedTuple

A class representing the state of an optimization process.

Attributes:

Parameters:

Name	Type	Description	Default
`(GSDParams)`	`params`	The current optimization parameters.	required
`(int)`	`count`	An integer count indicating the step or iteration of the optimization process. This class is used to store and manage the state of an optimization algorithm, allowing you to keep track of the current parameters, previous parameters, and the step count.	required

Maximum entropy¤

GSD distribution can be considered as the whole family of distributions with the following properties:

Its distribution over \([1,N]\)
The first parameter represents expectation value
It covers all possible variances

Another distribution that has similar properties and can be considered a member of GSD family is maximum entropy distribution.

`gsd.experimental.MaxEntropyGSD` ¤

Bases: Module

Maximum entropy distribution supported on Z[1,N]

This distribution is defined to fulfill the following conditions on \(p_i\)

Maximize \(H= -\sum_i p_i\log(p_i)\) wrt.
\(\sum p_i=1\)
\(\sum i p_i= \mu\)
\(\sum (i-\mu)^2 p_i= \sigma^2\)

Parameters:

Name	Description	Default
`mean`	Expectation value of the distribution.	required
`sigma`	Standard deviation of the distribution.	required
`N`	Number of responses	required

`from_gsd(theta, N)` `staticmethod` ¤

Created maxentropy from GSD parameters.

Parameters:

Name	Type	Description	Default
`theta`	`GSDParams`	Parameters of a GSD distribution.	required
`N`	`int`	Support size	required

Returns:

Type	Description
`MaxEntropyGSD`	A distribution object

API¤

Reference implementation in python¤

gsd.gsd_prob(ψ, ρ, k) ¤

JAX functions¤

gsd.log_prob(psi, rho, k) ¤

gsd.sample(psi, rho, shape, key) ¤

gsd.mean(psi, rho) ¤

gsd.variance(psi, rho) ¤

gsd.sufficient_statistic(data) ¤

Fit¤

gsd.fit_moments(data) ¤

Constrained parameter space¤

gsd.fit.log_pmax(log_probs) ¤

gsd.fit.allowed_region(log_probs, n) ¤

Structures¤

gsd.fit.GSDParams ¤

Experimental¤

gsd.experimental.fit_mle(data, max_iterations=100, log_lr_min=-15, log_lr_max=2.0, num_lr=10, constrain_by_pmax=False) ¤

gsd.experimental.fit_mle_grid(data, num, constrain_by_pmax=False) ¤

gsd.experimental.g_test(n, p, m, q) ¤

gsd.experimental.pp_plot_data(data, estimator, key, n_bootstrap_samples) ¤

gsd.experimental.BootstrapResult ¤

gsd.experimental.GridEstimator ¤

__call__(data) ¤

make(num) staticmethod ¤

gsd.experimental.OptState ¤

Maximum entropy¤

gsd.experimental.MaxEntropyGSD ¤

from_gsd(theta, N) staticmethod ¤

`gsd.gsd_prob(ψ, ρ, k)` ¤

`gsd.log_prob(psi, rho, k)` ¤

`gsd.sample(psi, rho, shape, key)` ¤

`gsd.mean(psi, rho)` ¤

`gsd.variance(psi, rho)` ¤

`gsd.sufficient_statistic(data)` ¤

`gsd.fit_moments(data)` ¤

`gsd.fit.log_pmax(log_probs)` ¤

`gsd.fit.allowed_region(log_probs, n)` ¤

`gsd.fit.GSDParams` ¤

`gsd.experimental.fit_mle(data, max_iterations=100, log_lr_min=-15, log_lr_max=2.0, num_lr=10, constrain_by_pmax=False)` ¤

`gsd.experimental.fit_mle_grid(data, num, constrain_by_pmax=False)` ¤

`gsd.experimental.g_test(n, p, m, q)` ¤

`gsd.experimental.pp_plot_data(data, estimator, key, n_bootstrap_samples)` ¤

`gsd.experimental.BootstrapResult` ¤

`gsd.experimental.GridEstimator` ¤

`call(data)` ¤

`make(num)` `staticmethod` ¤

`gsd.experimental.OptState` ¤

`gsd.experimental.MaxEntropyGSD` ¤

`from_gsd(theta, N)` `staticmethod` ¤