C++ API¶
The C++ backend implements the EM algorithm and all mathematical operations. It is exposed to Python via pybind11 but can also be used directly.
Core Types¶
-
struct Data¶
Input data container for latent class analysis.
Holds response matrix, covariate design matrix, and per-item category counts. Responses are 1-based with 0 indicating missing values.
-
struct Params¶
Latent class model parameters.
Encapsulates class-conditional response probabilities and covariate coefficients (beta) in flattened vector layouts suitable for Eigen.
Public Members
-
Eigen::VectorXd vecprobs¶
Flattened class-conditional response probabilities. Layout: [class 0: item 0 cat 0..K, item 1 cat 0..K, …], [class 1: …]. Total length = nclass * sum_j(num_choices[j]).
-
Eigen::VectorXd beta¶
Flattened covariate coefficients (multinomial logit parametrization). Layout: column-major S × (nclass-1) matrix stored as a single vector. The last class is the reference category (coefficients fixed at 0).
-
Eigen::VectorXd vecprobs¶
-
struct Results¶
Complete output from a latent class model fit.
Contains parameter estimates, posterior membership probabilities, convergence diagnostics, and (optionally) standard errors.
Public Members
-
Eigen::MatrixXd posterior¶
N × nclass posterior class-membership probabilities.
-
Eigen::MatrixXd prior¶
N × nclass prior class probabilities per observation.
-
double loglik = 0.0¶
Final log-likelihood.
-
int iterations = 0¶
Number of EM iterations performed.
-
bool converged = false¶
Whether the EM algorithm converged before reaching maxiter.
-
bool error = false¶
Whether a fatal error occurred during fitting.
-
Eigen::VectorXd vecprobs_se¶
Standard errors for vecprobs (same layout as Params::vecprobs).
-
Eigen::VectorXd P_se¶
Standard errors for class population shares (length nclass).
-
Eigen::VectorXd beta_se¶
Standard errors for beta coefficients (same layout as Params::beta).
-
Eigen::MatrixXd beta_V¶
Covariance matrix of beta coefficients (S(nclass-1) × S(nclass-1)).
-
Eigen::MatrixXd posterior¶
-
struct SEs¶
Container for standard error estimates.
Public Members
-
Eigen::VectorXd vecprobs_se¶
Standard errors for vecprobs (same layout as Params::vecprobs).
-
Eigen::VectorXd P_se¶
Standard errors for class population shares.
-
Eigen::VectorXd beta_se¶
Standard errors for beta coefficients.
-
Eigen::MatrixXd beta_V¶
Covariance matrix of beta coefficients.
-
Eigen::VectorXd vecprobs_se¶
EM Engine¶
-
Results pypolca::fit_em(const Data &data, int nclass, int maxiter = 1000, double tol = 1e-10, const Eigen::VectorXd &probs_start = Eigen::VectorXd(), const Eigen::VectorXd &beta_start = Eigen::VectorXd(), const unsigned int seed = 42, bool calc_se = true)¶
Fit a latent class model via EM with optional Newton-Raphson covariate updates.
The algorithm alternates between:
E-step: compute posterior membership from current parameters.
M-step: update response probabilities from weighted responses.
If covariates present: Newton-Raphson update of beta coefficients.
If nrep > 1, the best fit (highest log-likelihood) across random starts is returned.
- Parameters:
data – Input data container (responses and optional covariates).
nclass – Number of latent classes (≥ 1).
maxiter – Maximum EM iterations (default: 1000).
tol – Convergence tolerance on log-likelihood change (default: 1e-10).
probs_start – Optional starting vecprobs vector (empty = random init).
beta_start – Optional starting beta vector (empty = zeros or random).
seed – Random seed for reproducibility (default: 42).
calc_se – Whether to compute standard errors at convergence (default: true).
- Returns:
Fitted Results containing parameters, posterior, diagnostics, and SEs.
Math Operations¶
-
Eigen::MatrixXd pypolca::compute_log_ylik(const Data &data, const Params &p, int nclass)¶
Compute the log of class-conditional response likelihoods.
For each observation i and class r, computes: log P(y_i | class = r) = sum_j log(P(y_{ij} | class = r)) with proper handling of missing values (coded as 0).
- Parameters:
data – Input data container.
p – Current parameter estimates.
nclass – Number of latent classes.
- Returns:
N × nclass matrix of log-likelihood contributions.
-
std::pair<Eigen::MatrixXd, double> pypolca::e_step(const Data &data, const Params &p, const Eigen::MatrixXd &prior, int nclass)¶
E-step: compute posterior class membership probabilities.
Uses Bayes’ rule with current parameters and priors to update posterior = P(class = r | y_i) for each observation.
- Parameters:
data – Input data container.
p – Current parameter estimates.
prior – N × nclass prior probabilities (from beta or uniform).
nclass – Number of latent classes.
- Returns:
{posterior (N × nclass), total log-likelihood}.
-
Eigen::VectorXd pypolca::m_step_probs(const Data &data, const Eigen::MatrixXd &posterior, const std::vector<int> &num_choices, int nclass)¶
M-step for class-conditional response probabilities.
Updates the multinomial probabilities p_{jkr} = P(y_j = k | class = r) by weighting observed responses by posterior membership.
- Parameters:
data – Input data container.
posterior – N × nclass posterior matrix from the E-step.
num_choices – Number of categories per item.
nclass – Number of latent classes.
- Returns:
Updated flattened vecprobs vector.
-
Eigen::MatrixXd pypolca::compute_prior_from_beta(const Eigen::MatrixXd &x, const Eigen::VectorXd &beta, int nclass)¶
Build prior class-membership probabilities from covariates via multinomial logit.
For each observation i and class r (except the reference class), computes: prior[i, r] = exp(x_i * beta_r) / (1 + sum_s exp(x_i * beta_s)) The reference class (last) gets the residual probability.
- Parameters:
x – N × S covariate matrix.
beta – Flat vector of coefficients, column-major S × (nclass-1).
nclass – Number of latent classes.
- Returns:
N × nclass prior matrix (each row sums to 1).
-
std::pair<Eigen::VectorXd, Eigen::MatrixXd> pypolca::compute_beta_derivatives(const Data &data, const Eigen::MatrixXd &posterior, const Eigen::MatrixXd &prior, const Eigen::VectorXd &beta, int nclass)¶
Compute gradient and observed information for covariate coefficients (beta).
Used in the Newton-Raphson update for the multinomial logit model on the class priors.
- Parameters:
data – Input data container.
posterior – N × nclass posterior matrix.
prior – N × nclass prior matrix.
beta – Current beta vector (flat, length S*(nclass-1)).
nclass – Number of latent classes.
- Returns:
{gradient vector, negative Hessian (observed information)}.
-
std::pair<Eigen::VectorXd, Eigen::MatrixXd> pypolca::update_beta(const Data &data, const Eigen::MatrixXd &posterior, const Eigen::MatrixXd &prior, const Eigen::VectorXd &beta, int nclass)¶
One Newton-Raphson step for beta and corresponding prior update.
Solves H * delta = grad and updates beta_new = beta + delta. Recomputes prior from the new beta via multinomial logit.
- Parameters:
data – Input data container.
posterior – N × nclass posterior matrix.
prior – N × nclass prior matrix.
beta – Current beta vector.
nclass – Number of latent classes.
- Returns:
{updated beta vector, updated prior matrix}.
-
SEs pypolca::compute_standard_errors(const Data &data, const Params ¶ms, const Eigen::MatrixXd &posterior, const Eigen::MatrixXd &prior, int nclass)¶
Compute standard errors from the observed information matrix.
Inverts the observed Fisher information at the MLE to obtain asymptotic covariance matrices and standard errors.
- Parameters:
data – Input data container.
params – Parameter estimates at convergence.
posterior – N × nclass posterior matrix.
prior – N × nclass prior matrix.
nclass – Number of latent classes.
- Returns:
SEs struct with all standard error estimates.
-
double pypolca::compute_logsumexp(const Eigen::VectorXd &x)¶
Numerically stable log-sum-exp of a vector.
Computes log(sum(exp(x))) by shifting by max(x) to avoid overflow.
- Parameters:
x – Input vector.
- Returns:
log(sum(exp(x))).