C++ API

The C++ backend implements the EM algorithm and all mathematical operations. It is exposed to Python via pybind11 but can also be used directly.

Core Types

struct Data

Input data container for latent class analysis.

Holds response matrix, covariate design matrix, and per-item category counts. Responses are 1-based with 0 indicating missing values.

Public Functions

inline int n_obs() const

Number of observations (rows of y).

inline int n_items() const

Number of items (columns of y).

inline int n_covariates() const

Number of covariates (columns of x).

Public Members

Eigen::MatrixXi y

N × J response matrix (1-based integers, 0 = missing).

Eigen::MatrixXd x

N × S covariate/design matrix. Include an intercept column manually.

std::vector<int> num_choices

Number of response categories for each item (length J).

struct Params

Latent class model parameters.

Encapsulates class-conditional response probabilities and covariate coefficients (beta) in flattened vector layouts suitable for Eigen.

Public Members

Eigen::VectorXd vecprobs

Flattened class-conditional response probabilities. Layout: [class 0: item 0 cat 0..K, item 1 cat 0..K, …], [class 1: …]. Total length = nclass * sum_j(num_choices[j]).

Eigen::VectorXd beta

Flattened covariate coefficients (multinomial logit parametrization). Layout: column-major S × (nclass-1) matrix stored as a single vector. The last class is the reference category (coefficients fixed at 0).

struct Results

Complete output from a latent class model fit.

Contains parameter estimates, posterior membership probabilities, convergence diagnostics, and (optionally) standard errors.

Public Members

Params params

Estimated model parameters (vecprobs and beta).

Eigen::MatrixXd posterior

N × nclass posterior class-membership probabilities.

Eigen::MatrixXd prior

N × nclass prior class probabilities per observation.

double loglik = 0.0

Final log-likelihood.

int iterations = 0

Number of EM iterations performed.

bool converged = false

Whether the EM algorithm converged before reaching maxiter.

bool error = false

Whether a fatal error occurred during fitting.

Eigen::VectorXd vecprobs_se

Standard errors for vecprobs (same layout as Params::vecprobs).

Eigen::VectorXd P_se

Standard errors for class population shares (length nclass).

Eigen::VectorXd beta_se

Standard errors for beta coefficients (same layout as Params::beta).

Eigen::MatrixXd beta_V

Covariance matrix of beta coefficients (S(nclass-1) × S(nclass-1)).

struct SEs

Container for standard error estimates.

Public Members

Eigen::VectorXd vecprobs_se

Standard errors for vecprobs (same layout as Params::vecprobs).

Eigen::VectorXd P_se

Standard errors for class population shares.

Eigen::VectorXd beta_se

Standard errors for beta coefficients.

Eigen::MatrixXd beta_V

Covariance matrix of beta coefficients.

EM Engine

Results pypolca::fit_em(const Data &data, int nclass, int maxiter = 1000, double tol = 1e-10, const Eigen::VectorXd &probs_start = Eigen::VectorXd(), const Eigen::VectorXd &beta_start = Eigen::VectorXd(), const unsigned int seed = 42, bool calc_se = true)

Fit a latent class model via EM with optional Newton-Raphson covariate updates.

The algorithm alternates between:

  1. E-step: compute posterior membership from current parameters.

  2. M-step: update response probabilities from weighted responses.

  3. If covariates present: Newton-Raphson update of beta coefficients.

If nrep > 1, the best fit (highest log-likelihood) across random starts is returned.

Parameters:
  • data – Input data container (responses and optional covariates).

  • nclass – Number of latent classes (≥ 1).

  • maxiter – Maximum EM iterations (default: 1000).

  • tol – Convergence tolerance on log-likelihood change (default: 1e-10).

  • probs_start – Optional starting vecprobs vector (empty = random init).

  • beta_start – Optional starting beta vector (empty = zeros or random).

  • seed – Random seed for reproducibility (default: 42).

  • calc_se – Whether to compute standard errors at convergence (default: true).

Returns:

Fitted Results containing parameters, posterior, diagnostics, and SEs.

Math Operations

Eigen::MatrixXd pypolca::compute_log_ylik(const Data &data, const Params &p, int nclass)

Compute the log of class-conditional response likelihoods.

For each observation i and class r, computes: log P(y_i | class = r) = sum_j log(P(y_{ij} | class = r)) with proper handling of missing values (coded as 0).

Parameters:
  • data – Input data container.

  • p – Current parameter estimates.

  • nclass – Number of latent classes.

Returns:

N × nclass matrix of log-likelihood contributions.

std::pair<Eigen::MatrixXd, double> pypolca::e_step(const Data &data, const Params &p, const Eigen::MatrixXd &prior, int nclass)

E-step: compute posterior class membership probabilities.

Uses Bayes’ rule with current parameters and priors to update posterior = P(class = r | y_i) for each observation.

Parameters:
  • data – Input data container.

  • p – Current parameter estimates.

  • prior – N × nclass prior probabilities (from beta or uniform).

  • nclass – Number of latent classes.

Returns:

{posterior (N × nclass), total log-likelihood}.

Eigen::VectorXd pypolca::m_step_probs(const Data &data, const Eigen::MatrixXd &posterior, const std::vector<int> &num_choices, int nclass)

M-step for class-conditional response probabilities.

Updates the multinomial probabilities p_{jkr} = P(y_j = k | class = r) by weighting observed responses by posterior membership.

Parameters:
  • data – Input data container.

  • posterior – N × nclass posterior matrix from the E-step.

  • num_choices – Number of categories per item.

  • nclass – Number of latent classes.

Returns:

Updated flattened vecprobs vector.

Eigen::MatrixXd pypolca::compute_prior_from_beta(const Eigen::MatrixXd &x, const Eigen::VectorXd &beta, int nclass)

Build prior class-membership probabilities from covariates via multinomial logit.

For each observation i and class r (except the reference class), computes: prior[i, r] = exp(x_i * beta_r) / (1 + sum_s exp(x_i * beta_s)) The reference class (last) gets the residual probability.

Parameters:
  • x – N × S covariate matrix.

  • beta – Flat vector of coefficients, column-major S × (nclass-1).

  • nclass – Number of latent classes.

Returns:

N × nclass prior matrix (each row sums to 1).

std::pair<Eigen::VectorXd, Eigen::MatrixXd> pypolca::compute_beta_derivatives(const Data &data, const Eigen::MatrixXd &posterior, const Eigen::MatrixXd &prior, const Eigen::VectorXd &beta, int nclass)

Compute gradient and observed information for covariate coefficients (beta).

Used in the Newton-Raphson update for the multinomial logit model on the class priors.

Parameters:
  • data – Input data container.

  • posterior – N × nclass posterior matrix.

  • prior – N × nclass prior matrix.

  • beta – Current beta vector (flat, length S*(nclass-1)).

  • nclass – Number of latent classes.

Returns:

{gradient vector, negative Hessian (observed information)}.

std::pair<Eigen::VectorXd, Eigen::MatrixXd> pypolca::update_beta(const Data &data, const Eigen::MatrixXd &posterior, const Eigen::MatrixXd &prior, const Eigen::VectorXd &beta, int nclass)

One Newton-Raphson step for beta and corresponding prior update.

Solves H * delta = grad and updates beta_new = beta + delta. Recomputes prior from the new beta via multinomial logit.

Parameters:
  • data – Input data container.

  • posterior – N × nclass posterior matrix.

  • prior – N × nclass prior matrix.

  • beta – Current beta vector.

  • nclass – Number of latent classes.

Returns:

{updated beta vector, updated prior matrix}.

SEs pypolca::compute_standard_errors(const Data &data, const Params &params, const Eigen::MatrixXd &posterior, const Eigen::MatrixXd &prior, int nclass)

Compute standard errors from the observed information matrix.

Inverts the observed Fisher information at the MLE to obtain asymptotic covariance matrices and standard errors.

Parameters:
  • data – Input data container.

  • params – Parameter estimates at convergence.

  • posterior – N × nclass posterior matrix.

  • prior – N × nclass prior matrix.

  • nclass – Number of latent classes.

Returns:

SEs struct with all standard error estimates.

double pypolca::compute_logsumexp(const Eigen::VectorXd &x)

Numerically stable log-sum-exp of a vector.

Computes log(sum(exp(x))) by shifting by max(x) to avoid overflow.

Parameters:

x – Input vector.

Returns:

log(sum(exp(x))).