Python API¶
High-Level API¶
High-level Python API for pypoLCA.
- class pypolca.api.LCAResult(raw_results, formula=None, data=None, num_choices=None, y_mat=None)[source]¶
Bases:
objectPython-friendly wrapper around the C++ Results struct.
- Parameters:
- property probs: list[ndarray]¶
Class-conditional response probabilities.
Returns: list[J] of ndarray of shape (R, K_j).
- property coeff: ndarray¶
Covariate coefficients.
Returns: ndarray of shape (S, R-1). Column r-1 = coefficients for class r (r >= 2). None if no covariates.
- property coeff_se: ndarray¶
Standard errors of covariate coefficients.
Returns: ndarray of shape (S, R-1) matching .coeff layout. Empty array if no covariates.
- predict_posterior(newdata, newx=None)[source]¶
Compute posterior class membership probabilities for new data.
- Parameters:
newdata (DataFrame)
newx (DataFrame | None)
- Return type:
- property Chisq: float¶
Pearson chi-square goodness-of-fit.
Includes correction term (N - sum(exp)) for unobserved response patterns where O=0 and E>0, matching R poLCA behavior.
- pypolca.api.fit(formula, data, nclass=2, maxiter=1000, tol=1e-10, verbose=False, na_rm=True, probs_start=None, beta_start=None, nrep=1, seed=None, max_restarts=100, calc_se=True)[source]¶
Fit a latent class model.
- Parameters:
formula (str) – Patsy-style formula, e.g. “cbind(Y1, Y2, Y3) ~ 1” or “Y1 + Y2 ~ X1 + X2”. Left-hand side gives manifest variables; right-hand side gives covariates.
data (pd.DataFrame) – Data frame containing all variables.
nclass (int) – Number of latent classes.
maxiter (int) – Maximum EM iterations.
tol (float) – Log-likelihood convergence tolerance.
verbose (bool) – Print iteration progress.
na_rm (bool) – Drop rows with any missing values.
probs_start (np.ndarray, optional) – Starting values for class-conditional response probabilities.
beta_start (np.ndarray, optional) – Starting values for covariate coefficients.
nrep (int) – Number of replications with different random starting values (like R’s nrep).
seed (int, optional) – Random seed for the first replication. If None, a random seed is drawn.
max_restarts (int) – Maximum restarts per replication when a likelihood drop occurs (R retries indefinitely; this is a safety cap).
calc_se (bool) – Whether to compute standard errors (default True).
- Returns:
Fitted model result object.
- Return type:
Utilities¶
Utility functions for formula parsing and data preparation.
Datasets¶
Built-in datasets from R’s poLCA package.
All datasets are re-exported from R’s poLCA (GPL-2.0-or-later, compatible with this package) as CSV files. Use load_dataset() with a Dataset enum member to load one as a Polars DataFrame.
Usage:
from pypolca.data import load_dataset, Dataset
df = load_dataset(Dataset.CARCINOMA)
# or by name:
df = load_dataset("carcinoma")
from pypolca import fit
result = fit("cbind(A,B,C,D,E,F,G) ~ 1", df, nclass=2)
- pypolca.data._dataset.load_dataset(name)[source]¶
Load a built-in dataset as a Polars DataFrame.
- Parameters:
name (Dataset or str) – Dataset to load, e.g.
Dataset.CARCINOMAor"carcinoma".- Return type:
pl.DataFrame
- Raises:
ValueError – If name is not a valid dataset.
Examples
>>> from pypolca.data import load_dataset, Dataset >>> df = load_dataset(Dataset.CARCINOMA) >>> df.shape (118, 7)
- pypolca.data._dataset.get_dataset_info(name)[source]¶
Return metadata for a dataset (description, columns, source, example).
- class pypolca.data._dataset.Dataset(*values)[source]
Built-in datasets available for loading.
- CARCINOMA
Dichotomous ratings by seven pathologists of 118 slides for the presence or absence of carcinoma in the uterine cervix. Columns: A–G (1=no, 2=yes). Source: Agresti (2002), Table 13.1.
- Type:
- CHEATING
319 undergraduate students surveyed on chronic cheating behavior. Columns: LIEEXAM, LIEPAPER, FRAUD, COPYEXAM (1=no, 2=yes), GPA (1–5).
- Type:
- ELECTION
2000 American National Election Study survey, 1,785 respondents. 12 trait ratings (MORALG–INTELB, 1–4) for Gore and Bush, plus VOTE3, AGE, EDUC, GENDER, PARTY covariates.
- Type:
- GSS82
1,202 white respondents to the 1982 General Social Survey. Columns: PURPOSE (1–3), ACCURACY (1–2), UNDERSTA (1–3), COOPERAT (1–3). Source: McCutcheon (1987), Table 3.1.
- Type:
- VALUES
216 respondents on four dichotomous items measuring universalistic vs. particularistic values. Columns: A–D (1=universalistic, 2=particularistic).
- Type:
- CARCINOMA = 'carcinoma'
- CHEATING = 'cheating'
- ELECTION = 'election'
- GSS82 = 'gss82'
- VALUES = 'values'