151: Swissmetro Panel Latent Class#
import numpy as np
import pandas as pd
import larch as lx
from larch import P, X
print(lx.versions())
{'larch': '6.0.42', 'sharrow': '2.14.0', 'numpy': '2.2.6', 'pandas': '2.3.2', 'xarray': '2025.6.1', 'numba': '0.62.0', 'jax': '0.6.2'}
This example is a latent class mode choice model (also known as a discrete mixture model) built using the Swissmetro example dataset. This model is similar to the simple latent class model shown in Example 107. The key difference is that the class membership is determined per-respondent instead of per-observation. The underlying data is from a stated preference survey, where respondents made repeated choices with various different alternative attributes.
As before, the first step is to load the data and do some pre-processing, so that the format and scale of the data matches that from the Biogeme example.
raw = pd.read_csv(lx.example_file("swissmetro.csv.gz"))
raw["SM_COST"] = raw["SM_CO"] * (raw["GA"] == 0)
raw["TRAIN_COST"] = raw.eval("TRAIN_CO * (GA == 0)")
raw["TRAIN_COST_SCALED"] = raw["TRAIN_COST"] / 100
raw["TRAIN_TT_SCALED"] = raw["TRAIN_TT"] / 100
raw["SM_COST_SCALED"] = raw.eval("SM_COST / 100")
raw["SM_TT_SCALED"] = raw["SM_TT"] / 100
raw["CAR_CO_SCALED"] = raw["CAR_CO"] / 100
raw["CAR_TT_SCALED"] = raw["CAR_TT"] / 100
raw["CAR_AV_SP"] = raw.eval("CAR_AV * (SP!=0)")
raw["TRAIN_AV_SP"] = raw.eval("TRAIN_AV * (SP!=0)")
raw
GROUP | SURVEY | SP | ID | PURPOSE | FIRST | TICKET | WHO | LUGGAGE | AGE | MALE | INCOME | GA | ORIGIN | DEST | TRAIN_AV | CAR_AV | SM_AV | TRAIN_TT | TRAIN_CO | TRAIN_HE | SM_TT | SM_CO | SM_HE | SM_SEATS | CAR_TT | CAR_CO | CHOICE | SM_COST | TRAIN_COST | TRAIN_COST_SCALED | TRAIN_TT_SCALED | SM_COST_SCALED | SM_TT_SCALED | CAR_CO_SCALED | CAR_TT_SCALED | CAR_AV_SP | TRAIN_AV_SP | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 3 | 0 | 2 | 0 | 2 | 1 | 1 | 1 | 1 | 112 | 48 | 120 | 63 | 52 | 20 | 0 | 117 | 65 | 2 | 52 | 48 | 0.48 | 1.12 | 0.52 | 0.63 | 0.65 | 1.17 | 1 | 1 |
1 | 2 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 3 | 0 | 2 | 0 | 2 | 1 | 1 | 1 | 1 | 103 | 48 | 30 | 60 | 49 | 10 | 0 | 117 | 84 | 2 | 49 | 48 | 0.48 | 1.03 | 0.49 | 0.60 | 0.84 | 1.17 | 1 | 1 |
2 | 2 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 3 | 0 | 2 | 0 | 2 | 1 | 1 | 1 | 1 | 130 | 48 | 60 | 67 | 58 | 30 | 0 | 117 | 52 | 2 | 58 | 48 | 0.48 | 1.30 | 0.58 | 0.67 | 0.52 | 1.17 | 1 | 1 |
3 | 2 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 3 | 0 | 2 | 0 | 2 | 1 | 1 | 1 | 1 | 103 | 40 | 30 | 63 | 52 | 20 | 0 | 72 | 52 | 2 | 52 | 40 | 0.40 | 1.03 | 0.52 | 0.63 | 0.52 | 0.72 | 1 | 1 |
4 | 2 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 3 | 0 | 2 | 0 | 2 | 1 | 1 | 1 | 1 | 130 | 36 | 60 | 63 | 42 | 20 | 0 | 90 | 84 | 2 | 42 | 36 | 0.36 | 1.30 | 0.42 | 0.63 | 0.84 | 0.90 | 1 | 1 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
10723 | 3 | 1 | 1 | 1192 | 4 | 1 | 7 | 1 | 0 | 5 | 1 | 3 | 0 | 2 | 20 | 1 | 1 | 1 | 148 | 13 | 30 | 93 | 17 | 30 | 0 | 156 | 56 | 2 | 17 | 13 | 0.13 | 1.48 | 0.17 | 0.93 | 0.56 | 1.56 | 1 | 1 |
10724 | 3 | 1 | 1 | 1192 | 4 | 1 | 7 | 1 | 0 | 5 | 1 | 3 | 0 | 2 | 20 | 1 | 1 | 1 | 148 | 12 | 30 | 96 | 16 | 10 | 0 | 96 | 70 | 3 | 16 | 12 | 0.12 | 1.48 | 0.16 | 0.96 | 0.70 | 0.96 | 1 | 1 |
10725 | 3 | 1 | 1 | 1192 | 4 | 1 | 7 | 1 | 0 | 5 | 1 | 3 | 0 | 2 | 20 | 1 | 1 | 1 | 148 | 16 | 60 | 93 | 16 | 20 | 0 | 96 | 56 | 3 | 16 | 16 | 0.16 | 1.48 | 0.16 | 0.93 | 0.56 | 0.96 | 1 | 1 |
10726 | 3 | 1 | 1 | 1192 | 4 | 1 | 7 | 1 | 0 | 5 | 1 | 3 | 0 | 2 | 20 | 1 | 1 | 1 | 178 | 16 | 30 | 96 | 17 | 30 | 0 | 96 | 91 | 2 | 17 | 16 | 0.16 | 1.78 | 0.17 | 0.96 | 0.91 | 0.96 | 1 | 1 |
10727 | 3 | 1 | 1 | 1192 | 4 | 1 | 7 | 1 | 0 | 5 | 1 | 3 | 0 | 2 | 20 | 1 | 1 | 1 | 148 | 13 | 60 | 96 | 21 | 30 | 0 | 120 | 70 | 3 | 21 | 13 | 0.13 | 1.48 | 0.21 | 0.96 | 0.70 | 1.20 | 1 | 1 |
10728 rows × 38 columns
We now have a pandas DataFrame in “wide” or idco format, with one row per choice observation. For this example, it is relevant to note there is more than one row per respondent, and the various respondents can be identified by the “ID” column.
We can convert this to a Larch Dataset, filtering to include only the cases of interest, following the same filter applied in the Biogeme example.
data = lx.Dataset.construct.from_idco(raw).dc.query_cases(
"PURPOSE in (1,3) and CHOICE != 0"
)
The result is a Dataset with 6,768 cases.
data
<xarray.Dataset> Size: 2MB Dimensions: (index: 6768) Coordinates: * index (index) int64 54kB 0 1 2 3 4 ... 8446 8447 8448 8449 8450 Data variables: (12/38) GROUP (index) int64 54kB 2 2 2 2 2 2 2 2 2 ... 3 3 3 3 3 3 3 3 SURVEY (index) int64 54kB 0 0 0 0 0 0 0 0 0 ... 1 1 1 1 1 1 1 1 SP (index) int64 54kB 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 ID (index) int64 54kB 1 1 1 1 1 1 ... 939 939 939 939 939 PURPOSE (index) int64 54kB 1 1 1 1 1 1 1 1 1 ... 3 3 3 3 3 3 3 3 FIRST (index) int64 54kB 0 0 0 0 0 0 0 0 0 ... 1 1 1 1 1 1 1 1 ... ... SM_COST_SCALED (index) float64 54kB 0.52 0.49 0.58 ... 0.16 0.17 0.21 SM_TT_SCALED (index) float64 54kB 0.63 0.6 0.67 0.63 ... 0.5 0.53 0.53 CAR_CO_SCALED (index) float64 54kB 0.65 0.84 0.52 ... 0.64 1.04 0.8 CAR_TT_SCALED (index) float64 54kB 1.17 1.17 1.17 0.72 ... 0.8 0.8 1.0 CAR_AV_SP (index) int64 54kB 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 TRAIN_AV_SP (index) int64 54kB 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 Attributes: _caseid_: index
For convenience, we can collect the names of the variables that define alternative availability into a dictionary, keyed on the codes we will use to represent each alternative.
availability_co_vars = {
1: "TRAIN_AV_SP",
2: "SM_AV",
3: "CAR_AV_SP",
}
Then we can contruct choice models for each of the classes. This is done in the usual manner for Larch choice models, assigning utility functions, alternative availability conditions, and choice markers in the usual manner.
In this example, Class 1 chooses based on cost, and the utility functions include a set of alternative specific constants.
m1 = lx.Model(title="Model1")
m1.availability_co_vars = availability_co_vars
m1.choice_co_code = "CHOICE"
m1.utility_co[1] = P("ASC_TRAIN") + X("TRAIN_COST_SCALED") * P("B_COST")
m1.utility_co[2] = X("SM_COST_SCALED") * P("B_COST")
m1.utility_co[3] = P("ASC_CAR") + X("CAR_CO_SCALED") * P("B_COST")
Class 2 is the nearly same model, defined completely seperately but in a similar manner. The only difference is in the utility functions, which adds travel time to the utility.
m2 = lx.Model(title="Model2")
m2.availability_co_vars = availability_co_vars
m2.choice_co_code = "CHOICE"
m2.utility_co[1] = (
P("ASC_TRAIN")
+ X("TRAIN_TT_SCALED") * P("B_TIME")
+ X("TRAIN_COST_SCALED") * P("B_COST")
)
m2.utility_co[2] = X("SM_TT_SCALED") * P("B_TIME") + X("SM_COST_SCALED") * P("B_COST")
m2.utility_co[3] = (
P("ASC_CAR") + X("CAR_TT_SCALED") * P("B_TIME") + X("CAR_CO_SCALED") * P("B_COST")
)
For this example, we will also add a third choice class.
Class 3 represents a lexicographic or non-compensatory choice system, where
the choice of mode is based exclusively on cost, without regard to any other
choice attribute. When two modes have the same cost, they are considered to be
equally likely. This lexicographic is still represented in the same mathematical
manner as the other classes that employ typical multinomial logit models, except
we will constrain the Z_COST
parameter to enforce the desired choice rule for this
class.
The use of this kind of choice model in a latent class structure can be challenging with cross-sectional data, as it is very easy to create over-specified models, i.e. it is hard to tell from one choice observation whether a decision maker is choosing based on a simple choice rule or considering trade-offs between various attributes. In a panel data study as shown in this example, it is much easier to mathematically disambiguate between compensatory and non-compensatory choice processes.
m3 = lx.Model(title="Model3")
m3.availability_co_vars = availability_co_vars
m3.choice_co_code = "CHOICE"
m3.utility_co[1] = X("TRAIN_COST_SCALED") * P("Z_COST")
m3.utility_co[2] = X("SM_COST_SCALED") * P("Z_COST")
m3.utility_co[3] = X("CAR_CO_SCALED") * P("Z_COST")
The class membership model for this latent class model is relatively simple, just constants for each class. One class (the first one, which we label as model 101) is a reference point, and the other classes are evaluated as more or less likely than the reference, similar to a typical MNL model.
mk = lx.Model()
mk.utility_co[102] = P("W_OTHER")
mk.utility_co[103] = P("W_COST")
Finally, we assembled all the component models into one LatentClass structure. We add the groupid
argument to indicate how our panel data is structured.
b = lx.LatentClass(
mk,
{101: m1, 102: m2, 103: m3},
datatree=data.dc.set_altids([1, 2, 3]),
groupid="ID",
)
After creating the LatentClass
structure, we can add contraints on parameters, such as that
needed for the lexicographic choice class.
b.lock_value(Z_COST=-10000)
b.set_cap(25)
result = b.maximize_loglike(method="slsqp")
Iteration 020 [Optimization terminated successfully]
Best LL = -4474.4794921875
value | initvalue | nullvalue | minimum | maximum | holdfast | best | |
---|---|---|---|---|---|---|---|
param_name | |||||||
ASC_CAR | 0.061 | 0.0 | 0.0 | -25.0 | 25.0 | 0 | 0.061 |
ASC_TRAIN | -0.936 | 0.0 | 0.0 | -25.0 | 25.0 | 0 | -0.936 |
B_COST | -1.159 | 0.0 | 0.0 | -25.0 | 25.0 | 0 | -1.159 |
B_TIME | -3.093 | 0.0 | 0.0 | -25.0 | 25.0 | 0 | -3.093 |
W_COST | -0.773 | 0.0 | 0.0 | -25.0 | 25.0 | 0 | -0.773 |
W_OTHER | 1.156 | 0.0 | 0.0 | -25.0 | 25.0 | 0 | 1.156 |
Z_COST | -10000.000 | -10000.0 | 0.0 | -10000.0 | -10000.0 | 1 | -10000.000 |
b.calculate_parameter_covariance();
b.parameter_summary()
Value | Std Err | t Stat | Signif | Null Value | |
---|---|---|---|---|---|
Parameter | |||||
ASC_CAR | 0.0605 | 0.0482 | 1.26 | 0.00 | |
ASC_TRAIN | -0.936 | 0.0698 | -13.42 | *** | 0.00 |
B_COST | -1.16 | 0.0695 | -16.67 | *** | 0.00 |
B_TIME | -3.09 | 0.106 | -29.12 | *** | 0.00 |
W_COST | -0.773 | 0.161 | -4.80 | *** | 0.00 |
W_OTHER | 1.16 | 0.119 | 9.68 | *** | 0.00 |
Z_COST | -1.00e+04 | 0.00 | NA | 0.00 |