102: Swissmetro Weighted MNL Mode Choice#
import pandas as pd
import larch as lx
This example is a mode choice model built using the Swissmetro example dataset. First we create the Dataset and Model objects:
raw_data = pd.read_csv(lx.example_file("swissmetro.csv.gz")).rename_axis(index="CASEID")
data = lx.Dataset.construct.from_idco(raw_data, alts={1: "Train", 2: "SM", 3: "Car"})
data
<xarray.Dataset> Size: 2MB Dimensions: (CASEID: 10728, _altid_: 3) Coordinates: * CASEID (CASEID) int64 86kB 0 1 2 3 4 5 ... 10723 10724 10725 10726 10727 * _altid_ (_altid_) int64 24B 1 2 3 alt_names (_altid_) <U5 60B 'Train' 'SM' 'Car' Data variables: (12/28) GROUP (CASEID) int64 86kB 2 2 2 2 2 2 2 2 2 2 2 ... 3 3 3 3 3 3 3 3 3 3 SURVEY (CASEID) int64 86kB 0 0 0 0 0 0 0 0 0 0 0 ... 1 1 1 1 1 1 1 1 1 1 SP (CASEID) int64 86kB 1 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1 1 ID (CASEID) int64 86kB 1 1 1 1 1 1 ... 1192 1192 1192 1192 1192 1192 PURPOSE (CASEID) int64 86kB 1 1 1 1 1 1 1 1 1 1 1 ... 4 4 4 4 4 4 4 4 4 4 FIRST (CASEID) int64 86kB 0 0 0 0 0 0 0 0 0 0 0 ... 1 1 1 1 1 1 1 1 1 1 ... ... SM_CO (CASEID) int64 86kB 52 49 58 52 42 49 58 ... 21 21 17 16 16 17 21 SM_HE (CASEID) int64 86kB 20 10 30 20 20 10 10 ... 20 30 30 10 20 30 30 SM_SEATS (CASEID) int64 86kB 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 CAR_TT (CASEID) int64 86kB 117 117 117 72 90 90 ... 96 156 96 96 96 120 CAR_CO (CASEID) int64 86kB 65 84 52 52 84 52 65 ... 70 91 56 70 56 91 70 CHOICE (CASEID) int64 86kB 2 2 2 2 2 2 2 1 2 2 2 ... 3 3 2 2 3 2 3 3 2 3 Attributes: _caseid_: CASEID _altid_: _altid_
The swissmetro example models exclude some observations. We can use the
Dataset.query_cases
method to identify the observations we would like to keep.
m = lx.Model(data.dc.query_cases("PURPOSE in (1,3) and CHOICE != 0"))
We can attach a title to the model. The title does not affect the calculations as all; it is merely used in various output report styles.
m.title = "swissmetro example 02 (weighted logit)"
We need to identify the availability and choice variables.
m.availability_co_vars = {
1: "TRAIN_AV * (SP!=0)",
2: "SM_AV",
3: "CAR_AV * (SP!=0)",
}
m.choice_co_code = "CHOICE"
This model adds a weighting factor.
m.weight_co_var = "1.0*(GROUP==2)+1.2*(GROUP==3)"
The swissmetro dataset, as with all Biogeme data, is only in co
format.
from larch import P, X
m.utility_co[1] = P("ASC_TRAIN")
m.utility_co[2] = 0
m.utility_co[3] = P("ASC_CAR")
m.utility_co[1] += X("TRAIN_TT") * P("B_TIME")
m.utility_co[2] += X("SM_TT") * P("B_TIME")
m.utility_co[3] += X("CAR_TT") * P("B_TIME")
m.utility_co[1] += X("TRAIN_CO*(GA==0)") * P("B_COST")
m.utility_co[2] += X("SM_CO*(GA==0)") * P("B_COST")
m.utility_co[3] += X("CAR_CO") * P("B_COST")
Larch will find all the parameters in the model, but we’d like to output them in a rational order. We can use the ordering method to do this:
m.ordering = [
(
"ASCs",
"ASC.*",
),
(
"LOS",
"B_.*",
),
]
We can estimate the models and check the results match up with those given by Biogeme:
m.set_cap(15)
result = m.maximize_loglike(method="SLSQP")
result
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Larch Model Dashboard ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ optimization complete │ │ Log Likelihood Current = -5931.557617 Best = -5931.557617 │ │ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓ │ │ ┃ Parameter ┃ Value ┃ Gradient ┃ Best ┃ │ │ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩ │ │ │ ASC_CAR │ -0.11434614 │ 0.04642868 │ -0.11434615 │ │ │ │ ASC_TRAIN │ -0.75647611 │ -0.041870117 │ -0.75647611 │ │ │ │ B_COST │ -0.01119661 │ -0.58984375 │ -0.01119661 │ │ │ │ B_TIME │ -0.013215302 │ -0.74560547 │ -0.013215302 │ │ │ └────────────────────────────────┴────────────────┴────────────────┴────────────────┘ │ └───────────────────────────────────────────────────────────────────────────────────────┘
message: Optimization terminated successfully
success: True
status: 0
x: [-1.143e-01 -7.565e-01 -1.120e-02 -1.322e-02]
nit: 10
jac: [ 4.643e-02 -4.187e-02 -5.898e-01 -7.456e-01]
nfev: 36
njev: 10
n_cases: 6768
total_weight: 7612.199999999999
logloss: 0.7792172587671765
loglike: -5931.5576171875
m.calculate_parameter_covariance()
m.parameter_summary()
Value | Std Err | t Stat | Signif | Null Value | ||
---|---|---|---|---|---|---|
Category | Parameter | |||||
ASCs | ASC_CAR | -0.114 | 0.0407 | -2.81 | ** | 0.00 |
ASC_TRAIN | -0.756 | 0.0528 | -14.32 | *** | 0.00 | |
LOS | B_COST | -0.0112 | 0.000490 | -22.83 | *** | 0.00 |
B_TIME | -0.0132 | 0.000537 | -24.62 | *** | 0.00 |
Looks good!