101: Swissmetro MNL Mode Choice#
import larch as lx
This example is a mode choice model built using the Swissmetro example dataset. First we can create a Model object:
m = lx.Model()
We can attach a title to the model. The title does not affect the calculations as all; it is merely used in various output report styles.
m.title = "swissmetro example 01 (simple logit)"
We need to identify the availability and choice variables.
The Swissmetro dataset, as with all Biogeme data, is only
in co
format, so we must define alternative
availability as an expression for each alternative, using a
dictionary to map alternative codes and expressions.
m.availability_co_vars = {
1: "TRAIN_AV * (SP!=0)",
2: "SM_AV",
3: "CAR_AV * (SP!=0)",
}
In the Swissmetro example dataset, as in many discrete choice modeling applications, there is one and only one chosen alternative for each case, so the choices can be described as a single expression that evaluates to the code of the chosen alternative.
m.choice_co_code = "CHOICE"
We will also write utility functions for each alternative.
Since the data is only in co
format, we must use only the
utility_co
form for the utility functions.
from larch import P, X
m.utility_co[1] = P("ASC_TRAIN")
m.utility_co[2] = 0
m.utility_co[3] = P("ASC_CAR")
m.utility_co[1] += X("TRAIN_TT") * P("B_TIME")
m.utility_co[2] += X("SM_TT") * P("B_TIME")
m.utility_co[3] += X("CAR_TT") * P("B_TIME")
m.utility_co[1] += X("TRAIN_CO*(GA==0)") * P("B_COST")
m.utility_co[2] += X("SM_CO*(GA==0)") * P("B_COST")
m.utility_co[3] += X("CAR_CO") * P("B_COST")
Larch will find all the parameters in the model, but we’d like to output them in a rational order. We can use the ordering method to do this:
m.ordering = [
(
"ASCs",
"ASC.*",
),
(
"LOS",
"B_.*",
),
]
Now we can prepare the data, which is available in the data warehouse that comes with Larch.
import pandas as pd
raw_data = pd.read_csv(lx.example_file("swissmetro.csv.gz")).rename_axis(index="CASEID")
raw_data.head()
GROUP | SURVEY | SP | ID | PURPOSE | FIRST | TICKET | WHO | LUGGAGE | AGE | MALE | INCOME | GA | ORIGIN | DEST | TRAIN_AV | CAR_AV | SM_AV | TRAIN_TT | TRAIN_CO | TRAIN_HE | SM_TT | SM_CO | SM_HE | SM_SEATS | CAR_TT | CAR_CO | CHOICE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CASEID | ||||||||||||||||||||||||||||
0 | 2 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 3 | 0 | 2 | 0 | 2 | 1 | 1 | 1 | 1 | 112 | 48 | 120 | 63 | 52 | 20 | 0 | 117 | 65 | 2 |
1 | 2 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 3 | 0 | 2 | 0 | 2 | 1 | 1 | 1 | 1 | 103 | 48 | 30 | 60 | 49 | 10 | 0 | 117 | 84 | 2 |
2 | 2 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 3 | 0 | 2 | 0 | 2 | 1 | 1 | 1 | 1 | 130 | 48 | 60 | 67 | 58 | 30 | 0 | 117 | 52 | 2 |
3 | 2 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 3 | 0 | 2 | 0 | 2 | 1 | 1 | 1 | 1 | 103 | 40 | 30 | 63 | 52 | 20 | 0 | 72 | 52 | 2 |
4 | 2 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 3 | 0 | 2 | 0 | 2 | 1 | 1 | 1 | 1 | 130 | 36 | 60 | 63 | 42 | 20 | 0 | 90 | 84 | 2 |
The swissmetro example models exclude some observations. We can use pandas to identify the observations we would like to keep.
keep = raw_data.eval("PURPOSE in (1,3) and CHOICE != 0")
selected_data = raw_data[keep]
When you’ve created the data you need, you can pass the dataframe to
the larch.DataFrames constructor. Since the swissmetro data is in
idco
format, we’ll need to explicitly identify the alternative
codes as well.
ds = lx.Dataset.construct.from_idco(selected_data, alts={1: "Train", 2: "SM", 3: "Car"})
ds
<xarray.Dataset> Size: 2MB Dimensions: (CASEID: 6768, _altid_: 3) Coordinates: * CASEID (CASEID) int64 54kB 0 1 2 3 4 5 ... 8445 8446 8447 8448 8449 8450 * _altid_ (_altid_) int64 24B 1 2 3 alt_names (_altid_) <U5 60B 'Train' 'SM' 'Car' Data variables: (12/28) GROUP (CASEID) int64 54kB 2 2 2 2 2 2 2 2 2 2 2 ... 3 3 3 3 3 3 3 3 3 3 SURVEY (CASEID) int64 54kB 0 0 0 0 0 0 0 0 0 0 0 ... 1 1 1 1 1 1 1 1 1 1 SP (CASEID) int64 54kB 1 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1 1 ID (CASEID) int64 54kB 1 1 1 1 1 1 1 ... 939 939 939 939 939 939 939 PURPOSE (CASEID) int64 54kB 1 1 1 1 1 1 1 1 1 1 1 ... 3 3 3 3 3 3 3 3 3 3 FIRST (CASEID) int64 54kB 0 0 0 0 0 0 0 0 0 0 0 ... 1 1 1 1 1 1 1 1 1 1 ... ... SM_CO (CASEID) int64 54kB 52 49 58 52 42 49 58 ... 21 21 17 16 16 17 21 SM_HE (CASEID) int64 54kB 20 10 30 20 20 10 10 ... 20 30 30 10 20 30 30 SM_SEATS (CASEID) int64 54kB 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 CAR_TT (CASEID) int64 54kB 117 117 117 72 90 90 ... 80 130 80 80 80 100 CAR_CO (CASEID) int64 54kB 65 84 52 52 84 52 65 ... 104 64 80 64 104 80 CHOICE (CASEID) int64 54kB 2 2 2 2 2 2 2 1 2 2 2 ... 2 1 1 1 1 1 1 1 1 1 Attributes: _caseid_: CASEID _altid_: _altid_
You might notice we have not carefully constructed this object to
include only the relevant data or the various simple transformations
used in the utility definition above. Larch can do this itself, if
you assign this DataFrames not as the actual set of data used in model
estimation, but rather as the dataservice
that can be used as the
source to create these computational arrays.
m.datatree = ds
We can estimate the models and check the results match up with those given by Biogeme:
m.set_cap(15)
result = m.maximize_loglike(method="SLSQP")
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Larch Model Dashboard ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ optimization complete │ │ Log Likelihood Current = -5331.251465 Best = -5331.251465 │ │ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓ │ │ ┃ Parameter ┃ Value ┃ Gradient ┃ Best ┃ │ │ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩ │ │ │ ASC_CAR │ -0.15477075 │ 0.050556183 │ -0.15477075 │ │ │ │ ASC_TRAIN │ -0.70136421 │ 0.061515808 │ -0.70136421 │ │ │ │ B_COST │ -0.010838824 │ 1.7817383 │ -0.010838824 │ │ │ │ B_TIME │ -0.0127777 │ 3.4399414 │ -0.0127777 │ │ │ └────────────────────────────────┴────────────────┴────────────────┴────────────────┘ │ └───────────────────────────────────────────────────────────────────────────────────────┘
m.calculate_parameter_covariance();
m.parameter_summary()
Value | Std Err | t Stat | Signif | Null Value | ||
---|---|---|---|---|---|---|
Category | Parameter | |||||
ASCs | ASC_CAR | -0.155 | 0.0432 | -3.58 | *** | 0.00 |
ASC_TRAIN | -0.701 | 0.0549 | -12.78 | *** | 0.00 | |
LOS | B_COST | -0.0108 | 0.000518 | -20.91 | *** | 0.00 |
B_TIME | -0.0128 | 0.000569 | -22.46 | *** | 0.00 |