102: Swissmetro Weighted MNL Mode Choice#

import pandas as pd

import larch as lx

This example is a mode choice model built using the Swissmetro example dataset. First we create the Dataset and Model objects:

raw_data = pd.read_csv(lx.example_file("swissmetro.csv.gz")).rename_axis(index="CASEID")
data = lx.Dataset.construct.from_idco(raw_data, alts={1: "Train", 2: "SM", 3: "Car"})
data
<xarray.Dataset> Size: 2MB
Dimensions:    (CASEID: 10728, _altid_: 3)
Coordinates:
  * CASEID     (CASEID) int64 86kB 0 1 2 3 4 5 ... 10723 10724 10725 10726 10727
  * _altid_    (_altid_) int64 24B 1 2 3
    alt_names  (_altid_) <U5 60B 'Train' 'SM' 'Car'
Data variables: (12/28)
    GROUP      (CASEID) int64 86kB 2 2 2 2 2 2 2 2 2 2 2 ... 3 3 3 3 3 3 3 3 3 3
    SURVEY     (CASEID) int64 86kB 0 0 0 0 0 0 0 0 0 0 0 ... 1 1 1 1 1 1 1 1 1 1
    SP         (CASEID) int64 86kB 1 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1 1
    ID         (CASEID) int64 86kB 1 1 1 1 1 1 ... 1192 1192 1192 1192 1192 1192
    PURPOSE    (CASEID) int64 86kB 1 1 1 1 1 1 1 1 1 1 1 ... 4 4 4 4 4 4 4 4 4 4
    FIRST      (CASEID) int64 86kB 0 0 0 0 0 0 0 0 0 0 0 ... 1 1 1 1 1 1 1 1 1 1
    ...         ...
    SM_CO      (CASEID) int64 86kB 52 49 58 52 42 49 58 ... 21 21 17 16 16 17 21
    SM_HE      (CASEID) int64 86kB 20 10 30 20 20 10 10 ... 20 30 30 10 20 30 30
    SM_SEATS   (CASEID) int64 86kB 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
    CAR_TT     (CASEID) int64 86kB 117 117 117 72 90 90 ... 96 156 96 96 96 120
    CAR_CO     (CASEID) int64 86kB 65 84 52 52 84 52 65 ... 70 91 56 70 56 91 70
    CHOICE     (CASEID) int64 86kB 2 2 2 2 2 2 2 1 2 2 2 ... 3 3 2 2 3 2 3 3 2 3
Attributes:
    _caseid_:  CASEID
    _altid_:   _altid_

The swissmetro example models exclude some observations. We can use the Dataset.query_cases method to identify the observations we would like to keep.

m = lx.Model(data.dc.query_cases("PURPOSE in (1,3) and CHOICE != 0"))

We can attach a title to the model. The title does not affect the calculations as all; it is merely used in various output report styles.

m.title = "swissmetro example 02 (weighted logit)"

We need to identify the availability and choice variables.

m.availability_co_vars = {
    1: "TRAIN_AV * (SP!=0)",
    2: "SM_AV",
    3: "CAR_AV * (SP!=0)",
}
m.choice_co_code = "CHOICE"

This model adds a weighting factor.

m.weight_co_var = "1.0*(GROUP==2)+1.2*(GROUP==3)"

The swissmetro dataset, as with all Biogeme data, is only in co format.

from larch import P, X

m.utility_co[1] = P("ASC_TRAIN")
m.utility_co[2] = 0
m.utility_co[3] = P("ASC_CAR")
m.utility_co[1] += X("TRAIN_TT") * P("B_TIME")
m.utility_co[2] += X("SM_TT") * P("B_TIME")
m.utility_co[3] += X("CAR_TT") * P("B_TIME")
m.utility_co[1] += X("TRAIN_CO*(GA==0)") * P("B_COST")
m.utility_co[2] += X("SM_CO*(GA==0)") * P("B_COST")
m.utility_co[3] += X("CAR_CO") * P("B_COST")

Larch will find all the parameters in the model, but we’d like to output them in a rational order. We can use the ordering method to do this:

m.ordering = [
    (
        "ASCs",
        "ASC.*",
    ),
    (
        "LOS",
        "B_.*",
    ),
]

We can estimate the models and check the results match up with those given by Biogeme:

m.set_cap(15)
result = m.maximize_loglike(method="SLSQP")
result
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Larch Model Dashboard                                                                 ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ optimization complete                                                                 │
│ Log Likelihood Current =      -5931.557617 Best =      -5931.557617                   │
│ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓ │
│ ┃ Parameter                       Value           Gradient        Best           ┃ │
│ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩ │
│ │ ASC_CAR                        │ -0.11434614    │  0.04642868    │ -0.11434615    │ │
│ │ ASC_TRAIN                      │ -0.75647611    │ -0.041870117   │ -0.75647611    │ │
│ │ B_COST                         │ -0.01119661    │ -0.58984375    │ -0.01119661    │ │
│ │ B_TIME                         │ -0.013215302   │ -0.74560547    │ -0.013215302   │ │
│ └────────────────────────────────┴────────────────┴────────────────┴────────────────┘ │
└───────────────────────────────────────────────────────────────────────────────────────┘
      message: Optimization terminated successfully
      success: True
       status: 0
            x: [-1.143e-01 -7.565e-01 -1.120e-02 -1.322e-02]
          nit: 10
          jac: [ 4.643e-02 -4.187e-02 -5.898e-01 -7.456e-01]
         nfev: 36
         njev: 10
      n_cases: 6768
 total_weight: 7612.199999999999
      logloss: 0.7792172587671765
      loglike: -5931.5576171875
m.calculate_parameter_covariance()
m.parameter_summary()
    Value Std Err t Stat Signif Null Value
Category Parameter          
ASCs ASC_CAR -0.114  0.0407 -2.81 ** 0.00
ASC_TRAIN -0.756  0.0528 -14.32 *** 0.00
LOS B_COST -0.0112  0.000490 -22.83 *** 0.00
B_TIME -0.0132  0.000537 -24.62 *** 0.00

Looks good!