7: Diminishing OVTT by Distance#

The primary shortcoming of the specification in Model 6 is that the estimated value of IVT is unrealistically small. At least two alternatives can be considered for getting an improved estimate of the value of out-of-vehicle time. One is to assume that the sensitivity of travelers to OVT diminishes with the trip distance. The idea behind this is that travelers are more willing to tolerate higher out-of-vehicle time for a long trip rather than for a short trip.

Model 7 ensures this result by including total travel time (the sum of in-vehicle and out-of-vehicle time) and out-of-vehicle time divided by distance in place of in- and out-of-vehicle travel time. (pp. 114)

import larch
larch.__version__
'6.0.37.dev32+gdab7641'

This example is a mode choice model built using the MTC example dataset. First we create the DB and Model objects:

d = larch.examples.MTC(format="dataset")
d
<xarray.Dataset> Size: 2MB
Dimensions:    (caseid: 5029, altid: 6)
Coordinates:
  * caseid     (caseid) int64 40kB 1 2 3 4 5 6 ... 5024 5025 5026 5027 5028 5029
  * altid      (altid) int64 48B 1 2 3 4 5 6
    alt_names  (altid) <U7 168B 'DA' 'SR2' 'SR3+' 'Transit' 'Bike' 'Walk'
Data variables: (12/38)
    chose      (caseid, altid) float32 121kB 1.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0
    ivtt       (caseid, altid) float64 241kB 13.38 18.38 20.38 ... 1.59 6.55 0.0
    ovtt       (caseid, altid) float64 241kB 2.0 2.0 2.0 15.2 ... 16.0 4.5 0.0
    tottime    (caseid, altid) float64 241kB 15.38 20.38 22.38 ... 11.05 19.1
    totcost    (caseid, altid) float64 241kB 70.63 35.32 20.18 ... 75.0 0.0 0.0
    hhid       (caseid) int64 40kB 2 3 5 6 8 8 ... 9429 9430 9433 9434 9436 9438
    ...         ...
    corredis   (caseid) int64 40kB 0 1 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
    vehbywrk   (caseid) float64 40kB 4.0 1.0 0.33 1.0 0.0 ... 2.0 2.0 3.0 3.0
    vocc       (caseid) int64 40kB 1 0 1 0 2 0 1 1 1 1 0 ... 1 2 1 1 0 1 2 1 1 1
    wgt        (caseid) int64 40kB 1 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1 1
    _avail_    (caseid, altid) int8 30kB 1 1 1 1 1 0 1 1 1 ... 1 0 1 1 1 1 1 1 1
    avail      (caseid, altid) int8 30kB 1 1 1 1 1 0 1 1 1 ... 1 0 1 1 1 1 1 1 1
Attributes:
    _caseid_:  caseid
    _altid_:   altid
m = larch.Model(d, compute_engine="numba")

Then we can build up the utility function. We’ll use some :ref:idco data first, using the Model.utility.co attribute. This attribute is a dict-like object, to which we can assign :class:LinearFunction objects for each alternative code.

from larch import PX, P, X

m.utility_co[2] = P("ASC_SR2") + P("hhinc#2,3") * X("hhinc")
m.utility_co[3] = P("ASC_SR3P") + P("hhinc#2,3") * X("hhinc")
m.utility_co[4] = P("ASC_TRAN") + P("hhinc#4") * X("hhinc")
m.utility_co[5] = P("ASC_BIKE") + P("hhinc#5") * X("hhinc")
m.utility_co[6] = P("ASC_WALK") + P("hhinc#6") * X("hhinc")

Next we’ll use some idca data, with the utility_ca attribute. This attribute is only a single :class:LinearFunction that is applied across all alternatives using :ref:idca data. Because the data is structured to vary across alternatives, the parameters (and thus the structure of the :class:LinearFunction) does not need to vary across alternatives.

We diminish the effect of ovtt by dividing its parameter by distance.

m.utility_ca = (
    +PX("totcost")
    + P("motorized_time") * X("(altid <= 4) * tottime")
    + P("nonmotorized_time") * X("(altid > 4) * tottime")
    + P("motorized_ovtbydist") * X("(altid <= 4) * ovtt/dist")
)

Lastly, we need to identify idca Format data that gives the availability for each alternative, as well as the number of times each alternative is chosen. (In traditional discrete choice analysis, this is often 0 or 1, but it need not be binary, or even integral.)

m.availability_ca_var = "avail"
m.choice_ca_var = "chose"

And let’s give our model a descriptive title.

m.title = "MTC Example 7, Diminishing OVTT by Distance"

We can view a summary of the choices and alternative availabilities to make sure the model is set up correctly.

m.choice_avail_summary()
name chosen available
1 DA 3637 4755
2 SR2 517 5029
3 SR3+ 161 5029
4 Transit 498 4003
5 Bike 50 1738
6 Walk 166 1479
< Total All Alternatives > 5029 <NA>

We’ll set a parameter cap (bound) at +/- 20, which helps improve the numerical stability of the optimization algorithm used in estimation.

m.set_cap(20)

Having created this model, we can then estimate it:

assert m.compute_engine == "numba"
result = m.maximize_loglike(stderr=True)
m.calculate_parameter_covariance()
m.loglike()

Iteration 051 [Optimization terminated successfully]

Best LL = -3547.3458055419787

value best initvalue minimum maximum nullvalue holdfast
param_name
ASC_BIKE -2.675340 -2.675340 0.0 -20.0 20.0 0.0 0
ASC_SR2 -2.187171 -2.187171 0.0 -20.0 20.0 0.0 0
ASC_SR3P -3.517048 -3.517048 0.0 -20.0 20.0 0.0 0
ASC_TRAN -0.039612 -0.039612 0.0 -20.0 20.0 0.0 0
ASC_WALK -1.020608 -1.020608 0.0 -20.0 20.0 0.0 0
hhinc#2,3 -0.001384 -0.001384 0.0 -20.0 20.0 0.0 0
hhinc#4 -0.007258 -0.007258 0.0 -20.0 20.0 0.0 0
hhinc#5 -0.011964 -0.011964 0.0 -20.0 20.0 0.0 0
hhinc#6 -0.008182 -0.008182 0.0 -20.0 20.0 0.0 0
motorized_ovtbydist -0.181457 -0.181457 0.0 -20.0 20.0 0.0 0
motorized_time -0.041565 -0.041565 0.0 -20.0 20.0 0.0 0
nonmotorized_time -0.047557 -0.047557 0.0 -20.0 20.0 0.0 0
totcost -0.004114 -0.004114 0.0 -20.0 20.0 0.0 0
np.float64(-3547.3458055419787)
m.parameter_summary()
  Value Std Err t Stat Signif Null Value
Parameter          
ASC_BIKE -2.68  0.333 -8.03 *** 0.00
ASC_SR2 -2.19  0.0976 -22.41 *** 0.00
ASC_SR3P -3.52  0.123 -28.61 *** 0.00
ASC_TRAN -0.0396  0.160 -0.25 0.00
ASC_WALK -1.02  0.292 -3.50 *** 0.00
hhinc#2,3 -0.00138  0.00138 -1.00 0.00
hhinc#4 -0.00726  0.00191 -3.79 *** 0.00
hhinc#5 -0.0120  0.00524 -2.28 * 0.00
hhinc#6 -0.00818  0.00318 -2.57 * 0.00
motorized_ovtbydist -0.181  0.0179 -10.15 *** 0.00
motorized_time -0.0416  0.00352 -11.82 *** 0.00
nonmotorized_time -0.0476  0.00550 -8.65 *** 0.00
totcost -0.00411  0.000239 -17.21 *** 0.00

It is a little tough to read this report because the parameters show up in alphabetical order. We can use the reorder method to fix this and group them systematically:

m.ordering = (
    (
        "LOS",
        ".*cost.*",
        ".*time.*",
        ".*dist.*",
    ),
    (
        "Income",
        "hhinc.*",
    ),
    (
        "ASCs",
        "ASC.*",
    ),
)
m.parameter_summary()
    Value Std Err t Stat Signif Null Value
Category Parameter          
LOS totcost -0.00411  0.000239 -17.21 *** 0.00
motorized_time -0.0416  0.00352 -11.82 *** 0.00
nonmotorized_time -0.0476  0.00550 -8.65 *** 0.00
motorized_ovtbydist -0.181  0.0179 -10.15 *** 0.00
Income hhinc#2,3 -0.00138  0.00138 -1.00 0.00
hhinc#4 -0.00726  0.00191 -3.79 *** 0.00
hhinc#5 -0.0120  0.00524 -2.28 * 0.00
hhinc#6 -0.00818  0.00318 -2.57 * 0.00
ASCs ASC_BIKE -2.68  0.333 -8.03 *** 0.00
ASC_SR2 -2.19  0.0976 -22.41 *** 0.00
ASC_SR3P -3.52  0.123 -28.61 *** 0.00
ASC_TRAN -0.0396  0.160 -0.25 0.00
ASC_WALK -1.02  0.292 -3.50 *** 0.00

Finally, let’s print model statistics. Note that if you want LL at constants you need to run a separate model.

m.estimation_statistics()
StatisticAggregatePer Case
Number of Cases5029
Log Likelihood at Convergence-3547.35-0.71
Log Likelihood at Null Parameters-7309.60-1.45
Rho Squared w.r.t. Null Parameters0.515