Model Analysis#

Larch includes several tools to support analysis of estimated model results. We will demonstrate some of these features here using the simple example model for MTC mode choice:

import larch as lx

m = lx.example(1)
m.estimate(quiet=True)
m.parameter_summary()
  Value Std Err t Stat Signif Null Value
Parameter          
ASC_BIKE -2.38  0.305 -7.80 *** 0.00
ASC_SR2 -2.18  0.105 -20.81 *** 0.00
ASC_SR3P -3.72  0.178 -20.96 *** 0.00
ASC_TRAN -0.671  0.133 -5.06 *** 0.00
ASC_WALK -0.207  0.194 -1.07 0.00
hhinc#2 -0.00217  0.00155 -1.40 0.00
hhinc#3  0.000352  0.00254  0.14 0.00
hhinc#4 -0.00529  0.00183 -2.89 ** 0.00
hhinc#5 -0.0128  0.00532 -2.41 * 0.00
hhinc#6 -0.00968  0.00303 -3.19 ** 0.00
totcost -0.00492  0.000239 -20.60 *** 0.00
tottime -0.0513  0.00310 -16.56 *** 0.00

Choice and Availability Summary#

A simple aggregate analysis of the data’s choice and availablity statistics is available via choice_avail_summary even without any model structure or parameters, as this summary can be constructed from the data alone.

m.choice_avail_summary()
name chosen available
1 DA 3637 4755
2 SR2 517 5029
3 SR3+ 161 5029
4 Transit 498 4003
5 Bike 50 1738
6 Walk 166 1479
< Total All Alternatives > 5029 <NA>

Analyze Predictions#

Larch includes methods to analyze model predictions across various dimensions. The analyze_predictions_co method can be used to examine how well the model predicts choices against any available (or computable) attibute of the chooser. For example, consider the basic example model for MTC mode choice. This model includes a utility function that incorporates alternative specific constants, level of service variables (time and cost), as well as alternative-specific parameters to account for income.

We may be interested in knowing how well the model predicts choices across various age levels. To see this, we can pass the “age” variable to analyze_predictions_co:

m.analyze_predictions_co("age")
/opt/hostedtoolcache/Python/3.10.19/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2958: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  pr.groupby(slicer).sum().stack().rename("mean-predicted"),
/opt/hostedtoolcache/Python/3.10.19/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2959: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  np.sqrt(pr_v.groupby(slicer).sum()).stack().rename("stdev-predicted"),
/opt/hostedtoolcache/Python/3.10.19/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2960: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  obs.groupby(slicer).sum().stack().rename("observed"),
/opt/hostedtoolcache/Python/3.10.19/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2980: FutureWarning: Styler.applymap has been deprecated. Use Styler.map instead.
  result.style.text_gradient(
Model Predictions by age
    mean-predicted stdev-predicted observed signif
age alt_name        
(17.999, 29.0] DA 805.213 12.578 753.000 0.000
SR2 116.586 9.865 158.000 0.000
SR3+ 33.167 5.577 39.000 0.296
Transit 118.049 8.748 106.000 0.168
Bike 13.169 3.509 26.000 0.000
Walk 46.816 5.863 51.000 0.475
(29.0, 35.0] DA 707.973 12.035 717.000 0.453
SR2 105.193 9.398 107.000 0.848
SR3+ 34.192 5.602 32.000 0.696
Transit 106.630 8.117 104.000 0.746
Bike 11.384 3.260 11.000 0.906
Walk 28.629 4.724 23.000 0.233
(35.0, 41.0] DA 683.659 11.876 681.000 0.823
SR2 96.580 9.047 91.000 0.537
SR3+ 31.445 5.404 39.000 0.162
Transit 92.564 7.745 102.000 0.223
Bike 8.540 2.862 3.000 0.053
Walk 27.212 4.671 24.000 0.492
(41.0, 50.0] DA 827.617 12.767 853.000 0.047
SR2 112.626 9.844 88.000 0.012
SR3+ 35.374 5.726 35.000 0.948
Transit 102.940 8.147 107.000 0.618
Bike 10.472 3.161 7.000 0.272
Walk 33.972 5.211 33.000 0.852
(50.0, 83.0] DA 612.531 10.918 633.000 0.061
SR2 86.039 8.497 73.000 0.125
SR3+ 26.800 4.991 16.000 0.030
Transit 77.828 7.175 79.000 0.870
Bike 6.437 2.462 3.000 0.163
Walk 29.365 4.776 35.000 0.238

This gives us a table of mode choices, segregated into five age categories. These categories were selected by the pandas.qcut functions to roughly divide the sample of observations into quintiles. We can see the mean and standard deviation of the model predictions for each mode choice in each age group, as well as the actual observed count of choices in each group. The signif column gives the level of significance of the difference between the predicted totals and the observed totals. A small number in this column indicates that, assuming the model is correct, is would be very unlikely to actually collect the observed data. We can see the very small significance values in the lowest age group are bold and highlighted in red, as these very small numbers are suggesting there is a problem in our model.

We can also generate a figure to present this same information in a more visual representation, using analyze_predictions_co_figure.

m.analyze_predictions_co_figure("age")
/opt/hostedtoolcache/Python/3.10.19/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2958: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  pr.groupby(slicer).sum().stack().rename("mean-predicted"),
/opt/hostedtoolcache/Python/3.10.19/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2959: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  np.sqrt(pr_v.groupby(slicer).sum()).stack().rename("stdev-predicted"),
/opt/hostedtoolcache/Python/3.10.19/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2960: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  obs.groupby(slicer).sum().stack().rename("observed"),
/opt/hostedtoolcache/Python/3.10.19/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2980: FutureWarning: Styler.applymap has been deprecated. Use Styler.map instead.
  result.style.text_gradient(

If we prefer to analyze age using specific categorical breakpoints, we can do so by providing the preferred explicit bin breakpoints to the analyze_predictions_co method, which are then used by pandas.cut to categorize the data.

m.analyze_predictions_co("age", bins=[0, 25, 45, 65, 99])
/opt/hostedtoolcache/Python/3.10.19/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2958: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  pr.groupby(slicer).sum().stack().rename("mean-predicted"),
/opt/hostedtoolcache/Python/3.10.19/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2959: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  np.sqrt(pr_v.groupby(slicer).sum()).stack().rename("stdev-predicted"),
/opt/hostedtoolcache/Python/3.10.19/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2960: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  obs.groupby(slicer).sum().stack().rename("observed"),
/opt/hostedtoolcache/Python/3.10.19/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2980: FutureWarning: Styler.applymap has been deprecated. Use Styler.map instead.
  result.style.text_gradient(
Model Predictions by age
    mean-predicted stdev-predicted observed signif
age alt_name        
(0, 25] DA 418.199 9.070 388.000 0.001
SR2 63.365 7.247 89.000 0.000
SR3+ 16.805 4.005 26.000 0.022
Transit 68.247 6.581 56.000 0.063
Bike 8.281 2.763 12.000 0.178
Walk 28.104 4.602 32.000 0.397
(25, 45] DA 2170.558 20.973 2166.000 0.828
SR2 310.701 16.200 304.000 0.679
SR3+ 100.290 9.610 102.000 0.859
Transit 304.319 13.869 322.000 0.202
Bike 30.105 5.342 32.000 0.723
Walk 92.027 8.427 82.000 0.234
(45, 65] DA 1006.937 13.983 1046.000 0.005
SR2 137.491 10.809 118.000 0.071
SR3+ 42.324 6.286 32.000 0.101
Transit 118.311 8.950 113.000 0.553
Bike 11.205 3.263 6.000 0.111
Walk 40.732 5.715 42.000 0.824
(65, 99] DA 41.299 2.935 37.000 0.143
SR2 5.467 2.187 6.000 0.808
SR3+ 1.559 1.221 1.000 0.647
Transit 7.134 2.121 7.000 0.950
Bike 0.411 0.613 0.000 0.503
Walk 5.130 1.905 10.000 0.011

We can also apply non-uniform weights to the observations, by passing an expression to the wgt argument of the analyze_predictions_co method. For example, here we overweight persons who work in the core CBD:

m.analyze_predictions_co("age", wgt="1.5 if wkccbd else 1.0")
/opt/hostedtoolcache/Python/3.10.19/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2958: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  pr.groupby(slicer).sum().stack().rename("mean-predicted"),
/opt/hostedtoolcache/Python/3.10.19/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2959: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  np.sqrt(pr_v.groupby(slicer).sum()).stack().rename("stdev-predicted"),
/opt/hostedtoolcache/Python/3.10.19/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2960: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  obs.groupby(slicer).sum().stack().rename("observed"),
/opt/hostedtoolcache/Python/3.10.19/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2980: FutureWarning: Styler.applymap has been deprecated. Use Styler.map instead.
  result.style.text_gradient(
Model Predictions by age
    mean-predicted stdev-predicted observed signif
age alt_name        
(17.999, 29.0] DA 825.919 12.946 767.000 0.000
SR2 123.978 10.166 165.000 0.000
SR3+ 36.884 5.859 41.500 0.431
Transit 144.729 9.410 141.500 0.732
Bike 14.399 3.664 27.500 0.000
Walk 51.590 6.117 55.000 0.577
(29.0, 35.0] DA 731.622 12.529 736.500 0.697
SR2 114.122 9.780 116.000 0.848
SR3+ 39.359 5.979 39.000 0.952
Transit 136.488 8.910 138.500 0.821
Bike 13.021 3.477 11.000 0.561
Walk 31.388 4.924 25.000 0.195
(35.0, 41.0] DA 707.273 12.336 697.500 0.428
SR2 104.747 9.413 99.000 0.542
SR3+ 36.251 5.776 45.000 0.130
Transit 116.849 8.433 132.500 0.063
Bike 9.025 2.940 3.000 0.040
Walk 29.354 4.820 26.500 0.554
(41.0, 50.0] DA 844.727 13.094 863.000 0.163
SR2 120.621 10.158 95.000 0.012
SR3+ 39.849 6.041 40.000 0.980
Transit 127.050 8.745 139.500 0.155
Bike 11.218 3.267 7.500 0.255
Walk 36.534 5.373 35.000 0.775
(50.0, 83.0] DA 628.842 11.270 647.000 0.107
SR2 93.285 8.835 78.000 0.084
SR3+ 30.795 5.326 19.500 0.034
Transit 97.220 7.787 104.000 0.384
Bike 7.066 2.571 3.000 0.114
Walk 31.292 4.907 37.000 0.245

Note that weights are not normalized within the analysis, so if you use something like a population expansion weight or other large value, you will see results that appear to be extraordinarily significant across the board.

m.analyze_predictions_co("age", wgt="1000")
/opt/hostedtoolcache/Python/3.10.19/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2958: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  pr.groupby(slicer).sum().stack().rename("mean-predicted"),
/opt/hostedtoolcache/Python/3.10.19/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2959: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  np.sqrt(pr_v.groupby(slicer).sum()).stack().rename("stdev-predicted"),
/opt/hostedtoolcache/Python/3.10.19/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2960: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  obs.groupby(slicer).sum().stack().rename("observed"),
/opt/hostedtoolcache/Python/3.10.19/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2980: FutureWarning: Styler.applymap has been deprecated. Use Styler.map instead.
  result.style.text_gradient(
Model Predictions by age
    mean-predicted stdev-predicted observed signif
age alt_name        
(17.999, 29.0] DA 805212.851 397.754 753000.000 0.000
SR2 116585.528 311.967 158000.000 0.000
SR3+ 33167.486 176.366 39000.000 0.000
Transit 118049.461 276.647 106000.000 0.000
Bike 13168.881 110.977 26000.000 0.000
Walk 46815.793 185.418 51000.000 0.000
(29.0, 35.0] DA 707972.588 380.571 717000.000 0.000
SR2 105192.728 297.192 107000.000 0.000
SR3+ 34191.796 177.143 32000.000 0.000
Transit 106630.204 256.670 104000.000 0.000
Bike 11383.801 103.088 11000.000 0.000
Walk 28628.882 149.401 23000.000 0.000
(35.0, 41.0] DA 683659.428 375.550 681000.000 0.000
SR2 96580.363 286.099 91000.000 0.000
SR3+ 31445.145 170.893 39000.000 0.000
Transit 92563.655 244.925 102000.000 0.000
Bike 8539.901 90.514 3000.000 0.000
Walk 27211.509 147.725 24000.000 0.000
(41.0, 50.0] DA 827616.698 403.720 853000.000 0.000
SR2 112625.877 311.281 88000.000 0.000
SR3+ 35373.514 181.080 35000.000 0.039
Transit 102940.087 257.635 107000.000 0.000
Bike 10471.769 99.964 7000.000 0.000
Walk 33972.055 164.773 33000.000 0.000
(50.0, 83.0] DA 612531.263 345.269 633000.000 0.000
SR2 86039.266 268.691 73000.000 0.000
SR3+ 26799.944 157.832 16000.000 0.000
Transit 77827.628 226.888 79000.000 0.000
Bike 6436.855 77.852 3000.000 0.000
Walk 29365.045 151.038 35000.000 0.000

To counteract this effect, you can normalize weights before providing the data to Larch, or explicitly in the wgt expression.

Weights in analyze_predictions_co are computed seperately from weighting that is applied in estimation, but if weights are used in estimation you can choose to apply the same weights here by setting wgt=True.

Elasticity#

Users can also review the elasticity of demand with respect to various input variables, using the analyze_elasticity method. This method accepts a variable name and it computes the elasticity with respect to that variable. For idca format variables, you can also provide an altid (the integer code for an individual alternative), and the elasticity will be computed with respect to a change in only that alterantives values for the selected variable. For example, in the model we are reviewing here, cost is stored in a single idca format variable, but if we want to see the elasticity with respect to transit cost specifically we can do so like this:

m.analyze_elasticity("totcost", altid=4)
Model Elasticity
  totcost[Transit]
alt_name  
DA 0.032
SR2 0.077
SR3+ 0.119
Transit -0.391
Bike 0.083
Walk 0.081

For idco format variables, we can still compute elasticities, but only without the altid argument, as it does not make sense to try to have an elasticity with respect to something like “income when choosing to drive”.

m.analyze_elasticity("hhinc")
Model Elasticity
  hhinc
alt_name  
DA 0.043
SR2 -0.058
SR3+ 0.096
Transit -0.136
Bike -0.510
Walk -0.284

Elasticities can also be computed by segments of choosers, in a manner mirroring the segmentation available from the analyze_predictions_co method. By adding the q argument to break the data into quantiles, (and optionally the n to set the number of quantiles), we can see elasticity by various segments. For example, here we can see the price elasticity of demand for transit is (slightly) increasing as income increases.

m.analyze_elasticity("totcost", altid=4, q="hhinc", n=3)
/opt/hostedtoolcache/Python/3.10.19/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:3393: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  (pr_change.groupby(slicer).sum() / pr_avg.groupby(slicer).sum())
Model Elasticity by hhinc
    totcost[Transit]
hhinc alt_name  
(2.999, 42.5] DA 0.034
SR2 0.081
SR3+ 0.112
Transit -0.362
Bike 0.081
Walk 0.091
(42.5, 67.5] DA 0.032
SR2 0.075
SR3+ 0.122
Transit -0.412
Bike 0.092
Walk 0.065
(67.5, 145.0] DA 0.030
SR2 0.072
SR3+ 0.125
Transit -0.425
Bike 0.070
Walk 0.063

Full Probability Array#

In addition to the pre-packaged analysis above, Larch makes available the full probability array (among other internals), so that advanced users can slice and analyze the results in arbitrarily complex ways.

m.probability(return_format="dataframe")
altid 1 2 3 4 5 6
caseid
1 0.817453 0.077713 0.017906 0.071428 0.015500 0.000000
2 0.336948 0.074346 0.052077 0.498066 0.038563 0.000000
3 0.823137 0.077032 0.015894 0.083936 0.000000 0.000000
4 0.340374 0.073518 0.058211 0.527897 0.000000 0.000000
5 0.000000 0.448104 0.114345 0.343653 0.093897 0.000000
... ... ... ... ... ... ...
5025 0.847974 0.120204 0.031822 0.000000 0.000000 0.000000
5026 0.791097 0.071897 0.015698 0.121309 0.000000 0.000000
5027 0.885339 0.092190 0.022471 0.000000 0.000000 0.000000
5028 0.710696 0.062881 0.014049 0.157473 0.000000 0.054901
5029 0.581632 0.047285 0.010365 0.120865 0.031087 0.208766

5029 rows × 6 columns

In conjunction with manipulations of the data and model parameters, users can evaluate nearly any type of elasticity, reponse function, or summary statistic.