Model Analysis#

Larch includes several tools to support analysis of estimated model results. We will demonstrate some of these features here using the simple example model for MTC mode choice:

import larch as lx

m = lx.example(1)
m.estimate(quiet=True)
m.parameter_summary()
  Value Std Err t Stat Signif Null Value
Parameter          
ASC_BIKE -2.38  0.305 -7.80 *** 0.00
ASC_SR2 -2.18  0.105 -20.81 *** 0.00
ASC_SR3P -3.73  0.178 -20.97 *** 0.00
ASC_TRAN -0.671  0.133 -5.06 *** 0.00
ASC_WALK -0.207  0.194 -1.06 0.00
hhinc#2 -0.00217  0.00155 -1.40 0.00
hhinc#3  0.000365  0.00254  0.14 0.00
hhinc#4 -0.00528  0.00183 -2.89 ** 0.00
hhinc#5 -0.0128  0.00532 -2.41 * 0.00
hhinc#6 -0.00969  0.00303 -3.19 ** 0.00
totcost -0.00492  0.000239 -20.60 *** 0.00
tottime -0.0513  0.00310 -16.57 *** 0.00

Choice and Availability Summary#

A simple aggregate analysis of the data’s choice and availablity statistics is available via choice_avail_summary even without any model structure or parameters, as this summary can be constructed from the data alone.

m.choice_avail_summary()
name chosen available
1 DA 3637 4755
2 SR2 517 5029
3 SR3+ 161 5029
4 Transit 498 4003
5 Bike 50 1738
6 Walk 166 1479
< Total All Alternatives > 5029 <NA>

Analyze Predictions#

Larch includes methods to analyze model predictions across various dimensions. The analyze_predictions_co method can be used to examine how well the model predicts choices against any available (or computable) attibute of the chooser. For example, consider the basic example model for MTC mode choice. This model includes a utility function that incorporates alternative specific constants, level of service variables (time and cost), as well as alternative-specific parameters to account for income.

We may be interested in knowing how well the model predicts choices across various age levels. To see this, we can pass the “age” variable to analyze_predictions_co:

m.analyze_predictions_co("age")
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2958: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  pr.groupby(slicer).sum().stack().rename("mean-predicted"),
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2959: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  np.sqrt(pr_v.groupby(slicer).sum()).stack().rename("stdev-predicted"),
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2960: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  obs.groupby(slicer).sum().stack().rename("observed"),
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2980: FutureWarning: Styler.applymap has been deprecated. Use Styler.map instead.
  result.style.text_gradient(
Model Predictions by age
    mean-predicted stdev-predicted observed signif
age alt_name        
(17.999, 29.0] DA 805.220 12.578 753.000 0.000
SR2 116.578 9.865 158.000 0.000
SR3+ 33.173 5.578 39.000 0.296
Transit 118.044 8.748 106.000 0.169
Bike 13.167 3.509 26.000 0.000
Walk 46.817 5.863 51.000 0.476
(29.0, 35.0] DA 707.977 12.035 717.000 0.453
SR2 105.184 9.398 107.000 0.847
SR3+ 34.200 5.602 32.000 0.694
Transit 106.627 8.116 104.000 0.746
Bike 11.382 3.260 11.000 0.907
Walk 28.629 4.724 23.000 0.233
(35.0, 41.0] DA 683.662 11.876 681.000 0.823
SR2 96.571 9.047 91.000 0.538
SR3+ 31.455 5.405 39.000 0.163
Transit 92.561 7.745 102.000 0.223
Bike 8.539 2.862 3.000 0.053
Walk 27.212 4.671 24.000 0.492
(41.0, 50.0] DA 827.620 12.767 853.000 0.047
SR2 112.614 9.843 88.000 0.012
SR3+ 35.385 5.727 35.000 0.946
Transit 102.937 8.147 107.000 0.618
Bike 10.471 3.161 7.000 0.272
Walk 33.972 5.211 33.000 0.852
(50.0, 83.0] DA 612.534 10.918 633.000 0.061
SR2 86.032 8.496 73.000 0.125
SR3+ 26.808 4.992 16.000 0.030
Transit 77.824 7.175 79.000 0.870
Bike 6.436 2.462 3.000 0.163
Walk 29.366 4.776 35.000 0.238

This gives us a table of mode choices, segregated into five age categories. These categories were selected by the pandas.qcut functions to roughly divide the sample of observations into quintiles. We can see the mean and standard deviation of the model predictions for each mode choice in each age group, as well as the actual observed count of choices in each group. The signif column gives the level of significance of the difference between the predicted totals and the observed totals. A small number in this column indicates that, assuming the model is correct, is would be very unlikely to actually collect the observed data. We can see the very small significance values in the lowest age group are bold and highlighted in red, as these very small numbers are suggesting there is a problem in our model.

We can also generate a figure to present this same information in a more visual representation, using analyze_predictions_co_figure.

m.analyze_predictions_co_figure("age")
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2958: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  pr.groupby(slicer).sum().stack().rename("mean-predicted"),
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2959: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  np.sqrt(pr_v.groupby(slicer).sum()).stack().rename("stdev-predicted"),
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2960: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  obs.groupby(slicer).sum().stack().rename("observed"),
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2980: FutureWarning: Styler.applymap has been deprecated. Use Styler.map instead.
  result.style.text_gradient(

If we prefer to analyze age using specific categorical breakpoints, we can do so by providing the preferred explicit bin breakpoints to the analyze_predictions_co method, which are then used by pandas.cut to categorize the data.

m.analyze_predictions_co("age", bins=[0, 25, 45, 65, 99])
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2958: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  pr.groupby(slicer).sum().stack().rename("mean-predicted"),
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2959: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  np.sqrt(pr_v.groupby(slicer).sum()).stack().rename("stdev-predicted"),
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2960: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  obs.groupby(slicer).sum().stack().rename("observed"),
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2980: FutureWarning: Styler.applymap has been deprecated. Use Styler.map instead.
  result.style.text_gradient(
Model Predictions by age
    mean-predicted stdev-predicted observed signif
age alt_name        
(0, 25] DA 418.203 9.070 388.000 0.001
SR2 63.362 7.247 89.000 0.000
SR3+ 16.807 4.005 26.000 0.022
Transit 68.245 6.581 56.000 0.063
Bike 8.279 2.763 12.000 0.178
Walk 28.104 4.602 32.000 0.397
(25, 45] DA 2170.571 20.973 2166.000 0.827
SR2 310.673 16.200 304.000 0.680
SR3+ 100.318 9.611 102.000 0.861
Transit 304.309 13.868 322.000 0.202
Bike 30.100 5.342 32.000 0.722
Walk 92.029 8.427 82.000 0.234
(45, 65] DA 1006.942 13.983 1046.000 0.005
SR2 137.478 10.809 118.000 0.072
SR3+ 42.337 6.287 32.000 0.100
Transit 118.306 8.950 113.000 0.553
Bike 11.205 3.262 6.000 0.111
Walk 40.733 5.715 42.000 0.825
(65, 99] DA 41.299 2.935 37.000 0.143
SR2 5.467 2.187 6.000 0.807
SR3+ 1.560 1.221 1.000 0.647
Transit 7.134 2.121 7.000 0.950
Bike 0.411 0.613 0.000 0.503
Walk 5.131 1.905 10.000 0.011

We can also apply non-uniform weights to the observations, by passing an expression to the wgt argument of the analyze_predictions_co method. For example, here we overweight persons who work in the core CBD:

m.analyze_predictions_co("age", wgt="1.5 if wkccbd else 1.0")
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2958: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  pr.groupby(slicer).sum().stack().rename("mean-predicted"),
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2959: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  np.sqrt(pr_v.groupby(slicer).sum()).stack().rename("stdev-predicted"),
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2960: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  obs.groupby(slicer).sum().stack().rename("observed"),
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2980: FutureWarning: Styler.applymap has been deprecated. Use Styler.map instead.
  result.style.text_gradient(
Model Predictions by age
    mean-predicted stdev-predicted observed signif
age alt_name        
(17.999, 29.0] DA 825.927 12.945 767.000 0.000
SR2 123.971 10.165 165.000 0.000
SR3+ 36.891 5.860 41.500 0.432
Transit 144.723 9.410 141.500 0.732
Bike 14.397 3.663 27.500 0.000
Walk 51.591 6.117 55.000 0.577
(29.0, 35.0] DA 731.626 12.529 736.500 0.697
SR2 114.112 9.779 116.000 0.847
SR3+ 39.369 5.980 39.000 0.951
Transit 136.486 8.909 138.500 0.821
Bike 13.019 3.477 11.000 0.561
Walk 31.388 4.924 25.000 0.194
(35.0, 41.0] DA 707.275 12.336 697.500 0.428
SR2 104.737 9.413 99.000 0.542
SR3+ 36.263 5.777 45.000 0.130
Transit 116.846 8.433 132.500 0.063
Bike 9.024 2.940 3.000 0.040
Walk 29.355 4.819 26.500 0.554
(41.0, 50.0] DA 844.730 13.093 863.000 0.163
SR2 120.608 10.158 95.000 0.012
SR3+ 39.862 6.041 40.000 0.982
Transit 127.047 8.745 139.500 0.154
Bike 11.217 3.267 7.500 0.255
Walk 36.534 5.373 35.000 0.775
(50.0, 83.0] DA 628.844 11.270 647.000 0.107
SR2 93.277 8.835 78.000 0.084
SR3+ 30.804 5.327 19.500 0.034
Transit 97.216 7.787 104.000 0.384
Bike 7.065 2.571 3.000 0.114
Walk 31.292 4.907 37.000 0.245

Note that weights are not normalized within the analysis, so if you use something like a population expansion weight or other large value, you will see results that appear to be extraordinarily significant across the board.

m.analyze_predictions_co("age", wgt="1000")
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2958: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  pr.groupby(slicer).sum().stack().rename("mean-predicted"),
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2959: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  np.sqrt(pr_v.groupby(slicer).sum()).stack().rename("stdev-predicted"),
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2960: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  obs.groupby(slicer).sum().stack().rename("observed"),
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2980: FutureWarning: Styler.applymap has been deprecated. Use Styler.map instead.
  result.style.text_gradient(
Model Predictions by age
    mean-predicted stdev-predicted observed signif
age alt_name        
(17.999, 29.0] DA 805220.280 397.747 753000.000 0.000
SR2 116578.340 311.956 158000.000 0.000
SR3+ 33173.244 176.380 39000.000 0.000
Transit 118044.078 276.638 106000.000 0.000
Bike 13166.754 110.969 26000.000 0.000
Walk 46817.304 185.417 51000.000 0.000
(29.0, 35.0] DA 707977.079 380.565 717000.000 0.000
SR2 105183.885 297.181 107000.000 0.000
SR3+ 34200.379 177.163 32000.000 0.000
Transit 106627.399 256.664 104000.000 0.000
Bike 11381.978 103.080 11000.000 0.000
Walk 28629.280 149.400 23000.000 0.000
(35.0, 41.0] DA 683662.462 375.546 681000.000 0.000
SR2 96571.347 286.087 91000.000 0.000
SR3+ 31454.868 170.917 39000.000 0.000
Transit 92560.522 244.920 102000.000 0.000
Bike 8539.006 90.510 3000.000 0.000
Walk 27211.795 147.724 24000.000 0.000
(41.0, 50.0] DA 827620.072 403.716 853000.000 0.000
SR2 112614.499 311.267 88000.000 0.000
SR3+ 35385.015 181.107 35000.000 0.034
Transit 102937.133 257.629 107000.000 0.000
Bike 10471.054 99.961 7000.000 0.000
Walk 33972.227 164.772 33000.000 0.000
(50.0, 83.0] DA 612533.977 345.266 633000.000 0.000
SR2 86032.037 268.680 73000.000 0.000
SR3+ 26807.890 157.853 16000.000 0.000
Transit 77824.116 226.882 79000.000 0.000
Bike 6436.116 77.849 3000.000 0.000
Walk 29365.864 151.037 35000.000 0.000

To counteract this effect, you can normalize weights before providing the data to Larch, or explicitly in the wgt expression.

Weights in analyze_predictions_co are computed seperately from weighting that is applied in estimation, but if weights are used in estimation you can choose to apply the same weights here by setting wgt=True.

Elasticity#

Users can also review the elasticity of demand with respect to various input variables, using the analyze_elasticity method. This method accepts a variable name and it computes the elasticity with respect to that variable. For idca format variables, you can also provide an altid (the integer code for an individual alternative), and the elasticity will be computed with respect to a change in only that alterantives values for the selected variable. For example, in the model we are reviewing here, cost is stored in a single idca format variable, but if we want to see the elasticity with respect to transit cost specifically we can do so like this:

m.analyze_elasticity("totcost", altid=4)
Model Elasticity
  totcost[Transit]
alt_name  
DA 0.032
SR2 0.077
SR3+ 0.119
Transit -0.391
Bike 0.083
Walk 0.081

For idco format variables, we can still compute elasticities, but only without the altid argument, as it does not make sense to try to have an elasticity with respect to something like “income when choosing to drive”.

m.analyze_elasticity("hhinc")
Model Elasticity
  hhinc
alt_name  
DA 0.042
SR2 -0.059
SR3+ 0.097
Transit -0.136
Bike -0.509
Walk -0.284

Elasticities can also be computed by segments of choosers, in a manner mirroring the segmentation available from the analyze_predictions_co method. By adding the q argument to break the data into quantiles, (and optionally the n to set the number of quantiles), we can see elasticity by various segments. For example, here we can see the price elasticity of demand for transit is (slightly) increasing as income increases.

m.analyze_elasticity("totcost", altid=4, q="hhinc", n=3)
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:3393: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  (pr_change.groupby(slicer).sum() / pr_avg.groupby(slicer).sum())
Model Elasticity by hhinc
    totcost[Transit]
hhinc alt_name  
(2.999, 42.5] DA 0.034
SR2 0.081
SR3+ 0.112
Transit -0.362
Bike 0.081
Walk 0.091
(42.5, 67.5] DA 0.032
SR2 0.075
SR3+ 0.122
Transit -0.412
Bike 0.092
Walk 0.065
(67.5, 145.0] DA 0.030
SR2 0.072
SR3+ 0.125
Transit -0.425
Bike 0.070
Walk 0.063

Full Probability Array#

In addition to the pre-packaged analysis above, Larch makes available the full probability array (among other internals), so that advanced users can slice and analyze the results in arbitrarily complex ways.

m.probability(return_format="dataframe")
altid 1 2 3 4 5 6
caseid
1 0.817471 0.077709 0.017906 0.071419 0.015495 0.000000
2 0.336948 0.074344 0.052067 0.498101 0.038539 0.000000
3 0.823159 0.077032 0.015888 0.083921 0.000000 0.000000
4 0.340332 0.073502 0.058223 0.527942 0.000000 0.000000
5 0.000000 0.448029 0.114416 0.343639 0.093916 0.000000
... ... ... ... ... ... ...
5025 0.847984 0.120198 0.031818 0.000000 0.000000 0.000000
5026 0.791110 0.071894 0.015696 0.121300 0.000000 0.000000
5027 0.885344 0.092183 0.022473 0.000000 0.000000 0.000000
5028 0.710703 0.062877 0.014049 0.157478 0.000000 0.054894
5029 0.581608 0.047278 0.010365 0.120866 0.031085 0.208797

5029 rows × 6 columns

In conjunction with manipulations of the data and model parameters, users can evaluate nearly any type of elasticity, reponse function, or summary statistic.