Model Analysis#

Larch includes several tools to support analysis of estimated model results. We will demonstrate some of these features here using the simple example model for MTC mode choice:

import larch as lx

m = lx.example(1)
m.estimate(quiet=True)
m.parameter_summary()

	Value	Std Err	t Stat	Signif	Null Value
Parameter
ASC_BIKE	-2.38	0.305	-7.80	***	0.00
ASC_SR2	-2.18	0.105	-20.81	***	0.00
ASC_SR3P	-3.73	0.178	-20.97	***	0.00
ASC_TRAN	-0.671	0.133	-5.06	***	0.00
ASC_WALK	-0.207	0.194	-1.06		0.00
hhinc#2	-0.00217	0.00155	-1.40		0.00
hhinc#3	0.000365	0.00254	0.14		0.00
hhinc#4	-0.00528	0.00183	-2.89	**	0.00
hhinc#5	-0.0128	0.00532	-2.41	*	0.00
hhinc#6	-0.00969	0.00303	-3.19	**	0.00
totcost	-0.00492	0.000239	-20.60	***	0.00
tottime	-0.0513	0.00310	-16.57	***	0.00

Choice and Availability Summary#

A simple aggregate analysis of the data’s choice and availablity statistics is available via choice_avail_summary even without any model structure or parameters, as this summary can be constructed from the data alone.

m.choice_avail_summary()

	name	chosen	available
1	DA	3637	4755
2	SR2	517	5029
3	SR3+	161	5029
4	Transit	498	4003
5	Bike	50	1738
6	Walk	166	1479
< Total All Alternatives >		5029	<NA>

Analyze Predictions#

Larch includes methods to analyze model predictions across various dimensions. The analyze_predictions_co method can be used to examine how well the model predicts choices against any available (or computable) attibute of the chooser. For example, consider the basic example model for MTC mode choice. This model includes a utility function that incorporates alternative specific constants, level of service variables (time and cost), as well as alternative-specific parameters to account for income.

We may be interested in knowing how well the model predicts choices across various age levels. To see this, we can pass the “age” variable to analyze_predictions_co:

m.analyze_predictions_co("age")

/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2958: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  pr.groupby(slicer).sum().stack().rename("mean-predicted"),
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2959: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  np.sqrt(pr_v.groupby(slicer).sum()).stack().rename("stdev-predicted"),
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2960: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  obs.groupby(slicer).sum().stack().rename("observed"),
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2980: FutureWarning: Styler.applymap has been deprecated. Use Styler.map instead.
  result.style.text_gradient(

Model Predictions by age
		mean-predicted	stdev-predicted	observed	signif
age	alt_name
(17.999, 29.0]	DA	805.220	12.578	753.000	0.000
	SR2	116.578	9.865	158.000	0.000
	SR3+	33.173	5.578	39.000	0.296
	Transit	118.044	8.748	106.000	0.169
	Bike	13.167	3.509	26.000	0.000
	Walk	46.817	5.863	51.000	0.476
(29.0, 35.0]	DA	707.977	12.035	717.000	0.453
	SR2	105.184	9.398	107.000	0.847
	SR3+	34.200	5.602	32.000	0.694
	Transit	106.627	8.116	104.000	0.746
	Bike	11.382	3.260	11.000	0.907
	Walk	28.629	4.724	23.000	0.233
(35.0, 41.0]	DA	683.662	11.876	681.000	0.823
	SR2	96.571	9.047	91.000	0.538
	SR3+	31.455	5.405	39.000	0.163
	Transit	92.561	7.745	102.000	0.223
	Bike	8.539	2.862	3.000	0.053
	Walk	27.212	4.671	24.000	0.492
(41.0, 50.0]	DA	827.620	12.767	853.000	0.047
	SR2	112.614	9.843	88.000	0.012
	SR3+	35.385	5.727	35.000	0.946
	Transit	102.937	8.147	107.000	0.618
	Bike	10.471	3.161	7.000	0.272
	Walk	33.972	5.211	33.000	0.852
(50.0, 83.0]	DA	612.534	10.918	633.000	0.061
	SR2	86.032	8.496	73.000	0.125
	SR3+	26.808	4.992	16.000	0.030
	Transit	77.824	7.175	79.000	0.870
	Bike	6.436	2.462	3.000	0.163
	Walk	29.366	4.776	35.000	0.238

This gives us a table of mode choices, segregated into five age categories. These categories were selected by the pandas.qcut functions to roughly divide the sample of observations into quintiles. We can see the mean and standard deviation of the model predictions for each mode choice in each age group, as well as the actual observed count of choices in each group. The signif column gives the level of significance of the difference between the predicted totals and the observed totals. A small number in this column indicates that, assuming the model is correct, is would be very unlikely to actually collect the observed data. We can see the very small significance values in the lowest age group are bold and highlighted in red, as these very small numbers are suggesting there is a problem in our model.

We can also generate a figure to present this same information in a more visual representation, using analyze_predictions_co_figure.

m.analyze_predictions_co_figure("age")

/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2958: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  pr.groupby(slicer).sum().stack().rename("mean-predicted"),
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2959: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  np.sqrt(pr_v.groupby(slicer).sum()).stack().rename("stdev-predicted"),
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2960: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  obs.groupby(slicer).sum().stack().rename("observed"),
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2980: FutureWarning: Styler.applymap has been deprecated. Use Styler.map instead.
  result.style.text_gradient(

If we prefer to analyze age using specific categorical breakpoints, we can do so by providing the preferred explicit bin breakpoints to the analyze_predictions_co method, which are then used by pandas.cut to categorize the data.

m.analyze_predictions_co("age", bins=[0, 25, 45, 65, 99])

/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2958: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  pr.groupby(slicer).sum().stack().rename("mean-predicted"),
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2959: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  np.sqrt(pr_v.groupby(slicer).sum()).stack().rename("stdev-predicted"),
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2960: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  obs.groupby(slicer).sum().stack().rename("observed"),
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2980: FutureWarning: Styler.applymap has been deprecated. Use Styler.map instead.
  result.style.text_gradient(

Model Predictions by age
		mean-predicted	stdev-predicted	observed	signif
age	alt_name
(0, 25]	DA	418.203	9.070	388.000	0.001
	SR2	63.362	7.247	89.000	0.000
	SR3+	16.807	4.005	26.000	0.022
	Transit	68.245	6.581	56.000	0.063
	Bike	8.279	2.763	12.000	0.178
	Walk	28.104	4.602	32.000	0.397
(25, 45]	DA	2170.571	20.973	2166.000	0.827
	SR2	310.673	16.200	304.000	0.680
	SR3+	100.318	9.611	102.000	0.861
	Transit	304.309	13.868	322.000	0.202
	Bike	30.100	5.342	32.000	0.722
	Walk	92.029	8.427	82.000	0.234
(45, 65]	DA	1006.942	13.983	1046.000	0.005
	SR2	137.478	10.809	118.000	0.072
	SR3+	42.337	6.287	32.000	0.100
	Transit	118.306	8.950	113.000	0.553
	Bike	11.205	3.262	6.000	0.111
	Walk	40.733	5.715	42.000	0.825
(65, 99]	DA	41.299	2.935	37.000	0.143
	SR2	5.467	2.187	6.000	0.807
	SR3+	1.560	1.221	1.000	0.647
	Transit	7.134	2.121	7.000	0.950
	Bike	0.411	0.613	0.000	0.503
	Walk	5.131	1.905	10.000	0.011

We can also apply non-uniform weights to the observations, by passing an expression to the wgt argument of the analyze_predictions_co method. For example, here we overweight persons who work in the core CBD:

m.analyze_predictions_co("age", wgt="1.5 if wkccbd else 1.0")

/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2958: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  pr.groupby(slicer).sum().stack().rename("mean-predicted"),
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2959: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  np.sqrt(pr_v.groupby(slicer).sum()).stack().rename("stdev-predicted"),
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2960: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  obs.groupby(slicer).sum().stack().rename("observed"),
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2980: FutureWarning: Styler.applymap has been deprecated. Use Styler.map instead.
  result.style.text_gradient(

Model Predictions by age
		mean-predicted	stdev-predicted	observed	signif
age	alt_name
(17.999, 29.0]	DA	825.927	12.945	767.000	0.000
	SR2	123.971	10.165	165.000	0.000
	SR3+	36.891	5.860	41.500	0.432
	Transit	144.723	9.410	141.500	0.732
	Bike	14.397	3.663	27.500	0.000
	Walk	51.591	6.117	55.000	0.577
(29.0, 35.0]	DA	731.626	12.529	736.500	0.697
	SR2	114.112	9.779	116.000	0.847
	SR3+	39.369	5.980	39.000	0.951
	Transit	136.486	8.909	138.500	0.821
	Bike	13.019	3.477	11.000	0.561
	Walk	31.388	4.924	25.000	0.194
(35.0, 41.0]	DA	707.275	12.336	697.500	0.428
	SR2	104.737	9.413	99.000	0.542
	SR3+	36.263	5.777	45.000	0.130
	Transit	116.846	8.433	132.500	0.063
	Bike	9.024	2.940	3.000	0.040
	Walk	29.355	4.819	26.500	0.554
(41.0, 50.0]	DA	844.730	13.093	863.000	0.163
	SR2	120.608	10.158	95.000	0.012
	SR3+	39.862	6.041	40.000	0.982
	Transit	127.047	8.745	139.500	0.154
	Bike	11.217	3.267	7.500	0.255
	Walk	36.534	5.373	35.000	0.775
(50.0, 83.0]	DA	628.844	11.270	647.000	0.107
	SR2	93.277	8.835	78.000	0.084
	SR3+	30.804	5.327	19.500	0.034
	Transit	97.216	7.787	104.000	0.384
	Bike	7.065	2.571	3.000	0.114
	Walk	31.292	4.907	37.000	0.245

Note that weights are not normalized within the analysis, so if you use something like a population expansion weight or other large value, you will see results that appear to be extraordinarily significant across the board.

m.analyze_predictions_co("age", wgt="1000")

/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2958: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  pr.groupby(slicer).sum().stack().rename("mean-predicted"),
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2959: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  np.sqrt(pr_v.groupby(slicer).sum()).stack().rename("stdev-predicted"),
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2960: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  obs.groupby(slicer).sum().stack().rename("observed"),
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:2980: FutureWarning: Styler.applymap has been deprecated. Use Styler.map instead.
  result.style.text_gradient(

Model Predictions by age
		mean-predicted	stdev-predicted	observed	signif
age	alt_name
(17.999, 29.0]	DA	805220.280	397.747	753000.000	0.000
	SR2	116578.340	311.956	158000.000	0.000
	SR3+	33173.244	176.380	39000.000	0.000
	Transit	118044.078	276.638	106000.000	0.000
	Bike	13166.754	110.969	26000.000	0.000
	Walk	46817.304	185.417	51000.000	0.000
(29.0, 35.0]	DA	707977.079	380.565	717000.000	0.000
	SR2	105183.885	297.181	107000.000	0.000
	SR3+	34200.379	177.163	32000.000	0.000
	Transit	106627.399	256.664	104000.000	0.000
	Bike	11381.978	103.080	11000.000	0.000
	Walk	28629.280	149.400	23000.000	0.000
(35.0, 41.0]	DA	683662.462	375.546	681000.000	0.000
	SR2	96571.347	286.087	91000.000	0.000
	SR3+	31454.868	170.917	39000.000	0.000
	Transit	92560.522	244.920	102000.000	0.000
	Bike	8539.006	90.510	3000.000	0.000
	Walk	27211.795	147.724	24000.000	0.000
(41.0, 50.0]	DA	827620.072	403.716	853000.000	0.000
	SR2	112614.499	311.267	88000.000	0.000
	SR3+	35385.015	181.107	35000.000	0.034
	Transit	102937.133	257.629	107000.000	0.000
	Bike	10471.054	99.961	7000.000	0.000
	Walk	33972.227	164.772	33000.000	0.000
(50.0, 83.0]	DA	612533.977	345.266	633000.000	0.000
	SR2	86032.037	268.680	73000.000	0.000
	SR3+	26807.890	157.853	16000.000	0.000
	Transit	77824.116	226.882	79000.000	0.000
	Bike	6436.116	77.849	3000.000	0.000
	Walk	29365.864	151.037	35000.000	0.000

To counteract this effect, you can normalize weights before providing the data to Larch, or explicitly in the wgt expression.

Weights in analyze_predictions_co are computed seperately from weighting that is applied in estimation, but if weights are used in estimation you can choose to apply the same weights here by setting wgt=True.

Elasticity#

Users can also review the elasticity of demand with respect to various input variables, using the analyze_elasticity method. This method accepts a variable name and it computes the elasticity with respect to that variable. For idca format variables, you can also provide an altid (the integer code for an individual alternative), and the elasticity will be computed with respect to a change in only that alterantives values for the selected variable. For example, in the model we are reviewing here, cost is stored in a single idca format variable, but if we want to see the elasticity with respect to transit cost specifically we can do so like this:

m.analyze_elasticity("totcost", altid=4)

Model Elasticity
	totcost[Transit]
alt_name
DA	0.032
SR2	0.077
SR3+	0.119
Transit	-0.391
Bike	0.083
Walk	0.081

For idco format variables, we can still compute elasticities, but only without the altid argument, as it does not make sense to try to have an elasticity with respect to something like “income when choosing to drive”.

m.analyze_elasticity("hhinc")

Model Elasticity
	hhinc
alt_name
DA	0.042
SR2	-0.059
SR3+	0.097
Transit	-0.136
Bike	-0.509
Walk	-0.284

Elasticities can also be computed by segments of choosers, in a manner mirroring the segmentation available from the analyze_predictions_co method. By adding the q argument to break the data into quantiles, (and optionally the n to set the number of quantiles), we can see elasticity by various segments. For example, here we can see the price elasticity of demand for transit is (slightly) increasing as income increases.

m.analyze_elasticity("totcost", altid=4, q="hhinc", n=3)

/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/larch/model/numbamodel.py:3393: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  (pr_change.groupby(slicer).sum() / pr_avg.groupby(slicer).sum())

Model Elasticity by hhinc
		totcost[Transit]
hhinc	alt_name
(2.999, 42.5]	DA	0.034
	SR2	0.081
	SR3+	0.112
	Transit	-0.362
	Bike	0.081
	Walk	0.091
(42.5, 67.5]	DA	0.032
	SR2	0.075
	SR3+	0.122
	Transit	-0.412
	Bike	0.092
	Walk	0.065
(67.5, 145.0]	DA	0.030
	SR2	0.072
	SR3+	0.125
	Transit	-0.425
	Bike	0.070
	Walk	0.063

Full Probability Array#

In addition to the pre-packaged analysis above, Larch makes available the full probability array (among other internals), so that advanced users can slice and analyze the results in arbitrarily complex ways.

m.probability(return_format="dataframe")

altid	1	2	3	4	5	6
caseid
1	0.817471	0.077709	0.017906	0.071419	0.015495	0.000000
2	0.336948	0.074344	0.052067	0.498101	0.038539	0.000000
3	0.823159	0.077032	0.015888	0.083921	0.000000	0.000000
4	0.340332	0.073502	0.058223	0.527942	0.000000	0.000000
5	0.000000	0.448029	0.114416	0.343639	0.093916	0.000000
...	...	...	...	...	...	...
5025	0.847984	0.120198	0.031818	0.000000	0.000000	0.000000
5026	0.791110	0.071894	0.015696	0.121300	0.000000	0.000000
5027	0.885344	0.092183	0.022473	0.000000	0.000000	0.000000
5028	0.710703	0.062877	0.014049	0.157478	0.000000	0.054894
5029	0.581608	0.047278	0.010365	0.120866	0.031085	0.208797

5029 rows × 6 columns

In conjunction with manipulations of the data and model parameters, users can evaluate nearly any type of elasticity, reponse function, or summary statistic.

Model Analysis#

Choice and Availability Summary#

Analyze Predictions#

Elasticity#

Full Probability Array#

This Page