larch.Model.analyze_predictions_co#

Model.analyze_predictions_co(q: Any = None, n: int = 5, *, caption: str | bool = True, alt_labels: Literal['id', 'name'] = 'name', bins=None, wgt: Any = None) pd.io.formats.style.Styler#

Analyze predictions of the model based on idco attributes.

This method provides a summary of the model’s predictions, broken down into categories by some measure in the idco data. The analysis includes the mean predicted counts within each category, the standard deviation of the predicted counts, the observed values, and a two-tailed p-value for the difference between the observed and predicted values. Statistically significant differences are highlighted in the output.

Parameters:
  • q (str or array-like) – The quantiles to use for slicing the data. If given as a string, the string evaluated against the idca portion of this model’s datatree, and then the result is categorized into n quantiles. If given as an array-like, the array is used to slice the data, as the by argument to DataFrame.groupby, against an idca formatted dataframe of probabilities.

  • n (int, default 5) – The number of quantiles to use when q is a string.

  • caption (str or bool, default True) – The caption to use for the styled DataFrame. If True, the caption will be “Model Predictions by {q}”, and if False no caption will be used.

  • alt_labels ({'name', 'id'}, default 'name') – The type of labels to use for the alternative IDs in the output.

  • bins (int, sequence of scalars, or IntervalIndex, optional) –

    If provided, this value overrides n and is provided to pandas.cut to control the binning.

    • int : Defines the number of equal-width bins in the range of q. The range of q is extended by .1% on each side to include the minimum and maximum values of q.

    • sequence of scalars : Defines the bin edges allowing for non-uniform width. No extension of the range of q is done.

    • IntervalIndex : Defines the exact bins to be used. Note that IntervalIndex for bins must be non-overlapping.

  • wgt (array-like or str or bool, optional) – If given, this value is used to weight the cases. This can be done whether the model was estimated with weights or not; the estimation weights are ignored in this analysis, unless the value of this argument is True, in which case the estimation weights are used.

Returns:

pandas.io.formats.style.Styler – A styled DataFrame containing the results of the analysis.

Notes

This method is typically used to analyze the model’s predictions against attributes in the observed data that are not used in the model itself. For example, if the model estimates the probability of choosing a particular alternative conditional on cost, time, and income, this method can be used to analyze the model’s predictions against the distribution of observed choices by age or other characteristics. Technically, nothing prevents a user from using this method to analyze the model’s predictions against the same attributes used in the model, but the results are likely to provide less useful informative.

This method requires the scipy package to be installed, as it uses the scipy.stats.norm.sf function to calculate the p-values.

The standard deviation of the predicted counts is calculated via a normal approximation to the underlying variable-p binomial-like distribution, and may be slightly biased especially for small sample sizes.