larch.Model.analyze_predictions_co_figure#
- Model.analyze_predictions_co_figure(q: Any = None, n: int = 5, *, caption: str | bool = True, signif: float = 0.05, width: int = 400, alt_labels: Literal['id', 'name'] = 'name', bins=None) altair.vegalite.v5.api.FacetChart #
Create an Altair figure of the model’s predictions based on idco attributes.
This method provides a summary of the model’s predictions, broken down into categories by some measure in the idco data. The analysis includes the mean predicted counts within each category, the standard deviation of the predicted counts, the observed values, and a two-tailed p-value for the difference between the observed and predicted values. Statistically significant differences are highlighted in the output.
- Parameters:
q (str or array-like, optional) – The quantiles to use for slicing the data. If given as a string, the string evaluated against the idca portion of this model’s datatree, and then the result is categorized into n quantiles. If given as an array-like, the array is used to slice the data, as the by argument to DataFrame.groupby, against an idca formatted dataframe of probabilities.
n (int, default 5) – The number of quantiles to use when q is a string.
caption (str or bool, default True) – The caption to use for the figure. If True, the caption will be “Model Predictions by {q}”, and if False no caption will be used.
alt_labels ({'id', 'name'}, default 'name') – The type of labels to use for the alternative IDs in the output.
signif (float, default 0.05) – The significance level to use for highlighting statistically significant differences.
width (int, default 400) – The width of the figure in pixels.
bins (int, sequence of scalars, or IntervalIndex, optional) – If provided, this value overrides n and is provided to pandas.cut to control the binning. See pandas.cut for more information.
- Returns:
pandas.io.formats.style.Styler – A styled DataFrame containing the results of the analysis.
Notes
This method is typically used to analyze the model’s predictions against attributes in the observed data that are not used in the model itself. For example, if the model estimates the probability of choosing a particular alternative conditional on cost, time, and income, this method can be used to analyze the model’s predictions against the distribution of observed choices by age or other characteristics. Technically, nothing prevents a user from using this method to analyze the model’s predictions against the same attributes used in the model, but the results are likely to provide less useful informative.
This method requires the scipy package to be installed, as it uses the scipy.stats.norm.sf function to calculate the p-values.
The standard deviation of the predicted counts is calculated via a normal approximation to the underlying variable-p binomial-like distribution, and may be slightly biased especially for small sample sizes.