===== Usage ===== Quick start guide to use hierarchical linear regression using HLR package. Fetch example data ------------------ Let's first fetch some data and initiate the HLR object. We'll use the `penguins` dataset from `seaborn` for our example. .. code-block:: python import seaborn as sns import pandas as pd # Load the example penguins dataset df = sns.load_dataset('penguins') df.dropna(inplace=True) df = df[['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g']] Initialize HLR & generate summary report ---------------------------------------- .. code-block:: python from HLR import HierarchicalLinearRegression # Define the independent variables for each model level ivs_dict = { 1: ['bill_length_mm'], 2: ['bill_length_mm', 'bill_depth_mm'], 3: ['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm'] } # Define the dependent variable dv = 'body_mass_g' # Initialize the HierarchicalLinearRegression class hlr = HierarchicalLinearRegression(df, ivs_dict, dv) hlr.summary() Output: .. raw:: html :file: images/hlr_summary.html Run diagnostics for testing assumptions --------------------------------------- .. code-block:: python diagnostics_dict = hlr.diagnostics(verbose=True) Output: .. code-block:: text Model Level 1 Diagnostics: Independence of residuals (Durbin-Watson test): DW stat: 0.8450671190941991 Passed: False Linearity (Pearson r): bill_length_mm: {'Pearson r': 0.5894511101769488, 'p-value': 1.5386135144860176e-32, 'Passed': True} Linearity (Rainbow test): Rainbow Stat: 0.845825915500362 p-value: 0.8589217163587981 Passed: True Homoscedasticity (Breusch-Pagan test): Lagrange Stat: 76.51043993569607 p-value: 2.1905189444330245e-18 Passed: False Homoscedasticity (Goldfeld-Quandt test): F-Stat: 3.298385120028286 p-value: 5.1841847326260096e-14 Passed: False Multicollinearity (pairwise correlations): Correlations: {} Passed: True Multicollinearity (Variance Inflation Factors): VIFs: {} Passed: True Outliers (extreme standardized residuals): Indices: [] Passed: True Outliers (high Cooks distance): Indices: [] Passed: True Normality (mean of residuals): Mean: -2.403469482162693e-13 Passed: True Normality (Shapiro-Wilk test): SW Stat: 0.9912192354166119 p-value: 0.04492289320888261 Passed: False Model Level 2 Diagnostics: ... Plotting options for all model levels ------------------------------------- .. code-block:: python fig = hlr.plot_studentized_residuals_vs_fitted() Output: .. image:: /images/plot_studentized_residuals_vs_fitted.png :alt: plot_studentized_residuals_vs_fitted :align: center :width: 50% .. code-block:: python fig = hlr.plot_qq_residuals() Output: .. image:: /images/plot_qq_residuals.png :alt: plot_qq_residuals :align: center :width: 50% .. code-block:: python fig = hlr.plot_influence() Output: .. image:: /images/plot_influence.png :alt: plot_influence :align: center :width: 50% .. code-block:: python fig = hlr.plot_std_residuals() Output: .. image:: /images/plot_std_residuals.png :alt: plot_std_residuals :align: center :width: 50% .. code-block:: python fig = hlr.plot_histogram_std_residuals() Output: .. image:: /images/plot_histogram_std_residuals.png :alt: plot_histogram_std_residuals :align: center :width: 50% .. code-block:: python fig_list = hlr.plot_partial_regression() Output: .. image:: /images/plot_partial_regression.png :alt: plot_partial_regression :align: center :width: 50% (the fig_list contains a fig for each Model Level; only Model Level 1 displayed (i.e., fig_list[0]))