scikit_na.altair

scikit_na.altair.plot_corr(data: DataFrame, columns: Iterable[str] | None = None, mask_diag: bool = True, annot_color: str = 'black', round_sgn: int = 2, font_size: int = 14, opacity: float = 0.5, corr_kws: dict | None = None, chart_kws: dict | None = None, x_kws: dict | None = None, y_kws: dict | None = None, color_kws: dict | None = None, text_kws: dict | None = None) LayerChart

Correlation heatmap.

Parameters:
  • data (DataFrame) – Input data.

  • columns (Optional[Sequence[str]]) – Columns names.

  • mask_diag (bool = True) – Mask diagonal on heatmap.

  • corr_kws (dict, optional) – Keyword arguments passed to pandas.DataFrame.corr() method.

  • heat_kws (dict, optional) – Keyword arguments passed to seaborn.heatmap() method.

Returns:

Altair Chart object.

Return type:

altair.Chart

scikit_na.altair.plot_hist(data: DataFrame, col: str, col_na: str, na_label: str | None = None, na_replace: dict | None = None, heuristic: bool = True, thres_uniq: int = 20, step: bool = False, norm: bool = True, font_size: int = 14, xlabel: str | None = None, ylabel: str = 'Frequency', chart_kws: dict | None = None, markarea_kws: dict | None = None, markbar_kws: dict | None = None, joinagg_kws: dict | None = None, calc_kws: dict | None = None, x_kws: dict | None = None, y_kws: dict | None = None, color_kws: dict | None = None) Chart

Histogram plot.

Plots a histogram of values in a column col grouped by NA/non-NA values in column col_na.

Parameters:
  • data (DataFrame) – Input data.

  • col (str) – Column to display distribution of values.

  • col_na (str) – Column to group values by.

  • na_label (str, optional) – Legend title.

  • na_replace (dict, optional) – Dictionary to replace values returned by pandas.Series.isna() method.

  • step (bool, optional) – Draw step plot.

  • norm (bool, optional) – Normalize values in groups.

  • xlabel (str, optional) – X axis label.

  • ylabel (str, optional) – Y axis label.

  • chart_kws (dict, optional) – Keyword arguments passed to altair.Chart().

  • markarea_kws (dict, optional) – Keyword arguments passed to altair.Chart.mark_area().

  • markbar_kws (dict, optional) – Keyword arguments passed to altair.Chart.mark_bar().

  • joinagg_kws (dict, optional) – Keyword arguments passed to altair.Chart.transform_joinaggregate().

  • calc_kws (dict, optional) – Keyword arguments passed to altair.Chart.transform_calculate().

  • x_kws (dict, optional) – Keyword arguments passed to altair.X().

  • y_kws (dict, optional) – Keyword arguments passed to altair.Y().

  • color_kws (dict, optional) – Keyword arguments passed to altair.Color().

Returns:

Altair Chart object.

Return type:

Chart

scikit_na.altair.plot_kde(data: DataFrame, col: str, col_na: str, na_label: str | None = None, na_replace: dict | None = None, font_size: int = 14, xlabel: str | None = None, ylabel: str = 'Density', chart_kws: dict | None = None, markarea_kws: dict | None = None, density_kws: dict | None = None, x_kws: dict | None = None, y_kws: dict | None = None, color_kws: dict | None = None) Chart

Density plot.

Plots distribution of values in a column col grouped by NA/non-NA values in column col_na.

Parameters:
  • data (DataFrame) – Input data.

  • col (str) – Column to display distribution of values.

  • col_na (str) – Column to group values by.

  • na_label (str, optional) – Legend title.

  • na_replace (dict, optional) – Dictionary to replace values returned by pandas.Series.isna() method.

  • xlabel (str, optional) – X axis label.

  • ylabel (str, optional) – Y axis label.

  • chart_kws (dict, optional) – Keyword arguments passed to altair.Chart().

  • markarea_kws (dict, optional) – Keyword arguments passed to altair.Chart.mark_area().

  • density_kws (dict, optional) – Keyword arguments passed to altair.Chart.transform_density().

  • x_kws (dict, optional) – Keyword arguments passed to altair.X().

  • y_kws (dict, optional) – Keyword arguments passed to altair.Y().

  • color_kws (dict, optional) – Keyword arguments passed to altair.Color().

Returns:

Altair Chart object.

Return type:

Chart

scikit_na.altair.plot_heatmap(data: DataFrame, columns: Sequence[str] | None = None, names: list | None = None, sort: bool = True, droppable: bool = True, font_size: int = 14, xlabel: str = 'Columns', ylabel: str = 'Rows', zlabel: str = 'Values', chart_kws: dict | None = None, rect_kws: dict | None = None, x_kws: dict | None = None, y_kws: dict | None = None, color_kws: dict | None = None) Chart

Create interactive heatmap visualization of missing data patterns.

Generates a color-coded heatmap where each cell represents a data point, showing the pattern of missing values across rows and columns. This visualization is essential for understanding: - Overall distribution of missing values - Systematic patterns in data collection - Which rows would be affected by listwise deletion - Relationships between missing values in different columns

Parameters:
  • data (DataFrame) – Input pandas DataFrame to visualize missing data patterns.

  • columns (Sequence[str], optional) – Specific column names to include in the visualization. If None, includes all columns in the DataFrame.

  • names (list, optional) – Custom labels for the legend categories, provided as a list with: - names[0]: Label for non-missing values (default: “Filled”) - names[1]: Label for missing values (default: “NA”) - names[2]: Label for droppable values (default: “Droppable”) Only first two elements are used if droppable=False.

  • sort (bool, default True) – If True, sorts columns by number of missing values (most missing first) and rows by missing value patterns for better visual clustering.

  • droppable (bool, default True) – If True, highlights non-missing values in rows that contain at least one missing value (i.e., values that would be lost with listwise deletion). This helps understand the impact of complete case analysis.

  • font_size (int, default 14) – Font size for axis labels and legend text.

  • xlabel (str, default "Columns") – Label for the x-axis (column names).

  • ylabel (str, default "Rows") – Label for the y-axis (row indices).

  • zlabel (str, default "Values") – Title for the color legend showing value categories.

  • chart_kws (dict, optional) – Additional keyword arguments passed to altair.Chart() constructor. Common options: {‘width’: int, ‘height’: int, ‘title’: str}

  • rect_kws (dict, optional) – Keyword arguments for altair.Chart.mark_rect() to customize rectangles. Common options: {‘stroke’: str, ‘strokeWidth’: float}

  • x_kws (dict, optional) – Keyword arguments for altair.X() encoding of the x-axis.

  • y_kws (dict, optional) – Keyword arguments for altair.Y() encoding of the y-axis.

  • color_kws (dict, optional) – Keyword arguments for altair.Color() encoding, including custom color scales.

Returns:

Interactive Altair Chart object that can be: - Displayed directly in Jupyter notebooks - Saved to various formats (PNG, SVG, HTML, JSON) - Further customized with additional Altair methods

Return type:

altair.Chart

Examples

Basic missing data heatmap:

>>> import pandas as pd
>>> import scikit_na as na
>>> data = pd.DataFrame({
...     'A': [1, None, 3, None, 5],
...     'B': [1, 2, None, 4, None],
...     'C': [None, None, 3, 4, 5]
... })
>>> chart = na.altair.plot_heatmap(data)
>>> chart.show()

Focus on specific columns without sorting:

>>> chart = na.altair.plot_heatmap(data,
...                                columns=['A', 'B'],
...                                sort=False)

Simplified view without droppable values:

>>> chart = na.altair.plot_heatmap(data,
...                                droppable=False,
...                                names=['Available', 'Missing'])

Customized appearance:

>>> chart = na.altair.plot_heatmap(
...     data,
...     chart_kws={'width': 400, 'height': 300, 'title': 'Missing Data Pattern'},
...     color_kws={'scale': {'range': ['lightblue', 'red', 'orange']}},
...     font_size=12
... )

Save to file:

>>> chart = na.altair.plot_heatmap(data)
>>> chart.save('missing_data_heatmap.png')

Notes

  • Green typically represents filled/non-missing values

  • Red represents missing (NA) values

  • Orange represents “droppable” values (non-missing values in incomplete rows)

  • Sorting helps identify systematic missing data patterns

  • The droppable category shows the collateral damage from listwise deletion

  • Interactive features allow zooming and tooltips for detailed inspection

  • Large datasets may require adjusting chart dimensions via chart_kws

See also

plot_stairs

Visualize cumulative impact of missing data

plot_corr

Correlation heatmap for missing value patterns

summary

Numerical summary of missing data patterns

scikit_na.altair.plot_scatter(data: DataFrame, x_col: str, y_col: str, col_na: str, na_label: str | None = None, na_replace: dict | None = None, font_size: int = 14, xlabel: str | None = None, ylabel: str | None = None, circle_kws: dict | None = None, color_kws: dict | None = None, x_kws: dict | None = None, y_kws: dict | None = None)

Scatter plot.

Parameters:
  • data (DataFrame) – Input data.

  • x_col (str) – Column name corresponding to X axis.

  • y_col (str) – Column name corresponding to Y axis.

  • col_na (str) – Column name

  • na_label (str, optional) – Label for NA values in legend.

  • na_replace (dict, optional) – NA replacement mapping, by default {True: ‘NA’, False: ‘Filled’}.

  • font_size (int, optional) – Font size for plotting, by default 14.

  • xlabel (str, optional) – X axis label.

  • ylabel (str, optional) – Y axis label.

  • circle_kws (dict, optional) – Keyword arguments passed to altair.Chart.mark_circle().

  • color_kws (dict, optional) – Keyword arguments passed to altair.Color().

  • x_kws (dict, optional) – Keyword arguments passed to altair.X().

  • y_kws (dict, optional) – Keyword arguments passed to altair.Y().

Returns:

Scatter plot.

Return type:

altair.Chart

scikit_na.altair.plot_stairs(data: DataFrame, columns: Sequence[str] | None = None, xlabel: str = 'Columns', ylabel: str = 'Instances', tooltip_label: str = 'Size difference', dataset_label: str = '(Whole dataset)', font_size: int = 14, area_kws: dict | None = None, chart_kws: dict | None = None, x_kws: dict | None = None, y_kws: dict | None = None)

Stairs plot.

Plots changes in dataset size (rows/instances number) after applying pandas.DataFrame.dropna() to each column cumulatively.

Columns are sorted by maximum influence on dataset size.

Parameters:
  • data (DataFrame) – Input data.

  • columns (Optional[Sequence[str]], optional) – Columns that are to be displayed on a plot.

  • xlabel (str, optional) – X axis label.

  • ylabel (str, optional) – Y axis label.

  • tooltip_label (str, optional) – Label for differences in dataset size that is displayed on a tooltip.

  • dataset_label (str, optional) – Label for the whole dataset (before dropping any NAs).

  • area_kws (dict, optional) – Keyword arguments passed to altair.Chart.mark_area() method.

  • chart_kws (dict, optional) – Keyword arguments passed to altair.Chart() class.

  • x_kws (dict, optional) – Keyword arguments passed to altair.X() class.

  • y_kws (dict, optional) – Keyword arguments passed to altair.Y() class.

Returns:

Chart object.

Return type:

altair.Chart