scikit_na.altair
- scikit_na.altair.plot_corr(data: DataFrame, columns: Iterable[str] | None = None, mask_diag: bool = True, annot_color: str = 'black', round_sgn: int = 2, font_size: int = 14, opacity: float = 0.5, corr_kws: dict | None = None, chart_kws: dict | None = None, x_kws: dict | None = None, y_kws: dict | None = None, color_kws: dict | None = None, text_kws: dict | None = None) LayerChart
Correlation heatmap.
- Parameters:
data (DataFrame) – Input data.
columns (Optional[Sequence[str]]) – Columns names.
mask_diag (bool = True) – Mask diagonal on heatmap.
corr_kws (dict, optional) – Keyword arguments passed to
pandas.DataFrame.corr()method.heat_kws (dict, optional) – Keyword arguments passed to
seaborn.heatmap()method.
- Returns:
Altair Chart object.
- Return type:
altair.Chart
- scikit_na.altair.plot_hist(data: DataFrame, col: str, col_na: str, na_label: str | None = None, na_replace: dict | None = None, heuristic: bool = True, thres_uniq: int = 20, step: bool = False, norm: bool = True, font_size: int = 14, xlabel: str | None = None, ylabel: str = 'Frequency', chart_kws: dict | None = None, markarea_kws: dict | None = None, markbar_kws: dict | None = None, joinagg_kws: dict | None = None, calc_kws: dict | None = None, x_kws: dict | None = None, y_kws: dict | None = None, color_kws: dict | None = None) Chart
Histogram plot.
Plots a histogram of values in a column col grouped by NA/non-NA values in column col_na.
- Parameters:
data (DataFrame) – Input data.
col (str) – Column to display distribution of values.
col_na (str) – Column to group values by.
na_label (str, optional) – Legend title.
na_replace (dict, optional) – Dictionary to replace values returned by
pandas.Series.isna()method.step (bool, optional) – Draw step plot.
norm (bool, optional) – Normalize values in groups.
xlabel (str, optional) – X axis label.
ylabel (str, optional) – Y axis label.
chart_kws (dict, optional) – Keyword arguments passed to
altair.Chart().markarea_kws (dict, optional) – Keyword arguments passed to
altair.Chart.mark_area().markbar_kws (dict, optional) – Keyword arguments passed to
altair.Chart.mark_bar().joinagg_kws (dict, optional) – Keyword arguments passed to
altair.Chart.transform_joinaggregate().calc_kws (dict, optional) – Keyword arguments passed to
altair.Chart.transform_calculate().x_kws (dict, optional) – Keyword arguments passed to
altair.X().y_kws (dict, optional) – Keyword arguments passed to
altair.Y().color_kws (dict, optional) – Keyword arguments passed to
altair.Color().
- Returns:
Altair Chart object.
- Return type:
Chart
- scikit_na.altair.plot_kde(data: DataFrame, col: str, col_na: str, na_label: str | None = None, na_replace: dict | None = None, font_size: int = 14, xlabel: str | None = None, ylabel: str = 'Density', chart_kws: dict | None = None, markarea_kws: dict | None = None, density_kws: dict | None = None, x_kws: dict | None = None, y_kws: dict | None = None, color_kws: dict | None = None) Chart
Density plot.
Plots distribution of values in a column col grouped by NA/non-NA values in column col_na.
- Parameters:
data (DataFrame) – Input data.
col (str) – Column to display distribution of values.
col_na (str) – Column to group values by.
na_label (str, optional) – Legend title.
na_replace (dict, optional) – Dictionary to replace values returned by
pandas.Series.isna()method.xlabel (str, optional) – X axis label.
ylabel (str, optional) – Y axis label.
chart_kws (dict, optional) – Keyword arguments passed to
altair.Chart().markarea_kws (dict, optional) – Keyword arguments passed to
altair.Chart.mark_area().density_kws (dict, optional) – Keyword arguments passed to
altair.Chart.transform_density().x_kws (dict, optional) – Keyword arguments passed to
altair.X().y_kws (dict, optional) – Keyword arguments passed to
altair.Y().color_kws (dict, optional) – Keyword arguments passed to
altair.Color().
- Returns:
Altair Chart object.
- Return type:
Chart
- scikit_na.altair.plot_heatmap(data: DataFrame, columns: Sequence[str] | None = None, names: list | None = None, sort: bool = True, droppable: bool = True, font_size: int = 14, xlabel: str = 'Columns', ylabel: str = 'Rows', zlabel: str = 'Values', chart_kws: dict | None = None, rect_kws: dict | None = None, x_kws: dict | None = None, y_kws: dict | None = None, color_kws: dict | None = None) Chart
Create interactive heatmap visualization of missing data patterns.
Generates a color-coded heatmap where each cell represents a data point, showing the pattern of missing values across rows and columns. This visualization is essential for understanding: - Overall distribution of missing values - Systematic patterns in data collection - Which rows would be affected by listwise deletion - Relationships between missing values in different columns
- Parameters:
data (DataFrame) – Input pandas DataFrame to visualize missing data patterns.
columns (Sequence[str], optional) – Specific column names to include in the visualization. If None, includes all columns in the DataFrame.
names (list, optional) – Custom labels for the legend categories, provided as a list with: - names[0]: Label for non-missing values (default: “Filled”) - names[1]: Label for missing values (default: “NA”) - names[2]: Label for droppable values (default: “Droppable”) Only first two elements are used if droppable=False.
sort (bool, default True) – If True, sorts columns by number of missing values (most missing first) and rows by missing value patterns for better visual clustering.
droppable (bool, default True) – If True, highlights non-missing values in rows that contain at least one missing value (i.e., values that would be lost with listwise deletion). This helps understand the impact of complete case analysis.
font_size (int, default 14) – Font size for axis labels and legend text.
xlabel (str, default "Columns") – Label for the x-axis (column names).
ylabel (str, default "Rows") – Label for the y-axis (row indices).
zlabel (str, default "Values") – Title for the color legend showing value categories.
chart_kws (dict, optional) – Additional keyword arguments passed to altair.Chart() constructor. Common options: {‘width’: int, ‘height’: int, ‘title’: str}
rect_kws (dict, optional) – Keyword arguments for altair.Chart.mark_rect() to customize rectangles. Common options: {‘stroke’: str, ‘strokeWidth’: float}
x_kws (dict, optional) – Keyword arguments for altair.X() encoding of the x-axis.
y_kws (dict, optional) – Keyword arguments for altair.Y() encoding of the y-axis.
color_kws (dict, optional) – Keyword arguments for altair.Color() encoding, including custom color scales.
- Returns:
Interactive Altair Chart object that can be: - Displayed directly in Jupyter notebooks - Saved to various formats (PNG, SVG, HTML, JSON) - Further customized with additional Altair methods
- Return type:
altair.Chart
Examples
Basic missing data heatmap:
>>> import pandas as pd >>> import scikit_na as na >>> data = pd.DataFrame({ ... 'A': [1, None, 3, None, 5], ... 'B': [1, 2, None, 4, None], ... 'C': [None, None, 3, 4, 5] ... }) >>> chart = na.altair.plot_heatmap(data) >>> chart.show()
Focus on specific columns without sorting:
>>> chart = na.altair.plot_heatmap(data, ... columns=['A', 'B'], ... sort=False)
Simplified view without droppable values:
>>> chart = na.altair.plot_heatmap(data, ... droppable=False, ... names=['Available', 'Missing'])
Customized appearance:
>>> chart = na.altair.plot_heatmap( ... data, ... chart_kws={'width': 400, 'height': 300, 'title': 'Missing Data Pattern'}, ... color_kws={'scale': {'range': ['lightblue', 'red', 'orange']}}, ... font_size=12 ... )
Save to file:
>>> chart = na.altair.plot_heatmap(data) >>> chart.save('missing_data_heatmap.png')
Notes
Green typically represents filled/non-missing values
Red represents missing (NA) values
Orange represents “droppable” values (non-missing values in incomplete rows)
Sorting helps identify systematic missing data patterns
The droppable category shows the collateral damage from listwise deletion
Interactive features allow zooming and tooltips for detailed inspection
Large datasets may require adjusting chart dimensions via chart_kws
See also
plot_stairsVisualize cumulative impact of missing data
plot_corrCorrelation heatmap for missing value patterns
summaryNumerical summary of missing data patterns
- scikit_na.altair.plot_scatter(data: DataFrame, x_col: str, y_col: str, col_na: str, na_label: str | None = None, na_replace: dict | None = None, font_size: int = 14, xlabel: str | None = None, ylabel: str | None = None, circle_kws: dict | None = None, color_kws: dict | None = None, x_kws: dict | None = None, y_kws: dict | None = None)
Scatter plot.
- Parameters:
data (DataFrame) – Input data.
x_col (str) – Column name corresponding to X axis.
y_col (str) – Column name corresponding to Y axis.
col_na (str) – Column name
na_label (str, optional) – Label for NA values in legend.
na_replace (dict, optional) – NA replacement mapping, by default {True: ‘NA’, False: ‘Filled’}.
font_size (int, optional) – Font size for plotting, by default 14.
xlabel (str, optional) – X axis label.
ylabel (str, optional) – Y axis label.
circle_kws (dict, optional) – Keyword arguments passed to
altair.Chart.mark_circle().color_kws (dict, optional) – Keyword arguments passed to
altair.Color().x_kws (dict, optional) – Keyword arguments passed to
altair.X().y_kws (dict, optional) – Keyword arguments passed to
altair.Y().
- Returns:
Scatter plot.
- Return type:
altair.Chart
- scikit_na.altair.plot_stairs(data: DataFrame, columns: Sequence[str] | None = None, xlabel: str = 'Columns', ylabel: str = 'Instances', tooltip_label: str = 'Size difference', dataset_label: str = '(Whole dataset)', font_size: int = 14, area_kws: dict | None = None, chart_kws: dict | None = None, x_kws: dict | None = None, y_kws: dict | None = None)
Stairs plot.
Plots changes in dataset size (rows/instances number) after applying
pandas.DataFrame.dropna()to each column cumulatively.Columns are sorted by maximum influence on dataset size.
- Parameters:
data (DataFrame) – Input data.
columns (Optional[Sequence[str]], optional) – Columns that are to be displayed on a plot.
xlabel (str, optional) – X axis label.
ylabel (str, optional) – Y axis label.
tooltip_label (str, optional) – Label for differences in dataset size that is displayed on a tooltip.
dataset_label (str, optional) – Label for the whole dataset (before dropping any NAs).
area_kws (dict, optional) – Keyword arguments passed to
altair.Chart.mark_area()method.chart_kws (dict, optional) – Keyword arguments passed to
altair.Chart()class.x_kws (dict, optional) – Keyword arguments passed to
altair.X()class.y_kws (dict, optional) – Keyword arguments passed to
altair.Y()class.
- Returns:
Chart object.
- Return type:
altair.Chart