scikit_na.altair

scikit_na.altair.plot_corr(data: DataFrame, columns: Sequence[str] | None = None, mask_diag: bool = True, annot_color: str = 'black', round_sgn: int = 2, font_size: int = 14, opacity: float = 0.5, corr_kws: dict = None, chart_kws: dict = None, x_kws: dict = None, y_kws: dict = None, color_kws: dict = None, text_kws: dict = None) Chart

Correlation heatmap.

Parameters:
  • data (DataFrame) – Input data.

  • columns (Optional[Sequence[str]]) – Columns names.

  • mask_diag (bool = True) – Mask diagonal on heatmap.

  • corr_kws (dict, optional) – Keyword arguments passed to pandas.DataFrame.corr() method.

  • heat_kws (dict, optional) – Keyword arguments passed to seaborn.heatmap() method.

Returns:

Altair Chart object.

Return type:

altair.Chart

scikit_na.altair.plot_hist(data: DataFrame, col: str, col_na: str, na_label: str = None, na_replace: dict = None, heuristic: bool = True, thres_uniq: int = 20, step: bool = False, norm: bool = True, font_size: int = 14, xlabel: str = None, ylabel: str = 'Frequency', chart_kws: dict = None, markarea_kws: dict = None, markbar_kws: dict = None, joinagg_kws: dict = None, calc_kws: dict = None, x_kws: dict = None, y_kws: dict = None, color_kws: dict = None) Chart

Histogram plot.

Plots a histogram of values in a column col grouped by NA/non-NA values in column col_na.

Parameters:
  • data (DataFrame) – Input data.

  • col (str) – Column to display distribution of values.

  • col_na (str) – Column to group values by.

  • na_label (str, optional) – Legend title.

  • na_replace (dict, optional) – Dictionary to replace values returned by pandas.Series.isna() method.

  • step (bool, optional) – Draw step plot.

  • norm (bool, optional) – Normalize values in groups.

  • xlabel (str, optional) – X axis label.

  • ylabel (str, optional) – Y axis label.

  • chart_kws (dict, optional) – Keyword arguments passed to altair.Chart().

  • markarea_kws (dict, optional) – Keyword arguments passed to altair.Chart.mark_area().

  • markbar_kws (dict, optional) – Keyword arguments passed to altair.Chart.mark_bar().

  • joinagg_kws (dict, optional) – Keyword arguments passed to altair.Chart.transform_joinaggregate().

  • calc_kws (dict, optional) – Keyword arguments passed to altair.Chart.transform_calculate().

  • x_kws (dict, optional) – Keyword arguments passed to altair.X().

  • y_kws (dict, optional) – Keyword arguments passed to altair.Y().

  • color_kws (dict, optional) – Keyword arguments passed to altair.Color().

Returns:

Altair Chart object.

Return type:

Chart

scikit_na.altair.plot_kde(data: DataFrame, col: str, col_na: str, na_label: str = None, na_replace: dict = None, font_size: int = 14, xlabel: str = None, ylabel: str = 'Density', chart_kws: dict = None, markarea_kws: dict = None, density_kws: dict = None, x_kws: dict = None, y_kws: dict = None, color_kws: dict = None) Chart

Density plot.

Plots distribution of values in a column col grouped by NA/non-NA values in column col_na.

Parameters:
  • data (DataFrame) – Input data.

  • col (str) – Column to display distribution of values.

  • col_na (str) – Column to group values by.

  • na_label (str, optional) – Legend title.

  • na_replace (dict, optional) – Dictionary to replace values returned by pandas.Series.isna() method.

  • xlabel (str, optional) – X axis label.

  • ylabel (str, optional) – Y axis label.

  • chart_kws (dict, optional) – Keyword arguments passed to altair.Chart().

  • markarea_kws (dict, optional) – Keyword arguments passed to altair.Chart.mark_area().

  • density_kws (dict, optional) – Keyword arguments passed to altair.Chart.transform_density().

  • x_kws (dict, optional) – Keyword arguments passed to altair.X().

  • y_kws (dict, optional) – Keyword arguments passed to altair.Y().

  • color_kws (dict, optional) – Keyword arguments passed to altair.Color().

Returns:

Altair Chart object.

Return type:

Chart

scikit_na.altair.plot_heatmap(data: DataFrame, columns: Sequence[str] | None = None, names: list = None, sort: bool = True, droppable: bool = True, font_size: int = 14, xlabel: str = 'Columns', ylabel: str = 'Rows', zlabel: str = 'Values', chart_kws: dict = None, rect_kws: dict = None, x_kws: dict = None, y_kws: dict = None, color_kws: dict = None) Chart

Heatmap plot for NA/non-NA values.

By default, it also indicates values that are to be dropped by pandas.DataFrame.dropna() method.

Parameters:
  • data (DataFrame) – Input data.

  • columns (Optional[Sequence[str]], optional) – Columns that are to be displayed on a plot.

  • names (list, optional) – Values labels passed as a list. The first element corresponds to non-missing values, the second one to NA values, and the last one to droppable values, i.e. values to be dropped by pandas.DataFrame.dropna().

  • sort (bool, optional) – Sort values as NA/non-NA.

  • droppable (bool, optional) – Show values to be dropped by pandas.DataFrame.dropna() method.

  • xlabel (str, optional) – X axis label.

  • ylabel (str, optional) – Y axis label.

  • zlabel (str, optional) – Groups label (shown as a legend title).

  • chart_kws (dict, optional) – Keyword arguments passed to altair.Chart() class.

  • rect_kws (dict, optional) – Keyword arguments passed to altair.Chart.mark_rect() method.

  • x_kws (dict, optional) – Keyword arguments passed to altair.X() class.

  • y_kws (dict, optional) – Keyword arguments passed to altair.Y() class.

  • color_kws (dict, optional) – Keyword arguments passed to altair.Color() class.

Returns:

Altair Chart object.

Return type:

altair.Chart

scikit_na.altair.plot_scatter(data: DataFrame, x_col: str, y_col: str, col_na: str, na_label: str = None, na_replace: dict = None, font_size: int = 14, xlabel: str = None, ylabel: str = None, circle_kws: dict = None, color_kws: dict = None, x_kws: dict = None, y_kws: dict = None)

Scatter plot.

Parameters:
  • data (DataFrame) – Input data.

  • x_col (str) – Column name corresponding to X axis.

  • y_col (str) – Column name corresponding to Y axis.

  • col_na (str) – Column name

  • na_label (str, optional) – Label for NA values in legend.

  • na_replace (dict, optional) – NA replacement mapping, by default {True: ‘NA’, False: ‘Filled’}.

  • font_size (int, optional) – Font size for plotting, by default 14.

  • xlabel (str, optional) – X axis label.

  • ylabel (str, optional) – Y axis label.

  • circle_kws (dict, optional) – Keyword arguments passed to altair.Chart.mark_circle().

  • color_kws (dict, optional) – Keyword arguments passed to altair.Color().

  • x_kws (dict, optional) – Keyword arguments passed to altair.X().

  • y_kws (dict, optional) – Keyword arguments passed to altair.Y().

Returns:

Scatter plot.

Return type:

altair.Chart

scikit_na.altair.plot_stairs(data: DataFrame, columns: Sequence[str] | None = None, xlabel: str = 'Columns', ylabel: str = 'Instances', tooltip_label: str = 'Size difference', dataset_label: str = '(Whole dataset)', font_size: int = 14, area_kws: dict = None, chart_kws: dict = None, x_kws: dict = None, y_kws: dict = None)

Stairs plot.

Plots changes in dataset size (rows/instances number) after applying pandas.DataFrame.dropna() to each column cumulatively.

Columns are sorted by maximum influence on dataset size.

Parameters:
  • data (DataFrame) – Input data.

  • columns (Optional[Sequence[str]], optional) – Columns that are to be displayed on a plot.

  • xlabel (str, optional) – X axis label.

  • ylabel (str, optional) – Y axis label.

  • tooltip_label (str, optional) – Label for differences in dataset size that is displayed on a tooltip.

  • dataset_label (str, optional) – Label for the whole dataset (before dropping any NAs).

  • area_kws (dict, optional) – Keyword arguments passed to altair.Chart.mark_area() method.

  • chart_kws (dict, optional) – Keyword arguments passed to altair.Chart() class.

  • x_kws (dict, optional) – Keyword arguments passed to altair.X() class.

  • y_kws (dict, optional) – Keyword arguments passed to altair.Y() class.

Returns:

Chart object.

Return type:

altair.Chart