{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Data visualization" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import scikit_na as na\n", "data = pd.read_csv('../../_tests/data/titanic_dataset.csv')" ] }, { "source": [ "## Heatmap\n", "\n", "### NA values\n", "\n", "Missing data can be visualized on a heatmap to quickly grasp its patterns. We will be using\n", "[Altair](https://altair-viz.github.io) + [Vega](https://vega.github.io/vega-lite/)\n", "backend. To plot a heatmap of NAs, simply pass your DataFrame to `scikit_na.altair.plot_heatmap()` function.\n", "\n", "Droppables are those values that will be dropped if we simply use ``pandas.DataFrame.dropna()`` on the *whole dataset*. By default, columns are sorted by the number of NA values." ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/html": "\n
\n", "text/plain": [ "alt.Chart(...)" ] }, "metadata": {}, "execution_count": null } ], "source": [ "na.altair.plot_heatmap(data)" ] }, { "source": [ "### Correlations" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "Correlations can be plotted using `scikit_na.altair.plot_corr()` function. Under the hood, it calls `scikit_na.correlate()` function with your input DataFrame as the first argument:" ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/html": "\n\n", "text/plain": [ "alt.LayerChart(...)" ] }, "metadata": {}, "execution_count": null } ], "source": [ "na.altair.plot_corr(data).properties(width=125, height=125)" ] }, { "source": [ "## Stairs plot\n", "\n", "Stairs plot is a useful visualization of a dataset shrinkage on applying\n", "``pandas.Series.dropna()`` method to each column sequentially (sorted by the\n", "number of NA values, by default):" ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "na.altair.plot_stairs(data)" ] }, { "source": [ "After dropping all NAs in *Cabin* column, we are left with 21 more NAs (in *Age*\n", "and *Embarked* columns). This plot also shows tooltips with exact numbers of NA\n", "values that are dropped per each column." ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/html": "\n\n", "text/plain": [ "alt.Chart(...)" ] }, "metadata": {}, "execution_count": null } ], "source": [ "na.altair.plot_stairbars(data)" ] }, { "source": [ "## Histogram\n", "\n", "Plotting a nice histogram may require configuring additional parameters." ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/html": "\n\n", "text/plain": [ "alt.Chart(...)" ] }, "metadata": {}, "execution_count": null } ], "source": [ "chart = na.altair.plot_hist(data, col='Pclass', col_na='Age')\\\n", " .properties(width=200, height=200)\n", "chart.configure_axisX(labelAngle = 0)" ] }, { "source": [ "## Density plot" ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/html": "\n\n", "text/plain": [ "alt.Chart(...)" ] }, "metadata": {}, "execution_count": null } ], "source": [ "chart = na.altair.plot_kde(data, col='Age', col_na='Cabin')\\\n", " .properties(width=200, height=200)\n", "chart.configure_axisX(labelAngle = 0)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.9.5 64-bit", "language": "python", "name": "python39564bit37ed7351e0344c1faf3897132336b259" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.5-final" } }, "nbformat": 4, "nbformat_minor": 4 }