Released a DataFrame summarytool for Jupyter Notebook

less than 1 minute read

About the package

This is python version of summarytools, which is used to generate standardized and comprehensive summary of pandas DataFrame in Jupyter Notebooks.

The idea is originated from the summarytools R package . Only dfSummary function is made available for now. I also added two html widgets (collapsible/tabbed view) to avoid displaying lengthy content.

Quick Start

default view

out-of-box dfSummary function will generate a HTML based data frame summary.

import pandas as pd
from summarytools import dfSummary
titanic = pd.read_csv('./data/titanic.csv')
dfSummary(titanic)

If too many data summaries are included in the same notebook, the following two widgets should be able to help.

collapsible view

import pandas as pd
from summarytools import dfSummary
titanic = pd.read_csv('./data/titanic.csv')
dfSummary(titanic, is_collapsible = True)

tabbed view

import pandas as pd
from summarytools import dfSummary, tabset
titanic = pd.read_csv('./data/titanic.csv')
vaccine = pd.read_csv('./data/country_vaccinations.csv')
vaccine['date'] = pd.to_datetime(vaccine['date'])

tabset({
    'titanic': dfSummary(titanic).render(),
    'vaccine': dfSummary(vaccine).render()})

Export as HTML

when export jupyter notebook to HTML, make sure Export Embedded HTML extension is installed and enabled.

Using the following bash command to retain the data frame summary in exported HTML.

jupyter nbconvert --to html_embed path/of/your/notebook.ipynb

Installation

detail is available at https://github.com/6chaoran/jupyter-summarytools

Leave a comment