This will generate an interactive dashboard where you can explore everything that you need.įor more information and examples, you can refer to the official documentation ĭespite being a fantastic tool, it also has some disadvantages. You can directly view the report on your jupyter notebook, but I would prefer converting the report to an HTML file and then view on a browser. Import the library from pandas_profiling import ProfileReport profile = ProfileReport(df, title=”Pandas Profiling Report”)
AUTODATA WIKIPEDIA INSTALL
Install the library pip install pandas-profiling File and Image analysis extract file sizes, creation dates and dimensions and scan for truncated images or those containing EXIF information.Īpart from this, correlations and interaction between variables are also presented in the report.Text analysis learns about categories (Uppercase, Space), scripts (Latin, Cyrillic) and blocks (ASCII) of text data.Missing values matrix, count, heatmap and dendrogram of missing values.Correlations highlighting of highly correlated variables, Spearman, Pearson and Kendall matrices.Descriptive statistics like mean, mode, standard deviation, sum, median absolute deviation, coefficient of variation, kurtosis, skewness.Quantile statistics like minimum value, Q1, median, Q3, maximum, range, interquartile range.Essentials: type, unique values, missing values.Type inference: detect the types of columns in a data frame.Here’s a quick look at how the reports look like.įor each column, the following statistics - if relevant for the column type - are presented in the report: This library automatically generates detailed reports explaining the data in just one line of code! This is where a really cool library called Pandas Profiling comes handy.
AUTODATA WIKIPEDIA MANUAL
The re are a few functions like info() and describe() which does help to an extent, but still, you’ll have to perform a lot of manual steps even after using these functions. If you’re nor aware of how EDA is performed, here are a few examples you can refer to.īut EDA is often a very time-consuming task which requires you to build multiple visuals to check distributions and interaction between variables. This makes EDA the very first step in any data science process before building any statistical model. So, EDA is the process of understanding the underlying data, distribution of variables and their correlations. According to Wikipedia, exploratory data analysis(EDA) is an approach to analyzing data sets to summarize their main characteristics, often with visual methods.