gimi9 Pandas Profiling

Dataset statistics

Number of variables	4
Number of observations	32
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	1.2 KiB
Average record size in memory	38.0 B

Variable types

Categorical	2
Numeric	2

Dataset

Description	맞춤형복지 연도별, 소비항목별 제휴업체 이용실적 데이터 자료로 소비항목에는 자기계발, 건강관리, 여가생활 등이 있습니다.
Author	공무원연금공단
URL	https://www.data.go.kr/data/3062872/fileData.do

Alerts

`이용건수` is highly correlated with `판매액`	High correlation
`판매액` is highly correlated with `이용건수`	High correlation
`이용건수` has unique values	Unique
`판매액` has unique values	Unique

Reproduction

Analysis started	2022-11-19 08:49:23.259081
Analysis finished	2022-11-19 08:49:23.954976
Duration	0.7 seconds
Software version	pandas-profiling v3.2.0
Download configuration	config.json

구분
Categorical

Distinct	8
Distinct (%)	25.0%
Missing	0
Missing (%)	0.0%
Memory size	384.0 B

2021년	4
2020년	4
2019년	4
2018년	4
2017년	4
Other values (3)	12

Length

Max length	5
Median length	5
Mean length	5
Min length	5

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	2021년
2nd row	2021년
3rd row	2021년
4th row	2021년
5th row	2020년

Common Values

Value	Count	Frequency (%)
2021년	4	12.5%
2020년	4	12.5%
2019년	4	12.5%
2018년	4	12.5%
2017년	4	12.5%
2016년	4	12.5%
2015년	4	12.5%
2014년	4	12.5%

Length

Histogram of lengths of the category

Category Frequency Plot

Value	Count	Frequency (%)
2021년	4	12.5%
2020년	4	12.5%
2019년	4	12.5%
2018년	4	12.5%
2017년	4	12.5%
2016년	4	12.5%
2015년	4	12.5%
2014년	4	12.5%

소비항목
Categorical

Distinct	5
Distinct (%)	15.6%
Missing	0
Missing (%)	0.0%
Memory size	384.0 B

자기계발	8
건강관리	8
가정친화	8
여가활동	5
여가생활	3

Length

Max length	4
Median length	4
Mean length	4
Min length	4

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	자기계발
2nd row	건강관리
3rd row	여가생활
4th row	가정친화
5th row	자기계발

Common Values

Value	Count	Frequency (%)
자기계발	8	25.0%
건강관리	8	25.0%
가정친화	8	25.0%
여가활동	5	15.6%
여가생활	3	9.4%

Length

Histogram of lengths of the category

Category Frequency Plot

Value	Count	Frequency (%)
자기계발	8	25.0%
건강관리	8	25.0%
가정친화	8	25.0%
여가활동	5	15.6%
여가생활	3	9.4%

이용건수
Real number (ℝ_≥0)

HIGH CORRELATION
UNIQUE

Distinct	32
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	167452.2188

Minimum	138
Maximum	890368
Zeros	0
Zeros (%)	0.0%
Negative	0
Negative (%)	0.0%
Memory size	416.0 B

Quantile statistics

Minimum	138
5-th percentile	1864.4
Q1	8919.5
median	19059
Q3	147501
95-th percentile	748002
Maximum	890368
Range	890230
Interquartile range (IQR)	138581.5

Descriptive statistics

Standard deviation	280498.5441
Coefficient of variation (CV)	1.675096014
Kurtosis	0.9016869299
Mean	167452.2188
Median Absolute Deviation (MAD)	12223
Skewness	1.530044541
Sum	5358471
Variance	7.867943324 × 10¹⁰
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=32)

Value	Count	Frequency (%)
138	1	3.1%
608949	1	3.1%
459442	1	3.1%
31045	1	3.1%
890368	1	3.1%
4870	1	3.1%
10880	1	3.1%
858244	1	3.1%
1640	1	3.1%
5872	1	3.1%
Other values (22)	22	68.8%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
138	1	3.1%
1640	1	3.1%
2048	1	3.1%
4278	1	3.1%
4870	1	3.1%
5872	1	3.1%
6599	1	3.1%
8552	1	3.1%
9042	1	3.1%
10880	1	3.1%

Value	Count	Frequency (%)
890368	1	3.1%
858244	1	3.1%
657804	1	3.1%
608949	1	3.1%
567298	1	3.1%
546013	1	3.1%
459442	1	3.1%
403581	1	3.1%
62141	1	3.1%
31045	1	3.1%

판매액
Real number (ℝ_≥0)

HIGH CORRELATION
UNIQUE

Distinct	32
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	12457682.44

Minimum	39801
Maximum	66802101
Zeros	0
Zeros (%)	0.0%
Negative	0
Negative (%)	0.0%
Memory size	416.0 B

Quantile statistics

Minimum	39801
5-th percentile	230173.75
Q1	2660618
median	3924835
Q3	10557159.5
95-th percentile	52096875.6
Maximum	66802101
Range	66762300
Interquartile range (IQR)	7896541.5

Descriptive statistics

Standard deviation	18331620.62
Coefficient of variation (CV)	1.471511311
Kurtosis	2.163073746
Mean	12457682.44
Median Absolute Deviation (MAD)	1666380.5
Skewness	1.766791928
Sum	398645838
Variance	3.360483145 × 10¹⁴
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=32)

Value	Count	Frequency (%)
39801	1	3.1%
36870538	1	3.1%
26620218	1	3.1%
3889103	1	3.1%
66802101	1	3.1%
216982	1	3.1%
3708046	1	3.1%
58806968	1	3.1%
2137013	1	3.1%
240967	1	3.1%
Other values (22)	22	68.8%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
39801	1	3.1%
216982	1	3.1%
240967	1	3.1%
313895	1	3.1%
612139	1	3.1%
832183	1	3.1%
1218097	1	3.1%
2137013	1	3.1%
2835153	1	3.1%
2843290	1	3.1%

Value	Count	Frequency (%)
66802101	1	3.1%
58806968	1	3.1%
46606800	1	3.1%
36870538	1	3.1%
35722325	1	3.1%
31695210	1	3.1%
26620218	1	3.1%
25819316	1	3.1%
5469774	1	3.1%
5341609	1	3.1%

이용건수
판매액

판매액
이용건수

판매액
이용건수

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows

	구분	소비항목	이용건수	판매액
0	2021년	자기계발	138	39801
1	2021년	건강관리	12442	3946677
2	2021년	여가생활	31045	3889103
3	2021년	가정친화	890368	66802101
4	2020년	자기계발	4870	216982
5	2020년	건강관리	10880	3708046
6	2020년	여가생활	858244	58806968
7	2020년	가정친화	1640	2137013
8	2019년	자기계발	5872	240967
9	2019년	건강관리	9042	2845511

Last rows

	구분	소비항목	이용건수	판매액
22	2016년	여가활동	18809	4952600
23	2016년	가정친화	546013	31695210
24	2015년	자기계발	23545	832183
25	2015년	건강관리	19709	3902993
26	2015년	여가활동	11656	4371353
27	2015년	가정친화	403581	25819316
28	2014년	자기계발	62141	2843290
29	2014년	건강관리	19309	4403357
30	2014년	여가활동	13931	4145821
31	2014년	가정친화	459442	26620218

Overview

Variables

Common Values

Length

Category Frequency Plot

Common Values

Length

Category Frequency Plot

Interactions

Correlations

Pearson's r

Spearman's ρ

Kendall's τ

Phik (φk)

Cramér's V (φc)

Missing values

Sample

First rows

Last rows