Overview

Dataset statistics

Number of variables4
Number of observations32
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.2 KiB
Average record size in memory38.0 B

Variable types

Categorical2
Numeric2

Dataset

Description맞춤형복지 연도별, 소비항목별 제휴업체 이용실적 데이터 자료로 소비항목에는 자기계발, 건강관리, 여가생활 등이 있습니다.
Author공무원연금공단
URLhttps://www.data.go.kr/data/3062872/fileData.do

Alerts

이용건수 is highly correlated with 판매액High correlation
판매액 is highly correlated with 이용건수High correlation
이용건수 has unique values Unique
판매액 has unique values Unique

Reproduction

Analysis started2022-11-19 08:49:23.259081
Analysis finished2022-11-19 08:49:23.954976
Duration0.7 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

구분
Categorical

Distinct8
Distinct (%)25.0%
Missing0
Missing (%)0.0%
Memory size384.0 B
2021년
2020년
2019년
2018년
2017년
Other values (3)
12 

Length

Max length5
Median length5
Mean length5
Min length5

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2021년
2nd row2021년
3rd row2021년
4th row2021년
5th row2020년

Common Values

ValueCountFrequency (%)
2021년4
12.5%
2020년4
12.5%
2019년4
12.5%
2018년4
12.5%
2017년4
12.5%
2016년4
12.5%
2015년4
12.5%
2014년4
12.5%

Length

2022-11-19T17:49:23.994323image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-19T17:49:24.089139image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
2021년4
12.5%
2020년4
12.5%
2019년4
12.5%
2018년4
12.5%
2017년4
12.5%
2016년4
12.5%
2015년4
12.5%
2014년4
12.5%

소비항목
Categorical

Distinct5
Distinct (%)15.6%
Missing0
Missing (%)0.0%
Memory size384.0 B
자기계발
건강관리
가정친화
여가활동
여가생활

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row자기계발
2nd row건강관리
3rd row여가생활
4th row가정친화
5th row자기계발

Common Values

ValueCountFrequency (%)
자기계발8
25.0%
건강관리8
25.0%
가정친화8
25.0%
여가활동5
15.6%
여가생활3
 
9.4%

Length

2022-11-19T17:49:24.330954image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-19T17:49:24.416367image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
자기계발8
25.0%
건강관리8
25.0%
가정친화8
25.0%
여가활동5
15.6%
여가생활3
 
9.4%

이용건수
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct32
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean167452.2188
Minimum138
Maximum890368
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size416.0 B
2022-11-19T17:49:24.503256image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum138
5-th percentile1864.4
Q18919.5
median19059
Q3147501
95-th percentile748002
Maximum890368
Range890230
Interquartile range (IQR)138581.5

Descriptive statistics

Standard deviation280498.5441
Coefficient of variation (CV)1.675096014
Kurtosis0.9016869299
Mean167452.2188
Median Absolute Deviation (MAD)12223
Skewness1.530044541
Sum5358471
Variance7.867943324 × 1010
MonotonicityNot monotonic
2022-11-19T17:49:24.586776image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=32)
ValueCountFrequency (%)
1381
 
3.1%
6089491
 
3.1%
4594421
 
3.1%
310451
 
3.1%
8903681
 
3.1%
48701
 
3.1%
108801
 
3.1%
8582441
 
3.1%
16401
 
3.1%
58721
 
3.1%
Other values (22)22
68.8%
ValueCountFrequency (%)
1381
3.1%
16401
3.1%
20481
3.1%
42781
3.1%
48701
3.1%
58721
3.1%
65991
3.1%
85521
3.1%
90421
3.1%
108801
3.1%
ValueCountFrequency (%)
8903681
3.1%
8582441
3.1%
6578041
3.1%
6089491
3.1%
5672981
3.1%
5460131
3.1%
4594421
3.1%
4035811
3.1%
621411
3.1%
310451
3.1%

판매액
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct32
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean12457682.44
Minimum39801
Maximum66802101
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size416.0 B
2022-11-19T17:49:24.675292image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum39801
5-th percentile230173.75
Q12660618
median3924835
Q310557159.5
95-th percentile52096875.6
Maximum66802101
Range66762300
Interquartile range (IQR)7896541.5

Descriptive statistics

Standard deviation18331620.62
Coefficient of variation (CV)1.471511311
Kurtosis2.163073746
Mean12457682.44
Median Absolute Deviation (MAD)1666380.5
Skewness1.766791928
Sum398645838
Variance3.360483145 × 1014
MonotonicityNot monotonic
2022-11-19T17:49:24.764185image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=32)
ValueCountFrequency (%)
398011
 
3.1%
368705381
 
3.1%
266202181
 
3.1%
38891031
 
3.1%
668021011
 
3.1%
2169821
 
3.1%
37080461
 
3.1%
588069681
 
3.1%
21370131
 
3.1%
2409671
 
3.1%
Other values (22)22
68.8%
ValueCountFrequency (%)
398011
3.1%
2169821
3.1%
2409671
3.1%
3138951
3.1%
6121391
3.1%
8321831
3.1%
12180971
3.1%
21370131
3.1%
28351531
3.1%
28432901
3.1%
ValueCountFrequency (%)
668021011
3.1%
588069681
3.1%
466068001
3.1%
368705381
3.1%
357223251
3.1%
316952101
3.1%
266202181
3.1%
258193161
3.1%
54697741
3.1%
53416091
3.1%

Interactions

2022-11-19T17:49:23.563647image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-19T17:49:23.337870image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-19T17:49:23.639155image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-11-19T17:49:23.482745image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Correlations

2022-11-19T17:49:24.847468image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-11-19T17:49:24.926423image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-11-19T17:49:25.008460image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-11-19T17:49:25.101162image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2022-11-19T17:49:25.191609image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2022-11-19T17:49:23.832350image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-11-19T17:49:23.923846image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

구분소비항목이용건수판매액
02021년자기계발13839801
12021년건강관리124423946677
22021년여가생활310453889103
32021년가정친화89036866802101
42020년자기계발4870216982
52020년건강관리108803708046
62020년여가생활85824458806968
72020년가정친화16402137013
82019년자기계발5872240967
92019년건강관리90422845511

Last rows

구분소비항목이용건수판매액
222016년여가활동188094952600
232016년가정친화54601331695210
242015년자기계발23545832183
252015년건강관리197093902993
262015년여가활동116564371353
272015년가정친화40358125819316
282014년자기계발621412843290
292014년건강관리193094403357
302014년여가활동139314145821
312014년가정친화45944226620218