Overview

Dataset statistics

Number of variables4
Number of observations100
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory3.5 KiB
Average record size in memory36.3 B

Variable types

Categorical3
Numeric1

Alerts

avg_sales_pc is highly overall correlated with gu_dcHigh correlation
gu_dc is highly overall correlated with avg_sales_pcHigh correlation

Reproduction

Analysis started2023-12-10 09:58:55.109740
Analysis finished2023-12-10 09:58:56.041475
Duration0.93 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

base_year
Categorical

Distinct3
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2014
61 
2015
36 
2018
 
3

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2014
2nd row2018
3rd row2014
4th row2014
5th row2014

Common Values

ValueCountFrequency (%)
2014 61
61.0%
2015 36
36.0%
2018 3
 
3.0%

Length

2023-12-10T18:58:56.153116image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T18:58:56.333925image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2014 61
61.0%
2015 36
36.0%
2018 3
 
3.0%

base_month
Categorical

Distinct4
Distinct (%)4.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
6
31 
3
30 
9
20 
12
19 

Length

Max length2
Median length1
Mean length1.19
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3
2nd row12
3rd row3
4th row3
5th row3

Common Values

ValueCountFrequency (%)
6 31
31.0%
3 30
30.0%
9 20
20.0%
12 19
19.0%

Length

2023-12-10T18:58:56.525522image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T18:58:56.783731image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
6 31
31.0%
3 30
30.0%
9 20
20.0%
12 19
19.0%

gu_dc
Categorical

HIGH CORRELATION 

Distinct16
Distinct (%)16.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
강서구
영도구
기장군
남구
중구
Other values (11)
65 

Length

Max length4
Median length3
Mean length2.81
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row강서구
2nd row영도구
3rd row기장군
4th row남구
5th row동구

Common Values

ValueCountFrequency (%)
강서구 7
 
7.0%
영도구 7
 
7.0%
기장군 7
 
7.0%
남구 7
 
7.0%
중구 7
 
7.0%
동구 6
 
6.0%
동래구 6
 
6.0%
부산진구 6
 
6.0%
사상구 6
 
6.0%
사하구 6
 
6.0%
Other values (6) 35
35.0%

Length

2023-12-10T18:58:57.073918image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
강서구 7
 
7.0%
영도구 7
 
7.0%
기장군 7
 
7.0%
남구 7
 
7.0%
중구 7
 
7.0%
동구 6
 
6.0%
동래구 6
 
6.0%
부산진구 6
 
6.0%
사상구 6
 
6.0%
사하구 6
 
6.0%
Other values (6) 35
35.0%

avg_sales_pc
Real number (ℝ)

HIGH CORRELATION 

Distinct10
Distinct (%)10.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean21500
Minimum14000
Maximum34000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-12-10T18:58:57.272536image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum14000
5-th percentile16000
Q119500
median22000
Q324000
95-th percentile28000
Maximum34000
Range20000
Interquartile range (IQR)4500

Descriptive statistics

Standard deviation3633.4584
Coefficient of variation (CV)0.16899807
Kurtosis0.76665013
Mean21500
Median Absolute Deviation (MAD)2000
Skewness0.53567382
Sum2150000
Variance13202020
MonotonicityNot monotonic
2023-12-10T18:58:57.477961image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
22000 31
31.0%
20000 16
16.0%
24000 16
16.0%
18000 13
13.0%
16000 11
 
11.0%
28000 6
 
6.0%
26000 3
 
3.0%
30000 2
 
2.0%
14000 1
 
1.0%
34000 1
 
1.0%
ValueCountFrequency (%)
14000 1
 
1.0%
16000 11
 
11.0%
18000 13
13.0%
20000 16
16.0%
22000 31
31.0%
24000 16
16.0%
26000 3
 
3.0%
28000 6
 
6.0%
30000 2
 
2.0%
34000 1
 
1.0%
ValueCountFrequency (%)
34000 1
 
1.0%
30000 2
 
2.0%
28000 6
 
6.0%
26000 3
 
3.0%
24000 16
16.0%
22000 31
31.0%
20000 16
16.0%
18000 13
13.0%
16000 11
 
11.0%
14000 1
 
1.0%

Interactions

2023-12-10T18:58:55.578959image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-10T18:58:57.669368image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
base_yearbase_monthgu_dcavg_sales_pc
base_year1.0000.3650.0000.824
base_month0.3651.0000.0000.000
gu_dc0.0000.0001.0000.917
avg_sales_pc0.8240.0000.9171.000
2023-12-10T18:58:57.888723image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
base_monthbase_yeargu_dc
base_month1.0000.3520.000
base_year0.3521.0000.000
gu_dc0.0000.0001.000
2023-12-10T18:58:58.036872image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
avg_sales_pcbase_yearbase_monthgu_dc
avg_sales_pc1.0000.4600.0000.673
base_year0.4601.0000.3520.000
base_month0.0000.3521.0000.000
gu_dc0.6730.0000.0001.000

Missing values

2023-12-10T18:58:55.815515image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-10T18:58:55.974739image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

base_yearbase_monthgu_dcavg_sales_pc
020143강서구22000
1201812영도구16000
220143기장군20000
320143남구22000
420143동구20000
520143동래구28000
620143부산진구22000
7201812중구14000
820143사상구18000
920143사하구16000
base_yearbase_monthgu_dcavg_sales_pc
9020156서구24000
9120156수영구30000
9220156연제구24000
9320156영도구16000
9420156중구22000
9520156해운대구24000
9620159강서구24000
9720159금정구24000
9820159기장군20000
9920159남구22000