Overview

Dataset statistics

Number of variables3
Number of observations615
Missing cells0
Missing cells (%)0.0%
Duplicate rows84
Duplicate rows (%)13.7%
Total size in memory15.7 KiB
Average record size in memory26.2 B

Variable types

Categorical1
Text1
Numeric1

Dataset

Description경기도 경기통계시스템 추출 자료항목리스트
Author경기도
URLhttps://data.gg.go.kr/portal/data/service/selectServicePage.do?&infId=4RT7D5AVAE7EU9J6E3LG33513440&infSeq=1

Alerts

조직번호 has constant value ""Constant
Dataset has 84 (13.7%) duplicate rowsDuplicates

Reproduction

Analysis started2023-12-10 21:54:09.449996
Analysis finished2023-12-10 21:54:09.723531
Duration0.27 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

조직번호
Categorical

CONSTANT 

Distinct1
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size4.9 KiB
210
615 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row210
2nd row210
3rd row210
4th row210
5th row210

Common Values

ValueCountFrequency (%)
210 615
100.0%

Length

2023-12-11T06:54:09.777461image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T06:54:09.861067image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
210 615
100.0%
Distinct137
Distinct (%)22.3%
Missing0
Missing (%)0.0%
Memory size4.9 KiB
2023-12-11T06:54:10.057344image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length20
Median length19
Mean length13.930081
Min length11

Characters and Unicode

Total characters8567
Distinct characters28
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique55 ?
Unique (%)8.9%

Sample

1st rowDT_20220025
2nd rowDT_20220025
3rd rowDT_20220025
4th rowDT_20220025
5th rowDT_20220025
ValueCountFrequency (%)
dt_21002_j010 149
24.2%
dt_21002_l007 41
 
6.7%
dt_21002_m016 17
 
2.8%
dt_21002_m023 15
 
2.4%
dt_21002_n001 12
 
2.0%
dt_20114_2021026_04 12
 
2.0%
dt_20114_2021026_05 12
 
2.0%
dt_20114_2021026_01 12
 
2.0%
dt_20114_2021026_02 12
 
2.0%
dt_21002_k010 11
 
1.8%
Other values (127) 322
52.4%
2023-12-11T06:54:10.428335image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 2400
28.0%
2 1476
17.2%
_ 1214
14.2%
1 1087
12.7%
D 643
 
7.5%
T 615
 
7.2%
7 217
 
2.5%
4 152
 
1.8%
J 150
 
1.8%
5 102
 
1.2%
Other values (18) 511
 
6.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 5727
66.8%
Uppercase Letter 1626
 
19.0%
Connector Punctuation 1214
 
14.2%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
D 643
39.5%
T 615
37.8%
J 150
 
9.2%
L 41
 
2.5%
M 36
 
2.2%
B 25
 
1.5%
I 22
 
1.4%
C 21
 
1.3%
E 16
 
1.0%
K 15
 
0.9%
Other values (7) 42
 
2.6%
Decimal Number
ValueCountFrequency (%)
0 2400
41.9%
2 1476
25.8%
1 1087
19.0%
7 217
 
3.8%
4 152
 
2.7%
5 102
 
1.8%
6 88
 
1.5%
8 88
 
1.5%
3 81
 
1.4%
9 36
 
0.6%
Connector Punctuation
ValueCountFrequency (%)
_ 1214
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 6941
81.0%
Latin 1626
 
19.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
D 643
39.5%
T 615
37.8%
J 150
 
9.2%
L 41
 
2.5%
M 36
 
2.2%
B 25
 
1.5%
I 22
 
1.4%
C 21
 
1.3%
E 16
 
1.0%
K 15
 
0.9%
Other values (7) 42
 
2.6%
Common
ValueCountFrequency (%)
0 2400
34.6%
2 1476
21.3%
_ 1214
17.5%
1 1087
15.7%
7 217
 
3.1%
4 152
 
2.2%
5 102
 
1.5%
6 88
 
1.3%
8 88
 
1.3%
3 81
 
1.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 8567
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 2400
28.0%
2 1476
17.2%
_ 1214
14.2%
1 1087
12.7%
D 643
 
7.5%
T 615
 
7.2%
7 217
 
2.5%
4 152
 
1.8%
J 150
 
1.8%
5 102
 
1.2%
Other values (18) 511
 
6.0%

최종변경일
Real number (ℝ)

Distinct35
Distinct (%)5.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20214232
Minimum20171117
Maximum20230412
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.5 KiB
2023-12-11T06:54:10.557469image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum20171117
5-th percentile20181224
Q120211221
median20220712
Q320221219
95-th percentile20221219
Maximum20230412
Range59295
Interquartile range (IQR)9998

Descriptive statistics

Standard deviation12683.958
Coefficient of variation (CV)0.00062747665
Kurtosis1.4674371
Mean20214232
Median Absolute Deviation (MAD)507
Skewness-1.6218291
Sum1.2431753 × 1010
Variance1.608828 × 108
MonotonicityNot monotonic
2023-12-11T06:54:10.675608image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=35)
ValueCountFrequency (%)
20221219 149
24.2%
20220407 79
12.8%
20220712 59
 
9.6%
20200509 48
 
7.8%
20200508 35
 
5.7%
20221012 35
 
5.7%
20220711 31
 
5.0%
20221210 29
 
4.7%
20181224 27
 
4.4%
20221206 17
 
2.8%
Other values (25) 106
17.2%
ValueCountFrequency (%)
20171117 1
 
0.2%
20180112 10
 
1.6%
20180510 2
 
0.3%
20180626 4
 
0.7%
20180814 1
 
0.2%
20180824 1
 
0.2%
20180827 5
 
0.8%
20181224 27
4.4%
20200421 16
2.6%
20200508 35
5.7%
ValueCountFrequency (%)
20230412 3
 
0.5%
20230329 6
 
1.0%
20221219 149
24.2%
20221212 6
 
1.0%
20221210 29
 
4.7%
20221209 6
 
1.0%
20221206 17
 
2.8%
20221203 1
 
0.2%
20221102 1
 
0.2%
20221025 1
 
0.2%

Interactions

2023-12-11T06:54:09.499393image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Missing values

2023-12-11T06:54:09.611993image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T06:54:09.690734image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

조직번호통계표ID최종변경일
0210DT_2022002520220407
1210DT_2022002520220407
2210DT_2022002520220407
3210DT_2022002520220712
4210DT_2022002520220407
5210DT_20114_2021012_0620220711
6210DT_20114_2021012_0620220711
7210DT_20114_2021012_0620220711
8210DT_20114_2021012_0620220712
9210DT_20114_2021035_0320220712
조직번호통계표ID최종변경일
605210DT_20114_2021026_0420220712
606210DT_20114_2021026_0420220712
607210DT_20114_2021026_0420220712
608210DT_20114_2021026_0420220712
609210DT_20114_2021026_0420220712
610210DT_20114_2021035_0220220712
611210DT_2022001420220407
612210DT_2022001420220407
613210DT_2022001420220407
614210DT_2022001420220407

Duplicate rows

Most frequently occurring

조직번호통계표ID최종변경일# duplicates
72210DT_21002_J01020221219149
75210DT_21002_L0072022101235
78210DT_21002_M0162022120616
79210DT_21002_M0232022121014
9210DT_20114_2021026_022022071212
10210DT_20114_2021026_042022071212
11210DT_20114_2021026_052022071212
80210DT_21002_N0012022121012
8210DT_20114_2021026_012022071111
73210DT_21002_K0102021122111