Overview

Dataset statistics

Number of variables7
Number of observations10000
Missing cells9378
Missing cells (%)13.4%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory654.3 KiB
Average record size in memory67.0 B

Variable types

Text3
Categorical4

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15658/S/1/datasetView.do

Alerts

작업_일자 has constant value ""Constant
지역지구구역_구분_코드 is highly overall correlated with 지역지구구역_코드High correlation
지역지구구역_코드 is highly overall correlated with 지역지구구역_구분_코드 and 1 other fieldsHigh correlation
대표_여부 is highly overall correlated with 지역지구구역_코드High correlation
지역지구구역_코드 is highly imbalanced (87.8%)Imbalance
대표_여부 is highly imbalanced (98.0%)Imbalance
기타_지역지구구역 has 9378 (93.8%) missing valuesMissing
관리_지역지구구역 has unique valuesUnique

Reproduction

Analysis started2024-05-17 21:49:00.304391
Analysis finished2024-05-17 21:49:02.032788
Duration1.73 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-18T06:49:02.585324image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length15
Median length11
Mean length10.5179
Min length7

Characters and Unicode

Total characters105179
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10000 ?
Unique (%)100.0%

Sample

1st row11290-21252
2nd row11290-8383
3rd row11290-31656
4th row11380-471
5th row11290-22249
ValueCountFrequency (%)
11290-21252 1
 
< 0.1%
11290-12520 1
 
< 0.1%
11290-29825 1
 
< 0.1%
11410-2545 1
 
< 0.1%
11305-3533 1
 
< 0.1%
11290-17940 1
 
< 0.1%
11290-25991 1
 
< 0.1%
11350-3762 1
 
< 0.1%
11290-32011 1
 
< 0.1%
11290-24158 1
 
< 0.1%
Other values (9990) 9990
99.9%
2024-05-18T06:49:03.777972image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 27951
26.6%
0 14148
13.5%
2 13448
12.8%
- 10000
 
9.5%
9 8773
 
8.3%
3 8646
 
8.2%
5 5570
 
5.3%
4 4643
 
4.4%
6 4584
 
4.4%
8 3808
 
3.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 95179
90.5%
Dash Punctuation 10000
 
9.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 27951
29.4%
0 14148
14.9%
2 13448
14.1%
9 8773
 
9.2%
3 8646
 
9.1%
5 5570
 
5.9%
4 4643
 
4.9%
6 4584
 
4.8%
8 3808
 
4.0%
7 3608
 
3.8%
Dash Punctuation
ValueCountFrequency (%)
- 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 105179
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 27951
26.6%
0 14148
13.5%
2 13448
12.8%
- 10000
 
9.5%
9 8773
 
8.3%
3 8646
 
8.2%
5 5570
 
5.3%
4 4643
 
4.4%
6 4584
 
4.4%
8 3808
 
3.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 105179
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 27951
26.6%
0 14148
13.5%
2 13448
12.8%
- 10000
 
9.5%
9 8773
 
8.3%
3 8646
 
8.2%
5 5570
 
5.3%
4 4643
 
4.4%
6 4584
 
4.4%
8 3808
 
3.6%
Distinct8514
Distinct (%)85.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-18T06:49:04.335072image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length15
Median length10
Mean length10.2617
Min length7

Characters and Unicode

Total characters102617
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7098 ?
Unique (%)71.0%

Sample

1st row11290-10094
2nd row11290-4357
3rd row11290-14169
4th row11380-4819
5th row11290-10426
ValueCountFrequency (%)
11290-10751 3
 
< 0.1%
11290-14469 3
 
< 0.1%
11290-9477 3
 
< 0.1%
11230-2499 3
 
< 0.1%
11290-6602 3
 
< 0.1%
11230-3705 3
 
< 0.1%
11290-7204 3
 
< 0.1%
11320-1436 3
 
< 0.1%
11290-12113 3
 
< 0.1%
11410-957 3
 
< 0.1%
Other values (8504) 9970
99.7%
2024-05-18T06:49:05.570901image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 28429
27.7%
0 13811
13.5%
2 12067
11.8%
- 10000
 
9.7%
9 8406
 
8.2%
3 7144
 
7.0%
5 5739
 
5.6%
4 4955
 
4.8%
6 4716
 
4.6%
8 3723
 
3.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 92617
90.3%
Dash Punctuation 10000
 
9.7%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 28429
30.7%
0 13811
14.9%
2 12067
13.0%
9 8406
 
9.1%
3 7144
 
7.7%
5 5739
 
6.2%
4 4955
 
5.3%
6 4716
 
5.1%
8 3723
 
4.0%
7 3627
 
3.9%
Dash Punctuation
ValueCountFrequency (%)
- 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 102617
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 28429
27.7%
0 13811
13.5%
2 12067
11.8%
- 10000
 
9.7%
9 8406
 
8.2%
3 7144
 
7.0%
5 5739
 
5.6%
4 4955
 
4.8%
6 4716
 
4.6%
8 3723
 
3.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 102617
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 28429
27.7%
0 13811
13.5%
2 12067
11.8%
- 10000
 
9.7%
9 8406
 
8.2%
3 7144
 
7.0%
5 5739
 
5.6%
4 4955
 
4.8%
6 4716
 
4.6%
8 3723
 
3.6%

지역지구구역_구분_코드
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
1
3510 
2
3332 
3
3158 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3
2nd row3
3rd row2
4th row2
5th row3

Common Values

ValueCountFrequency (%)
1 3510
35.1%
2 3332
33.3%
3 3158
31.6%

Length

2024-05-18T06:49:05.977570image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-18T06:49:06.287280image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 3510
35.1%
2 3332
33.3%
3 3158
31.6%

지역지구구역_코드
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct34
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
<NA>
9246 
1020
 
298
260
 
119
070
 
84
1022
 
61
Other values (29)
 
192

Length

Max length4
Median length4
Mean length3.9711
Min length2

Unique

Unique9 ?
Unique (%)0.1%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row260
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 9246
92.5%
1020 298
 
3.0%
260 119
 
1.2%
070 84
 
0.8%
1022 61
 
0.6%
1330 50
 
0.5%
103 22
 
0.2%
1023 18
 
0.2%
1030 14
 
0.1%
1021 12
 
0.1%
Other values (24) 76
 
0.8%

Length

2024-05-18T06:49:06.614027image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
na 9246
92.5%
1020 298
 
3.0%
260 119
 
1.2%
070 84
 
0.8%
1022 61
 
0.6%
1330 50
 
0.5%
103 22
 
0.2%
1023 18
 
0.2%
1030 14
 
0.1%
1021 12
 
0.1%
Other values (24) 76
 
0.8%

대표_여부
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
1
9968 
0
 
31
<NA>
 
1

Length

Max length4
Median length1
Mean length1.0003
Min length1

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 9968
99.7%
0 31
 
0.3%
<NA> 1
 
< 0.1%

Length

2024-05-18T06:49:06.992050image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-18T06:49:07.283544image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 9968
99.7%
0 31
 
0.3%
na 1
 
< 0.1%
Distinct55
Distinct (%)8.8%
Missing9378
Missing (%)93.8%
Memory size156.2 KiB
2024-05-18T06:49:07.750117image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length15
Median length6
Mean length6.2299035
Min length2

Characters and Unicode

Total characters3875
Distinct characters63
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique28 ?
Unique (%)4.5%

Sample

1st row주차장정비지구
2nd row주차장정비
3rd row일반주거지역
4th row개발제한구역
5th row일반주거지역
ValueCountFrequency (%)
일반주거지역 258
41.1%
주차장정비지구 101
 
16.1%
자연녹지지역 50
 
8.0%
일반주거 44
 
7.0%
개발제한구역 32
 
5.1%
주차장정비 18
 
2.9%
제2종일반주거지역 13
 
2.1%
2종일반주거지역 9
 
1.4%
도시지역 9
 
1.4%
준주거지역 9
 
1.4%
Other values (46) 85
 
13.5%
2024-05-18T06:49:08.635547image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
564
14.6%
475
12.3%
414
10.7%
359
9.3%
353
9.1%
347
9.0%
171
 
4.4%
122
 
3.1%
122
 
3.1%
122
 
3.1%
Other values (53) 826
21.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 3799
98.0%
Decimal Number 52
 
1.3%
Other Punctuation 16
 
0.4%
Space Separator 6
 
0.2%
Close Punctuation 1
 
< 0.1%
Open Punctuation 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
564
14.8%
475
12.5%
414
10.9%
359
9.4%
353
9.3%
347
9.1%
171
 
4.5%
122
 
3.2%
122
 
3.2%
122
 
3.2%
Other values (45) 750
19.7%
Decimal Number
ValueCountFrequency (%)
2 26
50.0%
4 14
26.9%
3 6
 
11.5%
1 6
 
11.5%
Other Punctuation
ValueCountFrequency (%)
, 16
100.0%
Space Separator
ValueCountFrequency (%)
6
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 3799
98.0%
Common 76
 
2.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
564
14.8%
475
12.5%
414
10.9%
359
9.4%
353
9.3%
347
9.1%
171
 
4.5%
122
 
3.2%
122
 
3.2%
122
 
3.2%
Other values (45) 750
19.7%
Common
ValueCountFrequency (%)
2 26
34.2%
, 16
21.1%
4 14
18.4%
6
 
7.9%
3 6
 
7.9%
1 6
 
7.9%
) 1
 
1.3%
( 1
 
1.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 3799
98.0%
ASCII 76
 
2.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
564
14.8%
475
12.5%
414
10.9%
359
9.4%
353
9.3%
347
9.1%
171
 
4.5%
122
 
3.2%
122
 
3.2%
122
 
3.2%
Other values (45) 750
19.7%
ASCII
ValueCountFrequency (%)
2 26
34.2%
, 16
21.1%
4 14
18.4%
6
 
7.9%
3 6
 
7.9%
1 6
 
7.9%
) 1
 
1.3%
( 1
 
1.3%

작업_일자
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
20111227
10000 

Length

Max length8
Median length8
Mean length8
Min length8

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row20111227
2nd row20111227
3rd row20111227
4th row20111227
5th row20111227

Common Values

ValueCountFrequency (%)
20111227 10000
100.0%

Length

2024-05-18T06:49:09.009880image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-18T06:49:09.339528image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
20111227 10000
100.0%

Correlations

2024-05-18T06:49:09.513408image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
지역지구구역_구분_코드지역지구구역_코드대표_여부기타_지역지구구역
지역지구구역_구분_코드1.0000.9880.0051.000
지역지구구역_코드0.9881.0000.7380.998
대표_여부0.0050.7381.0000.780
기타_지역지구구역1.0000.9980.7801.000
2024-05-18T06:49:09.794128image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
지역지구구역_코드지역지구구역_구분_코드대표_여부
지역지구구역_코드1.0000.8670.629
지역지구구역_구분_코드0.8671.0000.009
대표_여부0.6290.0091.000
2024-05-18T06:49:10.020768image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
지역지구구역_구분_코드지역지구구역_코드대표_여부
지역지구구역_구분_코드1.0000.8670.009
지역지구구역_코드0.8671.0000.629
대표_여부0.0090.6291.000

Missing values

2024-05-18T06:49:01.442056image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-18T06:49:01.854246image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

관리_지역지구구역관리_폐쇄말소대장지역지구구역_구분_코드지역지구구역_코드대표_여부기타_지역지구구역작업_일자
3323711290-2125211290-100943<NA>1<NA>20111227
2152011290-838311290-43573<NA>1<NA>20111227
2940411290-3165611290-141692<NA>1<NA>20111227
5894811380-47111380-481922601주차장정비지구20111227
2388311290-2224911290-104263<NA>1<NA>20111227
4849611305-565611305-31703<NA>1<NA>20111227
2916311290-4056211290-175322<NA>1<NA>20111227
16211290-2617111290-120741<NA>1<NA>20111227
1335211290-2416011290-110953<NA>1<NA>20111227
3978811320-477311320-19111<NA>1<NA>20111227
관리_지역지구구역관리_폐쇄말소대장지역지구구역_구분_코드지역지구구역_코드대표_여부기타_지역지구구역작업_일자
3330511170-313111170-35792<NA>1<NA>20111227
3943811260-49011260-4962<NA>1<NA>20111227
89111230-259811230-17121<NA>1<NA>20111227
6139011380-121011380-6704110201일반주거지역20111227
5325711305-907811305-46912<NA>1<NA>20111227
429211110-558811110-25091<NA>1<NA>20111227
2339611290-859611290-44283<NA>1<NA>20111227
414211230-1105211230-48711<NA>1<NA>20111227
850011215-525411215-23551<NA>1<NA>20111227
1720511290-2259511290-105421<NA>1<NA>20111227