Overview

Dataset statistics

Number of variables3
Number of observations200
Missing cells0
Missing cells (%)0.0%
Duplicate rows29
Duplicate rows (%)14.5%
Total size in memory4.8 KiB
Average record size in memory24.7 B

Variable types

Categorical1
Text2

Dataset

Description한국남동발전 환경화학 시스템 내 화공약품 운영정보입니다. 화공약품 사용처와 보일러수처리실 등 상세 사용처 정보를 포함하고 있습니다.
URLhttps://www.data.go.kr/data/15093021/fileData.do

Alerts

Dataset has 29 (14.5%) duplicate rowsDuplicates

Reproduction

Analysis started2023-12-12 21:09:01.725040
Analysis finished2023-12-12 21:09:02.028299
Duration0.3 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct34
Distinct (%)17.0%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
WT30
22 
BLR
20 
WT10
 
12
LIME
 
12
SCR
 
10
Other values (29)
124 

Length

Max length4
Median length4
Mean length3.76
Min length2

Unique

Unique6 ?
Unique (%)3.0%

Sample

1st rowASHP
2nd rowBLR
3rd rowBLR
4th rowBLR
5th rowBLR

Common Values

ValueCountFrequency (%)
WT30 22
 
11.0%
BLR 20
 
10.0%
WT10 12
 
6.0%
LIME 12
 
6.0%
SCR 10
 
5.0%
DFRM 8
 
4.0%
CP40 8
 
4.0%
EP 8
 
4.0%
CP60 8
 
4.0%
CP50 8
 
4.0%
Other values (24) 84
42.0%

Length

2023-12-13T06:09:02.093302image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
wt30 22
 
11.0%
blr 20
 
10.0%
wt10 12
 
6.0%
lime 12
 
6.0%
scr 10
 
5.0%
cp60 8
 
4.0%
cp30 8
 
4.0%
cp50 8
 
4.0%
ep 8
 
4.0%
cp40 8
 
4.0%
Other values (24) 84
42.0%
Distinct123
Distinct (%)61.5%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
2023-12-13T06:09:02.367216image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length4
Median length4
Mean length3.99
Min length3

Characters and Unicode

Total characters798
Distinct characters28
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique85 ?
Unique (%)42.5%

Sample

1st rowASHP
2nd rowBL56
3rd rowBLR1
4th rowBLR2
5th rowBLR3
ValueCountFrequency (%)
xetc 6
 
3.0%
wt33 4
 
2.0%
wt32 4
 
2.0%
wt34 4
 
2.0%
gwst 4
 
2.0%
wt11 4
 
2.0%
wt13 4
 
2.0%
wt21 4
 
2.0%
wt31 4
 
2.0%
wt12 4
 
2.0%
Other values (113) 158
79.0%
2023-12-13T06:09:02.772423image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
W 84
 
10.5%
C 70
 
8.8%
3 70
 
8.8%
1 69
 
8.6%
P 58
 
7.3%
T 52
 
6.5%
2 50
 
6.3%
0 39
 
4.9%
S 38
 
4.8%
4 30
 
3.8%
Other values (18) 238
29.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 478
59.9%
Decimal Number 320
40.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
W 84
17.6%
C 70
14.6%
P 58
12.1%
T 52
10.9%
S 38
7.9%
L 28
 
5.9%
B 26
 
5.4%
R 20
 
4.2%
X 14
 
2.9%
E 14
 
2.9%
Other values (10) 74
15.5%
Decimal Number
ValueCountFrequency (%)
3 70
21.9%
1 69
21.6%
2 50
15.6%
0 39
12.2%
4 30
9.4%
5 26
 
8.1%
6 25
 
7.8%
7 11
 
3.4%

Most occurring scripts

ValueCountFrequency (%)
Latin 478
59.9%
Common 320
40.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
W 84
17.6%
C 70
14.6%
P 58
12.1%
T 52
10.9%
S 38
7.9%
L 28
 
5.9%
B 26
 
5.4%
R 20
 
4.2%
X 14
 
2.9%
E 14
 
2.9%
Other values (10) 74
15.5%
Common
ValueCountFrequency (%)
3 70
21.9%
1 69
21.6%
2 50
15.6%
0 39
12.2%
4 30
9.4%
5 26
 
8.1%
6 25
 
7.8%
7 11
 
3.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 798
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
W 84
 
10.5%
C 70
 
8.8%
3 70
 
8.8%
1 69
 
8.6%
P 58
 
7.3%
T 52
 
6.5%
2 50
 
6.3%
0 39
 
4.9%
S 38
 
4.8%
4 30
 
3.8%
Other values (18) 238
29.8%
Distinct97
Distinct (%)48.5%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
2023-12-13T06:09:03.082044image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length15
Median length13
Mean length7.46
Min length2

Characters and Unicode

Total characters1492
Distinct characters93
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique57 ?
Unique (%)28.5%

Sample

1st row회처리장 중화
2nd row#5,6호기 보일러수처리
3rd row#1호기 보일러수처리
4th row#2호기 보일러수처리
5th row#3호기 보일러수처리
ValueCountFrequency (%)
재생 46
 
14.4%
보일러수처리 14
 
4.4%
2호기 10
 
3.1%
scr 10
 
3.1%
1호기 10
 
3.1%
r-4 10
 
3.1%
fgd 10
 
3.1%
acf 9
 
2.8%
ep 8
 
2.5%
g/f 6
 
1.9%
Other values (81) 186
58.3%
2023-12-13T06:09:03.554263image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
119
 
8.0%
# 64
 
4.3%
B 58
 
3.9%
( 54
 
3.6%
) 54
 
3.6%
C 51
 
3.4%
F 49
 
3.3%
P 49
 
3.3%
47
 
3.2%
47
 
3.2%
Other values (83) 900
60.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 568
38.1%
Uppercase Letter 410
27.5%
Decimal Number 136
 
9.1%
Space Separator 119
 
8.0%
Other Punctuation 115
 
7.7%
Open Punctuation 54
 
3.6%
Close Punctuation 54
 
3.6%
Dash Punctuation 36
 
2.4%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
47
 
8.3%
47
 
8.3%
46
 
8.1%
46
 
8.1%
40
 
7.0%
26
 
4.6%
26
 
4.6%
18
 
3.2%
17
 
3.0%
16
 
2.8%
Other values (49) 239
42.1%
Uppercase Letter
ValueCountFrequency (%)
B 58
14.1%
C 51
12.4%
F 49
12.0%
P 49
12.0%
R 39
9.5%
A 36
8.8%
M 20
 
4.9%
G 16
 
3.9%
E 15
 
3.7%
O 13
 
3.2%
Other values (10) 64
15.6%
Decimal Number
ValueCountFrequency (%)
2 37
27.2%
1 28
20.6%
3 27
19.9%
4 24
17.6%
5 10
 
7.4%
6 10
 
7.4%
Other Punctuation
ValueCountFrequency (%)
# 64
55.7%
/ 32
27.8%
, 12
 
10.4%
. 7
 
6.1%
Space Separator
ValueCountFrequency (%)
119
100.0%
Open Punctuation
ValueCountFrequency (%)
( 54
100.0%
Close Punctuation
ValueCountFrequency (%)
) 54
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 36
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 568
38.1%
Common 514
34.5%
Latin 410
27.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
47
 
8.3%
47
 
8.3%
46
 
8.1%
46
 
8.1%
40
 
7.0%
26
 
4.6%
26
 
4.6%
18
 
3.2%
17
 
3.0%
16
 
2.8%
Other values (49) 239
42.1%
Latin
ValueCountFrequency (%)
B 58
14.1%
C 51
12.4%
F 49
12.0%
P 49
12.0%
R 39
9.5%
A 36
8.8%
M 20
 
4.9%
G 16
 
3.9%
E 15
 
3.7%
O 13
 
3.2%
Other values (10) 64
15.6%
Common
ValueCountFrequency (%)
119
23.2%
# 64
12.5%
( 54
10.5%
) 54
10.5%
2 37
 
7.2%
- 36
 
7.0%
/ 32
 
6.2%
1 28
 
5.4%
3 27
 
5.3%
4 24
 
4.7%
Other values (4) 39
 
7.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 924
61.9%
Hangul 568
38.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
119
 
12.9%
# 64
 
6.9%
B 58
 
6.3%
( 54
 
5.8%
) 54
 
5.8%
C 51
 
5.5%
F 49
 
5.3%
P 49
 
5.3%
R 39
 
4.2%
2 37
 
4.0%
Other values (24) 350
37.9%
Hangul
ValueCountFrequency (%)
47
 
8.3%
47
 
8.3%
46
 
8.1%
46
 
8.1%
40
 
7.0%
26
 
4.6%
26
 
4.6%
18
 
3.2%
17
 
3.0%
16
 
2.8%
Other values (49) 239
42.1%

Correlations

2023-12-13T06:09:03.689686image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
화공약품사용처화공약품상세사용처명
화공약품사용처1.0000.981
화공약품상세사용처명0.9811.000

Missing values

2023-12-13T06:09:01.933891image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T06:09:02.001400image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

화공약품사용처화공약품상세사용처화공약품상세사용처명
0ASHPASHP회처리장 중화
1BLRBL56#5,6호기 보일러수처리
2BLRBLR1#1호기 보일러수처리
3BLRBLR2#2호기 보일러수처리
4BLRBLR3#3호기 보일러수처리
5BLRBLR4#4호기 보일러수처리
6BLRBLR5#5호기 보일러수처리
7BLRBLR6#6호기 보일러수처리
8CP10CP11R-1 재생
9CP10CP12R-2 재생
화공약품사용처화공약품상세사용처화공약품상세사용처명
190WT10WT13ACF
191WT20WT21ACF
192WT30WT312B3T(A)
193WT30WT322B3T(B)
194WT30WT33MBP(A)
195WT30WT34MBP(B)
196WT40XETC폐수중화용
197XETCXETC기타
198XWSHXWSH산세정
199ZMOVZMOV타저장고 이송

Duplicate rows

Most frequently occurring

화공약품사용처화공약품상세사용처화공약품상세사용처명# duplicates
24XETCXETC기타5
22WT10WT13ACF4
28ZMOVZMOV타저장고 이송4
1BLRBLR1#1호기 보일러수처리3
2BLRBLR2#2호기 보일러수처리3
12LIMELI01#1호기 FGD3
13LIMELI02#2호기 FGD3
20WT10WT11응집침전조3
21WT10WT12G/F3
23WT20WT21ACF3