gimi9 Pandas Profiling

Dataset statistics

Number of variables	3
Number of observations	200
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	29
Duplicate rows (%)	14.5%
Total size in memory	4.8 KiB
Average record size in memory	24.7 B

Variable types

Categorical	1
Text	2

Dataset

Description	한국남동발전 환경화학 시스템 내 화공약품 운영정보입니다. 화공약품 사용처와 보일러수처리실 등 상세 사용처 정보를 포함하고 있습니다.
URL	https://www.data.go.kr/data/15093021/fileData.do

Alerts

Dataset has 29 (14.5%) duplicate rows

Duplicates

Reproduction

Analysis started	2023-12-12 21:09:01.725040
Analysis finished	2023-12-12 21:09:02.028299
Duration	0.3 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

화공약품사용처
Categorical

Distinct	34
Distinct (%)	17.0%
Missing	0
Missing (%)	0.0%
Memory size	1.7 KiB

WT30	22
BLR	20
WT10	12
LIME	12
SCR	10
Other values (29)	124

Length

Max length	4
Median length	4
Mean length	3.76
Min length	2

Unique

Unique	6 ?
Unique (%)	3.0%

Sample

1st row	ASHP
2nd row	BLR
3rd row	BLR
4th row	BLR
5th row	BLR

Common Values

Value	Count	Frequency (%)
WT30	22	11.0%
BLR	20	10.0%
WT10	12	6.0%
LIME	12	6.0%
SCR	10	5.0%
DFRM	8	4.0%
CP40	8	4.0%
EP	8	4.0%
CP60	8	4.0%
CP50	8	4.0%
Other values (24)	84	42.0%

Length

Histogram of lengths of the category

Value	Count	Frequency (%)
wt30	22	11.0%
blr	20	10.0%
wt10	12	6.0%
lime	12	6.0%
scr	10	5.0%
cp60	8	4.0%
cp30	8	4.0%
cp50	8	4.0%
ep	8	4.0%
cp40	8	4.0%
Other values (24)	84	42.0%

화공약품상세사용처
Text

Distinct	123
Distinct (%)	61.5%
Missing	0
Missing (%)	0.0%
Memory size	1.7 KiB

Length

Max length	4
Median length	4
Mean length	3.99
Min length	3

Characters and Unicode

Total characters	798
Distinct characters	28
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	85 ?
Unique (%)	42.5%

Sample

1st row	ASHP
2nd row	BL56
3rd row	BLR1
4th row	BLR2
5th row	BLR3

Value	Count	Frequency (%)
xetc	6	3.0%
wt33	4	2.0%
wt32	4	2.0%
wt34	4	2.0%
gwst	4	2.0%
wt11	4	2.0%
wt13	4	2.0%
wt21	4	2.0%
wt31	4	2.0%
wt12	4	2.0%
Other values (113)	158	79.0%

Most occurring characters

Value	Count	Frequency (%)
W	84	10.5%
C	70	8.8%
3	70	8.8%
1	69	8.6%
P	58	7.3%
T	52	6.5%
2	50	6.3%
0	39	4.9%
S	38	4.8%
4	30	3.8%
Other values (18)	238	29.8%

Most occurring categories

Value	Count	Frequency (%)
Uppercase Letter	478	59.9%
Decimal Number	320	40.1%

Most frequent character per category

Uppercase Letter

Value	Count	Frequency (%)
W	84	17.6%
C	70	14.6%
P	58	12.1%
T	52	10.9%
S	38	7.9%
L	28	5.9%
B	26	5.4%
R	20	4.2%
X	14	2.9%
E	14	2.9%
Other values (10)	74	15.5%

Decimal Number

Value	Count	Frequency (%)
3	70	21.9%
1	69	21.6%
2	50	15.6%
0	39	12.2%
4	30	9.4%
5	26	8.1%
6	25	7.8%
7	11	3.4%

Most occurring scripts

Value	Count	Frequency (%)
Latin	478	59.9%
Common	320	40.1%

Most frequent character per script

Latin

Value	Count	Frequency (%)
W	84	17.6%
C	70	14.6%
P	58	12.1%
T	52	10.9%
S	38	7.9%
L	28	5.9%
B	26	5.4%
R	20	4.2%
X	14	2.9%
E	14	2.9%
Other values (10)	74	15.5%

Common

Value	Count	Frequency (%)
3	70	21.9%
1	69	21.6%
2	50	15.6%
0	39	12.2%
4	30	9.4%
5	26	8.1%
6	25	7.8%
7	11	3.4%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	798	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
W	84	10.5%
C	70	8.8%
3	70	8.8%
1	69	8.6%
P	58	7.3%
T	52	6.5%
2	50	6.3%
0	39	4.9%
S	38	4.8%
4	30	3.8%
Other values (18)	238	29.8%

화공약품상세사용처명
Text

Distinct	97
Distinct (%)	48.5%
Missing	0
Missing (%)	0.0%
Memory size	1.7 KiB

Length

Max length	15
Median length	13
Mean length	7.46
Min length	2

Characters and Unicode

Total characters	1492
Distinct characters	93
Distinct categories	8 ?
Distinct scripts	3 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	57 ?
Unique (%)	28.5%

Sample

1st row	회처리장 중화
2nd row	#5,6호기 보일러수처리
3rd row	#1호기 보일러수처리
4th row	#2호기 보일러수처리
5th row	#3호기 보일러수처리

Value	Count	Frequency (%)
재생	46	14.4%
보일러수처리	14	4.4%
2호기	10	3.1%
scr	10	3.1%
1호기	10	3.1%
r-4	10	3.1%
fgd	10	3.1%
acf	9	2.8%
ep	8	2.5%
g/f	6	1.9%
Other values (81)	186	58.3%

Most occurring characters

Value	Count	Frequency (%)
	119	8.0%
#	64	4.3%
B	58	3.9%
(	54	3.6%
)	54	3.6%
C	51	3.4%
F	49	3.3%
P	49	3.3%
수	47	3.2%
기	47	3.2%
Other values (83)	900	60.3%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	568	38.1%
Uppercase Letter	410	27.5%
Decimal Number	136	9.1%
Space Separator	119	8.0%
Other Punctuation	115	7.7%
Open Punctuation	54	3.6%
Close Punctuation	54	3.6%
Dash Punctuation	36	2.4%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
수	47	8.3%
기	47	8.3%
재	46	8.1%
생	46	8.1%
호	40	7.0%
리	26	4.6%
처	26	4.6%
용	18	3.2%
세	17	3.0%
러	16	2.8%
Other values (49)	239	42.1%

Uppercase Letter

Value	Count	Frequency (%)
B	58	14.1%
C	51	12.4%
F	49	12.0%
P	49	12.0%
R	39	9.5%
A	36	8.8%
M	20	4.9%
G	16	3.9%
E	15	3.7%
O	13	3.2%
Other values (10)	64	15.6%

Decimal Number

Value	Count	Frequency (%)
2	37	27.2%
1	28	20.6%
3	27	19.9%
4	24	17.6%
5	10	7.4%
6	10	7.4%

Other Punctuation

Value	Count	Frequency (%)
#	64	55.7%
/	32	27.8%
,	12	10.4%
.	7	6.1%

Space Separator

Value	Count	Frequency (%)
	119	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	54	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	54	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	36	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	568	38.1%
Common	514	34.5%
Latin	410	27.5%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
수	47	8.3%
기	47	8.3%
재	46	8.1%
생	46	8.1%
호	40	7.0%
리	26	4.6%
처	26	4.6%
용	18	3.2%
세	17	3.0%
러	16	2.8%
Other values (49)	239	42.1%

Latin

Value	Count	Frequency (%)
B	58	14.1%
C	51	12.4%
F	49	12.0%
P	49	12.0%
R	39	9.5%
A	36	8.8%
M	20	4.9%
G	16	3.9%
E	15	3.7%
O	13	3.2%
Other values (10)	64	15.6%

Common

Value	Count	Frequency (%)
	119	23.2%
#	64	12.5%
(	54	10.5%
)	54	10.5%
2	37	7.2%
-	36	7.0%
/	32	6.2%
1	28	5.4%
3	27	5.3%
4	24	4.7%
Other values (4)	39	7.6%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	924	61.9%
Hangul	568	38.1%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
	119	12.9%
#	64	6.9%
B	58	6.3%
(	54	5.8%
)	54	5.8%
C	51	5.5%
F	49	5.3%
P	49	5.3%
R	39	4.2%
2	37	4.0%
Other values (24)	350	37.9%

Hangul

Value	Count	Frequency (%)
수	47	8.3%
기	47	8.3%
재	46	8.1%
생	46	8.1%
호	40	7.0%
리	26	4.6%
처	26	4.6%
용	18	3.2%
세	17	3.0%
러	16	2.8%
Other values (49)	239	42.1%

Phik (φk)

Heatmap
Table

	화공약품사용처	화공약품상세사용처명
화공약품사용처	1.000	0.981
화공약품상세사용처명	0.981	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	화공약품사용처	화공약품상세사용처	화공약품상세사용처명
0	ASHP	ASHP	회처리장 중화
1	BLR	BL56	#5,6호기 보일러수처리
2	BLR	BLR1	#1호기 보일러수처리
3	BLR	BLR2	#2호기 보일러수처리
4	BLR	BLR3	#3호기 보일러수처리
5	BLR	BLR4	#4호기 보일러수처리
6	BLR	BLR5	#5호기 보일러수처리
7	BLR	BLR6	#6호기 보일러수처리
8	CP10	CP11	R-1 재생
9	CP10	CP12	R-2 재생

	화공약품사용처	화공약품상세사용처	화공약품상세사용처명
190	WT10	WT13	ACF
191	WT20	WT21	ACF
192	WT30	WT31	2B3T(A)
193	WT30	WT32	2B3T(B)
194	WT30	WT33	MBP(A)
195	WT30	WT34	MBP(B)
196	WT40	XETC	폐수중화용
197	XETC	XETC	기타
198	XWSH	XWSH	산세정
199	ZMOV	ZMOV	타저장고 이송

Most frequently occurring

	화공약품사용처	화공약품상세사용처	화공약품상세사용처명	# duplicates
24	XETC	XETC	기타	5
22	WT10	WT13	ACF	4
28	ZMOV	ZMOV	타저장고 이송	4
1	BLR	BLR1	#1호기 보일러수처리	3
2	BLR	BLR2	#2호기 보일러수처리	3
12	LIME	LI01	#1호기 FGD	3
13	LIME	LI02	#2호기 FGD	3
20	WT10	WT11	응집침전조	3
21	WT10	WT12	G/F	3
23	WT20	WT21	ACF	3

Overview

Variables

Common Values

Length

Most occurring characters

Most occurring categories

Most frequent character per category

Uppercase Letter

Decimal Number

Most occurring scripts

Most frequent character per script

Latin

Common

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Decimal Number

Other Punctuation

Space Separator

Open Punctuation

Close Punctuation

Dash Punctuation

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Most occurring blocks

Most frequent character per block

ASCII

Hangul

Correlations

Missing values

Sample

Duplicate rows

Most frequently occurring