Overview

Dataset statistics

Number of variables4
Number of observations454
Missing cells1
Missing cells (%)0.1%
Duplicate rows1
Duplicate rows (%)0.2%
Total size in memory14.8 KiB
Average record size in memory33.3 B

Variable types

Text3
Categorical1

Dataset

Description함안군 폐수배수시설 설치현황 제공, 폐수배출시설의 사업장명, 폐수배출시설의 소재지 주소, 폐수배출시설 업장의 업종명, 폐수배출시설의 종별 구분 등의 정보를 포함합니다.
Author경상남도 함안군
URLhttps://bigdata.gyeongnam.go.kr/index.gn?menuCd=DOM_000000114002001000&publicdatapk=3066728

Alerts

Dataset has 1 (0.2%) duplicate rowsDuplicates
종별 is highly imbalanced (84.2%)Imbalance

Reproduction

Analysis started2023-12-11 00:28:11.565745
Analysis finished2023-12-11 00:28:12.133068
Duration0.57 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct448
Distinct (%)98.7%
Missing0
Missing (%)0.0%
Memory size3.7 KiB
2023-12-11T09:28:12.340938image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length17
Mean length6.3436123
Min length2

Characters and Unicode

Total characters2880
Distinct characters304
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique442 ?
Unique (%)97.4%

Sample

1st row㈜쎄노텍 4공장
2nd row팀일레븐
3rd row쌍둥이세차장
4th row강남 손 세차장
5th row수영 손세차장
ValueCountFrequency (%)
주식회사 5
 
1.0%
함안지점 4
 
0.8%
㈜성일에스아이엠 3
 
0.6%
신성주유소 2
 
0.4%
㈜지티씨 2
 
0.4%
제3공장 2
 
0.4%
㈜쎄노텍 2
 
0.4%
함안공장 2
 
0.4%
㈜한국에이요 2
 
0.4%
세차장 2
 
0.4%
Other values (464) 475
94.8%
2023-12-11T09:28:12.757025image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
240
 
8.3%
79
 
2.7%
73
 
2.5%
62
 
2.2%
59
 
2.0%
58
 
2.0%
57
 
2.0%
55
 
1.9%
54
 
1.9%
) 53
 
1.8%
Other values (294) 2090
72.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 2397
83.2%
Other Symbol 240
 
8.3%
Close Punctuation 54
 
1.9%
Open Punctuation 54
 
1.9%
Space Separator 50
 
1.7%
Uppercase Letter 41
 
1.4%
Decimal Number 33
 
1.1%
Other Punctuation 10
 
0.3%
Dash Punctuation 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
79
 
3.3%
73
 
3.0%
62
 
2.6%
59
 
2.5%
58
 
2.4%
57
 
2.4%
55
 
2.3%
54
 
2.3%
51
 
2.1%
50
 
2.1%
Other values (261) 1799
75.1%
Uppercase Letter
ValueCountFrequency (%)
E 6
14.6%
C 5
12.2%
G 4
9.8%
T 4
9.8%
N 4
9.8%
H 4
9.8%
S 4
9.8%
P 2
 
4.9%
M 2
 
4.9%
B 2
 
4.9%
Other values (4) 4
9.8%
Decimal Number
ValueCountFrequency (%)
2 14
42.4%
1 10
30.3%
3 3
 
9.1%
8 2
 
6.1%
4 1
 
3.0%
5 1
 
3.0%
0 1
 
3.0%
6 1
 
3.0%
Other Punctuation
ValueCountFrequency (%)
: 4
40.0%
& 4
40.0%
. 1
 
10.0%
* 1
 
10.0%
Close Punctuation
ValueCountFrequency (%)
) 53
98.1%
] 1
 
1.9%
Open Punctuation
ValueCountFrequency (%)
( 53
98.1%
[ 1
 
1.9%
Other Symbol
ValueCountFrequency (%)
240
100.0%
Space Separator
ValueCountFrequency (%)
50
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 2637
91.6%
Common 202
 
7.0%
Latin 41
 
1.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
240
 
9.1%
79
 
3.0%
73
 
2.8%
62
 
2.4%
59
 
2.2%
58
 
2.2%
57
 
2.2%
55
 
2.1%
54
 
2.0%
51
 
1.9%
Other values (262) 1849
70.1%
Common
ValueCountFrequency (%)
) 53
26.2%
( 53
26.2%
50
24.8%
2 14
 
6.9%
1 10
 
5.0%
: 4
 
2.0%
& 4
 
2.0%
3 3
 
1.5%
8 2
 
1.0%
4 1
 
0.5%
Other values (8) 8
 
4.0%
Latin
ValueCountFrequency (%)
E 6
14.6%
C 5
12.2%
G 4
9.8%
T 4
9.8%
N 4
9.8%
H 4
9.8%
S 4
9.8%
P 2
 
4.9%
M 2
 
4.9%
B 2
 
4.9%
Other values (4) 4
9.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 2397
83.2%
ASCII 243
 
8.4%
None 240
 
8.3%

Most frequent character per block

None
ValueCountFrequency (%)
240
100.0%
Hangul
ValueCountFrequency (%)
79
 
3.3%
73
 
3.0%
62
 
2.6%
59
 
2.5%
58
 
2.4%
57
 
2.4%
55
 
2.3%
54
 
2.3%
51
 
2.1%
50
 
2.1%
Other values (261) 1799
75.1%
ASCII
ValueCountFrequency (%)
) 53
21.8%
( 53
21.8%
50
20.6%
2 14
 
5.8%
1 10
 
4.1%
E 6
 
2.5%
C 5
 
2.1%
G 4
 
1.6%
T 4
 
1.6%
: 4
 
1.6%
Other values (22) 40
16.5%
Distinct440
Distinct (%)96.9%
Missing0
Missing (%)0.0%
Memory size3.7 KiB
2023-12-11T09:28:13.031747image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length27
Median length25
Mean length21.693833
Min length18

Characters and Unicode

Total characters9849
Distinct characters129
Distinct categories6 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique426 ?
Unique (%)93.8%

Sample

1st row경상남도 함안군 가야읍 서봉로 356
2nd row경상남도 함안군 가야읍 가야11길 13
3rd row경상남도 함안군 가야읍 가야16길 11
4th row경상남도 함안군 가야읍 가야로 103-1
5th row경상남도 함안군 가야읍 가야로 132
ValueCountFrequency (%)
경상남도 454
20.0%
함안군 454
20.0%
칠원읍 122
 
5.4%
군북면 90
 
4.0%
칠서면 82
 
3.6%
칠북면 37
 
1.6%
법수면 34
 
1.5%
산인면 29
 
1.3%
가야읍 27
 
1.2%
대산면 23
 
1.0%
Other values (481) 920
40.5%
2023-12-11T09:28:13.433307image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2010
20.4%
544
 
5.5%
526
 
5.3%
512
 
5.2%
475
 
4.8%
465
 
4.7%
456
 
4.6%
455
 
4.6%
1 340
 
3.5%
305
 
3.1%
Other values (119) 3761
38.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 6138
62.3%
Space Separator 2010
 
20.4%
Decimal Number 1546
 
15.7%
Dash Punctuation 152
 
1.5%
Uppercase Letter 2
 
< 0.1%
Other Punctuation 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
544
 
8.9%
526
 
8.6%
512
 
8.3%
475
 
7.7%
465
 
7.6%
456
 
7.4%
455
 
7.4%
305
 
5.0%
260
 
4.2%
235
 
3.8%
Other values (104) 1905
31.0%
Decimal Number
ValueCountFrequency (%)
1 340
22.0%
2 216
14.0%
3 189
12.2%
9 131
 
8.5%
4 128
 
8.3%
5 124
 
8.0%
6 122
 
7.9%
7 119
 
7.7%
0 97
 
6.3%
8 80
 
5.2%
Uppercase Letter
ValueCountFrequency (%)
L 1
50.0%
B 1
50.0%
Space Separator
ValueCountFrequency (%)
2010
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 152
100.0%
Other Punctuation
ValueCountFrequency (%)
, 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 6138
62.3%
Common 3709
37.7%
Latin 2
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
544
 
8.9%
526
 
8.6%
512
 
8.3%
475
 
7.7%
465
 
7.6%
456
 
7.4%
455
 
7.4%
305
 
5.0%
260
 
4.2%
235
 
3.8%
Other values (104) 1905
31.0%
Common
ValueCountFrequency (%)
2010
54.2%
1 340
 
9.2%
2 216
 
5.8%
3 189
 
5.1%
- 152
 
4.1%
9 131
 
3.5%
4 128
 
3.5%
5 124
 
3.3%
6 122
 
3.3%
7 119
 
3.2%
Other values (3) 178
 
4.8%
Latin
ValueCountFrequency (%)
L 1
50.0%
B 1
50.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 6138
62.3%
ASCII 3711
37.7%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2010
54.2%
1 340
 
9.2%
2 216
 
5.8%
3 189
 
5.1%
- 152
 
4.1%
9 131
 
3.5%
4 128
 
3.4%
5 124
 
3.3%
6 122
 
3.3%
7 119
 
3.2%
Other values (5) 180
 
4.9%
Hangul
ValueCountFrequency (%)
544
 
8.9%
526
 
8.6%
512
 
8.3%
475
 
7.7%
465
 
7.6%
456
 
7.4%
455
 
7.4%
305
 
5.0%
260
 
4.2%
235
 
3.8%
Other values (104) 1905
31.0%
Distinct284
Distinct (%)62.7%
Missing1
Missing (%)0.2%
Memory size3.7 KiB
2023-12-11T09:28:13.741173image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length24
Median length20
Mean length10.81457
Min length2

Characters and Unicode

Total characters4899
Distinct characters226
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique221 ?
Unique (%)48.8%

Sample

1st row비금속광물 분쇄물 생산업
2nd row자동차세차업
3rd row자동차 세차업
4th row자동차세차업
5th row세차업
ValueCountFrequency (%)
93
 
10.0%
제조업 66
 
7.1%
30
 
3.2%
절삭가공 25
 
2.7%
유사처리업 25
 
2.7%
기타 23
 
2.5%
20
 
2.1%
금속열처리업 12
 
1.3%
세차업 11
 
1.2%
선박구성부분품제조업 10
 
1.1%
Other values (382) 618
66.2%
2023-12-11T09:28:14.175272image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
481
 
9.8%
344
 
7.0%
329
 
6.7%
304
 
6.2%
219
 
4.5%
130
 
2.7%
130
 
2.7%
115
 
2.3%
107
 
2.2%
99
 
2.0%
Other values (216) 2641
53.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 4360
89.0%
Space Separator 481
 
9.8%
Other Punctuation 33
 
0.7%
Decimal Number 23
 
0.5%
Close Punctuation 1
 
< 0.1%
Open Punctuation 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
344
 
7.9%
329
 
7.5%
304
 
7.0%
219
 
5.0%
130
 
3.0%
130
 
3.0%
115
 
2.6%
107
 
2.5%
99
 
2.3%
97
 
2.2%
Other values (206) 2486
57.0%
Decimal Number
ValueCountFrequency (%)
1 13
56.5%
2 5
 
21.7%
3 3
 
13.0%
9 1
 
4.3%
5 1
 
4.3%
Other Punctuation
ValueCountFrequency (%)
, 32
97.0%
. 1
 
3.0%
Space Separator
ValueCountFrequency (%)
481
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 4360
89.0%
Common 539
 
11.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
344
 
7.9%
329
 
7.5%
304
 
7.0%
219
 
5.0%
130
 
3.0%
130
 
3.0%
115
 
2.6%
107
 
2.5%
99
 
2.3%
97
 
2.2%
Other values (206) 2486
57.0%
Common
ValueCountFrequency (%)
481
89.2%
, 32
 
5.9%
1 13
 
2.4%
2 5
 
0.9%
3 3
 
0.6%
) 1
 
0.2%
9 1
 
0.2%
( 1
 
0.2%
. 1
 
0.2%
5 1
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 4360
89.0%
ASCII 539
 
11.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
481
89.2%
, 32
 
5.9%
1 13
 
2.4%
2 5
 
0.9%
3 3
 
0.6%
) 1
 
0.2%
9 1
 
0.2%
( 1
 
0.2%
. 1
 
0.2%
5 1
 
0.2%
Hangul
ValueCountFrequency (%)
344
 
7.9%
329
 
7.5%
304
 
7.0%
219
 
5.0%
130
 
3.0%
130
 
3.0%
115
 
2.6%
107
 
2.5%
99
 
2.3%
97
 
2.2%
Other values (206) 2486
57.0%

종별
Categorical

IMBALANCE 

Distinct4
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size3.7 KiB
5
431 
4
 
21
2
 
1
3
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique2 ?
Unique (%)0.4%

Sample

1st row5
2nd row5
3rd row5
4th row5
5th row5

Common Values

ValueCountFrequency (%)
5 431
94.9%
4 21
 
4.6%
2 1
 
0.2%
3 1
 
0.2%

Length

2023-12-11T09:28:14.370985image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T09:28:14.493011image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
5 431
94.9%
4 21
 
4.6%
2 1
 
0.2%
3 1
 
0.2%

Missing values

2023-12-11T09:28:11.961648image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T09:28:12.081663image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

사업장명소재지업종(배출시설의 분류)종별
0㈜쎄노텍 4공장경상남도 함안군 가야읍 서봉로 356비금속광물 분쇄물 생산업5
1팀일레븐경상남도 함안군 가야읍 가야11길 13자동차세차업5
2쌍둥이세차장경상남도 함안군 가야읍 가야16길 11자동차 세차업5
3강남 손 세차장경상남도 함안군 가야읍 가야로 103-1자동차세차업5
4수영 손세차장경상남도 함안군 가야읍 가야로 132세차업5
5함안주유소경상남도 함안군 가야읍 가야로 64주유소5
6함안셀프세차장경상남도 함안군 가야읍 검암천북길 19자동차세차업5
7㈜동신모텍 함안지점경상남도 함안군 가야읍 남문길 29-3그 외 자동차용 신품 부품제조5
8㈜원일경상남도 함안군 가야읍 남문길66내연기관제조업(29111)5
9농업회사법인㈜아라식품경상남도 함안군 가야읍 도음길 335-52과실, 채소가공 및 저장처리업5
사업장명소재지업종(배출시설의 분류)종별
444㈜케이씨피 제5공장경상남도 함안군 함안면 광정로 312토목공사 및 유사기계 장비 제조업5
445조아제약㈜경상남도 함안군 함안면 광정로 318의약품제조4
446금성열처리경상남도 함안군 함안면 광정로 330-14금속열처리5
447㈜신화모텍함안지점경상남도 함안군 함안면 광정로 330-2금속제품제조가공5
448㈜오양기업 함안공장경상남도 함안군 함안면 광정로 344-17도장 및 기타피막처리업4
449㈜이룸경상남도 함안군 함안면 광정로 372혼성 및 재생플라스틱 소재물질 제조업5
450신진물산㈜경상남도 함안군 함안면 봉성1길 41식료품제조3
451지리산농산경상남도 함안군 함안면 봉수로 715절임식품제조5
452㈜케이씨피 제3공장경상남도 함안군 함안면 봉수로 721토목공사및유사기계 장비제조5
453동원ENG경상남도 함안군 함안면 봉수로 733금속조립구조제 제조업5

Duplicate rows

Most frequently occurring

사업장명소재지업종(배출시설의 분류)종별# duplicates
0㈜삼보산업경상남도 함안군 대산면 송산로 621비금속광물제품제조52