gimi9 Pandas Profiling

Dataset statistics

Number of variables	2
Number of observations	363
Missing cells	254
Missing cells (%)	35.0%
Duplicate rows	1
Duplicate rows (%)	0.3%
Total size in memory	5.8 KiB
Average record size in memory	16.4 B

Variable types

Text	2

Dataset

Description	경기도 오산시에 등록된 통신판매업체 중 식품 판매 목적으로 등록한 통신판매업의 업체명, 휴대폰번호를 제외한 연락처(일부) 항목를 제공합니다.
Author	경기도 오산시
URL	https://www.data.go.kr/data/15085719/fileData.do

Alerts

Dataset has 1 (0.3%) duplicate rows	Duplicates
`연락처` has 254 (70.0%) missing values	Missing

Reproduction

Analysis started	2023-12-12 13:18:57.975103
Analysis finished	2023-12-12 13:18:58.703812
Duration	0.73 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

업체명
Text

Distinct	361
Distinct (%)	99.4%
Missing	0
Missing (%)	0.0%
Memory size	3.0 KiB

Length

Max length	33
Median length	21
Mean length	6.9807163
Min length	1

Characters and Unicode

Total characters	2534
Distinct characters	464
Distinct categories	11 ?
Distinct scripts	4 ?
Distinct blocks	4 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	359 ?
Unique (%)	98.9%

Sample

1st row	삼우인터내셔널
2nd row	대원축산
3rd row	주식회사 에스제이우솔(SJ Woosol Co. Ltd.)
4th row	갓팩토리
5th row	다모어엠(Damore M)

Value	Count	Frequency (%)
주식회사	36	7.5%
오산점	4	0.8%
주	4	0.8%
농업회사법인	3	0.6%
포트오브모카	2	0.4%
food	2	0.4%
유한회사	2	0.4%
더	2	0.4%
벽돌집	2	0.4%
system	2	0.4%
Other values (423)	423	87.8%

Most occurring characters

Value	Count	Frequency (%)
	120	4.7%
이	74	2.9%
주	65	2.6%
(	61	2.4%
)	61	2.4%
스	54	2.1%
사	53	2.1%
식	50	2.0%
회	47	1.9%
산	32	1.3%
Other values (454)	1917	75.7%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	1983	78.3%
Uppercase Letter	143	5.6%
Lowercase Letter	139	5.5%
Space Separator	120	4.7%
Open Punctuation	61	2.4%
Close Punctuation	61	2.4%
Other Punctuation	12	0.5%
Decimal Number	11	0.4%
Dash Punctuation	2	0.1%
Other Symbol	1	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
이	74	3.7%
주	65	3.3%
스	54	2.7%
사	53	2.7%
식	50	2.5%
회	47	2.4%
산	32	1.6%
오	31	1.6%
리	30	1.5%
드	26	1.3%
Other values (394)	1521	76.7%

Uppercase Letter

Value	Count	Frequency (%)
E	13	9.1%
S	11	7.7%
A	11	7.7%
N	10	7.0%
F	9	6.3%
C	9	6.3%
L	8	5.6%
O	8	5.6%
D	7	4.9%
R	6	4.2%
Other values (13)	51	35.7%

Lowercase Letter

Value	Count	Frequency (%)
o	19	13.7%
e	15	10.8%
n	12	8.6%
i	11	7.9%
a	11	7.9%
m	11	7.9%
s	9	6.5%
l	8	5.8%
t	8	5.8%
r	6	4.3%
Other values (11)	29	20.9%

Decimal Number

Value	Count	Frequency (%)
0	3	27.3%
2	3	27.3%
9	1	9.1%
4	1	9.1%
5	1	9.1%
8	1	9.1%
7	1	9.1%

Other Punctuation

Value	Count	Frequency (%)
.	9	75.0%
&	2	16.7%
'	1	8.3%

Space Separator

Value	Count	Frequency (%)
	120	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	61	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	61	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	2	100.0%

Other Symbol

Value	Count	Frequency (%)
㈜	1	100.0%

Connector Punctuation

Value	Count	Frequency (%)
_	1	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	1982	78.2%
Latin	282	11.1%
Common	268	10.6%
Han	2	0.1%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
이	74	3.7%
주	65	3.3%
스	54	2.7%
사	53	2.7%
식	50	2.5%
회	47	2.4%
산	32	1.6%
오	31	1.6%
리	30	1.5%
드	26	1.3%
Other values (393)	1520	76.7%

Latin

Value	Count	Frequency (%)
o	19	6.7%
e	15	5.3%
E	13	4.6%
n	12	4.3%
i	11	3.9%
S	11	3.9%
A	11	3.9%
a	11	3.9%
m	11	3.9%
N	10	3.5%
Other values (34)	158	56.0%

Common

Value	Count	Frequency (%)
	120	44.8%
(	61	22.8%
)	61	22.8%
.	9	3.4%
0	3	1.1%
2	3	1.1%
-	2	0.7%
&	2	0.7%
9	1	0.4%
'	1	0.4%
Other values (5)	5	1.9%

Han

Value	Count	Frequency (%)
天	1	50.0%
山	1	50.0%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	1981	78.2%
ASCII	550	21.7%
CJK	2	0.1%
None	1	< 0.1%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
	120	21.8%
(	61	11.1%
)	61	11.1%
o	19	3.5%
e	15	2.7%
E	13	2.4%
n	12	2.2%
i	11	2.0%
S	11	2.0%
A	11	2.0%
Other values (49)	216	39.3%

Hangul

Value	Count	Frequency (%)
이	74	3.7%
주	65	3.3%
스	54	2.7%
사	53	2.7%
식	50	2.5%
회	47	2.4%
산	32	1.6%
오	31	1.6%
리	30	1.5%
드	26	1.3%
Other values (392)	1519	76.7%

CJK

Value	Count	Frequency (%)
天	1	50.0%
山	1	50.0%

None

Value	Count	Frequency (%)
㈜	1	100.0%

연락처
Text

MISSING

Distinct	109
Distinct (%)	100.0%
Missing	254
Missing (%)	70.0%
Memory size	3.0 KiB

Length

Max length	13
Median length	12
Mean length	12.174312
Min length	11

Characters and Unicode

Total characters	1327
Distinct characters	11
Distinct categories	2 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	109 ?
Unique (%)	100.0%

Sample

1st row	031-374-9213
2nd row	031-373-4301
3rd row	031-378-2288
4th row	031-376-4941
5th row	031-374-0124

Value	Count	Frequency (%)
031-377-8795	1	0.9%
031-377-5866	1	0.9%
070-7620-2339	1	0.9%
0505-314-1751	1	0.9%
031-378-8745	1	0.9%
070-4412-8852	1	0.9%
031-374-5004	1	0.9%
031-375-6130	1	0.9%
031-393-0441	1	0.9%
070-7526-7796	1	0.9%
Other values (99)	99	90.8%

Most occurring characters

Value	Count	Frequency (%)
3	234	17.6%
-	218	16.4%
0	179	13.5%
1	147	11.1%
7	147	11.1%
5	77	5.8%
6	75	5.7%
2	73	5.5%
8	71	5.4%
4	59	4.4%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	1109	83.6%
Dash Punctuation	218	16.4%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
3	234	21.1%
0	179	16.1%
1	147	13.3%
7	147	13.3%
5	77	6.9%
6	75	6.8%
2	73	6.6%
8	71	6.4%
4	59	5.3%
9	47	4.2%

Dash Punctuation

Value	Count	Frequency (%)
-	218	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	1327	100.0%

Most frequent character per script

Common

Value	Count	Frequency (%)
3	234	17.6%
-	218	16.4%
0	179	13.5%
1	147	11.1%
7	147	11.1%
5	77	5.8%
6	75	5.7%
2	73	5.5%
8	71	5.4%
4	59	4.4%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	1327	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
3	234	17.6%
-	218	16.4%
0	179	13.5%
1	147	11.1%
7	147	11.1%
5	77	5.8%
6	75	5.7%
2	73	5.5%
8	71	5.4%
4	59	4.4%

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	업체명	연락처
0	삼우인터내셔널	<NA>
1	대원축산	031-374-9213
2	주식회사 에스제이우솔(SJ Woosol Co. Ltd.)	<NA>
3	갓팩토리	<NA>
4	다모어엠(Damore M)	<NA>
5	제이더블유(JW)유통	<NA>
6	성초	<NA>
7	거듭나다	<NA>
8	티에스케이인터내셔널 주식회사	<NA>
9	주식회사 텐바이오	<NA>

	업체명	연락처
353	인더로우	<NA>
354	웰빙나라	<NA>
355	허브비밀	<NA>
356	해피월드	031-378-4371
357	그린약국	031-378-9054
358	행복을짓는남매약국	031-378-6858
359	청아람 Food System	031-302-4425
360	백세식품	031-373-9052
361	세건홍삼전문점	031-378-3435
362	헬스보충제	031-377-6180

Most frequently occurring

	업체명	연락처	# duplicates
0	벽돌집	<NA>	2

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Lowercase Letter

Decimal Number

Other Punctuation

Space Separator

Open Punctuation

Close Punctuation

Dash Punctuation

Other Symbol

Connector Punctuation

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Han

Most occurring blocks

Most frequent character per block

ASCII

Hangul

CJK

None

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Dash Punctuation

Most occurring scripts

Most frequent character per script

Common

Most occurring blocks

Most frequent character per block

ASCII

Missing values

Sample

Duplicate rows

Most frequently occurring