Overview

Dataset statistics

Number of variables3
Number of observations360
Missing cells0
Missing cells (%)0.0%
Duplicate rows38
Duplicate rows (%)10.6%
Total size in memory8.6 KiB
Average record size in memory24.4 B

Variable types

Text3

Dataset

Description한국토지주택공사가 개발 조성한 전국 여러 지역에서 출토되어 토지주택박물관이 현재 소장중인 주요 유물 데이터를 제공합니다.
Author한국토지주택공사
URLhttps://www.data.go.kr/data/15088290/fileData.do

Alerts

Dataset has 38 (10.6%) duplicate rowsDuplicates

Reproduction

Analysis started2023-12-12 13:17:04.016781
Analysis finished2023-12-12 13:17:04.440945
Duration0.42 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct178
Distinct (%)49.4%
Missing0
Missing (%)0.0%
Memory size2.9 KiB
2023-12-12T22:17:05.015868image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length11
Mean length5.4638889
Min length2

Characters and Unicode

Total characters1967
Distinct characters226
Distinct categories5 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique139 ?
Unique (%)38.6%

Sample

1st row여러가지 무문토기
2nd row금동신발
3rd row청동9층탑
4th row상환증서
5th row지가증권
ValueCountFrequency (%)
토지매매문서 75
 
19.5%
호적 34
 
8.9%
토지매매문기 11
 
2.9%
분재기 11
 
2.9%
소송문기 8
 
2.1%
임명장 5
 
1.3%
소지 4
 
1.0%
저울추 4
 
1.0%
수키와 4
 
1.0%
소송문서 3
 
0.8%
Other values (181) 225
58.6%
2023-12-12T22:17:05.528173image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
363
18.5%
182
 
9.3%
127
 
6.5%
107
 
5.4%
105
 
5.3%
104
 
5.3%
67
 
3.4%
52
 
2.6%
40
 
2.0%
21
 
1.1%
Other values (216) 799
40.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1574
80.0%
Space Separator 363
 
18.5%
Decimal Number 18
 
0.9%
Open Punctuation 6
 
0.3%
Close Punctuation 6
 
0.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
182
 
11.6%
127
 
8.1%
107
 
6.8%
105
 
6.7%
104
 
6.6%
67
 
4.3%
52
 
3.3%
40
 
2.5%
21
 
1.3%
18
 
1.1%
Other values (206) 751
47.7%
Decimal Number
ValueCountFrequency (%)
1 4
22.2%
5 4
22.2%
8 3
16.7%
7 3
16.7%
9 2
11.1%
3 1
 
5.6%
6 1
 
5.6%
Space Separator
ValueCountFrequency (%)
363
100.0%
Open Punctuation
ValueCountFrequency (%)
( 6
100.0%
Close Punctuation
ValueCountFrequency (%)
) 6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1568
79.7%
Common 393
 
20.0%
Han 6
 
0.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
182
 
11.6%
127
 
8.1%
107
 
6.8%
105
 
6.7%
104
 
6.6%
67
 
4.3%
52
 
3.3%
40
 
2.6%
21
 
1.3%
18
 
1.1%
Other values (202) 745
47.5%
Common
ValueCountFrequency (%)
363
92.4%
( 6
 
1.5%
) 6
 
1.5%
1 4
 
1.0%
5 4
 
1.0%
8 3
 
0.8%
7 3
 
0.8%
9 2
 
0.5%
3 1
 
0.3%
6 1
 
0.3%
Han
ValueCountFrequency (%)
2
33.3%
2
33.3%
1
16.7%
1
16.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1568
79.7%
ASCII 393
 
20.0%
CJK 6
 
0.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
363
92.4%
( 6
 
1.5%
) 6
 
1.5%
1 4
 
1.0%
5 4
 
1.0%
8 3
 
0.8%
7 3
 
0.8%
9 2
 
0.5%
3 1
 
0.3%
6 1
 
0.3%
Hangul
ValueCountFrequency (%)
182
 
11.6%
127
 
8.1%
107
 
6.8%
105
 
6.7%
104
 
6.6%
67
 
4.3%
52
 
3.3%
40
 
2.6%
21
 
1.3%
18
 
1.1%
Other values (202) 745
47.5%
CJK
ValueCountFrequency (%)
2
33.3%
2
33.3%
1
16.7%
1
16.7%
Distinct150
Distinct (%)41.7%
Missing0
Missing (%)0.0%
Memory size2.9 KiB
2023-12-12T22:17:05.878525image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length8
Mean length3.1333333
Min length1

Characters and Unicode

Total characters1128
Distinct characters317
Distinct categories6 ?
Distinct scripts3 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique114 ?
Unique (%)31.7%

Sample

1st row各種無文土器
2nd row金銅飾履
3rd row靑銅九層塔
4th row相換證書
5th row地價證券
ValueCountFrequency (%)
明文 85
23.6%
없음 25
 
6.9%
準戶口 20
 
5.6%
戶籍單子 14
 
3.9%
所志 12
 
3.3%
和會文記 7
 
1.9%
6
 
1.7%
議送 5
 
1.4%
牌旨 5
 
1.4%
戶口單子 5
 
1.4%
Other values (140) 176
48.9%
2023-12-12T22:17:06.353915image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
114
 
10.1%
85
 
7.5%
43
 
3.8%
29
 
2.6%
25
 
2.2%
25
 
2.2%
22
 
2.0%
21
 
1.9%
21
 
1.9%
19
 
1.7%
Other values (307) 724
64.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1103
97.8%
Decimal Number 13
 
1.2%
Open Punctuation 5
 
0.4%
Close Punctuation 5
 
0.4%
Other Punctuation 1
 
0.1%
Space Separator 1
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
114
 
10.3%
85
 
7.7%
43
 
3.9%
29
 
2.6%
25
 
2.3%
25
 
2.3%
22
 
2.0%
21
 
1.9%
21
 
1.9%
19
 
1.7%
Other values (296) 699
63.4%
Decimal Number
ValueCountFrequency (%)
1 3
23.1%
7 3
23.1%
5 2
15.4%
8 2
15.4%
6 1
 
7.7%
9 1
 
7.7%
3 1
 
7.7%
Open Punctuation
ValueCountFrequency (%)
( 5
100.0%
Close Punctuation
ValueCountFrequency (%)
) 5
100.0%
Other Punctuation
ValueCountFrequency (%)
, 1
100.0%
Space Separator
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Han 1051
93.2%
Hangul 52
 
4.6%
Common 25
 
2.2%

Most frequent character per script

Han
ValueCountFrequency (%)
114
 
10.8%
85
 
8.1%
43
 
4.1%
29
 
2.8%
22
 
2.1%
21
 
2.0%
21
 
2.0%
19
 
1.8%
19
 
1.8%
18
 
1.7%
Other values (292) 660
62.8%
Common
ValueCountFrequency (%)
( 5
20.0%
) 5
20.0%
1 3
12.0%
7 3
12.0%
5 2
 
8.0%
8 2
 
8.0%
, 1
 
4.0%
1
 
4.0%
6 1
 
4.0%
9 1
 
4.0%
Hangul
ValueCountFrequency (%)
25
48.1%
25
48.1%
1
 
1.9%
1
 
1.9%

Most occurring blocks

ValueCountFrequency (%)
CJK 1018
90.2%
Hangul 52
 
4.6%
CJK Compat Ideographs 33
 
2.9%
ASCII 25
 
2.2%

Most frequent character per block

CJK
ValueCountFrequency (%)
114
 
11.2%
85
 
8.3%
43
 
4.2%
29
 
2.8%
22
 
2.2%
21
 
2.1%
21
 
2.1%
19
 
1.9%
19
 
1.9%
18
 
1.8%
Other values (280) 627
61.6%
Hangul
ValueCountFrequency (%)
25
48.1%
25
48.1%
1
 
1.9%
1
 
1.9%
CJK Compat Ideographs
ValueCountFrequency (%)
9
27.3%
4
12.1%
3
 
9.1%
3
 
9.1%
3
 
9.1%
3
 
9.1%
2
 
6.1%
2
 
6.1%
1
 
3.0%
1
 
3.0%
Other values (2) 2
 
6.1%
ASCII
ValueCountFrequency (%)
( 5
20.0%
) 5
20.0%
1 3
12.0%
7 3
12.0%
5 2
 
8.0%
8 2
 
8.0%
, 1
 
4.0%
1
 
4.0%
6 1
 
4.0%
9 1
 
4.0%
Distinct122
Distinct (%)33.9%
Missing0
Missing (%)0.0%
Memory size2.9 KiB
2023-12-12T22:17:06.666454image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length66
Median length2
Mean length11.022222
Min length2

Characters and Unicode

Total characters3968
Distinct characters57
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique102 ?
Unique (%)28.3%

Sample

1st rowDiverse Plain Pottery Vessels
2nd rowGilt bronze ornamental shoes
3rd rowNine-storied bronze pagoda
4th rowCertificate of Redemption
5th rowLand Bond
ValueCountFrequency (%)
없음 214
28.8%
of 31
 
4.2%
tile 22
 
3.0%
with 16
 
2.2%
eaves 15
 
2.0%
land 9
 
1.2%
family 9
 
1.2%
register 8
 
1.1%
pattern 8
 
1.1%
jar 8
 
1.1%
Other values (232) 402
54.2%
2023-12-12T22:17:07.182564image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
382
 
9.6%
e 349
 
8.8%
o 268
 
6.8%
t 231
 
5.8%
a 225
 
5.7%
i 223
 
5.6%
n 218
 
5.5%
214
 
5.4%
214
 
5.4%
r 188
 
4.7%
Other values (47) 1456
36.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2656
66.9%
Uppercase Letter 462
 
11.6%
Other Letter 428
 
10.8%
Space Separator 382
 
9.6%
Dash Punctuation 19
 
0.5%
Other Punctuation 17
 
0.4%
Open Punctuation 2
 
0.1%
Close Punctuation 2
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 349
13.1%
o 268
10.1%
t 231
 
8.7%
a 225
 
8.5%
i 223
 
8.4%
n 218
 
8.2%
r 188
 
7.1%
l 123
 
4.6%
s 112
 
4.2%
d 94
 
3.5%
Other values (15) 625
23.5%
Uppercase Letter
ValueCountFrequency (%)
S 52
11.3%
B 46
 
10.0%
C 40
 
8.7%
P 35
 
7.6%
T 35
 
7.6%
L 31
 
6.7%
F 27
 
5.8%
R 26
 
5.6%
M 23
 
5.0%
E 22
 
4.8%
Other values (13) 125
27.1%
Other Punctuation
ValueCountFrequency (%)
' 15
88.2%
, 1
 
5.9%
. 1
 
5.9%
Other Letter
ValueCountFrequency (%)
214
50.0%
214
50.0%
Space Separator
ValueCountFrequency (%)
382
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 19
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3118
78.6%
Hangul 428
 
10.8%
Common 422
 
10.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 349
 
11.2%
o 268
 
8.6%
t 231
 
7.4%
a 225
 
7.2%
i 223
 
7.2%
n 218
 
7.0%
r 188
 
6.0%
l 123
 
3.9%
s 112
 
3.6%
d 94
 
3.0%
Other values (38) 1087
34.9%
Common
ValueCountFrequency (%)
382
90.5%
- 19
 
4.5%
' 15
 
3.6%
( 2
 
0.5%
) 2
 
0.5%
, 1
 
0.2%
. 1
 
0.2%
Hangul
ValueCountFrequency (%)
214
50.0%
214
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3540
89.2%
Hangul 428
 
10.8%

Most frequent character per block

ASCII
ValueCountFrequency (%)
382
 
10.8%
e 349
 
9.9%
o 268
 
7.6%
t 231
 
6.5%
a 225
 
6.4%
i 223
 
6.3%
n 218
 
6.2%
r 188
 
5.3%
l 123
 
3.5%
s 112
 
3.2%
Other values (45) 1221
34.5%
Hangul
ValueCountFrequency (%)
214
50.0%
214
50.0%

Missing values

2023-12-12T22:17:04.333689image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T22:17:04.409704image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

국문 유물명한문 유물명영문 유물명
0여러가지 무문토기各種無文土器Diverse Plain Pottery Vessels
1금동신발金銅飾履Gilt bronze ornamental shoes
2청동9층탑靑銅九層塔Nine-storied bronze pagoda
3상환증서相換證書Certificate of Redemption
4지가증권地價證券Land Bond
5토지측량도土地測量圖Land Survey Map
6측량학교 졸업증서測量學校卒業證書Certificate of the Completion of a Survey School
7측량기사임명장測量技士任命狀Appointment Letter of Surveying Engineer
8목판채색지도木板彩色地圖Block-Printed Colored Map
9소송문서訴訟文書Records of a Lawsuit
국문 유물명한문 유물명영문 유물명
350긁개없음없음
351철촉없음없음
352긁개없음없음
353홍날없음없음
354긁개없음없음
355찍개없음없음
356마제석촉없음없음
357찍개없음없음
358마제석창없음없음
359빗살무늬토기없음없음

Duplicate rows

Most frequently occurring

국문 유물명한문 유물명영문 유물명# duplicates
31토지매매문서明文없음69
37호적準戶口없음16
36호적戶籍單子없음14
29토지매매문기明文없음9
10분재기和會文記없음5
14소송문기議送없음4
16소지所志없음4
1긁개없음없음3
13소송문기所志없음3
17수키와Convex Roof Tile3