Overview

Dataset statistics

Number of variables4
Number of observations1094
Missing cells0
Missing cells (%)0.0%
Duplicate rows14
Duplicate rows (%)1.3%
Total size in memory34.3 KiB
Average record size in memory32.1 B

Variable types

Text4

Dataset

Description충청남도산림자원연구소 금강수목원의 보유식물에 대한 데이터로 금강수목원 내 식물에 관한 과 별 분류 및 학명 관련 자료를 제공합니다.
URLhttps://www.data.go.kr/data/15015929/fileData.do

Alerts

Dataset has 14 (1.3%) duplicate rowsDuplicates

Reproduction

Analysis started2023-12-12 11:24:31.197748
Analysis finished2023-12-12 11:24:32.205962
Duration1.01 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct160
Distinct (%)14.6%
Missing0
Missing (%)0.0%
Memory size8.7 KiB
2023-12-12T20:24:32.522269image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length7
Mean length3.8528336
Min length2

Characters and Unicode

Total characters4215
Distinct characters195
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique63 ?
Unique (%)5.8%

Sample

1st row가래나무과
2nd row가래나무과
3rd row가래나무과
4th row가래나무과
5th row가래나무과
ValueCountFrequency (%)
돌나물과 104
 
9.5%
장미 99
 
9.0%
백합과 55
 
5.0%
선인장과 50
 
4.6%
국화과 32
 
2.9%
목련과 32
 
2.9%
소나무과 29
 
2.6%
장미과 28
 
2.6%
측백나무과 26
 
2.4%
콩과 25
 
2.3%
Other values (138) 616
56.2%
2023-12-12T20:24:33.160806image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
986
23.4%
401
 
9.5%
284
 
6.7%
183
 
4.3%
141
 
3.3%
131
 
3.1%
104
 
2.5%
83
 
2.0%
76
 
1.8%
58
 
1.4%
Other values (185) 1768
41.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 4167
98.9%
Space Separator 47
 
1.1%
Dash Punctuation 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
986
23.7%
401
 
9.6%
284
 
6.8%
183
 
4.4%
141
 
3.4%
131
 
3.1%
104
 
2.5%
83
 
2.0%
76
 
1.8%
58
 
1.4%
Other values (183) 1720
41.3%
Space Separator
ValueCountFrequency (%)
47
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 4167
98.9%
Common 48
 
1.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
986
23.7%
401
 
9.6%
284
 
6.8%
183
 
4.4%
141
 
3.4%
131
 
3.1%
104
 
2.5%
83
 
2.0%
76
 
1.8%
58
 
1.4%
Other values (183) 1720
41.3%
Common
ValueCountFrequency (%)
47
97.9%
- 1
 
2.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 4167
98.9%
ASCII 48
 
1.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
986
23.7%
401
 
9.6%
284
 
6.8%
183
 
4.4%
141
 
3.4%
131
 
3.1%
104
 
2.5%
83
 
2.0%
76
 
1.8%
58
 
1.4%
Other values (183) 1720
41.3%
ASCII
ValueCountFrequency (%)
47
97.9%
- 1
 
2.1%

과명
Text

Distinct162
Distinct (%)14.8%
Missing0
Missing (%)0.0%
Memory size8.7 KiB
2023-12-12T20:24:33.591623image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length17
Median length15
Mean length10.33181
Min length3

Characters and Unicode

Total characters11303
Distinct characters48
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique63 ?
Unique (%)5.8%

Sample

1st rowJuglandaceae
2nd rowJuglandaceae
3rd rowJuglandaceae
4th rowJuglandaceae
5th rowJuglandaceae
ValueCountFrequency (%)
crassulaceae 104
 
9.3%
rosaceae 101
 
9.0%
liliaceae 55
 
4.9%
cactaceae 50
 
4.5%
magnoliaceae 32
 
2.9%
pinaceae 29
 
2.6%
compositae 27
 
2.4%
cupressaceae 26
 
2.3%
prunus 26
 
2.3%
mune 26
 
2.3%
Other values (148) 645
57.5%
2023-12-12T20:24:34.316329image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 2618
23.2%
e 2354
20.8%
c 1183
10.5%
i 531
 
4.7%
r 511
 
4.5%
s 472
 
4.2%
l 427
 
3.8%
o 399
 
3.5%
u 350
 
3.1%
n 329
 
2.9%
Other values (38) 2129
18.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 10167
89.9%
Uppercase Letter 1093
 
9.7%
Space Separator 42
 
0.4%
Other Punctuation 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 2618
25.7%
e 2354
23.2%
c 1183
11.6%
i 531
 
5.2%
r 511
 
5.0%
s 472
 
4.6%
l 427
 
4.2%
o 399
 
3.9%
u 350
 
3.4%
n 329
 
3.2%
Other values (14) 993
 
9.8%
Uppercase Letter
ValueCountFrequency (%)
C 281
25.7%
A 137
12.5%
R 128
11.7%
L 107
 
9.8%
P 86
 
7.9%
M 71
 
6.5%
S 50
 
4.6%
B 46
 
4.2%
E 36
 
3.3%
F 31
 
2.8%
Other values (12) 120
11.0%
Space Separator
ValueCountFrequency (%)
42
100.0%
Other Punctuation
ValueCountFrequency (%)
, 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 11260
99.6%
Common 43
 
0.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 2618
23.3%
e 2354
20.9%
c 1183
10.5%
i 531
 
4.7%
r 511
 
4.5%
s 472
 
4.2%
l 427
 
3.8%
o 399
 
3.5%
u 350
 
3.1%
n 329
 
2.9%
Other values (36) 2086
18.5%
Common
ValueCountFrequency (%)
42
97.7%
, 1
 
2.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 11303
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 2618
23.2%
e 2354
20.8%
c 1183
10.5%
i 531
 
4.7%
r 511
 
4.5%
s 472
 
4.2%
l 427
 
3.8%
o 399
 
3.5%
u 350
 
3.1%
n 329
 
2.9%
Other values (38) 2129
18.8%

이름
Text

Distinct1009
Distinct (%)92.2%
Missing0
Missing (%)0.0%
Memory size8.7 KiB
2023-12-12T20:24:34.925881image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length17
Median length15
Mean length4.4661792
Min length1

Characters and Unicode

Total characters4886
Distinct characters555
Distinct categories7 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique943 ?
Unique (%)86.2%

Sample

1st row굴피나무
2nd row굴피나무
3rd row가래나무
4th row중국굴피
5th row피칸
ValueCountFrequency (%)
베고니아 6
 
0.5%
산딸나무 5
 
0.4%
산벚나무 5
 
0.4%
아가베 5
 
0.4%
알로에 4
 
0.3%
핀참나무 4
 
0.3%
산수유 4
 
0.3%
브리에세아 4
 
0.3%
떡갈나무 3
 
0.2%
틸란드시아 3
 
0.2%
Other values (1072) 1169
96.5%
2023-12-12T20:24:35.684970image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
315
 
6.4%
303
 
6.2%
137
 
2.8%
125
 
2.6%
121
 
2.5%
77
 
1.6%
63
 
1.3%
59
 
1.2%
55
 
1.1%
55
 
1.1%
Other values (545) 3576
73.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 4684
95.9%
Space Separator 121
 
2.5%
Open Punctuation 26
 
0.5%
Close Punctuation 26
 
0.5%
Lowercase Letter 26
 
0.5%
Dash Punctuation 2
 
< 0.1%
Uppercase Letter 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
315
 
6.7%
303
 
6.5%
137
 
2.9%
125
 
2.7%
77
 
1.6%
63
 
1.3%
59
 
1.3%
55
 
1.2%
55
 
1.2%
51
 
1.1%
Other values (525) 3444
73.5%
Lowercase Letter
ValueCountFrequency (%)
s 3
11.5%
c 3
11.5%
a 3
11.5%
t 3
11.5%
p 2
 
7.7%
i 2
 
7.7%
e 2
 
7.7%
r 1
 
3.8%
l 1
 
3.8%
m 1
 
3.8%
Other values (5) 5
19.2%
Space Separator
ValueCountFrequency (%)
121
100.0%
Open Punctuation
ValueCountFrequency (%)
( 26
100.0%
Close Punctuation
ValueCountFrequency (%)
) 26
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2
100.0%
Uppercase Letter
ValueCountFrequency (%)
L 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 4684
95.9%
Common 175
 
3.6%
Latin 27
 
0.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
315
 
6.7%
303
 
6.5%
137
 
2.9%
125
 
2.7%
77
 
1.6%
63
 
1.3%
59
 
1.3%
55
 
1.2%
55
 
1.2%
51
 
1.1%
Other values (525) 3444
73.5%
Latin
ValueCountFrequency (%)
s 3
11.1%
c 3
11.1%
a 3
11.1%
t 3
11.1%
p 2
 
7.4%
i 2
 
7.4%
e 2
 
7.4%
r 1
 
3.7%
l 1
 
3.7%
m 1
 
3.7%
Other values (6) 6
22.2%
Common
ValueCountFrequency (%)
121
69.1%
( 26
 
14.9%
) 26
 
14.9%
- 2
 
1.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 4684
95.9%
ASCII 202
 
4.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
315
 
6.7%
303
 
6.5%
137
 
2.9%
125
 
2.7%
77
 
1.6%
63
 
1.3%
59
 
1.3%
55
 
1.2%
55
 
1.2%
51
 
1.1%
Other values (525) 3444
73.5%
ASCII
ValueCountFrequency (%)
121
59.9%
( 26
 
12.9%
) 26
 
12.9%
s 3
 
1.5%
c 3
 
1.5%
a 3
 
1.5%
t 3
 
1.5%
p 2
 
1.0%
i 2
 
1.0%
e 2
 
1.0%
Other values (10) 11
 
5.4%

학명
Text

Distinct1060
Distinct (%)96.9%
Missing0
Missing (%)0.0%
Memory size8.7 KiB
2023-12-12T20:24:36.134876image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length76
Median length50
Mean length25.78702
Min length2

Characters and Unicode

Total characters28211
Distinct characters80
Distinct categories11 ?
Distinct scripts4 ?
Distinct blocks5 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1028 ?
Unique (%)94.0%

Sample

1st rowPlatycarya strobilacea
2nd rowPlatycarya strobilacea Siebold & Zucc
3rd rowJuglans Mandshurica Maxim. Var. mandshurica for. Mandst
4th rowPterocarya stenoptera DC.
5th rowCarya illinoensis
ValueCountFrequency (%)
l 105
 
2.9%
var 83
 
2.3%
55
 
1.5%
thunb 44
 
1.2%
ex 39
 
1.1%
prunus 38
 
1.1%
echeveria 36
 
1.0%
nakai 35
 
1.0%
magnolia 29
 
0.8%
japonica 28
 
0.8%
Other values (1751) 3104
86.3%
2023-12-12T20:24:36.840678image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 2876
 
10.2%
2740
 
9.7%
i 2134
 
7.6%
e 1864
 
6.6%
r 1679
 
6.0%
s 1462
 
5.2%
n 1438
 
5.1%
o 1399
 
5.0%
u 1368
 
4.8%
l 1156
 
4.1%
Other values (70) 10095
35.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 21733
77.0%
Space Separator 2740
 
9.7%
Uppercase Letter 2393
 
8.5%
Other Punctuation 1001
 
3.5%
Close Punctuation 142
 
0.5%
Open Punctuation 140
 
0.5%
Other Letter 41
 
0.1%
Dash Punctuation 16
 
0.1%
Initial Punctuation 2
 
< 0.1%
Final Punctuation 2
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 2876
13.2%
i 2134
 
9.8%
e 1864
 
8.6%
r 1679
 
7.7%
s 1462
 
6.7%
n 1438
 
6.6%
o 1399
 
6.4%
u 1368
 
6.3%
l 1156
 
5.3%
c 981
 
4.5%
Other values (16) 5376
24.7%
Uppercase Letter
ValueCountFrequency (%)
C 208
 
8.7%
L 203
 
8.5%
A 202
 
8.4%
S 189
 
7.9%
P 182
 
7.6%
M 174
 
7.3%
B 137
 
5.7%
E 123
 
5.1%
H 123
 
5.1%
T 113
 
4.7%
Other values (16) 739
30.9%
Other Letter
ValueCountFrequency (%)
3
 
7.3%
3
 
7.3%
3
 
7.3%
3
 
7.3%
3
 
7.3%
3
 
7.3%
3
 
7.3%
3
 
7.3%
3
 
7.3%
3
 
7.3%
Other values (5) 11
26.8%
Other Punctuation
ValueCountFrequency (%)
. 737
73.6%
' 202
 
20.2%
& 52
 
5.2%
, 5
 
0.5%
" 4
 
0.4%
1
 
0.1%
Space Separator
ValueCountFrequency (%)
2740
100.0%
Close Punctuation
ValueCountFrequency (%)
) 142
100.0%
Open Punctuation
ValueCountFrequency (%)
( 140
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 16
100.0%
Initial Punctuation
ValueCountFrequency (%)
2
100.0%
Final Punctuation
ValueCountFrequency (%)
2
100.0%
Math Symbol
ValueCountFrequency (%)
× 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 24126
85.5%
Common 4044
 
14.3%
Hangul 39
 
0.1%
Han 2
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 2876
 
11.9%
i 2134
 
8.8%
e 1864
 
7.7%
r 1679
 
7.0%
s 1462
 
6.1%
n 1438
 
6.0%
o 1399
 
5.8%
u 1368
 
5.7%
l 1156
 
4.8%
c 981
 
4.1%
Other values (42) 7769
32.2%
Common
ValueCountFrequency (%)
2740
67.8%
. 737
 
18.2%
' 202
 
5.0%
) 142
 
3.5%
( 140
 
3.5%
& 52
 
1.3%
- 16
 
0.4%
, 5
 
0.1%
" 4
 
0.1%
2
 
< 0.1%
Other values (3) 4
 
0.1%
Hangul
ValueCountFrequency (%)
3
 
7.7%
3
 
7.7%
3
 
7.7%
3
 
7.7%
3
 
7.7%
3
 
7.7%
3
 
7.7%
3
 
7.7%
3
 
7.7%
3
 
7.7%
Other values (3) 9
23.1%
Han
ValueCountFrequency (%)
1
50.0%
1
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 28164
99.8%
Hangul 39
 
0.1%
Punctuation 4
 
< 0.1%
None 2
 
< 0.1%
CJK 2
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 2876
 
10.2%
2740
 
9.7%
i 2134
 
7.6%
e 1864
 
6.6%
r 1679
 
6.0%
s 1462
 
5.2%
n 1438
 
5.1%
o 1399
 
5.0%
u 1368
 
4.9%
l 1156
 
4.1%
Other values (51) 10048
35.7%
Hangul
ValueCountFrequency (%)
3
 
7.7%
3
 
7.7%
3
 
7.7%
3
 
7.7%
3
 
7.7%
3
 
7.7%
3
 
7.7%
3
 
7.7%
3
 
7.7%
3
 
7.7%
Other values (3) 9
23.1%
Punctuation
ValueCountFrequency (%)
2
50.0%
2
50.0%
None
ValueCountFrequency (%)
1
50.0%
× 1
50.0%
CJK
ValueCountFrequency (%)
1
50.0%
1
50.0%

Missing values

2023-12-12T20:24:32.006799image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T20:24:32.161817image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

과국명과명이름학명
0가래나무과Juglandaceae굴피나무Platycarya strobilacea
1가래나무과Juglandaceae굴피나무Platycarya strobilacea Siebold & Zucc
2가래나무과Juglandaceae가래나무Juglans Mandshurica Maxim. Var. mandshurica for. Mandst
3가래나무과Juglandaceae중국굴피Pterocarya stenoptera DC.
4가래나무과Juglandaceae피칸Carya illinoensis
5가래나무과Juglandaceae호두Ferocactus acanthodes Br. & R.
6가래나무과Juglandaceae호두나무Juglans sinensis
7가지과Solanaceae구기자Lycium chinense
8가지과Solanaceae꽈리Physalis wrightii Gray
9갈매나무과Rhamnaceae갈매나무Rhamnus davurica Pall.
과국명과명이름학명
1084장미과Prunus mune미개홍Mikaikou
1085장미과Prunus mune화좌논Hanazaronn
1086장미과Prunus mune수심경Suishinkyou
1087장미과Prunus muneOkina
1088장미과Prunus mune서지수(수양)Akebono shidare
1089장미과Prunus mune문조매화Bunchou
1090장미과Prunus mune도조매화Miyako Dori
1091장미과Prunus mune금매화Trollius ledebourii Rchb.
1092장미과Prunus mune홍학매화BeniZuru
1093장미과Prunus mune춘일야홍수양kasugano BeniShidare

Duplicate rows

Most frequently occurring

과국명과명이름학명# duplicates
0돌나물과Crassulaceae에케베리아 히아리나Echeveria hyalina Walther2
1미나리아재비과Ranunculaceae큰꽃으아리Clematis patens C. Morren & Decne.2
2보리수나무과Elaeagnaceae보리수나무Elaeagnus umbellata Thunb.2
3선인장과Cactaceae무자단선Corynopuntia invicta (Brandegee) F.M. Knuth2
4선인장과Cactaceae신천지Gymnocalycium saglione (Cels) Britton & Rose2
5선인장과Cactaceae축옥Echinofossulocactus multicostatus (Hildm.) Britton & Rose2
6용설란과Agavaceae아가베 레오폴디Agave 'leopoldii'2
7용설란과Agavaceae아가베 파리Agave parryi Engelm.2
8용설란과Agavaceae유카Yucca gloriosa L.2
9용설란과Agavaceae희난설Agave parviflora Torr.2