Overview

Dataset statistics

Number of variables6
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory566.4 KiB
Average record size in memory58.0 B

Variable types

Numeric2
Text3
Categorical1

Dataset

Description경상북도 70개의 주요 상권 별 26,002개의 소상공인 사업체 정보(상가업소 번호, 상호, 상권 번호, 상권 명, 시군명, 주소) 데이터 셋 (CSV 파일)
Author경상북도
URLhttps://www.data.go.kr/data/15096093/fileData.do

Alerts

주요상권 코드 is highly overall correlated with 시군명High correlation
시군명 is highly overall correlated with 주요상권 코드High correlation
상가업소 번호 has unique valuesUnique

Reproduction

Analysis started2023-12-12 22:46:26.770247
Analysis finished2023-12-12 22:46:28.661264
Duration1.89 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

상가업소 번호
Real number (ℝ)

UNIQUE 

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean102129.08
Minimum1
Maximum204345
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T07:46:28.732085image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile10093.9
Q150249.5
median102956.5
Q3153892.75
95-th percentile193600.1
Maximum204345
Range204344
Interquartile range (IQR)103643.25

Descriptive statistics

Standard deviation58977.253
Coefficient of variation (CV)0.5774776
Kurtosis-1.2220361
Mean102129.08
Median Absolute Deviation (MAD)51865
Skewness-0.013191874
Sum1.0212908 × 109
Variance3.4783164 × 109
MonotonicityNot monotonic
2023-12-13T07:46:28.874818image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
142021 1
 
< 0.1%
5063 1
 
< 0.1%
26934 1
 
< 0.1%
21503 1
 
< 0.1%
127501 1
 
< 0.1%
44682 1
 
< 0.1%
176041 1
 
< 0.1%
58108 1
 
< 0.1%
92898 1
 
< 0.1%
136538 1
 
< 0.1%
Other values (9990) 9990
99.9%
ValueCountFrequency (%)
1 1
< 0.1%
8 1
< 0.1%
16 1
< 0.1%
34 1
< 0.1%
50 1
< 0.1%
78 1
< 0.1%
79 1
< 0.1%
120 1
< 0.1%
138 1
< 0.1%
205 1
< 0.1%
ValueCountFrequency (%)
204345 1
< 0.1%
204339 1
< 0.1%
204249 1
< 0.1%
204225 1
< 0.1%
204197 1
< 0.1%
204191 1
< 0.1%
204185 1
< 0.1%
204159 1
< 0.1%
204130 1
< 0.1%
204124 1
< 0.1%
Distinct7267
Distinct (%)72.7%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-13T07:46:29.285239image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length29
Median length27
Mean length6.1496
Min length2

Characters and Unicode

Total characters61496
Distinct characters851
Distinct categories10 ?
Distinct scripts4 ?
Distinct blocks5 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5980 ?
Unique (%)59.8%

Sample

1st row참조****
2nd row인동*******
3rd row설빙*****
4th row+구*
5th row골든*
ValueCountFrequency (%)
경북 89
 
0.9%
주식 87
 
0.9%
60
 
0.6%
제일 60
 
0.6%
경주 59
 
0.6%
중앙 57
 
0.6%
포항 53
 
0.5%
서울 49
 
0.5%
구미 49
 
0.5%
안동 48
 
0.5%
Other values (4616) 9455
93.9%
2023-12-13T07:46:29.802344image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
* 41455
67.4%
436
 
0.7%
417
 
0.7%
345
 
0.6%
331
 
0.5%
323
 
0.5%
276
 
0.4%
271
 
0.4%
255
 
0.4%
255
 
0.4%
Other values (841) 17132
27.9%

Most occurring categories

ValueCountFrequency (%)
Other Punctuation 41480
67.5%
Other Letter 18962
30.8%
Uppercase Letter 416
 
0.7%
Decimal Number 249
 
0.4%
Open Punctuation 131
 
0.2%
Close Punctuation 96
 
0.2%
Lowercase Letter 86
 
0.1%
Space Separator 66
 
0.1%
Dash Punctuation 8
 
< 0.1%
Math Symbol 2
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
436
 
2.3%
417
 
2.2%
345
 
1.8%
331
 
1.7%
323
 
1.7%
276
 
1.5%
271
 
1.4%
255
 
1.3%
255
 
1.3%
250
 
1.3%
Other values (773) 15803
83.3%
Uppercase Letter
ValueCountFrequency (%)
S 50
 
12.0%
K 40
 
9.6%
B 38
 
9.1%
G 32
 
7.7%
T 25
 
6.0%
O 24
 
5.8%
M 21
 
5.0%
H 21
 
5.0%
C 20
 
4.8%
J 17
 
4.1%
Other values (16) 128
30.8%
Lowercase Letter
ValueCountFrequency (%)
e 13
15.1%
a 12
14.0%
o 12
14.0%
i 7
8.1%
n 7
8.1%
l 6
7.0%
h 6
7.0%
t 5
 
5.8%
s 5
 
5.8%
k 4
 
4.7%
Other values (7) 9
10.5%
Decimal Number
ValueCountFrequency (%)
8 63
25.3%
1 43
17.3%
5 33
13.3%
6 26
10.4%
2 21
 
8.4%
9 17
 
6.8%
0 17
 
6.8%
3 15
 
6.0%
7 7
 
2.8%
4 7
 
2.8%
Other Punctuation
ValueCountFrequency (%)
* 41455
99.9%
. 13
 
< 0.1%
& 5
 
< 0.1%
# 2
 
< 0.1%
! 2
 
< 0.1%
: 1
 
< 0.1%
· 1
 
< 0.1%
% 1
 
< 0.1%
Open Punctuation
ValueCountFrequency (%)
( 122
93.1%
9
 
6.9%
Math Symbol
ValueCountFrequency (%)
+ 1
50.0%
~ 1
50.0%
Close Punctuation
ValueCountFrequency (%)
) 96
100.0%
Space Separator
ValueCountFrequency (%)
66
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 8
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 42032
68.3%
Hangul 18960
30.8%
Latin 502
 
0.8%
Han 2
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
436
 
2.3%
417
 
2.2%
345
 
1.8%
331
 
1.7%
323
 
1.7%
276
 
1.5%
271
 
1.4%
255
 
1.3%
255
 
1.3%
250
 
1.3%
Other values (771) 15801
83.3%
Latin
ValueCountFrequency (%)
S 50
 
10.0%
K 40
 
8.0%
B 38
 
7.6%
G 32
 
6.4%
T 25
 
5.0%
O 24
 
4.8%
M 21
 
4.2%
H 21
 
4.2%
C 20
 
4.0%
J 17
 
3.4%
Other values (33) 214
42.6%
Common
ValueCountFrequency (%)
* 41455
98.6%
( 122
 
0.3%
) 96
 
0.2%
66
 
0.2%
8 63
 
0.1%
1 43
 
0.1%
5 33
 
0.1%
6 26
 
0.1%
2 21
 
< 0.1%
9 17
 
< 0.1%
Other values (15) 90
 
0.2%
Han
ValueCountFrequency (%)
1
50.0%
1
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 42524
69.1%
Hangul 18958
30.8%
None 10
 
< 0.1%
Compat Jamo 2
 
< 0.1%
CJK 2
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
* 41455
97.5%
( 122
 
0.3%
) 96
 
0.2%
66
 
0.2%
8 63
 
0.1%
S 50
 
0.1%
1 43
 
0.1%
K 40
 
0.1%
B 38
 
0.1%
5 33
 
0.1%
Other values (56) 518
 
1.2%
Hangul
ValueCountFrequency (%)
436
 
2.3%
417
 
2.2%
345
 
1.8%
331
 
1.7%
323
 
1.7%
276
 
1.5%
271
 
1.4%
255
 
1.3%
255
 
1.3%
250
 
1.3%
Other values (770) 15799
83.3%
None
ValueCountFrequency (%)
9
90.0%
· 1
 
10.0%
Compat Jamo
ValueCountFrequency (%)
2
100.0%
CJK
ValueCountFrequency (%)
1
50.0%
1
50.0%

주요상권 코드
Real number (ℝ)

HIGH CORRELATION 

Distinct70
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9732.4462
Minimum9649
Maximum10367
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T07:46:29.990423image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum9649
5-th percentile9651
Q19663
median9683
Q39701
95-th percentile10349
Maximum10367
Range718
Interquartile range (IQR)38

Descriptive statistics

Standard deviation181.9773
Coefficient of variation (CV)0.018698002
Kurtosis7.4908382
Mean9732.4462
Median Absolute Deviation (MAD)20
Skewness3.0581103
Sum97324462
Variance33115.738
MonotonicityNot monotonic
2023-12-13T07:46:30.444120image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
9691 648
 
6.5%
9704 630
 
6.3%
9662 422
 
4.2%
9678 407
 
4.1%
9651 406
 
4.1%
9650 352
 
3.5%
9692 338
 
3.4%
9663 332
 
3.3%
9709 308
 
3.1%
9657 251
 
2.5%
Other values (60) 5906
59.1%
ValueCountFrequency (%)
9649 99
 
1.0%
9650 352
3.5%
9651 406
4.1%
9652 96
 
1.0%
9653 78
 
0.8%
9654 114
 
1.1%
9655 43
 
0.4%
9656 145
 
1.5%
9657 251
2.5%
9658 99
 
1.0%
ValueCountFrequency (%)
10367 55
 
0.5%
10361 118
 
1.2%
10352 105
 
1.1%
10351 100
 
1.0%
10350 121
 
1.2%
10349 99
 
1.0%
10333 196
2.0%
9711 39
 
0.4%
9710 99
 
1.0%
9709 308
3.1%
Distinct70
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-13T07:46:30.751122image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length8
Mean length4.2138
Min length3

Characters and Unicode

Total characters42138
Distinct characters91
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row옥동_2
2nd row인동동
3rd row김천구미역
4th row선산읍
5th row안동시청_2
ValueCountFrequency (%)
죽도시장 648
 
6.3%
경주역 630
 
6.1%
구미역_1 422
 
4.1%
동문동 407
 
3.9%
안동시청_2 406
 
3.9%
안동시청_1 352
 
3.4%
중앙상가 338
 
3.3%
구미역_2 332
 
3.2%
영천역 308
 
3.0%
하망동 251
 
2.4%
Other values (62) 6247
60.4%
2023-12-13T07:46:31.180173image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4365
 
10.4%
_ 2529
 
6.0%
2472
 
5.9%
2196
 
5.2%
1 1654
 
3.9%
1439
 
3.4%
1319
 
3.1%
1208
 
2.9%
1171
 
2.8%
2 1158
 
2.7%
Other values (81) 22627
53.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 36368
86.3%
Decimal Number 2900
 
6.9%
Connector Punctuation 2529
 
6.0%
Space Separator 341
 
0.8%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4365
 
12.0%
2472
 
6.8%
2196
 
6.0%
1439
 
4.0%
1319
 
3.6%
1208
 
3.3%
1171
 
3.2%
982
 
2.7%
975
 
2.7%
953
 
2.6%
Other values (76) 19288
53.0%
Decimal Number
ValueCountFrequency (%)
1 1654
57.0%
2 1158
39.9%
3 88
 
3.0%
Connector Punctuation
ValueCountFrequency (%)
_ 2529
100.0%
Space Separator
ValueCountFrequency (%)
341
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 36368
86.3%
Common 5770
 
13.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4365
 
12.0%
2472
 
6.8%
2196
 
6.0%
1439
 
4.0%
1319
 
3.6%
1208
 
3.3%
1171
 
3.2%
982
 
2.7%
975
 
2.7%
953
 
2.6%
Other values (76) 19288
53.0%
Common
ValueCountFrequency (%)
_ 2529
43.8%
1 1654
28.7%
2 1158
20.1%
341
 
5.9%
3 88
 
1.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 36368
86.3%
ASCII 5770
 
13.7%

Most frequent character per block

Hangul
ValueCountFrequency (%)
4365
 
12.0%
2472
 
6.8%
2196
 
6.0%
1439
 
4.0%
1319
 
3.6%
1208
 
3.3%
1171
 
3.2%
982
 
2.7%
975
 
2.7%
953
 
2.6%
Other values (76) 19288
53.0%
ASCII
ValueCountFrequency (%)
_ 2529
43.8%
1 1654
28.7%
2 1158
20.1%
341
 
5.9%
3 88
 
1.5%

시군명
Categorical

HIGH CORRELATION 

Distinct22
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
구미시
2104 
포항시 북구
1292 
경주시
1181 
안동시
932 
경산시
700 
Other values (17)
3791 

Length

Max length6
Median length3
Mean length3.5907
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row안동시
2nd row구미시
3rd row김천시
4th row구미시
5th row안동시

Common Values

ValueCountFrequency (%)
구미시 2104
21.0%
포항시 북구 1292
12.9%
경주시 1181
11.8%
안동시 932
9.3%
경산시 700
 
7.0%
포항시 남구 677
 
6.8%
영주시 439
 
4.4%
상주시 407
 
4.1%
김천시 386
 
3.9%
영천시 308
 
3.1%
Other values (12) 1574
15.7%

Length

2023-12-13T07:46:31.360129image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
구미시 2104
17.6%
포항시 1969
16.5%
북구 1292
10.8%
경주시 1181
9.9%
안동시 932
7.8%
경산시 700
 
5.8%
남구 677
 
5.7%
영주시 439
 
3.7%
상주시 407
 
3.4%
김천시 386
 
3.2%
Other values (13) 1882
15.7%
Distinct6583
Distinct (%)65.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-13T07:46:31.739587image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length21
Median length19
Mean length12.8311
Min length9

Characters and Unicode

Total characters128311
Distinct characters178
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4851 ?
Unique (%)48.5%

Sample

1st row안동시 옥동3길 41-9
2nd row구미시 수출대로 502
3rd row김천시 혁신3로 35
4th row구미시 단계동길 24
5th row안동시 영가로 19
ValueCountFrequency (%)
구미시 2104
 
6.6%
포항시 1969
 
6.2%
북구 1292
 
4.0%
경주시 1181
 
3.7%
안동시 932
 
2.9%
경산시 700
 
2.2%
남구 677
 
2.1%
중앙로 459
 
1.4%
영주시 439
 
1.4%
상주시 407
 
1.3%
Other values (2235) 21809
68.2%
2023-12-13T07:46:32.313788image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
21969
 
17.1%
9992
 
7.8%
1 7945
 
6.2%
7012
 
5.5%
5308
 
4.1%
4738
 
3.7%
2 4655
 
3.6%
3 3568
 
2.8%
- 2899
 
2.3%
4 2791
 
2.2%
Other values (168) 57434
44.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 72423
56.4%
Decimal Number 31020
24.2%
Space Separator 21969
 
17.1%
Dash Punctuation 2899
 
2.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
9992
 
13.8%
7012
 
9.7%
5308
 
7.3%
4738
 
6.5%
2626
 
3.6%
2349
 
3.2%
2340
 
3.2%
2101
 
2.9%
1996
 
2.8%
1972
 
2.7%
Other values (156) 31989
44.2%
Decimal Number
ValueCountFrequency (%)
1 7945
25.6%
2 4655
15.0%
3 3568
11.5%
4 2791
 
9.0%
5 2438
 
7.9%
6 2244
 
7.2%
7 2003
 
6.5%
9 1806
 
5.8%
8 1788
 
5.8%
0 1782
 
5.7%
Space Separator
ValueCountFrequency (%)
21969
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2899
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 72423
56.4%
Common 55888
43.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
9992
 
13.8%
7012
 
9.7%
5308
 
7.3%
4738
 
6.5%
2626
 
3.6%
2349
 
3.2%
2340
 
3.2%
2101
 
2.9%
1996
 
2.8%
1972
 
2.7%
Other values (156) 31989
44.2%
Common
ValueCountFrequency (%)
21969
39.3%
1 7945
 
14.2%
2 4655
 
8.3%
3 3568
 
6.4%
- 2899
 
5.2%
4 2791
 
5.0%
5 2438
 
4.4%
6 2244
 
4.0%
7 2003
 
3.6%
9 1806
 
3.2%
Other values (2) 3570
 
6.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 72423
56.4%
ASCII 55888
43.6%

Most frequent character per block

ASCII
ValueCountFrequency (%)
21969
39.3%
1 7945
 
14.2%
2 4655
 
8.3%
3 3568
 
6.4%
- 2899
 
5.2%
4 2791
 
5.0%
5 2438
 
4.4%
6 2244
 
4.0%
7 2003
 
3.6%
9 1806
 
3.2%
Other values (2) 3570
 
6.4%
Hangul
ValueCountFrequency (%)
9992
 
13.8%
7012
 
9.7%
5308
 
7.3%
4738
 
6.5%
2626
 
3.6%
2349
 
3.2%
2340
 
3.2%
2101
 
2.9%
1996
 
2.8%
1972
 
2.7%
Other values (156) 31989
44.2%

Interactions

2023-12-13T07:46:28.148159image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T07:46:27.874971image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T07:46:28.278717image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T07:46:28.000309image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T07:46:32.438197image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
상가업소 번호주요상권 코드주요 상권명시군명
상가업소 번호1.0000.0000.1050.059
주요상권 코드0.0001.0001.0000.668
주요 상권명0.1051.0001.0001.000
시군명0.0590.6681.0001.000
2023-12-13T07:46:32.547447image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
상가업소 번호주요상권 코드시군명
상가업소 번호1.000-0.0110.022
주요상권 코드-0.0111.0000.596
시군명0.0220.5961.000

Missing values

2023-12-13T07:46:28.437501image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T07:46:28.603977image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

상가업소 번호상호명주요상권 코드주요 상권명시군명도로명 주소
17893142021참조****9653옥동_2안동시안동시 옥동3길 41-9
21903171994인동*******9671인동동구미시구미시 수출대로 502
16002126512설빙*****10349김천구미역김천시김천시 혁신3로 35
452935650+구*9665선산읍구미시구미시 단계동길 24
13199104020골든*9651안동시청_2안동시안동시 영가로 19
20719163260밀라*9678동문동상주시상주시 남성로 82-15
18808149106서울**9678동문동상주시상주시 중앙시장길 15-14
14946117894한국*********9651안동시청_2안동시안동시 태사1길 2
371329204화신**9690죽도동포항시 남구포항시 남구 상대로 101
538442622신평****9668신평1동_1구미시구미시 신비로4길 4
상가업소 번호상호명주요상권 코드주요 상권명시군명도로명 주소
13169103821뉴하***********9701진량읍경산시경산시 공단1로1길 4
15080119034씨유******9657하망동영주시영주시 원당로 79
460336134(주)리더**9672임수동 공구상가구미시구미시 3공단1로 302-7
24557192487광장********9692중앙상가포항시 북구포항시 북구 불종로 43
607347892대경********9701진량읍경산시경산시 공단1로1길 47
25004195718e모*9659의성역의성군의성군 문소3길 78
11488829엘지*********9675군위군법원군위군군위군 중앙길 95-1
850266801어탕**9659의성역의성군의성군 문소3길 36-3
25891203403한영*****9660고령시외버스터미널고령군고령군 시장1길 18
734958109김지****9662구미역_1구미시구미시 구미중앙로9길 14-2