Overview

Dataset statistics

Number of variables6
Number of observations144
Missing cells118
Missing cells (%)13.7%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory7.3 KiB
Average record size in memory51.9 B

Variable types

Numeric3
Categorical1
Text2

Dataset

Description중국 26개 지역 기준 신소재, 수질관리 등 친환경 분야 기업 수 및 투자자금(지역명, 항목 수, 주요 항목명, 투자액 등)
Author한국환경산업기술원
URLhttps://www.data.go.kr/data/15068052/fileData.do

Alerts

연번 is highly overall correlated with 지역High correlation
항목 수(친환경) is highly overall correlated with 항목 수(전체) and 1 other fieldsHigh correlation
항목 수(전체) is highly overall correlated with 항목 수(친환경) and 1 other fieldsHigh correlation
지역 is highly overall correlated with 연번 and 2 other fieldsHigh correlation
비고 has 118 (81.9%) missing valuesMissing

Reproduction

Analysis started2023-12-12 16:02:00.493860
Analysis finished2023-12-12 16:02:01.949722
Duration1.46 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

HIGH CORRELATION 

Distinct26
Distinct (%)18.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean14.451389
Minimum1
Maximum26
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.4 KiB
2023-12-13T01:02:02.022639image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile3
Q17
median15
Q321
95-th percentile25
Maximum26
Range25
Interquartile range (IQR)14

Descriptive statistics

Standard deviation7.5704891
Coefficient of variation (CV)0.52385893
Kurtosis-1.3008878
Mean14.451389
Median Absolute Deviation (MAD)7
Skewness-0.11778573
Sum2081
Variance57.312306
MonotonicityIncreasing
2023-12-13T01:02:02.212374image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=26)
ValueCountFrequency (%)
7 9
 
6.2%
24 9
 
6.2%
25 8
 
5.6%
20 7
 
4.9%
19 7
 
4.9%
16 7
 
4.9%
23 7
 
4.9%
17 6
 
4.2%
22 6
 
4.2%
10 6
 
4.2%
Other values (16) 72
50.0%
ValueCountFrequency (%)
1 4
2.8%
2 3
 
2.1%
3 4
2.8%
4 5
3.5%
5 6
4.2%
6 6
4.2%
7 9
6.2%
8 6
4.2%
9 4
2.8%
10 6
4.2%
ValueCountFrequency (%)
26 5
3.5%
25 8
5.6%
24 9
6.2%
23 7
4.9%
22 6
4.2%
21 5
3.5%
20 7
4.9%
19 7
4.9%
18 4
2.8%
17 6
4.2%

지역
Categorical

HIGH CORRELATION 

Distinct26
Distinct (%)18.1%
Missing0
Missing (%)0.0%
Memory size1.3 KiB
광동성
 
9
췐저우시
 
9
간쑤성
 
8
쟝먼시
 
7
샨시성
 
7
Other values (21)
104 

Length

Max length5
Median length3
Mean length3.4027778
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row허베이성
2nd row허베이성
3rd row허베이성
4th row허베이성
5th row즈보시

Common Values

ValueCountFrequency (%)
광동성 9
 
6.2%
췐저우시 9
 
6.2%
간쑤성 8
 
5.6%
쟝먼시 7
 
4.9%
샨시성 7
 
4.9%
구이저우성 7
 
4.9%
후난성 7
 
4.9%
스자좡시 6
 
4.2%
장쑤성 6
 
4.2%
상하이시 6
 
4.2%
Other values (16) 72
50.0%

Length

2023-12-13T01:02:02.355366image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
광동성 9
 
6.2%
췐저우시 9
 
6.2%
간쑤성 8
 
5.6%
쟝먼시 7
 
4.9%
샨시성 7
 
4.9%
구이저우성 7
 
4.9%
후난성 7
 
4.9%
스자좡시 6
 
4.2%
장쑤성 6
 
4.2%
상하이시 6
 
4.2%
Other values (16) 72
50.0%

항목 수(친환경)
Real number (ℝ)

HIGH CORRELATION 

Distinct20
Distinct (%)13.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean28.125
Minimum3
Maximum98
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.4 KiB
2023-12-13T01:02:02.506318image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile5
Q113
median20.5
Q332
95-th percentile98
Maximum98
Range95
Interquartile range (IQR)19

Descriptive statistics

Standard deviation25.198478
Coefficient of variation (CV)0.89594588
Kurtosis2.596846
Mean28.125
Median Absolute Deviation (MAD)7.5
Skewness1.8549935
Sum4050
Variance634.96329
MonotonicityNot monotonic
2023-12-13T01:02:02.616470image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%)
13 17
 
11.8%
5 14
 
9.7%
25 12
 
8.3%
20 11
 
7.6%
27 9
 
6.2%
98 9
 
6.2%
38 8
 
5.6%
32 7
 
4.9%
88 7
 
4.9%
26 6
 
4.2%
Other values (10) 44
30.6%
ValueCountFrequency (%)
3 2
 
1.4%
4 5
 
3.5%
5 14
9.7%
9 4
 
2.8%
12 3
 
2.1%
13 17
11.8%
17 4
 
2.8%
18 6
 
4.2%
19 6
 
4.2%
20 11
7.6%
ValueCountFrequency (%)
98 9
6.2%
88 7
4.9%
38 8
5.6%
36 4
 
2.8%
33 6
4.2%
32 7
4.9%
27 9
6.2%
26 6
4.2%
25 12
8.3%
21 4
 
2.8%

항목 수(전체)
Real number (ℝ)

HIGH CORRELATION 

Distinct25
Distinct (%)17.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean596.05556
Minimum80
Maximum3357
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.4 KiB
2023-12-13T01:02:03.043787image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum80
5-th percentile89
Q1240
median346
Q3656
95-th percentile1230
Maximum3357
Range3277
Interquartile range (IQR)416

Descriptive statistics

Standard deviation705.68311
Coefficient of variation (CV)1.1839217
Kurtosis9.017434
Mean596.05556
Median Absolute Deviation (MAD)194
Skewness2.9433771
Sum85832
Variance497988.65
MonotonicityNot monotonic
2023-12-13T01:02:03.257579image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=25)
ValueCountFrequency (%)
656 9
 
6.2%
300 9
 
6.2%
1230 9
 
6.2%
89 8
 
5.6%
3357 7
 
4.9%
105 7
 
4.9%
469 7
 
4.9%
600 7
 
4.9%
310 6
 
4.2%
924 6
 
4.2%
Other values (15) 69
47.9%
ValueCountFrequency (%)
80 5
3.5%
89 8
5.6%
105 7
4.9%
144 4
2.8%
152 6
4.2%
157 2
 
1.4%
240 6
4.2%
265 3
 
2.1%
300 9
6.2%
310 6
4.2%
ValueCountFrequency (%)
3357 7
4.9%
1230 9
6.2%
1132 4
2.8%
1000 4
2.8%
924 6
4.2%
656 9
6.2%
600 7
4.9%
536 4
2.8%
487 5
3.5%
469 7
4.9%
Distinct127
Distinct (%)88.2%
Missing0
Missing (%)0.0%
Memory size1.3 KiB
2023-12-13T01:02:03.569236image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length25
Median length20
Mean length12.493056
Min length6

Characters and Unicode

Total characters1799
Distinct characters172
Distinct categories6 ?
Distinct scripts4 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique119 ?
Unique (%)82.6%

Sample

1st row신에너지·신소재석유화학 개발
2nd row오수처리시설 건설
3rd row징진지 친환경 물류단지 건설
4th row스마트 친환경 제조단지 건설 등
5th row산업단지 및 경제개발구 녹화
ValueCountFrequency (%)
건설 79
 
17.9%
26
 
5.9%
22
 
5.0%
생활쓰레기 12
 
2.7%
개발 9
 
2.0%
정비 8
 
1.8%
소각발전소 8
 
1.8%
처리시설 6
 
1.4%
친환경 6
 
1.4%
고체폐기물 6
 
1.4%
Other values (182) 259
58.7%
2023-12-13T01:02:04.071280image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
297
 
16.5%
131
 
7.3%
82
 
4.6%
52
 
2.9%
50
 
2.8%
47
 
2.6%
45
 
2.5%
32
 
1.8%
32
 
1.8%
31
 
1.7%
Other values (162) 1000
55.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1477
82.1%
Space Separator 297
 
16.5%
Other Punctuation 12
 
0.7%
Uppercase Letter 9
 
0.5%
Open Punctuation 2
 
0.1%
Close Punctuation 2
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
131
 
8.9%
82
 
5.6%
52
 
3.5%
50
 
3.4%
47
 
3.2%
45
 
3.0%
32
 
2.2%
32
 
2.2%
31
 
2.1%
29
 
2.0%
Other values (151) 946
64.0%
Uppercase Letter
ValueCountFrequency (%)
G 2
22.2%
N 2
22.2%
L 2
22.2%
P 1
11.1%
M 1
11.1%
A 1
11.1%
Other Punctuation
ValueCountFrequency (%)
· 10
83.3%
, 2
 
16.7%
Space Separator
ValueCountFrequency (%)
297
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1473
81.9%
Common 313
 
17.4%
Latin 9
 
0.5%
Han 4
 
0.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
131
 
8.9%
82
 
5.6%
52
 
3.5%
50
 
3.4%
47
 
3.2%
45
 
3.1%
32
 
2.2%
32
 
2.2%
31
 
2.1%
29
 
2.0%
Other values (149) 942
64.0%
Latin
ValueCountFrequency (%)
G 2
22.2%
N 2
22.2%
L 2
22.2%
P 1
11.1%
M 1
11.1%
A 1
11.1%
Common
ValueCountFrequency (%)
297
94.9%
· 10
 
3.2%
( 2
 
0.6%
) 2
 
0.6%
, 2
 
0.6%
Han
ValueCountFrequency (%)
2
50.0%
2
50.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1473
81.9%
ASCII 312
 
17.3%
None 10
 
0.6%
CJK 4
 
0.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
297
95.2%
( 2
 
0.6%
) 2
 
0.6%
G 2
 
0.6%
, 2
 
0.6%
N 2
 
0.6%
L 2
 
0.6%
P 1
 
0.3%
M 1
 
0.3%
A 1
 
0.3%
Hangul
ValueCountFrequency (%)
131
 
8.9%
82
 
5.6%
52
 
3.5%
50
 
3.4%
47
 
3.2%
45
 
3.1%
32
 
2.2%
32
 
2.2%
31
 
2.1%
29
 
2.0%
Other values (149) 942
64.0%
None
ValueCountFrequency (%)
· 10
100.0%
CJK
ValueCountFrequency (%)
2
50.0%
2
50.0%

비고
Text

MISSING 

Distinct23
Distinct (%)88.5%
Missing118
Missing (%)81.9%
Memory size1.3 KiB
2023-12-13T01:02:04.392514image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length81
Median length31
Mean length28.461538
Min length18

Characters and Unicode

Total characters740
Distinct characters101
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique22 ?
Unique (%)84.6%

Sample

1st row총투자액 1조 8833.1억 위안 (한화 약 322.4조 원)
2nd row265개 항목 중 건설항목 245개
3rd row총투자액 1,165.3억 위안 (한화 약 20조 원)
4th row지난시 신구동력전환구를 통한 빠른신인프라 건설 진행(프로젝트 승인,입지계획,친환경계획,입항,환경 영향평가 등 심사 비준을 마치고 융자채널을 넓힘)
5th row총투자액 6,416.3억 위안 (한화 약 110조 원)
ValueCountFrequency (%)
총투자액 18
 
9.9%
18
 
9.9%
18
 
9.9%
한화 18
 
9.9%
위안 18
 
9.9%
1조 5
 
2.8%
관련 4
 
2.2%
환경보호 4
 
2.2%
적음 4
 
2.2%
프로젝트가 4
 
2.2%
Other values (67) 70
38.7%
2023-12-13T01:02:04.913079image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
155
 
20.9%
1 26
 
3.5%
26
 
3.5%
2 20
 
2.7%
19
 
2.6%
19
 
2.6%
, 19
 
2.6%
) 19
 
2.6%
3 19
 
2.6%
( 19
 
2.6%
Other values (91) 399
53.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 388
52.4%
Space Separator 155
 
20.9%
Decimal Number 131
 
17.7%
Other Punctuation 28
 
3.8%
Close Punctuation 19
 
2.6%
Open Punctuation 19
 
2.6%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
26
 
6.7%
19
 
4.9%
19
 
4.9%
18
 
4.6%
18
 
4.6%
18
 
4.6%
18
 
4.6%
18
 
4.6%
18
 
4.6%
18
 
4.6%
Other values (76) 198
51.0%
Decimal Number
ValueCountFrequency (%)
1 26
19.8%
2 20
15.3%
3 19
14.5%
5 14
10.7%
4 13
9.9%
6 9
 
6.9%
8 9
 
6.9%
9 8
 
6.1%
7 7
 
5.3%
0 6
 
4.6%
Other Punctuation
ValueCountFrequency (%)
, 19
67.9%
. 9
32.1%
Space Separator
ValueCountFrequency (%)
155
100.0%
Close Punctuation
ValueCountFrequency (%)
) 19
100.0%
Open Punctuation
ValueCountFrequency (%)
( 19
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 388
52.4%
Common 352
47.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
26
 
6.7%
19
 
4.9%
19
 
4.9%
18
 
4.6%
18
 
4.6%
18
 
4.6%
18
 
4.6%
18
 
4.6%
18
 
4.6%
18
 
4.6%
Other values (76) 198
51.0%
Common
ValueCountFrequency (%)
155
44.0%
1 26
 
7.4%
2 20
 
5.7%
, 19
 
5.4%
) 19
 
5.4%
3 19
 
5.4%
( 19
 
5.4%
5 14
 
4.0%
4 13
 
3.7%
6 9
 
2.6%
Other values (5) 39
 
11.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 388
52.4%
ASCII 352
47.6%

Most frequent character per block

ASCII
ValueCountFrequency (%)
155
44.0%
1 26
 
7.4%
2 20
 
5.7%
, 19
 
5.4%
) 19
 
5.4%
3 19
 
5.4%
( 19
 
5.4%
5 14
 
4.0%
4 13
 
3.7%
6 9
 
2.6%
Other values (5) 39
 
11.1%
Hangul
ValueCountFrequency (%)
26
 
6.7%
19
 
4.9%
19
 
4.9%
18
 
4.6%
18
 
4.6%
18
 
4.6%
18
 
4.6%
18
 
4.6%
18
 
4.6%
18
 
4.6%
Other values (76) 198
51.0%

Interactions

2023-12-13T01:02:01.446375image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:02:00.831921image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:02:01.159635image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:02:01.546918image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:02:00.937375image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:02:01.252288image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:02:01.645081image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:02:01.052021image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:02:01.352168image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T01:02:05.052816image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번지역항목 수(친환경)항목 수(전체)비고
연번1.0001.0000.8090.8660.659
지역1.0001.0001.0001.0001.000
항목 수(친환경)0.8091.0001.0000.8330.901
항목 수(전체)0.8661.0000.8331.0000.947
비고0.6591.0000.9010.9471.000
2023-12-13T01:02:05.207037image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번항목 수(친환경)항목 수(전체)지역
연번1.0000.3980.0620.938
항목 수(친환경)0.3981.0000.5440.925
항목 수(전체)0.0620.5441.0000.921
지역0.9380.9250.9211.000

Missing values

2023-12-13T01:02:01.770258image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T01:02:01.893639image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연번지역항목 수(친환경)항목 수(전체)주요 항목비고
01허베이성5536신에너지·신소재석유화학 개발총투자액 1조 8833.1억 위안 (한화 약 322.4조 원)
11허베이성5536오수처리시설 건설<NA>
21허베이성5536징진지 친환경 물류단지 건설<NA>
31허베이성5536스마트 친환경 제조단지 건설 등<NA>
42즈보시12265산업단지 및 경제개발구 녹화265개 항목 중 건설항목 245개
52즈보시12265홍수 방지시설·스마트 생태단지 건설<NA>
62즈보시12265친환경소재·복합신소재 개발 및 스마트 제조 등<NA>
73난징시36334친환경 과학기술인프라 및 기초인프라 건설총투자액 1,165.3억 위안 (한화 약 20조 원)
83난징시36334공공안전 및 관리<NA>
93난징시36334농촌 도로 업그레이드<NA>
연번지역항목 수(친환경)항목 수(전체)주요 항목비고
13425간쑤성3889물환경 정비사업<NA>
13525간쑤성3889홍수방지시설 건설<NA>
13625간쑤성3889고체폐기물 매립장 건설<NA>
13725간쑤성3889생활오수처리시설 건설<NA>
13825간쑤성3889재생에너지 급열시설 건설 등<NA>
13926톈진시20346산업단지 리튬배터리 재이용 사업총투자액 1조 25억 위안 (한화 약 172조 원)
14026톈진시20346신소재 및 복합소재 개발<NA>
14126톈진시20346LNG저장시설 건설<NA>
14226톈진시20346하천·호수 물환경 종합정비<NA>
14326톈진시20346생태복원사업 등<NA>