Overview

Dataset statistics

Number of variables20
Number of observations166
Missing cells458
Missing cells (%)13.8%
Duplicate rows8
Duplicate rows (%)4.8%
Total size in memory26.1 KiB
Average record size in memory160.8 B

Variable types

Text2
Categorical3
Unsupported15

Dataset

Description파일 다운로드
Author강서구
URLhttps://data.seoul.go.kr/dataList/OA-21837/F/1/datasetView.do

Alerts

Dataset has 8 (4.8%) duplicate rowsDuplicates
Unnamed: 1 is highly overall correlated with Unnamed: 3 and 1 other fieldsHigh correlation
Unnamed: 4 is highly overall correlated with Unnamed: 1High correlation
Unnamed: 3 is highly overall correlated with Unnamed: 1High correlation
Unnamed: 1 is highly imbalanced (84.4%)Imbalance
기초생활보장 수급자구분에 따른 연령별, 성별 현황 has 155 (93.4%) missing valuesMissing
Unnamed: 2 has 140 (84.3%) missing valuesMissing
Unnamed: 5 has 11 (6.6%) missing valuesMissing
Unnamed: 6 has 11 (6.6%) missing valuesMissing
Unnamed: 7 has 11 (6.6%) missing valuesMissing
Unnamed: 8 has 11 (6.6%) missing valuesMissing
Unnamed: 9 has 11 (6.6%) missing valuesMissing
Unnamed: 10 has 11 (6.6%) missing valuesMissing
Unnamed: 11 has 11 (6.6%) missing valuesMissing
Unnamed: 12 has 11 (6.6%) missing valuesMissing
Unnamed: 13 has 11 (6.6%) missing valuesMissing
Unnamed: 14 has 11 (6.6%) missing valuesMissing
Unnamed: 15 has 11 (6.6%) missing valuesMissing
Unnamed: 16 has 11 (6.6%) missing valuesMissing
Unnamed: 17 has 9 (5.4%) missing valuesMissing
Unnamed: 18 has 11 (6.6%) missing valuesMissing
Unnamed: 19 has 11 (6.6%) missing valuesMissing
Unnamed: 5 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 6 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 7 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 8 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 9 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 10 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 11 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 12 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 13 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 14 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 15 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 16 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 17 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 18 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 19 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-12-11 05:39:27.523305
Analysis finished2023-12-11 05:39:29.063246
Duration1.54 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct7
Distinct (%)63.6%
Missing155
Missing (%)93.4%
Memory size1.4 KiB
2023-12-11T14:39:29.192228image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length73
Median length13
Mean length12.272727
Min length2

Characters and Unicode

Total characters135
Distinct characters46
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6 ?
Unique (%)54.5%

Sample

1st row(2020년 05월)
2nd row서울특별시 강서구
3rd row자격 : 전체(중복제외)
4th row시도
5th row합계
ValueCountFrequency (%)
서울특별시 7
25.9%
4
14.8%
강서구 2
 
7.4%
2020-6-30 1
 
3.7%
생활복지국 1
 
3.7%
출력부서 1
 
3.7%
육심석 1
 
3.7%
출력자 1
 
3.7%
11:19:31 1
 
3.7%
합계 1
 
3.7%
Other values (7) 7
25.9%
2023-12-11T14:39:29.579265image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
25
18.5%
10
 
7.4%
8
 
5.9%
7
 
5.2%
7
 
5.2%
7
 
5.2%
: 6
 
4.4%
0 6
 
4.4%
1 4
 
3.0%
2 4
 
3.0%
Other values (36) 51
37.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 79
58.5%
Space Separator 25
 
18.5%
Decimal Number 19
 
14.1%
Other Punctuation 6
 
4.4%
Open Punctuation 2
 
1.5%
Dash Punctuation 2
 
1.5%
Close Punctuation 2
 
1.5%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
10
 
12.7%
8
 
10.1%
7
 
8.9%
7
 
8.9%
7
 
8.9%
3
 
3.8%
3
 
3.8%
3
 
3.8%
2
 
2.5%
2
 
2.5%
Other values (24) 27
34.2%
Decimal Number
ValueCountFrequency (%)
0 6
31.6%
1 4
21.1%
2 4
21.1%
3 2
 
10.5%
9 1
 
5.3%
6 1
 
5.3%
5 1
 
5.3%
Space Separator
ValueCountFrequency (%)
25
100.0%
Other Punctuation
ValueCountFrequency (%)
: 6
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 79
58.5%
Common 56
41.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
10
 
12.7%
8
 
10.1%
7
 
8.9%
7
 
8.9%
7
 
8.9%
3
 
3.8%
3
 
3.8%
3
 
3.8%
2
 
2.5%
2
 
2.5%
Other values (24) 27
34.2%
Common
ValueCountFrequency (%)
25
44.6%
: 6
 
10.7%
0 6
 
10.7%
1 4
 
7.1%
2 4
 
7.1%
( 2
 
3.6%
- 2
 
3.6%
) 2
 
3.6%
3 2
 
3.6%
9 1
 
1.8%
Other values (2) 2
 
3.6%

Most occurring blocks

ValueCountFrequency (%)
Hangul 79
58.5%
ASCII 56
41.5%

Most frequent character per block

ASCII
ValueCountFrequency (%)
25
44.6%
: 6
 
10.7%
0 6
 
10.7%
1 4
 
7.1%
2 4
 
7.1%
( 2
 
3.6%
- 2
 
3.6%
) 2
 
3.6%
3 2
 
3.6%
9 1
 
1.8%
Other values (2) 2
 
3.6%
Hangul
ValueCountFrequency (%)
10
 
12.7%
8
 
10.1%
7
 
8.9%
7
 
8.9%
7
 
8.9%
3
 
3.8%
3
 
3.8%
3
 
3.8%
2
 
2.5%
2
 
2.5%
Other values (24) 27
34.2%

Unnamed: 1
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)1.8%
Missing0
Missing (%)0.0%
Memory size1.4 KiB
<NA>
160 
강서구
 
5
시군구
 
1

Length

Max length4
Median length4
Mean length3.9638554
Min length3

Unique

Unique1 ?
Unique (%)0.6%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 160
96.4%
강서구 5
 
3.0%
시군구 1
 
0.6%

Length

2023-12-11T14:39:29.754828image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T14:39:29.865038image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 160
96.4%
강서구 5
 
3.0%
시군구 1
 
0.6%

Unnamed: 2
Text

MISSING 

Distinct22
Distinct (%)84.6%
Missing140
Missing (%)84.3%
Memory size1.4 KiB
2023-12-11T14:39:30.053587image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length4
Median length4
Mean length3.8076923
Min length2

Characters and Unicode

Total characters99
Distinct characters27
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique18 ?
Unique (%)69.2%

Sample

1st row읍면동
2nd row소계
3rd row염창동
4th row등촌1동
5th row등촌2동
ValueCountFrequency (%)
방화1동 2
 
7.7%
화곡본동 2
 
7.7%
등촌3동 2
 
7.7%
가양2동 2
 
7.7%
염창동 1
 
3.8%
등촌2동 1
 
3.8%
등촌1동 1
 
3.8%
방화2동 1
 
3.8%
소계 1
 
3.8%
공항동 1
 
3.8%
Other values (12) 12
46.2%
2023-12-11T14:39:30.439925image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
25
25.3%
12
12.1%
8
 
8.1%
1 6
 
6.1%
3 5
 
5.1%
2 5
 
5.1%
4
 
4.0%
4
 
4.0%
4
 
4.0%
4
 
4.0%
Other values (17) 22
22.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 80
80.8%
Decimal Number 19
 
19.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
25
31.2%
12
15.0%
8
 
10.0%
4
 
5.0%
4
 
5.0%
4
 
5.0%
4
 
5.0%
4
 
5.0%
2
 
2.5%
2
 
2.5%
Other values (11) 11
13.8%
Decimal Number
ValueCountFrequency (%)
1 6
31.6%
3 5
26.3%
2 5
26.3%
8 1
 
5.3%
6 1
 
5.3%
4 1
 
5.3%

Most occurring scripts

ValueCountFrequency (%)
Hangul 80
80.8%
Common 19
 
19.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
25
31.2%
12
15.0%
8
 
10.0%
4
 
5.0%
4
 
5.0%
4
 
5.0%
4
 
5.0%
4
 
5.0%
2
 
2.5%
2
 
2.5%
Other values (11) 11
13.8%
Common
ValueCountFrequency (%)
1 6
31.6%
3 5
26.3%
2 5
26.3%
8 1
 
5.3%
6 1
 
5.3%
4 1
 
5.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 80
80.8%
ASCII 19
 
19.2%

Most frequent character per block

Hangul
ValueCountFrequency (%)
25
31.2%
12
15.0%
8
 
10.0%
4
 
5.0%
4
 
5.0%
4
 
5.0%
4
 
5.0%
4
 
5.0%
2
 
2.5%
2
 
2.5%
Other values (11) 11
13.8%
ASCII
ValueCountFrequency (%)
1 6
31.6%
3 5
26.3%
2 5
26.3%
8 1
 
5.3%
6 1
 
5.3%
4 1
 
5.3%

Unnamed: 3
Categorical

HIGH CORRELATION 

Distinct7
Distinct (%)4.2%
Missing0
Missing (%)0.0%
Memory size1.4 KiB
일반수급자가구
40 
조건부수급자가구
40 
특례수급자가구
40 
시설수급자
30 
<NA>
13 
Other values (2)
 
3

Length

Max length8
Median length7
Mean length6.5722892
Min length2

Unique

Unique1 ?
Unique (%)0.6%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
일반수급자가구 40
24.1%
조건부수급자가구 40
24.1%
특례수급자가구 40
24.1%
시설수급자 30
18.1%
<NA> 13
 
7.8%
기타 2
 
1.2%
수급자구분 1
 
0.6%

Length

2023-12-11T14:39:30.604918image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T14:39:30.732543image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
일반수급자가구 40
24.1%
조건부수급자가구 40
24.1%
특례수급자가구 40
24.1%
시설수급자 30
18.1%
na 13
 
7.8%
기타 2
 
1.2%
수급자구분 1
 
0.6%

Unnamed: 4
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)1.8%
Missing0
Missing (%)0.0%
Memory size1.4 KiB
여성
78 
남성
74 
<NA>
14 

Length

Max length4
Median length2
Mean length2.1686747
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
여성 78
47.0%
남성 74
44.6%
<NA> 14
 
8.4%

Length

2023-12-11T14:39:30.894083image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T14:39:31.038993image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
여성 78
47.0%
남성 74
44.6%
na 14
 
8.4%

Unnamed: 5
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing11
Missing (%)6.6%
Memory size1.4 KiB

Unnamed: 6
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing11
Missing (%)6.6%
Memory size1.4 KiB

Unnamed: 7
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing11
Missing (%)6.6%
Memory size1.4 KiB

Unnamed: 8
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing11
Missing (%)6.6%
Memory size1.4 KiB

Unnamed: 9
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing11
Missing (%)6.6%
Memory size1.4 KiB

Unnamed: 10
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing11
Missing (%)6.6%
Memory size1.4 KiB

Unnamed: 11
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing11
Missing (%)6.6%
Memory size1.4 KiB

Unnamed: 12
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing11
Missing (%)6.6%
Memory size1.4 KiB

Unnamed: 13
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing11
Missing (%)6.6%
Memory size1.4 KiB

Unnamed: 14
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing11
Missing (%)6.6%
Memory size1.4 KiB

Unnamed: 15
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing11
Missing (%)6.6%
Memory size1.4 KiB

Unnamed: 16
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing11
Missing (%)6.6%
Memory size1.4 KiB

Unnamed: 17
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing9
Missing (%)5.4%
Memory size1.4 KiB

Unnamed: 18
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing11
Missing (%)6.6%
Memory size1.4 KiB

Unnamed: 19
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing11
Missing (%)6.6%
Memory size1.4 KiB

Correlations

2023-12-11T14:39:31.132948image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
기초생활보장 수급자구분에 따른 연령별, 성별 현황Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 4
기초생활보장 수급자구분에 따른 연령별, 성별 현황1.0000.0001.0001.000NaN
Unnamed: 10.0001.0001.0001.000NaN
Unnamed: 21.0001.0001.0000.0000.000
Unnamed: 31.0001.0000.0001.0000.000
Unnamed: 4NaNNaN0.0000.0001.000
2023-12-11T14:39:31.262189image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Unnamed: 1Unnamed: 4Unnamed: 3
Unnamed: 11.0001.0001.000
Unnamed: 41.0001.0000.000
Unnamed: 31.0000.0001.000
2023-12-11T14:39:31.392138image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Unnamed: 1Unnamed: 3Unnamed: 4
Unnamed: 11.0001.0001.000
Unnamed: 31.0001.0000.000
Unnamed: 41.0000.0001.000

Missing values

2023-12-11T14:39:27.919349image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T14:39:28.194902image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-11T14:39:28.509027image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

기초생활보장 수급자구분에 따른 연령별, 성별 현황Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 4Unnamed: 5Unnamed: 6Unnamed: 7Unnamed: 8Unnamed: 9Unnamed: 10Unnamed: 11Unnamed: 12Unnamed: 13Unnamed: 14Unnamed: 15Unnamed: 16Unnamed: 17Unnamed: 18Unnamed: 19
0(2020년 05월)<NA><NA><NA><NA>NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
1<NA><NA><NA><NA><NA>NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
2서울특별시 강서구<NA><NA><NA><NA>NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN페이지 : 1 / 5NaNNaN
3자격 : 전체(중복제외)<NA><NA><NA><NA>NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN(단위: 명)NaNNaN
4<NA><NA><NA><NA><NA>NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
5시도시군구읍면동수급자구분<NA>합계7세미만7~12세13~15세16~18세19~29세30~39세40~49세50~59세60~64세65~69세70~74세75~79세80~89세90세이상
6합계<NA><NA><NA><NA>251874221010777914186011152257427227402428221721902509476
7서울특별시강서구소계<NA><NA>251874221010777914186011152257427227402428221721902509476
8<NA><NA>염창동일반수급자가구남성72281379814437411
9<NA><NA><NA>일반수급자가구여성82342251288785486
기초생활보장 수급자구분에 따른 연령별, 성별 현황Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 4Unnamed: 5Unnamed: 6Unnamed: 7Unnamed: 8Unnamed: 9Unnamed: 10Unnamed: 11Unnamed: 12Unnamed: 13Unnamed: 14Unnamed: 15Unnamed: 16Unnamed: 17Unnamed: 18Unnamed: 19
156<NA><NA><NA>조건부수급자가구여성16737212351723401625041
157<NA><NA><NA>시설수급자여성200000000000101
158<NA><NA><NA>특례수급자가구남성2900011024419610
159<NA><NA><NA>특례수급자가구여성3100000032535742
160<NA><NA><NA><NA><NA>NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
161<NA><NA><NA><NA><NA>NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
162<NA><NA><NA><NA><NA>NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
163<NA><NA><NA><NA><NA>NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
164<NA><NA><NA><NA><NA>NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
165출력일자 : 2020-6-30 11:19:31 출력자 : 육심석 출력부서 : 서울특별시 강서구 생활복지국 생활보장과<NA><NA><NA><NA>NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN

Duplicate rows

Most frequently occurring

기초생활보장 수급자구분에 따른 연령별, 성별 현황Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 4# duplicates
4<NA><NA><NA>조건부수급자가구여성20
6<NA><NA><NA>특례수급자가구여성20
2<NA><NA><NA>일반수급자가구여성19
3<NA><NA><NA>조건부수급자가구남성18
5<NA><NA><NA>특례수급자가구남성17
1<NA><NA><NA>시설수급자여성16
0<NA><NA><NA>시설수급자남성13
7<NA><NA><NA><NA><NA>7