Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells733
Missing cells (%)1.5%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

DateTime1
Categorical2
Numeric2

Dataset

Description측정일자,물재생센터명칭,처리장구분,1차하수처리량,2차하수처리량
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15561/S/1/datasetView.do

Alerts

2차하수처리량 is highly overall correlated with 물재생센터명칭 and 1 other fieldsHigh correlation
물재생센터명칭 is highly overall correlated with 2차하수처리량 and 1 other fieldsHigh correlation
처리장구분 is highly overall correlated with 2차하수처리량 and 1 other fieldsHigh correlation
1차하수처리량 has 733 (7.3%) missing valuesMissing
1차하수처리량 has 7132 (71.3%) zerosZeros

Reproduction

Analysis started2024-05-18 06:48:16.069656
Analysis finished2024-05-18 06:48:19.515168
Duration3.45 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2791
Distinct (%)27.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum2016-01-01 00:00:00
Maximum2023-10-30 00:00:00
2024-05-18T15:48:19.745329image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-18T15:48:20.299467image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

물재생센터명칭
Categorical

HIGH CORRELATION 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
중랑물재생센터
3364 
난지물재생센터
3323 
서남물재생센터
1786 
탄천물재생센터
1527 

Length

Max length7
Median length7
Mean length7
Min length7

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row난지물재생센터
2nd row난지물재생센터
3rd row난지물재생센터
4th row서남물재생센터
5th row탄천물재생센터

Common Values

ValueCountFrequency (%)
중랑물재생센터 3364
33.6%
난지물재생센터 3323
33.2%
서남물재생센터 1786
17.9%
탄천물재생센터 1527
15.3%

Length

2024-05-18T15:48:20.736354image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-18T15:48:21.053628image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
중랑물재생센터 3364
33.6%
난지물재생센터 3323
33.2%
서남물재생센터 1786
17.9%
탄천물재생센터 1527
15.3%

처리장구분
Categorical

HIGH CORRELATION 

Distinct7
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
제1처리장
3274 
제2처리장
3132 
제4처리장
858 
제3처리장
857 
정화조오니처리장
839 
Other values (2)
1040 

Length

Max length9
Median length5
Mean length5.5665
Min length5

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row제2처리장
2nd row제1처리장
3rd row중계펌프장(상암)
4th row제2처리장
5th row제1처리장

Common Values

ValueCountFrequency (%)
제1처리장 3274
32.7%
제2처리장 3132
31.3%
제4처리장 858
 
8.6%
제3처리장 857
 
8.6%
정화조오니처리장 839
 
8.4%
중계펌프장(상암) 787
 
7.9%
시설현대화 253
 
2.5%

Length

2024-05-18T15:48:21.631048image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-18T15:48:22.129920image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
제1처리장 3274
32.7%
제2처리장 3132
31.3%
제4처리장 858
 
8.6%
제3처리장 857
 
8.6%
정화조오니처리장 839
 
8.4%
중계펌프장(상암 787
 
7.9%
시설현대화 253
 
2.5%

1차하수처리량
Real number (ℝ)

MISSING  ZEROS 

Distinct1783
Distinct (%)19.2%
Missing733
Missing (%)7.3%
Infinite0
Infinite (%)0.0%
Mean10419.56
Minimum0
Maximum786736
Zeros7132
Zeros (%)71.3%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-05-18T15:48:22.743739image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile52609.3
Maximum786736
Range786736
Interquartile range (IQR)0

Descriptive statistics

Standard deviation48834.742
Coefficient of variation (CV)4.6868334
Kurtosis70.993992
Mean10419.56
Median Absolute Deviation (MAD)0
Skewness7.4950727
Sum96558064
Variance2.3848321 × 109
MonotonicityNot monotonic
2024-05-18T15:48:23.474078image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 7132
71.3%
40 15
 
0.1%
60 10
 
0.1%
30 8
 
0.1%
200 7
 
0.1%
80 7
 
0.1%
320 6
 
0.1%
20 6
 
0.1%
500 6
 
0.1%
70 5
 
0.1%
Other values (1773) 2065
 
20.6%
(Missing) 733
 
7.3%
ValueCountFrequency (%)
0 7132
71.3%
1 1
 
< 0.1%
3 1
 
< 0.1%
8 1
 
< 0.1%
10 5
 
0.1%
11 1
 
< 0.1%
12 1
 
< 0.1%
17 1
 
< 0.1%
20 6
 
0.1%
22 1
 
< 0.1%
ValueCountFrequency (%)
786736 1
< 0.1%
783290 1
< 0.1%
768850 1
< 0.1%
677819 1
< 0.1%
649543 1
< 0.1%
649236 1
< 0.1%
618086 1
< 0.1%
609105 1
< 0.1%
598983 1
< 0.1%
577190 1
< 0.1%

2차하수처리량
Real number (ℝ)

HIGH CORRELATION 

Distinct9623
Distinct (%)96.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean326089.86
Minimum0
Maximum1331833
Zeros26
Zeros (%)0.3%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-05-18T15:48:24.161865image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile4224.95
Q1163349.25
median272874.5
Q3437381.75
95-th percentile877818.1
Maximum1331833
Range1331833
Interquartile range (IQR)274032.5

Descriptive statistics

Standard deviation254572.74
Coefficient of variation (CV)0.78068279
Kurtosis0.29486581
Mean326089.86
Median Absolute Deviation (MAD)130649
Skewness0.89652487
Sum3.2608986 × 109
Variance6.4807281 × 1010
MonotonicityNot monotonic
2024-05-18T15:48:24.824656image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.0 26
 
0.3%
1080000.0 5
 
0.1%
845040.0 5
 
0.1%
7841.0 5
 
0.1%
660000.0 5
 
0.1%
175000.0 4
 
< 0.1%
591120.0 4
 
< 0.1%
9247.0 4
 
< 0.1%
8935.0 4
 
< 0.1%
4700.0 4
 
< 0.1%
Other values (9613) 9934
99.3%
ValueCountFrequency (%)
0.0 26
0.3%
126.0 1
 
< 0.1%
141.0 1
 
< 0.1%
148.0 2
 
< 0.1%
174.0 1
 
< 0.1%
229.0 1
 
< 0.1%
410.0 1
 
< 0.1%
413.0 1
 
< 0.1%
424.0 1
 
< 0.1%
605.0 1
 
< 0.1%
ValueCountFrequency (%)
1331833.0 1
< 0.1%
1088705.0 1
< 0.1%
1084480.0 1
< 0.1%
1082879.0 1
< 0.1%
1082471.0 1
< 0.1%
1081254.0 1
< 0.1%
1080912.0 1
< 0.1%
1080864.0 1
< 0.1%
1080840.0 2
< 0.1%
1080816.0 1
< 0.1%

Interactions

2024-05-18T15:48:18.059738image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-18T15:48:17.325262image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-18T15:48:18.419495image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-18T15:48:17.725276image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-18T15:48:25.362325image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
물재생센터명칭처리장구분1차하수처리량2차하수처리량
물재생센터명칭1.0000.6680.2110.782
처리장구분0.6681.0000.1090.767
1차하수처리량0.2110.1091.0000.518
2차하수처리량0.7820.7670.5181.000
2024-05-18T15:48:25.754704image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
처리장구분물재생센터명칭
처리장구분1.0000.531
물재생센터명칭0.5311.000
2024-05-18T15:48:26.099971image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
1차하수처리량2차하수처리량물재생센터명칭처리장구분
1차하수처리량1.0000.0700.1270.055
2차하수처리량0.0701.0000.6000.529
물재생센터명칭0.1270.6001.0000.531
처리장구분0.0550.5290.5311.000

Missing values

2024-05-18T15:48:18.886518image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-18T15:48:19.254425image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

측정일자물재생센터명칭처리장구분1차하수처리량2차하수처리량
232392018/05/08난지물재생센터제2처리장0323396.28
54482022/04/06난지물재생센터제1처리장0230405.0
74222021/11/05난지물재생센터중계펌프장(상암)<NA>10902.0
129572020/09/05서남물재생센터제2처리장2467911080000.0
266272017/07/29탄천물재생센터제1처리장0442951.0
65592022/01/10중랑물재생센터제2처리장0238303.0
231612018/05/14중랑물재생센터제3처리장1010721431.0
300162016/10/20서남물재생센터제2처리장0973800.0
241432018/02/21탄천물재생센터제1처리장0387306.0
222612018/07/28중랑물재생센터제3처리장6470774386.0
측정일자물재생센터명칭처리장구분1차하수처리량2차하수처리량
211062018/11/01탄천물재생센터제2처리장0320522.0
137322020/07/07탄천물재생센터제1처리장0412980.0
94392021/06/03난지물재생센터제1처리장6119326230.0
330872016/01/17탄천물재생센터제1처리장0368495.0
197332019/02/24서남물재생센터제1처리장0498165.0
111352021/01/23중랑물재생센터제2처리장0258441.0
115702020/12/21난지물재생센터제2처리장0275655.0
73242021/11/12중랑물재생센터제4처리장0165990.0
77092021/10/14난지물재생센터제2처리장0312836.0
315832016/06/02중랑물재생센터제2처리장0242758.0