Dataset statistics
Number of variables | 11 |
---|---|
Number of observations | 182 |
Missing cells | 0 |
Missing cells (%) | 0.0% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 9.5 KiB |
Average record size in memory | 53.7 B |
Variable types
Numeric | 3 |
---|---|
Categorical | 8 |
fare is highly correlated with sibsp and 1 other fields | High correlation |
survived is highly correlated with male | High correlation |
sibsp is highly correlated with fare | High correlation |
male is highly correlated with survived | High correlation |
parch is highly correlated with fare | High correlation |
df_index has unique values | Unique |
fare has 2 (1.1%) zeros | Zeros |
Reproduction
Analysis started | 2023-01-09 05:38:22.937071 |
---|---|
Analysis finished | 2023-01-09 05:38:30.570414 |
Duration | 7.63 seconds |
Software version | pandas-profiling v3.4.0 |
Download configuration | config.json |
Distinct | 182 |
---|---|
Distinct (%) | 100.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 455 |
Minimum | 1 |
---|---|
Maximum | 889 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 1.5 KiB |
Quantile statistics
Minimum | 1 |
---|---|
5-th percentile | 54.4 |
Q1 | 262.25 |
median | 458 |
Q3 | 677 |
95-th percentile | 834.4 |
Maximum | 889 |
Range | 888 |
Interquartile range (IQR) | 414.75 |
Descriptive statistics
Standard deviation | 247.5847307 |
---|---|
Coefficient of variation (CV) | 0.5441422654 |
Kurtosis | -1.098496479 |
Mean | 455 |
Median Absolute Deviation (MAD) | 205 |
Skewness | -0.06092618352 |
Sum | 82810 |
Variance | 61298.1989 |
Monotonicity | Strictly increasing |
Value | Count | Frequency (%) |
1 | 1 | 0.5% |
571 | 1 | 0.5% |
577 | 1 | 0.5% |
581 | 1 | 0.5% |
583 | 1 | 0.5% |
585 | 1 | 0.5% |
587 | 1 | 0.5% |
591 | 1 | 0.5% |
599 | 1 | 0.5% |
609 | 1 | 0.5% |
Other values (172) | 172 |
Value | Count | Frequency (%) |
1 | 1 | |
3 | 1 | |
6 | 1 | |
10 | 1 | |
11 | 1 | |
21 | 1 | |
23 | 1 | |
27 | 1 | |
52 | 1 | |
54 | 1 |
Value | Count | Frequency (%) |
889 | 1 | |
887 | 1 | |
879 | 1 | |
872 | 1 | |
871 | 1 | |
867 | 1 | |
862 | 1 | |
857 | 1 | |
853 | 1 | |
835 | 1 |
Distinct | 2 |
---|---|
Distinct (%) | 1.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 10.4 KiB |
1 | |
---|---|
0 |
Common Values
Value | Count | Frequency (%) |
1 | 123 | |
0 | 59 |
Length
Category Frequency Plot
Value | Count | Frequency (%) |
1 | 123 | |
0 | 59 |
Most occurring characters
Value | Count | Frequency (%) |
1 | 123 | |
0 | 59 |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 182 |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
1 | 123 | |
0 | 59 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 182 |
Most frequent character per script
Common
Value | Count | Frequency (%) |
1 | 123 | |
0 | 59 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 182 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
1 | 123 | |
0 | 59 |
age
Real number (ℝ≥0)
Distinct | 63 |
---|---|
Distinct (%) | 34.6% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 35.62318681 |
Minimum | 0.92 |
---|---|
Maximum | 80 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 1.5 KiB |
Quantile statistics
Minimum | 0.92 |
---|---|
5-th percentile | 6.25 |
Q1 | 24 |
median | 36 |
Q3 | 47.75 |
95-th percentile | 60.95 |
Maximum | 80 |
Range | 79.08 |
Interquartile range (IQR) | 23.75 |
Descriptive statistics
Standard deviation | 15.67161536 |
---|---|
Coefficient of variation (CV) | 0.4399273832 |
Kurtosis | -0.2309427736 |
Mean | 35.62318681 |
Median Absolute Deviation (MAD) | 12 |
Skewness | 0.01841894051 |
Sum | 6483.42 |
Variance | 245.5995279 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
36 | 11 | 6.0% |
24 | 9 | 4.9% |
19 | 6 | 3.3% |
35 | 6 | 3.3% |
31 | 5 | 2.7% |
29 | 5 | 2.7% |
49 | 5 | 2.7% |
47 | 5 | 2.7% |
27 | 5 | 2.7% |
58 | 5 | 2.7% |
Other values (53) | 120 |
Value | Count | Frequency (%) |
0.92 | 1 | 0.5% |
1 | 1 | 0.5% |
2 | 3 | |
3 | 1 | 0.5% |
4 | 3 | |
6 | 1 | 0.5% |
11 | 1 | 0.5% |
14 | 1 | 0.5% |
15 | 1 | 0.5% |
16 | 3 |
Value | Count | Frequency (%) |
80 | 1 | 0.5% |
71 | 1 | 0.5% |
70 | 1 | 0.5% |
65 | 2 | 1.1% |
64 | 1 | 0.5% |
63 | 1 | 0.5% |
62 | 1 | 0.5% |
61 | 2 | 1.1% |
60 | 2 | 1.1% |
58 | 5 |
Distinct | 4 |
---|---|
Distinct (%) | 2.2% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 10.4 KiB |
0 | |
---|---|
1 | |
2 | 6 |
3 | 3 |
Common Values
Value | Count | Frequency (%) |
0 | 109 | |
1 | 64 | |
2 | 6 | 3.3% |
3 | 3 | 1.6% |
Length
Category Frequency Plot
Value | Count | Frequency (%) |
0 | 109 | |
1 | 64 | |
2 | 6 | 3.3% |
3 | 3 | 1.6% |
Most occurring characters
Value | Count | Frequency (%) |
0 | 109 | |
1 | 64 | |
2 | 6 | 3.3% |
3 | 3 | 1.6% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 182 |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
0 | 109 | |
1 | 64 | |
2 | 6 | 3.3% |
3 | 3 | 1.6% |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 182 |
Most frequent character per script
Common
Value | Count | Frequency (%) |
0 | 109 | |
1 | 64 | |
2 | 6 | 3.3% |
3 | 3 | 1.6% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 182 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
0 | 109 | |
1 | 64 | |
2 | 6 | 3.3% |
3 | 3 | 1.6% |
Distinct | 4 |
---|---|
Distinct (%) | 2.2% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 10.4 KiB |
0 | |
---|---|
1 | |
2 | |
4 | 1 |
Common Values
Value | Count | Frequency (%) |
0 | 121 | |
1 | 37 | 20.3% |
2 | 23 | 12.6% |
4 | 1 | 0.5% |
Length
Category Frequency Plot
Value | Count | Frequency (%) |
0 | 121 | |
1 | 37 | 20.3% |
2 | 23 | 12.6% |
4 | 1 | 0.5% |
Most occurring characters
Value | Count | Frequency (%) |
0 | 121 | |
1 | 37 | 20.3% |
2 | 23 | 12.6% |
4 | 1 | 0.5% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 182 |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
0 | 121 | |
1 | 37 | 20.3% |
2 | 23 | 12.6% |
4 | 1 | 0.5% |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 182 |
Most frequent character per script
Common
Value | Count | Frequency (%) |
0 | 121 | |
1 | 37 | 20.3% |
2 | 23 | 12.6% |
4 | 1 | 0.5% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 182 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
0 | 121 | |
1 | 37 | 20.3% |
2 | 23 | 12.6% |
4 | 1 | 0.5% |
Distinct | 93 |
---|---|
Distinct (%) | 51.1% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 78.91973516 |
Minimum | 0 |
---|---|
Maximum | 512.3292 |
Zeros | 2 |
Zeros (%) | 1.1% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 1.5 KiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 10.5 |
Q1 | 29.7 |
median | 57 |
Q3 | 90 |
95-th percentile | 246.52101 |
Maximum | 512.3292 |
Range | 512.3292 |
Interquartile range (IQR) | 60.3 |
Descriptive statistics
Standard deviation | 76.49077401 |
---|---|
Coefficient of variation (CV) | 0.9692223859 |
Kurtosis | 10.69069789 |
Mean | 78.91973516 |
Median Absolute Deviation (MAD) | 29.975 |
Skewness | 2.707368315 |
Sum | 14363.3918 |
Variance | 5850.838509 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
26.55 | 7 | 3.8% |
53.1 | 5 | 2.7% |
90 | 4 | 2.2% |
263 | 4 | 2.2% |
10.5 | 4 | 2.2% |
120 | 4 | 2.2% |
30 | 4 | 2.2% |
13 | 4 | 2.2% |
153.4625 | 3 | 1.6% |
113.275 | 3 | 1.6% |
Other values (83) | 140 |
Value | Count | Frequency (%) |
0 | 2 | |
5 | 1 | 0.5% |
7.65 | 3 | |
8.05 | 1 | 0.5% |
10.4625 | 2 | |
10.5 | 4 | |
12.475 | 2 | |
12.875 | 1 | 0.5% |
13 | 4 | |
13.7917 | 1 | 0.5% |
Value | Count | Frequency (%) |
512.3292 | 2 | |
263 | 4 | |
262.375 | 2 | |
247.5208 | 2 | |
227.525 | 2 | |
211.5 | 1 | 0.5% |
211.3375 | 3 | |
164.8667 | 1 | 0.5% |
153.4625 | 3 | |
151.55 | 3 |
Distinct | 2 |
---|---|
Distinct (%) | 1.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 10.4 KiB |
1 | |
---|---|
0 |
Common Values
Value | Count | Frequency (%) |
1 | 94 | |
0 | 88 |
Length
Category Frequency Plot
Value | Count | Frequency (%) |
1 | 94 | |
0 | 88 |
Most occurring characters
Value | Count | Frequency (%) |
1 | 94 | |
0 | 88 |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 182 |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
1 | 94 | |
0 | 88 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 182 |
Most frequent character per script
Common
Value | Count | Frequency (%) |
1 | 94 | |
0 | 88 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 182 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
1 | 94 | |
0 | 88 |
Q
Categorical
Distinct | 2 |
---|---|
Distinct (%) | 1.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 10.4 KiB |
0 | |
---|---|
1 | 2 |
Common Values
Value | Count | Frequency (%) |
0 | 180 | |
1 | 2 | 1.1% |
Length
Category Frequency Plot
Value | Count | Frequency (%) |
0 | 180 | |
1 | 2 | 1.1% |
Most occurring characters
Value | Count | Frequency (%) |
0 | 180 | |
1 | 2 | 1.1% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 182 |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
0 | 180 | |
1 | 2 | 1.1% |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 182 |
Most frequent character per script
Common
Value | Count | Frequency (%) |
0 | 180 | |
1 | 2 | 1.1% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 182 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
0 | 180 | |
1 | 2 | 1.1% |
S
Categorical
Distinct | 2 |
---|---|
Distinct (%) | 1.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 10.4 KiB |
1 | |
---|---|
0 |
Common Values
Value | Count | Frequency (%) |
1 | 115 | |
0 | 67 |
Length
Category Frequency Plot
Value | Count | Frequency (%) |
1 | 115 | |
0 | 67 |
Most occurring characters
Value | Count | Frequency (%) |
1 | 115 | |
0 | 67 |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 182 |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
1 | 115 | |
0 | 67 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 182 |
Most frequent character per script
Common
Value | Count | Frequency (%) |
1 | 115 | |
0 | 67 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 182 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
1 | 115 | |
0 | 67 |
2
Categorical
Distinct | 2 |
---|---|
Distinct (%) | 1.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 10.4 KiB |
0 | |
---|---|
1 | 15 |
Common Values
Value | Count | Frequency (%) |
0 | 167 | |
1 | 15 | 8.2% |
Length
Category Frequency Plot
Value | Count | Frequency (%) |
0 | 167 | |
1 | 15 | 8.2% |
Most occurring characters
Value | Count | Frequency (%) |
0 | 167 | |
1 | 15 | 8.2% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 182 |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
0 | 167 | |
1 | 15 | 8.2% |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 182 |
Most frequent character per script
Common
Value | Count | Frequency (%) |
0 | 167 | |
1 | 15 | 8.2% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 182 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
0 | 167 | |
1 | 15 | 8.2% |
3
Categorical
Distinct | 2 |
---|---|
Distinct (%) | 1.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 10.4 KiB |
0 | |
---|---|
1 | 10 |
Common Values
Value | Count | Frequency (%) |
0 | 172 | |
1 | 10 | 5.5% |
Length
Category Frequency Plot
Value | Count | Frequency (%) |
0 | 172 | |
1 | 10 | 5.5% |
Most occurring characters
Value | Count | Frequency (%) |
0 | 172 | |
1 | 10 | 5.5% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 182 |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
0 | 172 | |
1 | 10 | 5.5% |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 182 |
Most frequent character per script
Common
Value | Count | Frequency (%) |
0 | 172 | |
1 | 10 | 5.5% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 182 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
0 | 172 | |
1 | 10 | 5.5% |
Auto
The auto setting is an easily interpretable pairwise column metric of the following mapping: vartype-vartype : method, categorical-categorical : Cramer's V, numerical-categorical : Cramer's V (using a discretized numerical column), numerical-numerical : Spearman's ρ. This configuration uses the best suitable for each pair of columns.Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
df_index | survived | age | sibsp | parch | fare | male | Q | S | 2 | 3 | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 1 | 38.0 | 1 | 0 | 71.2833 | 0 | 0 | 0 | 0 | 0 |
1 | 3 | 1 | 35.0 | 1 | 0 | 53.1000 | 0 | 0 | 1 | 0 | 0 |
2 | 6 | 0 | 54.0 | 0 | 0 | 51.8625 | 1 | 0 | 1 | 0 | 0 |
3 | 10 | 1 | 4.0 | 1 | 1 | 16.7000 | 0 | 0 | 1 | 0 | 1 |
4 | 11 | 1 | 58.0 | 0 | 0 | 26.5500 | 0 | 0 | 1 | 0 | 0 |
5 | 21 | 1 | 34.0 | 0 | 0 | 13.0000 | 1 | 0 | 1 | 1 | 0 |
6 | 23 | 1 | 28.0 | 0 | 0 | 35.5000 | 1 | 0 | 1 | 0 | 0 |
7 | 27 | 0 | 19.0 | 3 | 2 | 263.0000 | 1 | 0 | 1 | 0 | 0 |
8 | 52 | 1 | 49.0 | 1 | 0 | 76.7292 | 0 | 0 | 0 | 0 | 0 |
9 | 54 | 0 | 65.0 | 0 | 1 | 61.9792 | 1 | 0 | 0 | 0 | 0 |
Last rows
df_index | survived | age | sibsp | parch | fare | male | Q | S | 2 | 3 | |
---|---|---|---|---|---|---|---|---|---|---|---|
172 | 835 | 1 | 39.0 | 1 | 1 | 83.1583 | 0 | 0 | 0 | 0 | 0 |
173 | 853 | 1 | 16.0 | 0 | 1 | 39.4000 | 0 | 0 | 1 | 0 | 0 |
174 | 857 | 1 | 51.0 | 0 | 0 | 26.5500 | 1 | 0 | 1 | 0 | 0 |
175 | 862 | 1 | 48.0 | 0 | 0 | 25.9292 | 0 | 0 | 1 | 0 | 0 |
176 | 867 | 0 | 31.0 | 0 | 0 | 50.4958 | 1 | 0 | 1 | 0 | 0 |
177 | 871 | 1 | 47.0 | 1 | 1 | 52.5542 | 0 | 0 | 1 | 0 | 0 |
178 | 872 | 0 | 33.0 | 0 | 0 | 5.0000 | 1 | 0 | 1 | 0 | 0 |
179 | 879 | 1 | 56.0 | 0 | 1 | 83.1583 | 0 | 0 | 0 | 0 | 0 |
180 | 887 | 1 | 19.0 | 0 | 0 | 30.0000 | 0 | 0 | 1 | 0 | 0 |
181 | 889 | 1 | 26.0 | 0 | 0 | 30.0000 | 1 | 0 | 0 | 0 | 0 |