Consumer Characteristics Data
The descriptive statistics for the income and amount charged on annual credit can be given as below.
Descriptive Statistics |
||
Income |
Annual Credit Card Charges |
|
Mean |
43.48 |
3963.86 |
Standard Error |
2.06 |
132.02 |
Median |
42 |
4090 |
Mode |
54 |
3890 |
Standard Deviation |
14.55 |
933.55 |
Sample Variance |
211.72 |
871508.74 |
Kurtosis |
-1.25 |
-0.74 |
Skewness |
0.10 |
-0.13 |
Range |
46 |
3814 |
Minimum |
21 |
1864 |
Maximum |
67 |
5678 |
Sum |
2174 |
198193 |
Count |
50 |
50 |
Table: Descriptive statistics for Annual Credit Card Charges and Income
As per the data given, the income is 43.48 thousand dollars on average of customers in which maximum and minimum is 67 and 21 thousand dollars. The standard deviation came out to be 14.55 thousand dollars which states that there have been considerable variations in the income. Likewise, the income’s mean, median and mode are no equal stating that the income of the consumers on the sample of 50 is not normally distributed.
Moreover, the amount charged on credit cards is 3936.83 dollars on average of customers in which maximum and minimum is 1864 and 5678 dollars which shows the range is wide and varies in accordance with different customers. The standard deviation came out to be 933.55 dollars which depicts significant variations in the amount charged. Furthermore, the annual credit charged by clients from customers is not normally distributed because mean, median and mode are not equal.
The frequency distribution for household size can be given as:
House Hold size |
Frequency |
1 |
5 |
2 |
15 |
3 |
8 |
4 |
9 |
5 |
5 |
6 |
5 |
7 |
3 |
Table: Frequency Distribution for Household Size
The table shows that the maximum household size of 2 is there in prevalent in sample of 50. However, the least sample is of 3 on the household size 7 which depicts that the consumer prefer small household size
Figure: Household Size Distribution
In addition, it can interpreted that in this sample size of 50, high income takers are been charged more amount of annual credit but prefer lower household size.
The regression has been carried out on 2 models:
Independent Variable = Income in thousand dollars
Dependent Variable = Annual Credit Card Charges from the customers
Regression Statistics |
|
Multiple R |
0.6308 |
R Square |
0.3979 |
Adjusted R Square |
0.3853 |
Standard Error |
731.9025 |
Observations |
50 |
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
Lower 95.0% |
Upper 95.0% |
|
Intercept |
2204.241 |
329.134 |
6.697 |
0.000 |
1542.472 |
2866.009 |
1542.472 |
2866.009 |
Income |
40.470 |
7.186 |
5.632 |
0.000 |
26.02178 |
54.91748 |
26.02178 |
54.91748 |
Table: Regression Model 1
The table depicts the regression model such that the annual credit card charged from the customers is affected by the income of the customers comes out to be as
èAnnual Credit Charges = 2204.2 + 40.47*Income (thousand dollars)
The regression model depicts the relationship between annual credit card charges as well as income of the customers. However, any changes in income results in 40.47 per cents change in rate in annual credit card charged from the customers by the clients. Moreover, the p value depicts to be 0.00 which is less than p = 0.05 saying the results will be significant 95 times when repeated for 100 times. The scatter diagram of annual credit and income has been given below.
Frequency Distribution for Household Size
Figure: Scatter Diagram of income and annual credit card charges
Independent Variable = Household Size
Dependent Variable = Annual Credit Card Charges from the customers
Regression Statistics |
|
Multiple R |
0.7529 |
R Square |
0.5668 |
Adjusted R Square |
0.5578 |
Standard Error |
620.8163 |
Observations |
50 |
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
Lower 95.0% |
Upper 95.0% |
|
Intercept |
2581.644 |
195.270 |
13.221 |
0.000 |
2189.028 |
2974.26 |
2189.028 |
2974.26 |
Household Size |
404.157 |
51.000 |
7.925 |
0.000 |
301.6148 |
506.6986 |
301.6148 |
506.6986 |
Table: Regression Model 2
The table depicts the regression model such that the annual credit card charged from the customers is affected by the household size of the customers comes out to be as
èAnnual Credit Card Charges = 2581.6 + 40.15*Household size
The regression model depicts the relationship between annual credit card charges as well as household size of the customers. However, any changes in household size results in 40.15 per cents change in rate in annual credit card charged from the customers by the clients. Moreover, the p value depicts to be 0.00 which is less than p = 0.05 saying the results will be significant 95 times when repeated for 100 times. The scatter diagram of annual credit and household size has been given below.
Figure: Scatter Diagram of household size and annual credit card charges
Moreover, R square as 56.68% illustrates that household size is a better forecaster for annual credit card charges.
Multiple Regression Model
Independent Variable – Income and Household Size
Dependent Variable – Annual Credit Card Charges from the customers
Regression Statistics |
|
Multiple R |
0.9085 |
R Square |
0.8254 |
Adjusted R Square |
0.8179 |
Standard Error |
398.3249 |
Observations |
50 |
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
Lower 95.0% |
Upper 95.0% |
|
Intercept |
1305.034 |
197.771 |
6.599 |
0.000 |
907.17 |
1702.898 |
907.17 |
1702.898 |
Income ($1000s) |
33.122 |
3.970 |
8.343 |
0.000 |
25.13487 |
41.10904 |
25.13487 |
41.10904 |
Household Size |
356.340 |
33.220 |
10.727 |
0.000 |
289.5094 |
423.171 |
289.5094 |
423.171 |
The table depicts the regression model such that the annual credit card charged from the customers is affected by the household size as well as income of the customers comes out to be as
èAnnual Credit Card Charges = 1305.03 + 33.12*Income (thousand dollars) + 356.34*Household size
The regression model depicts the relationship between annual credit card charges, income as well as household size of the customers. Nevertheless, any changes in household size results in 356.34 per cents change in rate in annual credit card charged from the customers by the clients and somewhat changes in income results in 33.12 per cents change in rate in annual credit card charged from the customers by the clients. Moreover, the p value depicts to be 0.00 for both the independent variables which is less than p = 0.05 saying the results will be significant 95 times when repeated for 100 times with same independent variables for same sample size. Hence, the R square is 81.79% which is a good fit for the model based on the two independent variables.
Multiple Regression Model
èAnnual Credit Card Charges = 1305.03 + 33.12*Income (thousand dollars) + 356.34*Household size
Regression Analysis
When household size is 3 and income as $40,000, then the predicted regression would be
èAnnual Credit Card Charges = 1305.03 + 356.34*3 + 33.12*40 (thousand dollars) = 1305.03 + 1069.2 + 1342.8 = 3699.03 ≈ 3699
The independent variable helps the regression model to predict the changes that it may cause in the dependent variable and the relationship that it maintains with dependent variable may differ (whether positive or negative). The other independent variables that can help in the model are the number of credit card users and the amount spend using credit cards annually.
The preliminary observation from the depression scores can be given as:
Table: Depression Scores of individuals across different countries with good health condition
As per the data on good health condition, it can be seen that there is variation in the depression scores for the three different countries such that Florida ranges from depression score of 2 to 9 which is less when compared to other two countries. North Carolina has scores from 2 to 10 and New York has scores from 4 to 13. The maximum scores has been observes in New York. Moreover, the scores elaborate on the good health condition that persists more in Florida than in New York.
Table: Depression Scores of individuals across different countries with chronic health condition
The chronic health condition is more in New York than in North Carolina or in Florida. However, New York has likely to fluctuate around the maximum depression score whereas North Carolina has shown a decreasing trend. Florida, on the other hand, has shown a decline since the 18th score with minimum fluctuations. The same can be said by the descriptive statistics calculated for the sample.
Table: Descriptive Statistics for Depression Scores
The descriptive statistics state the same through frequency distribution distributed. The mean for good health condition ranges from New York > North Carolina > Florida and for chronic health condition ranges from New York > Florida > North Carolina.
- ANOVA for geographical condition and depression scores on good health condition
èH0: Mean depression scores on good health condition for three countries are equal
èH1: Mean depression scores on good health condition for three countries are not equal
Source of Variation |
SS |
df |
MS |
F |
P-value |
F crit |
Between Groups |
61.033 |
2 |
30.517 |
5.241 |
0.008 |
3.159 |
Within Groups |
331.900 |
57 |
5.823 |
|||
Total |
392.933 |
59 |
As per the single factor ANOVA, the F statistics for good health condition is 5.241 at degrees of freedom (2, 57) states that the result are valid statistically. On the other hand, p value = 0.008 which is further less than p = 0.05 at 95% level. As a result, we reject the null hypothesis because all the mean values on the good health condition are not equal across countries. Hence, we accept alternate hypothesis as the results are significant and valid.
- ANOVA for geographical condition and depression scores on good health condition
èH0: Mean depression scores on chronic health condition for three countries are equal
èH1: Mean depression scores on chronic health condition for three countries are not equal
Source of Variation |
SS |
df |
MS |
F |
P-value |
F crit |
Between Groups |
17.03 |
2 |
8.517 |
0.714 |
0.494 |
3.159 |
Within Groups |
679.70 |
57 |
11.925 |
|||
Total |
696.73 |
59 |
As per the single factor ANOVA, the F statistics for chronic health condition is 0.714 at degrees of freedom (2, 57) states that the result are not valid statistically. On the other hand, p value = 0.494 which is more than p = 0.05 at 95% level. As a result, we accept the null hypothesis because all the mean values on the chronic health condition are same across countries. Hence, we accept null hypothesis as the results are not significantly valid.
The inferences that can be made from the ANOVA done on good and chronic health condition can be said as that the good health condition across different geographical areas (i.e. New York, Florida and North Carolina) changes across different areas for the first phase on long-term study whereas when compared with depression scores of chronic health condition the analysis of variance shows that the means scores have non-significant results across different geographical areas (i.e. New York, Florida and North Carolina) on the second phase of the study.