Identification of variables
The Data used here is cross sectional and not a time series data as the data is collected on different subjects such as household income, household size, car ownership, distance travelled, education and number of vehicles owned or leased at the same point of time and not across time. Hence, the data can be termed as a cross sectional data.
Two kinds of qualitative variable from the data are “OWNCAR” and “EDUC”. Both the qualitative variables selected here are nominal as in case of these two variables, the values are used to label the qualities. There is no particular order involved in which these labels can be made.
Two kinds of quantitative variable from the data are “HHINCOME” and “DIS_TRAVELLED”. Here, both the quantitative variables considered are continuous in nature as both the variables can take any values.
In order to represent the data, the following table has been prepared. It can be seen clearly from the table that the number of people from each group are equally distributed. There are 25% people belonging to each education group in the sample collected. Thus, the design is a good random sampling design.
Education |
Frequency |
Percentage |
1 |
254 |
25% |
2 |
249 |
25% |
3 |
249 |
25% |
4 |
248 |
25% |
Grand Total |
1000 |
100% |
EDUC |
Average of AGE |
1 |
47.08 |
2 |
48 |
3 |
47.71 |
4 |
48.53 |
Grand Total |
47.83 |
Table 5.1: Percentage of Grand Total
Percentage of AGE |
Age Group |
|||||
Number of Cars |
15-24 |
25-34 |
35-44 |
45-54 |
55-65 |
Grand Total |
1 |
9.40% |
5.60% |
5.80% |
6.60% |
27.10% |
54.50% |
2 |
5.70% |
4.40% |
3.70% |
6.10% |
19.00% |
38.90% |
3 |
0.90% |
0.50% |
0.40% |
0.60% |
3.20% |
5.60% |
4 |
0.00% |
0.10% |
0.20% |
0.20% |
0.50% |
1.00% |
Grand Total |
16.00% |
10.60% |
10.10% |
13.50% |
49.80% |
100.00% |
Table 5.2: Percentage of Column Total
Percentage of AGE |
Age Group |
|||||
Number of Cars |
15-24 |
25-34 |
35-44 |
45-54 |
55-65 |
Grand Total |
1 |
58.75% |
52.83% |
57.43% |
48.89% |
54.42% |
54.50% |
2 |
35.63% |
41.51% |
36.63% |
45.19% |
38.15% |
38.90% |
3 |
5.63% |
4.72% |
3.96% |
4.44% |
6.43% |
5.60% |
4 |
0.00% |
0.94% |
1.98% |
1.48% |
1.00% |
1.00% |
Grand Total |
100.00% |
100.00% |
100.00% |
100.00% |
100.00% |
100.00% |
Table 5.3: Percentage of Row Total
Percentage of AGE |
Age Group |
|||||
Number of Cars |
15-24 |
25-34 |
35-44 |
45-54 |
55-65 |
Grand Total |
1 |
17.25% |
10.28% |
10.64% |
12.11% |
49.72% |
100.00% |
2 |
14.65% |
11.31% |
9.51% |
15.68% |
48.84% |
100.00% |
3 |
16.07% |
8.93% |
7.14% |
10.71% |
57.14% |
100.00% |
4 |
0.00% |
10.00% |
20.00% |
20.00% |
50.00% |
100.00% |
Grand Total |
16.00% |
10.60% |
10.10% |
13.50% |
49.80% |
100.00% |
- For each of the number of cars, the most dominant age group is 55 – 65 years. This can be obtained from table 5.3. From the table it can be seen that the highest proportion of people owning cars belong to the age group of 55 – 65 years, irrespective of the number of cars.
- For each level of age group, the most dominant group in terms of the number of cars is one car. It can be seen clearly from table 5.2 that irrespective of the age group, the proportion of people owning cars is higher for 1 car. Hence, this is termed as the most dominant group.
- The most dominant group considering both the number of cars as well as the age group is owing one car in the age group of 55 – 65 years. This result is obtained from table 5.1. It can be seen from the table that this group has the highest proportion of people.
Income |
||
Own Car |
Lease Car |
|
Mean |
53812.39 |
55335.06 |
Median |
51351.5 |
51830.5 |
Standard Deviation |
17962.39 |
19473.82 |
First Quartile |
41904.5 |
43172 |
Third Quartile |
61627 |
66938.75 |
Interquartile Range |
19722.5 |
23766.75 |
Range |
79452 |
79553 |
Minimum |
20161 |
20176 |
Maximum |
99613 |
99729 |
Coefficient of Variation |
33.38 |
35.19 |
Income |
||
Low Education Level |
High Education Level |
|
Mean |
54616.06 |
54690.19718 |
Median |
52036 |
50781 |
Standard Deviation |
18636.33 |
19018.233 |
First Quartile |
42446.5 |
42307 |
Third Quartile |
65563.5 |
65463 |
Interquartile Range |
23117 |
23156 |
Range |
79553 |
79452 |
Minimum |
20176 |
20161 |
Maximum |
99729 |
99613 |
Coefficient of Variation |
34.12 |
34.77 |
Figure 8.1
Figure 8.2
It can be seen from table 7.2 that, the mean of the income of both the higher education group and the lower education group are close to the median income of both the groups respectively. Further, it can be seen that the standard deviation of the incomes of both the groups are quite less than the mean of the incomes for both the groups. The coefficient of variation has also been considerable less at 34 % for both the education groups. This indicates that the income of both the education groups are close to the average income of the respective groups.
Further, from the histograms given in figures 8.1 and 8.2, denoting the shape of the distribution of income for each of the education groups, it can be seen that the incomes of the households are distributed symmetrically for both the groups.
The distribution of the wealth of the households are distributed fairly as the income of the households have not shown much difference irrespective of their education levels.
Average of DIS_TRAVELLED |
Column Labels |
||
Row Labels |
1-5 |
6-11 |
Grand Total |
1-2 |
15303.17 |
17295.28 |
15966.46 |
3-4 |
30773.07 |
30440.14 |
30620.10 |
Grand Total |
16724.60 |
19227.23 |
17593.01 |
Average of DIS_TRAVELLED |
Column Labels |
||
Row Labels |
1-5 |
6-11 |
Grand Total |
1-2 |
15303.17 |
17295.28 |
15966.46 |
3-4 |
30773.07 |
30440.14 |
30620.10 |
Grand Total |
16724.60 |
19227.23 |
17593.01 |
It can be seen from the table that the average distance travelled is higher in households where the household size is higher and the distance travelled is less in households with smaller household size. Not much difference can be seen in the distance travelled by the households according to household income. Thus, it can be said that is no relationship between distance travelled and household income but there is a significant relationship between the distance travelled and the household size. The same is illustrated in figure 12.1.
- The total number of high income households have been found to be 173 and low income households have been found to be 131.
- The number of households that has travelled more than 20,000 km is 378 and the number of households that has travelled less than 10,000 km is 171.
- The number of high income households that has travelled more than 20,000 km is 97.
- The number of low income households that has travelled less than 10,000 km is 31.
Therefore, the two events, High-income household and distance travelled more than 20,000 km are not mutually exclusive.
Therefore, the two events, High-income household and distance travelled more than 20,000 km are not independent.
400 households are in rural areas
600 households are in suburban areas
Probability of households living in rural areas
Number of samples selected (n) = 20.
Let Y be the number of households living in the rural areas.
- Y follows a Binomial Distribution with parameters n = 20 and p = 0.04.
The probability distribution of Y is given by
- The probability that there are 8 households living in the rural area is given by:
- The probability that at least 13 households are living in the rural area is given by: