Variables and Datasets
The process of statistical research is driven from dataset which consists of variables. Variables are dynamic elements which tend to change the value. These dataset contain data captured through requisite symbols and also highlight the values assumed. In order to ascertain the underlying the underlying relationship and association, it is imperative to explore the variables provided. These relationships and other summary of variables can also be found using computers as a suitable aid mechanism (Flick, 2015).
The following data can be used as an example to illustrate the same.
The summary statistics for the above data can be found with the aid of Data Analysis option that excel presents.
Also, it is possible to explore the association between the given variables through the use of scatter diagrams as indicated below.
It is apparent that a weak positive association does tend to exist between the income and deduction. This is apparent from the R2 and dispersion of the scatter plot points from the best fir line. The model does not indicate a good fit (Hillier , 2016).
The use of computers is highly recommended when the amount of data to be processed is quite high. Various software are available that can potentially summarise the data and also run various inferential techniques so as to derive meaningful conclusions about the population (Hair et. al., 2015).
- The pivot tables in order to represent the relationship between age and liking for the product is highlighted below:
Sample size of old people who would say yes for the product =51
Proportion of old people who would say yes for the product =0.8361
Similarly,
Sample size of young people who would say yes for the product = 25
Proportion of young people who would say yes for the product =0.6410
- It is apparent from the above results that there is a greater acceptability of the product amongst consumers who are old. The acceptability of the product amongst consumers who are young is considerably less but still substantial.
- The difference between the sample proportions for liking is calculated below:
Now,
Difference between the sample proportions
- The summary statistics for the sample is highlighted below:
For Old people
Sample size
Sample average
Sample standard deviation
For young people
Sample size
Sample average
Sample standard deviation
- The average money that the old customers would pay would be higher than the corresponding amount that young customers would pay. Also, the deviation trends seem to be lesser for old customers as compared to young customers.
- Difference between the sample means
Data sample
- Scatter Plot
- There is a strong positive association between number of bets and profit and it seems that higher number of bets does tend to lead to higher profits being earned. The coefficient of determination is quite high indicating significant relationship between the given variables (Eriksson and Kovalainen, 2015).
- Profit of casino =?
Number of bets x = 1000
Regression equation from the scatter plot
Therefore, the profit of casino would be 925.975 units.
- Pivot tables from section 2
- Null and alternative hypothesis
- The p value for the for the inputs (Sample size and proportions) are computed and is shown below:
Therefore, the p value comes out to be 0.0259.
- It is apparent from the above that p value is lower than level of significance and therefore, sufficient evidences present to reject the null hypothesis and accept the alternative hypothesis (Flick, 2015).
- Conclusion can be made that sample proportions are not equal.
- Pivot tables from section 3
- Null and alternative hypothesis
- The p value for the for the inputs (Sample size and proportions) are computed and is shown below:
The p value from the above comes out to be 0.0384.
- Assuming level of significance = 5%
- It can be seen that p value is lower than level of significance and therefore, sufficient evidences present to reject the null hypothesis and to accept the alternative hypothesis (Eriksson and Kovalainen, 2015).
- Conclusion can be made that sample means are not equal.
- The numerical summary in the form of pivot table of the sample is highlighted below:
Sample size = 214
Number of people who will support the proposed change and will say yes = 128
Requisite proportion =
- 90% confidence interval for proportion
Standard error
The z value for 90% confidence interval = 1.645
Hence,
Therefore, the 90% confidence interva [0.543 0.653].
- An example of a back to back histogram is indicated below.
- The given histogram provides a monthly comparison of unemployment rates that prevail in two cities namely Texas and California. The given variable is quantitative in nature considering this variable is captured through numerical data.
- The relationship between the two variables is weak considering the changes in the unemployment witnessed in the two cities. This is because the unemployment may be the result of domestic factor or international factors. If the unemployment is on account of international factors, then the correlation would be higher but it would not be the case when localised or regional factors are in play (Hillier, 2016).
- The information indicated in the histogram and above discussion is relevant for business decision making. This is imperative from the fact that there are some months witnessed in California where the unemployment was in excess of 10% which would auger well for a employer to set up a business consider that the requisite skills sets are available with the labour force but there is a lack of opportunity (Hair et. al., 2015).
- From the discussion in section 1, the above information can be used to draw association between the two cities unemployment and therefore make prediction about the future unemployment and the implications of the same on the business especially if it is located In one of the cities mentioned (Eriksson and Kovalainen, 2015).
Average of the estimates = 0.14
Standard deviation = 0.088
- If there is 1000 estimates ranked from lowest to highest then,
- Requisite table
Which sample |
Rank lowest to highest |
Estimate X |
Zscore=(X-mean)/stdev |
|
Lowest estimate |
||||
Estimate from allocated sample |
||||
Highest estimate |
- Using section 3
- The z score
Average of the estimates = 0.408
Standard deviation = 0.26
- If there is 1000 estimates ranked from lowest to highest then,
.79
- Requisite table
Which sample |
Rank lowest to highest |
Estimate X |
Zscore=(X-mean)/stdev |
|
Lowest estimate |
||||
Estimate from allocated sample |
||||
Highest estimate |
- Using section 4
- The z score
Average of the estimates = 0.952
Standard deviation = 0.237
Therefore,
- If there is 1000 estimates ranked from lowest to highest then,
- Requisite table
Which sample |
Rank lowest to highest |
Estimate X |
Zscore=(X-mean)/stdev |
|
Lowest estimate |
1 |
|||
Estimate from allocated sample |
488 |
|||
Highest estimate |
1000 |
- It is apparent that actual rank of the allocated sample is 488 and the estimated rank in part (A) and part (B) are significantly greater than the actual rank of the sample. However, for part (C), the estimated rank of allocated sample comes out as 486 which is quite close to the actual rank of 488.
- If the sampling distribution is the same, than comparison can be drawn even between different datasets since the underlying properties tend to converge. This has been exhibited here. Further, the sampling distribution plays a key role in hypothesis testing and determination of the resultant p value. For the given case, the p value determination has been done considering the normal distribution and if the distribution varies, then the underlying process for determination of p value changes along with the value itself (Flick, 2015).
References
Eriksson, P. and Kovalainen, A. (2015) Quantitative methods in business research 3rd ed. London: Sage Publications.
Flick, U. (2015) Introducing research methodology: A beginner’s guide to doing a research project. 4th ed. New York: Sage Publications.
Hair, J. F., Wolfinbarger, M., Money, A. H., Samouel, P., and Page, M. J. (2015) Essentials of business research methods. 2nd ed. New York: Routledge.
Hillier, F. (2016) Introduction to Operations Research 6th ed. New York: McGraw Hill Publications