Question 1: Probability Distributions and Performance Evaluation
Answer:
- (a) Average time to complete the task for Bill was Average time to complete the task for Ben was
(b) Standard deviation can be calculated as where E(X) is the mean or average of the probability distribution.
Therefore for Bill,
Hence standard deviation for Bill wasweeks.
In case of Ben,
Hence standard deviation for Bill wasweeks.
Hence Bill can finish the job within 3.48 weeks to 5.53 weeks. Ben can finish the job within 2.87 weeks to 6.13 weeks (.
(c) Even though the average time for job completion is same but the standard deviation of time, for the job completion was less in Bill’s case compared to Ben. Hence preference should be given to Bill (Anderson et al., 2000).
- (a) The mean was
Standard Deviation was calculated as
Hence
(b) The given dataset was arranged in ascending order and total size of the data was 10. Hence, for median, the average of 5th and 6th observations (the middlemost) was the average of 13 and 15. The value of the median was found to be 14. The mean of the dataset was 18. The distribution was positively skewed as mean was greater than median and the measurement of skewness was
(c) The coefficient of variance was
(d) Data set 9, 11, 12, 13, 13 , 15, 16, 21, 28, 42 was arranged as follows, (9, 11, 12, 13, 13), | , (15, 16, 21, 28, 42) by partitioning at the median. First quartile was the median of the first part of the dataset and third quartile was the median of the second dataset. Hence the first quartile was 12 and third quartile was 21.
(e) The 80th percentile was 0.8*10 = 8, therefore the value of the 80th percentile was (21 + 28)/2=24.5.
(e) The range of a dataset is calculated as.
Therefore the range was (42-9) =33.
The interquartile range was calculated as the difference between third and first quartiles. The value of interquartile range was.
Considering the fact that the last value of the series was entered wrongly and the correct value would be 80 instead of 42, the value of the range was updated as (80-9) =71.
The value of the interquartile range was not affected by the modification of the dataset as, and quartiles are the 25% and 75% positional measures of central tendency. The interquartile range is the difference of and (Wan et al., 2014).
- (a) UsingtheZ-table the value of was evaluated. The probability between and was the area between the two points.
. The region has been shaded in figure 1.
Figure 1: in Standard normal curve
(b) Using the inverse Z-table the value of was evaluated. The probability between and was the area between the two points, and. The region has been shaded in figure 2
Figure 2: in Standard normal curve
Question 2: Descriptive Statistics and Dataset Analysis
(c) Using the inverse Z-table the value of was evaluated. The probability between and was the area between the two points, and. The region has been shaded in figure 3.
Figure 3: in Standard normal curve
(d) As the standard normal curve is symmetric about Z=0, the probability of or are same. Using the Z-table the value of was evaluated. The probability between and was the area between the two points, and. The region has been shaded in figure 4.
Figure 4: or in Standard normal curve
(e) Using the Z-table the value of was evaluated. The probability between and was the area between the two points, and. The region has been shaded in figure 5.
Figure 5: in Standard normal curve
(f) Using the inverse Z-table the value the probability at the left of -0.85 was found to be 0.1977 and the probability at the left of -0.84 was found to 0.2005. Hence the Z value was interpolated to be -0.842 left of which probability was 0.2.
Figure 6: Probability of 0.2 left of -0.842 in the standard normal curve
(g) The symmetric Z values which contained 95% of all observations were found using both Z-table and inverse Z-table. The standard normal curve is symmetric about Z=0 and hence probability of 095 implied probability of 0.475 on both sides. Again it was cross verified that.
Similarly from the inverse table, .
Therefore
Figure 7: Standard Normal Curve containing 95% of all observations
- TypeIerror was to revoke the license of the repair workshop where the workshop issued okay certificate to the cars who were meeting the standards.
(b) In this context, what is a type II error?
Type II error was to maintain the license of the workshop where the workshop issued okay certificate to the cars who were not meeting the standards.
(c) Type I error would be more serious compared to type-II error from the point of view of the shop owner. It would be fatal decision to revoke the license of the repair workshop where the workshop issued okay certificate to the cars who were meeting the standards
(d) From the point of view of the environmentalists, type-II error was more serious. Maintaining the license of the workshop where the workshop issued okay certificate to the cars who were not meeting the standards would have implied disastrous effect on the environment.
- 5. Sample size is the fundamental prerequisite for planning a statistical study to avoid biasness during interpretation of the experimental results. Less number of observations in a sample will produce an indifferent decision for the population, whereas more observations than required will result in wastage of resources.
Standard deviation represents variability of the sample data set. It is very obvious that for low variability in a population or a homogenous population, less number of observations in a sample will be sufficient. Again higher level of dispersion of the population would require larger set of sample data for proper effect size to differentiate between the interferences. Hence standard deviation plays an important role in choice of sample size (Sullivan & Feinn, 2012).
When standard deviation of the population is unknown, Student’s t-distribution is used to assess the population S.D. The population is assumed to be normal in nature and the estimated distribution in absence of standard deviation follows Student’s t-distribution, which almost follows pattern of normal distribution. The sample standard deviation is calculated and the unbiased estimate of population standard deviation is found. This unbiased estimator is used as an approximate measure of population S.D as, where n is the sample size and S is the sample standard deviation.
6.Testing of null hypothesis against the alternate hypothesis based on the sample collected from the population provides knowledge about the characteristics of the population. Hypothesis testing does not guarantee the determination of the truth of the hypothesis. It merely measures whether existing evidences are present to reject the null hypothesis. The alternate hypothesis is accepted based on the confidence level of the experiment. But the null hypothesis always gets rejected or failed to get rejected based on the value of the test statistic. The setup of the hypothesis testing is such that type-I error (rejection of true null hypothesis) is reduced. The value of the test statistic in critical region (region of rejection) at 5% (for example) implies 95% confidence that the null hypothesis is false, based on the current sample. On the contrary, if the value of the test statistic falls in the acceptable region then it reflects the inability of the evidences to reject null hypothesis. The failure to reject null hypothesis does not mean that the null hypothesis is correct; it only signifies the absence of enough evidences to reject it.
References
Anderson, D.R., Burnham, K.P. and Thompson, W.L., 2000. Null hypothesis testing: problems, prevalence, and an alternative. The journal of wildlife management, pp.912-923.
Sullivan, G.M. and Feinn, R., 2012. Using effect size—or why the P value is not enough. Journal of graduate medical education, 4(3), pp.279-282.
Wan, X., Wang, W., Liu, J. and Tong, T., 2014. Estimating the sample mean and standard deviation from the sample size, median, range and/or Interquartile range. BMC medical research methodology, 14(1), p.135.