Relationship between Reality and Test Result
- The descriptive statistics results (data 1)
negative |
positive |
|
Actually pregnant count |
6 |
36 |
Not actually pregnant count |
46 |
12 |
Column total |
52 |
48 |
- Describing the relationship between variables.
This is done using sample proportion in which case, the resulting p-value from the dataset summariser is 1.3288E-10 which is less than 5% level of significance and hence leading to the conclusion that there is a strong relationship between reality and test result.
- Descriptive statistics results (data 2)
negative |
positive |
|
actually pregnant count |
6 |
39 |
Not actually pregnant count |
51 |
4 |
Column total |
57 |
43 |
- Describing the relationship between variables.
This is done using sample proportion in which case, the resulting p-value from the dataset summariser is 1.4852E-15 which is less than 5% level of significance and hence leading to the conclusion that there is a strong relationship between reality and test result
- Better version
Of the two versions, the best one is the dataset 2 version. This is because it had the smallest p-value compared to dataset 1 hence more significant relationship between the two variables.
- Descriptive statistics (variables 1 and 2 data 3)
Answer
Country A |
Country B |
|
Mean |
1478.82 |
3983.08 |
Standard Deviation |
313.2581702 |
581.1986502 |
Minimum |
1032 |
3043 |
Maximum |
1957 |
4998 |
- Description of the relationship
Since there are two categories for countries, the best test is the difference between means hence an independent sample t-test is appropriate. The corresponding results from the t-test (p-value) is 3.53971E-40 which is less than 5% level of significance and hence a conclusion that on average, country A had a significantly lesser number of tests than country B.
- Graph of predicted shapes of histograms
Answer
Fig 1. Histogram for country A
Fig 2. Histogram for country B
From both the histograms, the number of tests values were skewed to the right hence normality was not achieved.
- Suppose you know the quantitative variable is normally distributed for both groups, make a comment about part c)
In case the variables are normally distributed, then the average, mode and median values are all expected to be the same. Besides, the value of standard deviation will be expected to significantly reduce.
- The descriptive statistics into the word file
Answer
Table 4. Descriptive statistics
Number of tests |
Number of people needing tech support |
|
Average |
2730.95 |
174.56 |
stdev |
1341.42679 |
41.20099 |
- Looking at the graph does there appear to be one linear relationship or two linear relationships?
Fig 3. Scatter plot for number of tests and the number of people in need of technical support
There is a positive linear relationship between the number of tests and the number of people needing technical support. The number of tests explained 71.38% of the model variation which is a strong relationship.
- Repeat part a) for country A
Answer
Table 5. Descriptive statistics for country A number of tests
Number of tests |
Number of people needing tech support |
|
Average |
1478.82 |
148.26 |
stdev |
313.25817 |
32.48724 |
Using the output from part c) Describe the relationship between the variables using one of the following numbers, select the correct option.
Answer
The appropriate test is the correlation coefficient r which is represented using the graph below.
Fig 4. Scatter plot for number of tests and the number of people in need of technical support (country A)
There is a strong positive linear association between the number of tests and the number of people that require technical support for country A. This relationship is stronger as compared to when both countries are used for analysis with the number of tests explaining 92.04% of the model variation.
- Using the information in part c) Write an equation that lets you predict the number of people needing tech support Y given the number of tests.
Answer
- Use the information in part (d) to predict number of people needing tech support if the number of tests is 1000
Answer
- Just considering the information from country A
- What is the estimate of the population mean number of tests?
1478.82
- What is the standard error of this estimate?
44.3014
- Just considering the people country B
- What is the estimate of the population mean number of tests?
3983.08
- What is the standard error of this estimate?
82.1939
- For version 1 of the test find a 95% confidence interval for the proportion of pregnant women that test positive
Answer
We are 95% confident that the true population proportion of pregnant women that test positive lies somewhere between 0.266 and 0.454.
- For version 2 of the test find a 95% confidence interval for the proportion of pregnant women that test positive
Answer
We are 95% confident that the true population proportion of pregnant women that test positive lies somewhere between 0.294 and 0.486.
- Relationship output
Answer
hypothesis testing |
|
test stat |
two sided p-value |
-6.423833 |
1.329E-10 |
To calculate the p-value H0:p1=p2 is assumed to be true |
|
since the test is two sided H1 is H1:p1≠p2 |
- Comment on the confidence interval
Confidence interval |
We are 95% confident that p1-p2 is between |
-0.8486454 |
and |
-0.4518472 |
0 is NOT in the confidence interval so there is strong evidence there is a difference in proportions |
- Comment on the p-value
Answer
The p-value is less than 5% level of significance displaying a strong significant relationship between the actual pregnancies and the test results.
- Inferential statistics results
Difference between Two Countries
Answer
xbar1-xbar2 |
standard error of estimate |
t test stat |
df |
two sided p-value |
-2504.26 |
93.37264613 |
-26.82006031 |
75 |
3.53971E-40 |
- Comment on the confidence interval
Answer
We are 95% confident that the true mean difference in the number of tests for countries A and B lies somewhere between 2318.965 and 2689.555
- Comment on the p-value
Answer
The p-value is less than 5% level of significance hence there is a significant difference between the number of tests for countries A and B.
- Inferential statistics claim using whole population
correlation r |
0.844842063 |
r squared |
0.713758111 |
p-value |
2.27E-28 |
- Inferential statistics for country A
correlation r |
0.959 |
r squared |
0.92 |
p-value |
5.06E-28 |
- Which case has a lower standard error the output from part a) or the output from part b)
Country A resulted in a lower output error since the independent variable explains a stronger percentage variation of the model.
- In both part a) and part b) the computer is trying to find a single linear relationship between the variables, based on your previous work in which case is the output trustworthy?
Answer
The output is trustworthy for the country A results since the independent variable explains 92.00% of the model variation and has a smaller p-value than when both countries are used.
- Comment on the confidence interval in part b)
Lower 95% |
Upper 95% |
0.090998 |
0.107986 |
We are 95% confident that the true number of tests coefficient lies somewhere between 0.090998 and 0.107986.
- Comment on the p-value in part b)
The p-value is <.001, a value that’s less than 5% level of confidence and hence the number of tests made was a significant predictor of the number of people who needed technical assistance.
In the video, a typical Likert format such as employee surveys with responses that ranges from strongly disagree to strongly agree was used to explain the ways to visually present survey results. An example of the satisfaction Data with response range of strongly disagree, disagree, neutral, agree and strongly agree are is used. This had five statements of I feel valued in my team; the work is distributed evenly in the team; I can communicate openly; my manager values my feedback; and I enjoy my work were used. The first option for visualizing this data was the use of stacked bar chart. The data was then selected and rows and column switched to put the statements on the axis. The second method was he use of diverging stacked bar chart that centres the neutral responses. This is important in displaying which category resulted in the most positive and negative impact. To develop a diverging stacked bar chart, we have negative values and positive values that are separated by neutral values. This done by creating a proportion table through copying the original table and pasting in form of links. The negative sign is then added on the strongly disagree and disagree and two neutral points are erected, one for negative and another for positive. The neutral values are divided by 2 for the two categories to come up with new values. The new table is then highlighted, then using the insert function, the 2D stacked bar chat is created then rows and columns switched. The axis is then highlighted and CNTRL 1 pressed to bring the axis property and a low label position is selected. In the resulting graph, the negative neutral and strongly agree options are interchanged in position and therefore their columns are interchanged in the proportion table to come up with the desired values. Colours are then exchanged for each box by clicking the box. Thereafter, formatting numbers is done by clicking on label option, number and format code using where is typed and add option clicked. Getting the legend correct order (strongly disagree before disagree) involve adding a dummy column for disagree to the right of strongly disagree with empty values, then the variable is pushed up to the required position. The colour is then changed to match the original disagree option then the latter is removed to leave the desired stacked bar.