BUS 105, trimester 1 2018 Instructions for the computing assignment worth 20% of your final grade due week 10
*A sample assignment is provided on pages 7 to 15 of this document it is forbidden ask your friends for their assignments so you can use their assignments as a guide.
*Last semester many students got 0 out of 20 because they used their friends sample or they copied the written sections ( this semester the written sections are sections 1, 7 and 8d) Submit the drafts to section 2a,3a,4a and 6a and check the lecturer feedback before looking at the rest of these instructions
*Last semester many students got 0 out of 20 because many students got their friend to do the assignment using advanced methods the friend used in other courses but they could not reproduce the results themselves. Be aware if you use any advance methods you will HAVE to show them to your tutor you can reproduce the results in the tutorial *Read the instructions carefully and make sure your answers demonstrate what you have learnt something , if this need you to go under or over the word limit this is not a problem, Definitely do not add words just to make the word count large.
*There are 8 sections you must put the answers to all sections into a single document and submit to turnitin. Note that your document must be a Microsoft Word document or pdf. If you use an Apple computer DO NOT SUBMIT A “.PAGES” document save it as a pdf before submitting.
*For sections 2,3,4 and 6 you must use your allocated sample number available from
https://app.box.com/s/uhxuhxw8z8pv937ituzs5t2bos9bcr5y
*Note that there are 5 preparation quizzes on moodle that are also due in week 10 , these will help prepare you for the assignment so you should finish them before finishing the assignment.
The exact instructions for each of the 8 sections is given below
Section 0(Cover page)
“title : semester 1, 2018 bus105 computing assignment” “Name:” “Student number:” “Allocated sample:”
Section 1
Read the sample statistical report from last semester https://app.box.com/s/q349upd1tt9mzxqay9xc8e8mq7mlumo5
And skim through the guide to summarizing datasets
https://app.box.com/s/rdifq3re479ymn4zgxwtdvef698g162c
Describe how the author of the sample statistical report used the information in the guide to summarizing datasets to write the sample statistical report
Notes
*You do not Have to give your discussion in Essay form *Make sure you show you understand what a variable is and what a dataset is, you are encouraged to discuss the examples provided. 400 words of explanation should be more than enough. Note that anything that shows you have bothered to try to properly understand variables and datasets gets the marks, for example most students lost marks on the week 6 test because they made mistakes about discussing variables, you could explain these mistakes.
Section 2 Use the dataset given below https://app.box.com/s/gx52n61zap2o79mrz4k2y37xxhf4uqm7
A) Use the PivotTable feature in excel to find appropriate summary statistics that let you investigate the relationship between variables “age , old or young” and “do they like the product, like or hate” This will probably require two PivotTables. You should paste both into word, you do not need the excel file.
Make sure the pivotable (or pivot tables) include the following statistics
*Just considering the old people what is the sample size n1 and the proportion of people that would say “yes, they like the product ”
* Just considering the young people what is the sample size n1 and the proportion of people that would say “yes, they like the product ”
B) Make a simple comment about the relationship between the variables
C) Using your sample what is the estimate for p1- p2? In other words what is the difference between the sample proportions –
Section 3 Use the dataset given below you must use your own sample
https://app.box.com/s/gx52n61zap2o79mrz4k2y37xxhf4uqm7
A) Use the pivot table feature in excel to find appropriate summary statistics for your sample. The following sample statistics must be found
Just considering the old people , what is the sample size n1 , the sample average profit per bet from machine A , and the sample standard deviation s1 Just considering the young people , what is the sample size n2 , the sample average profit per bet from machine B , and the sample standard deviation s2
Paste the pivot table into the word document you do not need to submit the excel file
B) Make a simple comment about the relationship between the variables using the answers to (A)
C) Using your sample what is the estimate for µ1- µ2? In other words what is the difference between the sample means –
Section 4
Use the dataset given below https://app.box.com/s/jg6n5yxec1alcrz6mfuo7alz697h2w0t
Note that for section 4 the answers are provided so you can check your work, the answers will not be provided for the other sections.
A) paste in the scatterplot for your sample into your word document
B) give a simple comment about the relationship between the variables
C) Estimate profit of a casino when there are 1000 bets.
Section 5
A) Using the answer in section 2
test the claim there is a difference in the proportions, use a 5% level of significance
i)state an appropriate H0 and H1
ii) find the p-value Only using the answers to part (A) and the webpage http://epitools.ausvet.com.au/content.php?page=z-test-2
or https://www.medcalc.org/calc/comparison_of_proportions.php
Do NOT use any other method to find the p-value Do NOT use any other software package such as SPSS or Analysis tookpak
iii) state whether or not you reject the H0
iv) give a conclusion in plain English
B) Using the answer in section 3 Test the claim that there is a difference between the means using a 5% level of significance
i)state an appropriate H0 and H1
ii) find the p-value using the answers to part (A) and the webpage https://www.medcalc.org/calc/comparison_of_means.php
Do NOT find the p-value using any other method. Do NOT use any other software package such as SPSS or Analysis toolpak
iii) state whether or not you reject H0
iv) give a conclusion in plain English
Section 6 Use the dataset given below you must use your own sample https://app.box.com/s/kzc6ivy10gvy4vz6d0pgy0lzh929ivx9 Suppose A business has conducted an opinion poll to find out if their customers support a change to the Business
a) Use the PivotTable feature in excel to find appropriate summary statistics for your sample,. You should paste both into word, you do not need the excel file.
This pivot table must have the number of people that answer yes and the number of people that answer no
b) What is sample size and the sample proportion of people that support the change, Note that is the estimate for the population proportion p
c) Find a 90% confidence interval for the proportion of people that support the change
.
Section 7
DO YOUR OWN WORK DO NOT COPY ANY PART FROM ANYONE a) Use google to find an example of a back to back histogram (or something like a back to back histogram, any webpage that uses different histograms with the same scale to compare different groups) and discuss it , you have to find your own example that you understand, you cannot use the examples given on moodle.
paste the histograms into your assignment
b) Give a description of each of both variables, for each variable explain if it is categorical or quantitative, Hint: A variable can be expressed in terms a of question, if the question is “which category” it is a categorical variable , if the question is “how many or how much” it is a quantitative variable. A sample has n things and you ask the question for each of the n things in the sample WARNING: last semester many students said both variables where quantitative this is incorrect the frequency is NOT a quantitative variable, one variable is quantitative and one variable is categorical.
c) Describe the relationship between the variables.
d) Consider the histogram you found yourself and discussed in parts (a) ,(b) and (c) Would the discussion be useful in business? Give a reason for your answer.
e) Consider the following discussion taken from the sample report you had to read in section 1, Would the discussion be useful in business? Give a reason for your answer
A sample of 100 people were given a snack food to taste and given a survey.
The histogram below lets you see the relationship between the variables “Gender” and the “amount they would pay “
The variable gender is categorical variable because it is a question “Which category are you in male or female?”
The variable gender is quantitative variable because it is a question “How much would you pay ?”
The amount people would pay for the snack food is between $0 and $3
Males and females have a similar distribution , slightly more females would page a low amount, (between $0 and $0.50) slightly less females would pay in the range between $2.50 and $3.00
Section 8
This section is abstract so you are encouraged to try and roughly understand the following before attempting the task
https://app.box.com/s/3e8pxh994ixhwj50je849xz1gzxcsen3
a) Using section 2
i) Find the zscore of the estimate section 2d note that average of the estimates is 0.14 with standard deviation 0.088
ii) using part (i) find P(Z<zscore) using www.wolframalpha.com for example if the zscore is 0.5 type in P(Z<0.5)” into wolframalpha.com
iii) IF there was a list of 1000 estimates ranked from lowest to highest, roughly what rank do you expect your estimate to have? Hint: just use the formula expected rank = P(Z<zscore)*1000
iv) complete the following table using https://app.box.com/s/2to195ysj0deo5wawwjp53e9jlt4peqp
Which sample | Rank lowest to highest | Estimate X | Zscore=(X-mean)/stdev | |
Lowest estimate | 1 | |||
Estimate from allocated sample | ||||
Highest estimate | 1000 |
b) Using section 3
i) Find the zscore of the estimate in section 3c note that average of the estimates is 0.408 with standard deviation 0.26
ii) using part (ii) What is P(Z<zscore), you can find out the answer using www.wolframalpha.com for example if the zscore =-1 type in P(Z<-1) into wolfram alpha
iii) If there was a list of 1000 estimates ranked from lowest to highest, what rank do you think your would be close to, hint just use the formula expected rank = P(Z<zscore)*1000
iv) complete the following table , use https://app.box.com/s/kiqemn0h0m3d03uygo1dhemvx4e5uf6r
Which sample | Rank lowest to highest | Estimate X | Zscore=(X-mean)/stdev | |
Lowest estimate | 1 | |||
Estimate from allocated sample | ||||
Highest estimate | 1000 |
c) Using section 4
i) Find the zscore of the slope estimate in section 4a note that average of the estimates is 0.952 with standard deviation 0.237,
ii) using part (ii) What is P(Z<zscore), you can find out the answer using www.wolframalpha.com for example if the zscore =-1 type in P(Z<-1) into wolfram alpha
iii) If there was a list of 1000 estimates ranked from lowest to highest, what rank do you think your would be close to, hint just use the formula expected rank = P(Z<zscore)*1000
iv) Complete the following table
Which sample | Rank lowest to highest | Estimate X | Zscore=(X-mean)/stdev | |
Lowest estimate | 1 | |||
Estimate from allocated sample | ||||
Highest estimate | 1000 |
d) for parts a,b and c , compare the predicted rank for your sample iii to the actual rank in part iv
e) Comment on the connection between the following facts *“part (d) shows totally different datasets that have same sampling distribution, (the normal distribution)”
*”Hypothesis testing uses a sampling distribution, p-value is a shaded area on the sampling distribution”
Assignment template (INSRUCTIONS WITH SAMPLE ANSWERS)
Section 0(Cover page)
“title : semester 1, 2018 bus105 computing assignment” “Name: Mat Maccallum” “Student number: 1070004” “Allocated sample: 1 ”
Section 1
Read the sample statistical report from last semester https://app.box.com/s/q349upd1tt9mzxqay9xc8e8mq7mlumo5
And skim through the guide to summarizing datasets
https://app.box.com/s/rdifq3re479ymn4zgxwtdvef698g162c
Describe how the author of the sample statistical report used the information in the guide to summarizing datasets to write the sample statistical report
Notes
*You do not Have to give your discussion in Essay form *Make sure you show you understand what a variable is and what a dataset is, you are encouraged to discuss the examples provided. 400 words of explanation should be more than enough. Note that anything that shows you have bothered to try to properly understand variables and datasets gets the marks, for example most students lost marks on the week 6 test because they made mistakes about discussing variables, you could explain these mistakes.
Section 2
A) pivot tables that let you investigate the relationship between the variables “old or young” and “do the like the product ? hate or like”
sample collector id | 1 | ||
Count of do they like product ? | Column Labels | ||
Row Labels | hate | like | Grand Total |
old | 15 | 47 | 62 |
young | 15 | 23 | 38 |
Grand Total | 30 | 70 | 100 |
Count of do they like product ? | Column Labels | ||
Row Labels | like | hate | Grand Total |
old | 75.81% | 24.19% | 100.00% |
young | 60.53% | 39.47% | 100.00% |
Grand Total | 70.00% | 30.00% | 100.00% |
B) Make a simple comment
C) Using your sample what is the estimate for p1- p2? In other words what is the difference between the sample proportions –
Answer
0.7581-0.6053=0.1528
Section 3
A) A pivot table that let you investigate the relationship between the variables “old or young” and “how much they would pay for the product ”
sample collector id | 1 | ||
Row Labels | Average of how much would pay ? | StdDev of how much would pay ? | Count of are they old? |
old | 2.44 | 1.258995134 | 62 |
young | 1.99 | 1.448423384 | 38 |
Grand Total | 2.268 | 1.345240769 | 100 |
B) Make a simple comment about the relationship between the variables
C)
Using your sample what is the estimate for µ1- µ2? In other words what is the difference between the sample means – answer
2.44-1.99=0.45
Section 4
A)
B) Make a simple comment about the relationship between the variables
C) Estimated profit for the casino when there 1000 bets is =0.8388*1000+107.96=946.76
Section 5A refer to 7th lesson and 8th lesson or sample exam to find out what to do
A) Using the answer in section 2
sample collector id | 1 | ||
Count of do they like product ? | Column Labels | ||
Row Labels | hate | like | Grand Total |
old | 15 | 47 | 62 |
young | 15 | 23 | 38 |
Grand Total | 30 | 70 | 100 |
Count of do they like product ? | Column Labels | ||
Row Labels | like | hate | Grand Total |
old | 75.81% | 24.19% | 100.00% |
young | 60.53% | 39.47% | 100.00% |
Grand Total | 70.00% | 30.00% | 100.00% |
test the claim there is a difference in the proportions, use a 5% level of significance
i)state an appropriate H0 and H1
ii) find the p-value Only using the answers to part (A) and one of the following webpages http://epitools.ausvet.com.au/content.php?page=z-test-2
or
https://www.medcalc.org/calc/comparison_of_proportions.php
Do NOT use any other method to find the p-value Do NOT use any other software package such as SPSS or Analysis tookpak
iii) state whether or not you reject the H0
iv) give a conclusion in plain English
Section 5B refer to 7th lesson and 8th lesson or sample exam to find out what to do
using the information in section 3
sample collector id | 1 | ||
Row Labels | Average of how much would pay ? | StdDev of how much would pay ? | Count of are they old? |
old | 2.44 | 1.258995134 | 62 |
young | 1.99 | 1.448423384 | 38 |
Grand Total | 2.268 | 1.345240769 | 100 |
Test the claim that there is a difference between the means using a 5% level of significance
i)state an appropriate H0 and H1
ii) find the p-value using the answers to part (A) and the webpage https://www.medcalc.org/calc/comparison_of_means.php
Do NOT find the p-value using any other method. Do NOT use any other software package such as SPSS or Analysis toolpak
iii) state whether or not you reject H0
iv) give a conclusion in plain English
Section 6 refer to the 9th lesson you must use your own sample
which sample | 1 |
Row Labels | Count of Do you support the change |
n | 378 |
y | 622 |
Grand Total | 1000 |
a) The sample size n is 1000 and the sample proportion =622/1000=0.622
b) Find 90% confidence interval for the proportion of people that support the change standard error = =0.0153 using the z distribution 90% of sample proportions are within 1.645 standard errors of the population proportion so the 90% confidence for sample proportion is between 0.622-1.645*0.0153 =0.597 and 0.622+1.645*0.153=0.647
.
Section 7
DO YOUR OWN WORK DO NOT COPY ANY PART FROM ANYONE a) Use google to find an example of a back to back histogram (or something like a back to back histogram, any webpage that uses different histograms with the same scale to compare different groups) and discuss it , you have to find your own example that you understand, you cannot use the examples given on moodle.
paste the histograms into your assignment
b) Give a description of each of both variables, for each variable explain if it is categorical or quantitative, Hint: A variable can be expressed in terms a of question, if the question is “which category” it is a categorical variable , if the question is “how many or how much” it is a quantitative variable. A sample has n things and you ask the question for each of the n things in the sample WARNING: last semester many students said both variables where quantitative this is incorrect the frequency is NOT a quantitative variable, one variable is quantitative and one variable is categorical.
c) Describe the relationship between the variables.
d) Consider the histogram you found yourself and discussed in parts (a) ,(b) and (c) Would the discussion be useful in business? Give a reason for your answer.
e) Consider the following discussion taken from the sample report you had to read in section 1, Would the discussion be useful in business? Give a reason for your answer
A sample of 100 people were given a snack food to taste and given a survey.
The histogram below lets you see the relationship between the variables “Gender” and the “amount they would pay “
The variable gender is categorical variable because it is a question “Which category are you in male or female?”
The variable gender is quantitative variable because it is a question “How much would you pay ?”
The amount people would pay for the snack food is between $0 and $3
Males and females have a similar distribution , slightly more females would page a low amount, (between $0 and $0.50) slightly less females would pay in the range between $2.50 and $3.00
Section 8
This section is abstract so you are encouraged to try and roughly understand the following before attempting the task
https://app.box.com/s/3e8pxh994ixhwj50je849xz1gzxcsen3
a) Using section 2
i) Find the zscore of the estimate section 2 note that average of the estimates is 0.14 with standard deviation 0.088 answer using section 2
Count of do they like product ? | Column Labels | ||
Row Labels | like | hate | Grand Total |
old | 75.81% | 24.19% | 100.00% |
young | 60.53% | 39.47% | 100.00% |
Grand Total | 70.00% | 30.00% | 100.00% |
The estimate -=0.7581-0.6053=0.153 so zscore = (0.153-0.14)/0.088=0.151
ii) P(Z<zscore) using www.wolframalpha.com is P(Z<0.151)=0.56
iii) expected rank = P(Z<zscore)*1000=0.56*1000=560
iv) some of the 1000 estimates, full list available from https://app.box.com/s/2to195ysj0deo5wawwjp53e9jlt4peqp
Which sample | Rank lowest to highest | Estimate X | Zscore=(X-mean)/stdev | |
Lowest estimate | 475 | 1 | -0.14306 | -3.19465 |
Estimate from allocated sample | 1 | 567 | 0.153 | 0.151 |
Highest estimate | 663 | 1000 | 0.543672 | 4.570203 |
b) Using section 3
i) Find the zscore of the estimate in section 3 note that average of the estimates is 0.408 with standard deviation 0.26 answer , using the answer to section 3
Row Labels | Average of how much would pay ? |
old | 2.44 |
young | 1.99 |
the estimate is – =2.44-1.99=0.45
so the zscore is (0.45-0.408)/0.26=0.176
ii) using www.wolframalpha.com P(Z<zscore) = P(Z<0.176)=0.570
iii) expected rank = P(Z<zscore)*1000=0.570*1000=570
iv) summary of some of the 1000 samples full list available from https://app.box.com/s/kiqemn0h0m3d03uygo1dhemvx4e5uf6r
Which sample | Rank lowest to highest | Estimate X | Zscore=(X-mean)/stdev | |
Lowest estimate | 475 | 1 | -0.43474 | -3.23897 |
Estimate from allocated sample | 1 | 571 | 0.45 | 0.176 |
Highest estimate | 663 | 1000 | 1.607576 | 4.613465 |
c) Using section 4
i) Find the zscore of the slope estimate in section 4a note that average of the estimates is 0.952 with standard deviation 0.237 answer ( 0.8388-0.952)/0.237=-0.478
ii) Using www.wolframalpha.com P(Z<zscore)=P(Z<-0.478)=0.316
iii) expected rank = P(Z<zscore)*1000=0.316*1000=316
iv) Summary some of the 1000 estimates the full list of estimates is available from https://app.box.com/s/35a0x0hnxcqq2qh6krzua6qp587fke51
Which sample | Rank lowest to highest | Estimate X | Zscore=(X-mean)/stdev | |
Lowest estimate | 141 | 1 | -0.003480103 | -4.03134 |
Estimate from allocated sample | 1 | 290 | 0.838828357 | -0.478 |
Highest estimate | 683 | 1000 | 3.878984 | 3.876998 |
d) for parts a,b and c , compare the predicted rank for your sample iii using P(Z<zscore) to the actual rank in part iv
Comment on the connection between the following facts *“part (d) shows totally different population with totally different variables have the same sampling distribution, (the normal distribution)”
*”Hypothesis testing uses a sampling distribution, p-value is a shaded area on the sampling distribution
profit
1029 981 989 988 951 1013 987 984 970 946 957 953 944 947 897 953 920 942 937 882
Number of bets
Profit
profit
y = 0.8388x + 107.96 R² = 0.6794
1029 981 989 988 951 1013 987 984 970 946 957 953 944 947 897 953 920 942 937 882
Number of bets
Profit