Data Cleaning and Analysis
Discuss about the Confidence Interval and Statistical Significance.
The researcher examines the data set collected from an experimental study about footwear as per the physical aspects of the individuals such as gender, weight, shoe size and height. The considered data variables are – 1) “BodyVisualization_LOG” (it is the perceived body weight after each exposure captured by the user changing an image of a body until it compared the body weights for the individuals), 2) “GSR_Zscore” (galvanic skin response), 3) “Valence” (valence), 4) “Questionnaire_Speed” (speed perception), 5) “Questionnaire_Weight” (weight perception), 6) “Questionnnaire_Strength” (strength perception), 7) “Questionnaire_Straightness” (body straightness perception), 8) “Questionnaire_Vividness” (vividness of body feelings) and 9) “Questionnnaire_Surprise” (unexpected body feelings). The data analysis is based on the MS-Excel operation and measures extracted from the experiments with changing frequency and audio interface of the footwear at the time of witnessing the variations in the reaction of users. The participants acknowledged the high frequency, low frequency and controlled audio feedback from walking while wearing the prototype shoes. The researcher captured the perceptions of the participants about their body weight, mood, emotions and changes in behaviour involving three dimensions for each case of this experiment management.
The data cleaning helps to decrease wastage and consolidate the dirty or inaccurate data. A data analyst should focus on data cleaning for having correct information before using the data for analysis purpose. The analysis without cleaning the data may lead to a range of problems, linking problems, errors in parameter estimation, model mis-specification and linking biases. It may lead to draw a false conclusion. Data set in data sheet, the data set has a total of 21 variables. It is observed that- Sample no. 2 and 8 has missing values in the range “AD3” to “AF2”. Besides, sample no. 16, 11 and 10 also have some missing values. Sample no. 4 has lots of missing values starting from “L5” to “AC5”. The attempt of removing the missing values and eliminating empty cells is therefore carried out.
For single empty response of any sample, the analyst manually put the mean value of the column in the blank space. For the sample whose number of missing values are more than 3, the analyst has decided to neglect the whole sample from the data set. For example, the missing value of “GSR_Zscore” with high frequency of sample no. 2 is replaced by the average value of the entire “AD” column. On the other hand, sample no. 4 is removed from the whole data set.
Descriptive Statistics
The response of these people is gathered as per questionnaires of the survey. To have descriptive statistics, the study depicts that out of 21 samples, 17 are females and 4 are males.
- The average age of the samples is found to be 24.36 years with the standard deviation 4.86 years [2]. It is 95% evident that average Age lies in the interval of 26.52 years to 22.21 years. The median of age is 22.5. The age of the samples varies from 18 to 35 years.
- The mean height of the sampled responders is 164.818 cm with standard deviation 6.905 cm. The median of the heights of the responders are both 165 cm. Most of the responders have heights 165 cm. The estimated average of the heights lies in the interval of 161.757 cm and 167.880 cm.
- The average weight of the samples is 59.25 Kgs and standard deviation 10.546 Kgs. The 95% estimated average weight of the samples are 54.574 Kgs. And 63.926 Kgs. The median weight of the responders is 57 Kgs. The maximum weight of any responder was found to be 89 Kgs and minimum weight of any responder was found to be 47 Kgs.
- The average shoesize of the samples is 6.023 unit and standard deviation 1.829 unit. The estimated average shoe size of all the samples lie in the interval of 6.83 units to 5.21 units. The median of the shoe size 5.75 units. The shoe size ranges in the interval of 3 units to 10 units.
- The “BodyVisualization_LOGscore” has the average 1.7422 and standard deviation 0.0955. The estimated average of the log scores of body-visualizations varies in the range 1.699 and 1.786 respectively. The 50% samples are above and below of the median log scores that is 1.717.
Among 21 chosen samples, GSR_Z-score has higher average score for high frequency followed by low frequency and control frequency. The average of Z-scores in high audio frequency, low audio frequency and controlled audio frequency are 0.127, (-0.140) and (-0.024) respectively.
- The estimated average of Galvanic Skin Response lies in the interval of 0.043 to 0.21 for high audio frequency with 95% possibility [1].
- The estimated average of Galvanic Skin Response lies in the interval of (-0.298) to (0.018) for low audio frequency with 95% possibility.
- The estimated average of Galvanic Skin Response lies in the interval of (-0.312) to 0.263 for controlled audio frequency with 95% possibility.
The average GSR Z-score is highest for high frequency followed by controlled and low audio frequencies. The estimated GSR ranges maximum for controlled frequencies.
The positive Valence response are the responses that are greater than 5. Among 21 chosen samples, 17 samples have positive valence rate in high audio frequency, 11 samples have positive valence rate in low audio frequency and 13 samples have positive valence rate in control audio frequency. The proportions of Valence response in high audio frequency, low audio frequency and controlled audio frequency are 0.81, 0.52 and 0.62 respectively.
- The estimated proportion of positive response lies in the interval of 0.64 to 0.98 for high audio frequency with 95% possibility.
- The estimated proportion of positive response lies in the interval of 0.74 to 0.31 for low audio frequency with 95% possibility.
- The estimated proportion of positive response lies in the interval of 0.83 to 0.41 for controlled audio frequency with 95% possibility.
- The bar plot shows that valence has higher positive proportion in high frequency followed by low frequency. The positive proportion is higher for control frequency.
The positive perceived proportional response of Straightness are the samples that have experimental measure from 4 to 7. The proportion of positive perceived proportional response of Straightness is maximum for “High” audio frequencies and minimum for “Control” audio frequencies. The positive response of perceived Straightness of “High” audio frequency differs in the range of proportion 0.78 and 1.0 [3]. The predicted response of “Control” audio frequency ranges within the proportion 0.41 to 0.83.
The bar chart depicts that “Straightness” has higher proportion in case of high frequency followed by the proportion in case of low frequency. The proportion and estimated ranges are lowest for control frequency management.
The positive perceived surprise response are the samples that have experimental measure from 4 to 7. The proportion of positive surprise feeling is higher for “High” audio frequencies and lower for “Control” audio frequencies. The positive surprise response of “High” audio frequency varies from the range of proportion 0.31 and 0.74. The predicted response of “Control” audio frequency ranges within the proportion 0.17 to 0.58.
The bar chart refers that the positive surprise proportions and its 95% estimated ranges of proportions are ordered in the way High, Low and Control.
Research Question: Do the positive response of the three variables “Speed”, “Weight” and “Strength” are equal with respect to the three types of audio frequency that are High, Low and Control?
The proportions of “Speed” refer that-
- The proportion of positive speed perception in high frequency is 0.762.
- The proportion of positive speed perception in low frequency is 0.476.
- The proportion of positive speed perception in controlled frequency is 0.476.
The proportions of “Weight” refer that-
- The proportion of positive weight perception in high frequency is 0.238.
- The proportion of positive weight perception in low frequency is 0.524.
- The proportion of positive weight perception in controlled frequency is 0.476.
The proportions of “Strength” refer that-
- The proportion of positive strength perception in high frequency is 0.667.
- The proportion of positive strength perception in low frequency is 0.381.
- The proportion of positive strength perception in controlled frequency is 0.429.
The bar chart shows that speed perception is higher in high frequency than low or control frequency. The weight perception is higher in low audio frequency rather than control or high frequency. Lastly, the straight perception is maximum for high frequency followed by low or control frequency. Hence, the positive perception rate as per three types of frequencies do not follow same pattern for all the three variables that are “Speed”, “Weight” and “Strength”.
Reference List:
J. Bartlett, J. Kortlik and C. Higgins, “ProQuest Statistical Abstract of the USA2014 134 ProQuest Statistical Abstract of the USA Ann Arbor, MI ProQuest 2013-“, Reference Reviews, vol. 28, no. 4, pp. 22-24, 2014.
J. Gandhi, “Political Institutions under Dictatorship”, Langtoninfo.com, 2018. [Online]. Available: https://www.langtoninfo.com/web_content/9780521897952_frontmatter.pdf.
S. Nakagawa and I. Cuthill, “Effect size, confidence interval and statistical significance: a practical guide for biologists”, Lira.pro.br, 2018. [Online]. Available: https://lira.pro.br/wordpress/wp-content/uploads/downloads/2012/03/nakagawa-e-cuthill-2007.pdf.