Missing Data and Data Cleaning
This report shows the analysis of the quantitative data on footwear user study that was collected from 22 participants. The survey recorded information on the user interfaces of footwear. The aim of this study is to provide a quantitative analysis of the audio interface of footwear and ultimately the analysis of the changes in the user behaviour due to change in the audio interface of the footwear from the available information [6].
The responses of 22 individuals on 98 variables were arranged in columns. However, some respondents did not want to respond to certain variables. Thus, there were missing data and also some data wrongly entered in the columns [1].
Data Cleaning – Treatment for missing values
The total number of cells with missing data was 30 in the entire dataset. The initial decision for data cleaning would be to remove the missing data field from the dataset, but the dataset was small (only 22 participants participated in the study). Thus, the missing value or missing values for each column (there were more than one missing value in a single column for some cases) was (were) replaced by the mean of that particular columns. This technique would not affect the result of the study [7]. The missing values, which are replaced by mean of all the values of the respective columns, are highlighted by yellow colour in the Excel Sheet.
Descriptive Statistics
Descriptive statistics of 22 respondents for five different variables are shown below. The chosen five variables with their descriptive statistics are shown below [8]-
- ToePressure_Zscore_HighFrequency_Repetition1
ToePressure_Zscore_HighFrequency_Repetition1 |
|
Mean |
0.126706903 |
Standard Error |
0.179428664 |
Median |
0.126706903 |
Mode |
0.126706903 |
Standard Deviation |
0.841595034 |
Sample Variance |
0.708282202 |
Kurtosis |
-0.612582963 |
Skewness |
-0.066815821 |
Range |
2.98311421 |
Minimum |
-1.411457729 |
Maximum |
1.571656481 |
Sum |
2.787551859 |
Count |
22 |
Largest(1) |
1.571656481 |
Smallest(1) |
-1.411457729 |
Confidence Level(95.0%) |
0.373142334 |
- BodyVisualization_LOGscore_HighFrequency_Repetition1
- HeelPressure_Zscore_HighFrequency_Repetition2
- FootAcceleration_Zscore_LowFrequency_Repetition1
- GSR_Zscore_HighFrequency
The following table shows the 95% confidence intervals of Z-scores of the Galvanic skin response (GSR) for High, Low, and Controlled frequency. The table displays the lower bounds and the upper bounds of the confidence intervals [4].
A bar chart is shown below to represent the comparison of Galvanic skin response for three different frequencies.
The bar chart plots the average of the GSR Z-scores for three different frequencies (high, low, and control) and the magnitude of errors are shown on the bar chart using error bars. The bar chart shows that the confidence interval of the low frequency Galvanic skin response (GSR) shows the widest interval to contain the population mean for 95% of the time among the three types of GSR Z-scores.
To calculate the confidence intervals of the proportion of positive valence for high, low, and controlled frequencies, the formula of confidence interval for one-sample proportion is used which can be defined as,
Where, p? = sample proportion, z = confidence coefficient which is equal to 1.96 for 95% confidence interval.
A bar chart is displayed below to show the comparison among the confidence intervals of the proportion of the positive valence for high, low, and control frequencies. Here the data is collected for two trials for each frequency. To calculate the confidence intervals, the average of the proportion for the two repetitions has been taken for better result [5].
Descriptive Statistics
The bar chart shows that the proportion of positive valence is largest for high frequency. The proportion gradually decreases from high frequency to control frequency.
Does Weight affect the audio interface for different frequencies?
To check whether the audio interface changes due to change in the frequency level, a one-way ANOVA is constructed where the treatment to be taken into consideration is the weight variable and the variation that may occur is due to change in the value for frequency levels High, Low, and Control. The null hypothesis is that the change in the mean values for the 3 types of frequency level is unassignable or is due to chance. The values of repetition 1 and 2 are averaged to get a better result [2].
Anova: Single Factor |
||||||
SUMMARY |
||||||
Groups |
Count |
Sum |
Average |
Variance |
|
|
Questionnaire_Weight_HighFrequency |
22 |
80.5 |
3.659091 |
0.98539 |
||
Questionnaire_Weight_LowFrequency |
22 |
97 |
4.409091 |
1.372294 |
||
Questionnaire_Weight_Control |
22 |
91.5 |
4.159091 |
1.223485 |
||
ANOVA |
||||||
Source of Variation |
SS |
df |
MS |
F |
P-value |
F crit |
Between Groups |
6.416667 |
2 |
3.208333 |
2.68767 |
0.075839 |
3.142809 |
Within Groups |
75.20455 |
63 |
1.193723 |
|||
Total |
81.62121 |
65 |
From the ANOVA table, it can be seen that the p-value = 0.0758 > significance level α = 0.05. Thus, the null hypothesis is accepted. Therefore, the different audio frequencies do not affect the Weight variable.
Does Dominance change with different audio frequencies?
The data on Dominance variable are measured based on three different frequency levels and are recorded for two different trials (repetition 1 and 2). The change can be checked by performing a one-way ANOVA test on Dominance variable, averaging the two trials for each frequency level.
Anova: Single Factor |
||||||
SUMMARY |
||||||
Groups |
Count |
Sum |
Average |
Variance |
||
Dominance_HighFrequency |
22 |
132.5 |
6.022727 |
1.511364 |
||
Dominance_lowFrequency |
22 |
116.5 |
5.295455 |
2.301407 |
||
Dominance_Control |
22 |
116 |
5.272727 |
1.969697 |
||
ANOVA |
||||||
Source of Variation |
SS |
df |
MS |
F |
P-value |
F crit |
Between Groups |
8.007576 |
2 |
4.003788 |
2.077204 |
0.133777 |
3.142809 |
Within Groups |
121.4318 |
63 |
1.927489 |
|||
Total |
129.4394 |
65 |
In the above table, the p-value = 0.133777 > level of significance = 0.05. Thus, the null hypothesis is accepted and it can be concluded that the Dominance variable has equal means for all audio frequencies. Therefore, different audio frequencies have no effect on the Dominance variable.
Does Straightness change with different audio frequencies?
Here a one-way ANOVA is plotted where the treatment is Straightness and the classes of Straightness variable are based on high frequency, low frequency, and control. The data for each frequency has been recorded for two trials. Thus, an average value of the data of repetition 1 and the data of repetition 2 is taken. After that, the One- way ANOVA is carried out to check the difference in the means for the change in the audio frequencies. Here, the null hypothesis is that there is no difference in the means among the classes of the treatment variable [3]. The ANOVA table is shown below.
Anova: Single Factor |
||||||
SUMMARY |
||||||
Groups |
Count |
Sum |
Average |
Variance |
||
Questionnaire_Straightness_HighFrequency |
22 |
117.5 |
5.340909 |
1.390152 |
||
Questionnaire_Straightness_LowFrequency |
22 |
104.5 |
4.75 |
1.565476 |
||
Questionnaire_Straightness_Control |
22 |
107.5 |
4.886364 |
1.665043 |
||
ANOVA |
||||||
Source of Variation |
SS |
df |
MS |
F |
P-value |
F crit |
Between Groups |
4.212121 |
2 |
2.106061 |
1.367373 |
0.262232 |
3.142809 |
Within Groups |
97.03409 |
63 |
1.540224 |
|||
Total |
101.2462 |
65 |
From the ANOVA table, it can be seen that, at 95% confidence level, α=level of significance = 0.05 < the p-value = 0.262232. Thus, the null hypothesis is accepted and it can be concluded that there is no change in the mean value of the Straightness variable due to frequency and the arousal changes are due to chances only.
Conclusion
The above report and presentation display the results of an empirical quantitative footwear user study is analysed here based on the response of 22 participants. This study aims to find out how the behaviour of the user changes due to the change in the audio interface of footwear on high, low, and controlled frequencies. In addition, the change of the variables that are taken into account, are being affected by the change of the audio frequencies. Finally, it is concluded that the high frequency can change the behaviour of the user most effectively.
References
[1] G. Cumming, Understanding the new statistics. New York: Routledge, 2012.
[2] J. Hox, M. Moerbeek and R. Van de Schoot, Multilevel Analysis.
[3] “Quantitative Data Analysis”, Google Books, 2018. [Online]. Available: https://books.google.co.in/books?hl=en&lr=&id=c-fOAgAAQBAJ&oi=fnd&pg=PR5&dq=quantitative+analysis+in+statistics&ots=gR2k6yr_LG&sig=RrlmJvyaBJOFA1MZWAney7EIIuM&redir_esc=y#v=onepage&q=quantitative%20analysis%20in%20statistics&f=false. [Accessed: 18- May- 2018].
[4] J. Muller, W. Pet, E. Pet-Reatsch, R. Servaas, F. Ansems, D. Schwander, G. Firer, H. Lothaller and P. Endler, “Repeatability of Measurements of Galvanic Skin Response – A Pilot Study”, Ww.inter-uni.net, 2013. [Online]. Available: https://ww.inter-uni.net/static/download/publication/komplementaer/p_Muller_et_al_+OCMJ_2013+_Repeatability_Galvanic_Skin_Response.pdf. [Accessed: 18- May- 2018].
[5] D. Treiman, “Quantitative Data Analysis”, Google Books, 2018. [Online]. Available: https://books.google.co.in/books?hl=en&lr=&id=c-fOAgAAQBAJ&oi=fnd&pg=PR5&dq=quantitative+analysis+in+statistics&ots=gR2k6yr_LG&sig=RrlmJvyaBJOFA1MZWAney7EIIuM&redir_esc=y#v=onepage&q=quantitative%20analysis%20in%20statistics&f=false. [Accessed: 18- May- 2018].
[6] J. Lewis, “Usability: Lessons Learned … and Yet to Be Learned”, International Journal of Human-Computer Interaction, vol. 30, no. 9, pp. 663-684, 2014.
[7]A. Hayes and K. Preacher, “Statistical mediation analysis with a multicategorical independent variable”, 2014. [Online]. Available: https://pdfs.semanticscholar.org/dd3f/1055005bda517a081c4949ad9a348b7127b3.pdf. [Accessed: 18- May- 2018].
[8] M. Triola, Elementary Statistics Using Excel. Pearson, 2013.