Research Methodology
Recently, eCommerce captured the attention of whole world. Online shopping is one of the main part of eCommerce. As the eCommerce business increased exponentially it brings new challenges to the service provider. Business competition and customer satisfaction are the important challenges for service provider.
Service provider used the different tools, techniques and strategies to attract the customers. Business is all about the attraction, quality and service provided by the service provider. We have data of 1180 Cloths (Jacket, Jeans and Suit). We considered the following attributes / variables as Product Name, Product Price (in $), Sale Price (in $), Profit (in $), Number of customers who bought the product, Shipping Type (Free or Paid), Customer Type (New or Existing), Region (QLD, WA, VIC, TAS, SA), Product Material (Wool and Cotton) and Product Colour (Black, Blue, Pink, Red and White).
We observed that company gaining about 7.95% profit overall. We can observed that there is no comparative difference in the different attributes. In the region, WA region giving the most profit percentage as 8.23% and QLD region generate 7.75% lowest among the all-region. We observed that averagely there is 11.81 customer for each products with standard deviation 3.82.
We observed that only shipping type and material have significant association at 5% level of significance and customer type and material have significant association at 10% level of significance whereas all other pairs are not associated. Average new customers are more than the existing customers. Mean number of customers for the products which are shipped freely is significantly more than products which has paid shipping. We can say that wool material products are more preferred than cotton as the number of customers for wool material products are significantly more than cotton material product. We conclude that there is significant difference between mean numbers of customers in different region and there is no significant differences between mean numbers of customers according to colour. We can see that QLD has most number of customer compared to the other region.
From the correlation analysis, we can say that product price and number of customer are positively related with each other. Number of customers is negatively correlated with profit and product price. Regression analysis suggest that there is significant relation between total profit and number of customers. We also observed R2 as 0.74 which suggest that fitting is good. Slope of number of customers suggest that every customer gives on an average $2.3592 profit to the company. We have also given recommendation from the analysis and plan for it.
Analytical Findings
Table of Contents
Sr. No. |
Topic |
Page No. |
1 |
List of Abbreviations and assumptions made |
4 |
2 |
Introduction – What is the problem? |
5 |
3 |
Research Methodology |
6 |
4 |
Analytical Findings |
7 |
5 |
Recommendations to the company |
14 |
6 |
An implementation plan based on the recommendations you have provided |
14 |
7 |
Conclusion |
15 |
8 |
List of References |
16 |
9 |
Appendix |
18 |
Max : Maximum
Min : Minimum
NSW : New South Wales
QLD : Queensland
SA : South Australia
TAS : Tasmania
VIC : Victoria
WA : Western Australia
Recently, eCommerce captured the attention of whole world. Online shopping is one of the main part of eCommerce. As the eCommerce business increased exponentially it brings new challenges to the service provider. Business competition and customer satisfaction are the important challenges for service provider.
Service provider used the different tools, techniques and strategies to attract the customers. Business is all about the attraction, quality and service provided by the service provider.
We have data of 1180 Cloths (Jacket, Jeans and Suit). We considered the following attributes / variables as
i) Product Name
ii) Product Price (in $)
iii) Sale Price (in $)
iv) Profit (in $)
v) Number of customers who bought the product
vi) Shipping Type (Free or Paid)
vii) Customer Type (New or Existing)
viii)Region (QLD, WA, VIC, TAS, SA)
ix) Product Material (Wool and Cotton)
x) Product Colour (Black, Blue, Pink, Red and White)
We define following variables for our analysis from the above variables
Total Monthly sale amount (in $) = Sale Price (in $) × Number of customers
Total monthly profit (in $) = Profit (in $) × Number of customers
We are interested to know the following things
- Profit analysis by shipping type, customer type, region, material and colour.
- Whether there is any association between shipping type, customer type, region, material and colour.
- Whether the number of customers is significantly different shipping type, customer type, region, material and colour.
- Correlation analysis of variables
- Regression analysis for total monthly sales
Data analysis is incomplete without use of statistical tools and techniques. Selection of proper tools and techniques is the important aspect of the analysis. We did the profit analysis for shipping type, customer type, region, material and colour by summarising the total sale amount and total profit. We test the association between different attributes shipping type, customer type, region, material and colour by carrying the chi-squared test for association. We used two sample t-test and one way ANOVA for testing the mean of number of customers for shipping type, customer type, region, material and colour. We carried the correlation analysis for variables product price, profit and number of customers. We used regression analysis for predicting total sale. We run the python code given in appendix and formatted output is reported.
Profit Analysis
In Table 1, we have presented the profit analysis for shipping type, customer type, region, material and colour. We have reported the total sales amount, total profit and profit percentage for shipping type, customer type, region, material and colour.
Table 1: Profit analysis according to different attributes
Attributes |
Levels |
Total Sales |
Total Profit |
Profit % |
Shipping Type |
Free |
143591.9 |
11427.7 |
7.96% |
Paid |
272070.3 |
21623.5 |
7.95% |
|
Customer Type |
Existing |
194472.8 |
15667.8 |
8.06% |
New |
221189.4 |
17383.5 |
7.86% |
|
Region |
NSW |
106339.2 |
8582.3 |
8.07% |
QLD |
45850.0 |
3552.9 |
7.75% |
|
SA |
62913.1 |
4927.5 |
7.83% |
|
TAS |
79070.6 |
6136.9 |
7.76% |
|
VIC |
43351.3 |
3422.1 |
7.89% |
|
WA |
78138.0 |
6429.6 |
8.23% |
|
Material |
Cotton |
198617.0 |
15895.1 |
8.00% |
Wool |
217045.2 |
17156.1 |
7.90% |
|
Colour |
Black |
83913.8 |
6687.3 |
7.97% |
Blue |
82727.4 |
6577.5 |
7.95% |
|
Pink |
86665.8 |
6801.6 |
7.85% |
|
Red |
80847.7 |
6450.1 |
7.98% |
|
White |
81507.4 |
6534.7 |
8.02% |
|
Total |
415662.1 |
33051.2 |
7.95% |
Recommendations to the Company
We observed that company gaining about 7.95% profit overall. We can observed that there is no comparative difference in the different attributes. In the region, WA region giving the most profit percentage as 8.23% and QLD region generate 7.75% lowest among the all-region.
Total sale and profit is mainly depend on the number of customers. Table 2 represent the summary statistics for shipping type, customer type, region, material and colour. In the summary statistics, we have reported size, mean, standard deviation, minimum and maximum for shipping type, customer type, region, material and colour.
Table 2: Summary statistics for number of customers
Attributes |
Levels |
Size |
Mean |
SD |
Min |
Max |
Shipping Type |
Free |
349 |
13.88 |
3.90 |
5 |
32 |
Paid |
831 |
10.94 |
3.44 |
3 |
23 |
|
Customer Type |
Existing |
544 |
12.08 |
3.97 |
3 |
32 |
New |
636 |
11.58 |
3.68 |
3 |
26 |
|
Region |
NSW |
308 |
11.64 |
3.73 |
3 |
23 |
QLD |
106 |
14.49 |
4.46 |
4 |
32 |
|
SA |
177 |
11.88 |
3.30 |
5 |
23 |
|
TAS |
236 |
11.18 |
3.88 |
3 |
29 |
|
VIC |
122 |
11.89 |
3.90 |
3 |
23 |
|
WA |
231 |
11.34 |
3.41 |
4 |
24 |
|
Material |
Cotton |
574 |
10.98 |
3.81 |
3 |
26 |
Wool |
606 |
12.59 |
3.68 |
3 |
32 |
|
Colour |
Black |
238 |
11.92 |
3.95 |
3 |
24 |
Blue |
226 |
12.19 |
3.87 |
5 |
26 |
|
Pink |
255 |
11.30 |
3.43 |
3 |
19 |
|
Red |
233 |
11.61 |
3.57 |
3 |
21 |
|
White |
228 |
12.07 |
4.25 |
3 |
32 |
|
Total |
1180 |
11.81 |
3.82 |
3 |
32 |
We can observed that averagely there is 11.81 customer for each products with standard deviation 3.82. We observed that average number of customers
i) for free shipping is more than paid shipping.
ii) for QLD region is more than other.
iii) for wool material is more than cotton.
iv) for blue colour is more than other.
Table 3 shows the chi-square statistic and p-value for chi-square test of testing association for shipping type, customer type, region, material and colour. We have null hypothesis that there is no significant association between two attributes and alternative hypothesis is there is significant association between two attributes. We test the significant association between following pair of attributes
- shipping type and customer type.
- shipping type and region.
- shipping type and material.
- shipping type and colour.
- customer type and region.
- customer type and material.
- customer type and colour.
- region and material.
- region and colour.
- material and colour.
Table 3: Chi-squared test for association
Pairs of attributes |
Chi-Square Statistic |
P-Value |
shipping type and customer type. |
0.248 |
0.618 |
shipping type and region. |
7.598 |
0.180 |
shipping type and material. |
5.737 |
0.017 |
shipping type and colour. |
4.889 |
0.299 |
customer type and region. |
4.364 |
0.489 |
customer type and material. |
3.333 |
0.068 |
customer type and colour. |
3.133 |
0.536 |
region and material. |
6.745 |
0.240 |
region and colour. |
23.598 |
0.260 |
material and colour. |
0.922 |
0.911 |
We observed that only shipping type and material have significant association at 5% level of significance and customer type and material have significant association at 10% level of significance whereas all other pairs are not associated.
In this section, we carried the two sample t test for testing the equality of mean of numbers of customer for shipping type (free and paid), customer type (new and existing) and material (wool and cotton). We test the following null and alternative hypothesis
Shipping Type
Null Hypothesis: There is no significant difference between the mean of numbers of customers for free shipping and paid shipping.
Alternative Hypothesis: There is significant difference between the mean of numbers of customers for free shipping and paid shipping.
Customer Type
Null Hypothesis: There is no significant difference between the mean of numbers of customers that are new and existing.
Alternative Hypothesis: There is significant difference between the mean of numbers of customers that are new and existing..
Material
Null Hypothesis: There is no significant difference between the mean of numbers of customers for wool and cotton material product.
Alternative Hypothesis: There is significant difference between the mean of numbers of customers for wool and cotton material product.
Implementation Plan
In Table 4, we presented the results of two sample t test for shipping type, customer type and material. Table 4 includes test statistics, degrees of freedom and p value.
Table 4: Two sample independent test for shipping type, customer type and material
Attributes |
Levels |
Test Statistic |
P value |
Shipping Type |
Free and Paid |
12.25 |
0.000 |
Customer Type |
New and Existing |
2.22 |
0.026 |
Material |
Wool and Cotton |
-7.39 |
0.000 |
From Table 4, we can see that P-value of shipping type, customer type and material is less than 5% suggest that there is significant difference between the mean number of customers for this attributes. Average new customers are more than the existing customers. Mean number of customers for the products which are shipped freely is significantly more than products which has paid shipping. We can say that wool material products are more preferred than cotton as the number of customers for wool material products are significantly more than cotton material product.
We test whether there is any significant difference between means number of customers for different level of
i) Region (QLD, WA, VIC, TAS, SA)
ii) Product Colour (Black, Blue, Pink, Red and White)
We test the following null and alternative hypothesis
Region (QLD, WA, VIC, TAS and SA)
Null Hypothesis: There is no significant difference between mean numbers of customer for different region.
Alternative Hypothesis: At least one of the region has different mean of numbers of customers.
Product Colour (Black, Blue, Pink, Red and White)
Null Hypothesis: There is no significant difference between mean numbers of customer for different colours.
Alternative Hypothesis: At least one of the colour has different mean of numbers of customers.
Table 5 shows the output of one way ANOVA for region and colour
Table 5: Output of one way ANOVA for region and colour
Attributes |
Level |
F Statistic |
P Value |
Region |
QLD, WA, VIC, TAS and SA |
13.21 |
0.000 |
Colour |
Black, Blue, Pink, Red and White |
2.17 |
0.070 |
From Table 5, we conclude that there is significant difference between mean number of customers in different region and there is no significant differences between mean number of customers according to colour. We can see that QLD has most number of customer compared to the other region.
In this section we calculate the correlation coefficient for studying the association between the variables like Product Price, Profit and Number of customers. Table 6 represents the Pearson’s correlation matrix for the Product Price, Profit and Number of customers
Table 6: Pearson’s correlation coefficient a for Product Price, Profit and Number of customers
Product Price |
Profit |
Number of customers |
|
Product Price |
1 |
0.161 |
-0.100 |
Profit |
0.161 |
1 |
-0.045 |
Number of customers |
-0.100 |
-0.045 |
1 |
From the correlation analysis, we can say that product price and number of customer are positively related with each other. Number of customers is negatively correlated with profit and product price.
Conclusion
Regression Analysis:
In this section, we try to fit the regression model to the total profit. We used number of customers as a predictor variable for total profit. Following Table 7 shows the output of regression analysis.
Table 7: Output of regression analysis
Regression Statistics |
||||||
Multiple R |
0.859341 |
|||||
R Square |
0.738467 |
|||||
Adjusted R Square |
0.738245 |
|||||
Standard Error |
5.371418 |
|||||
Observations |
1180 |
|||||
ANOVA |
||||||
df |
SS |
MS |
F |
|||
Regression |
1 |
95968.32 |
95968.32 |
3326.212 |
0 |
|
Residual |
1178 |
33987.81 |
28.85213 |
|||
Total |
1179 |
129956.1 |
||||
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
|
Intercept |
0.154577 |
0.50766 |
0.304488 |
0.76081 |
-0.84144 |
1.150595 |
No. of Customers |
2.359233 |
0.040907 |
57.67332 |
0 |
2.278974 |
2.439491 |
We observed that P value (Significance F) < 0.05, suggest that there is significant relation between total profit and number of customers. We also observed R2 as 0.74 which suggest that fitting is good. Slope of number of customers suggest that every customer gives on an average 2.3592 $ profit to the company.
- We observed that there is no comparative differences in profit percentages from profit analysis. But there is difference in total sales and total profit suggest that company should attract the more number of customers for increase the total profit.
- As the mean number of new customers are significantly different than existing customer, company should attract the existing customer.
- As the mean number of customers who bought the free shipping product is significantly more than product on which shipping charges are levied suggesting that company should give the shipping free of charge so that number of customers increases.
- We observed that wool products are more preferred than cotton products suggesting that company avail the wool products to every customer which demands.
- We observed that mean number of customers are more in QLD region than other suggesting that company should attract the customers from other region.
An implementation plan based on the recommendations you have provided
i. To attract the more customer, company should focused on the quality and service.
2.Company can avail the most product at free shipping by appointing more staff for shipping department.
3.Company should keep the wool product in stock so that every customer got there desired products.
4.Company should use advertising boards, online advertisement to attract the customers.
5.Company can attract the customer by offering them some offers like free delivery, cash back, etc.
Conclusions
We observed that company gaining about 7.95% profit overall. We can observed that there is no comparative difference in the different attributes. In the region, WA region giving the most profit percentage as 8.23% and QLD region generate 7.75% lowest among the all-region. We observed that averagely there is 11.81 customer for each products with standard deviation 3.82.
We observed that only shipping type and material have significant association at 5% level of significance and customer type and material have significant association at 10% level of significance whereas all other pairs are not associated. Average new customers are more than the existing customers. Mean number of customers for the products which are shipped freely is significantly more than products which has paid shipping. We can say that wool material products are more preferred than cotton as the number of customers for wool material products are significantly more than cotton material product. We conclude that there is significant difference between mean numbers of customers in different region and there is no significant differences between mean numbers of customers according to colour. We can see that QLD has most number of customer compared to the other region.
From the correlation analysis, we can say that product price and number of customer are positively related with each other. Number of customers is negatively correlated with profit and product price. Regression analysis suggest that there is significant relation between total profit and number of customers. We also observed R2 as 0.74 which suggest that fitting is good. Slope of number of customers suggest that every customer gives on an average $2.3592 profit to the company. We have also given recommendation from the analysis and plan for it.
References
Berenson, M., Levine, D., Szabat, K.A. and Krehbiel, T.C., (2012). Basic business statistics: Concepts and applications. Pearson higher education AU.
Bickel, P.J. and Doksum, K.A., (2015). Mathematical statistics: basic ideas and selected topics, volume I (Vol. 117). CRC Press.
Black, K., (2009). Business statistics: Contemporary decision making. John Wiley & Sons.
Casella, G. and Berger, R.L., (2002). Statistical inference (Vol. 2). Pacific Grove, CA: Duxbury.
DeGroot, M.H. and Schervish, M.J., (2012). Probability and statistics. Pearson Education.
Groebner, D.F., Shannon, P.W., Fry, P.C. and Smith, K.D., (2008). Business statistics. Pearson Education.
Grus, J., (2015). Data science from scratch: first principles with python. ” O’Reilly Media, Inc.”.
Hodges Jr, J.L. and Lehmann, E.L., (2005). Basic concepts of probability and statistics. Society for Industrial and Applied Mathematics.
Karp, K., Python for Data Science. Master of Science in Big Data, p.37.
Kvanli, A.H., Pavur, R.J. and Guynes, C.S., (2000). Introduction to business statistics. Cincinnati, OH: South-Western.
McKinney, W., (2012). Python for data analysis: Data wrangling with Pandas, NumPy, and IPython. ” O’Reilly Media, Inc.”.
Mendenhall, W. and Sincich, T., (1993). A second course in business statistics: Regression analysis. San Francisco: Dellen.
Papoulis, A., (1990). Probability & statistics (Vol. 2). Englewood Cliffs: Prentice-Hall.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V. and Vanderplas, J., (2011). Scikit-learn: Machine learning in Python. Journal of machine learning research, 12(Oct), pp.2825-2830.
Pillers Dobler, Carolyn. “Mathematical statistics: Basic ideas and selected topics.” (2002): 332-332.
Ross, S.M., (2014). Introduction to probability and statistics for engineers and scientists. Academic Press.
Schutt, R. and O’Neil, C., (2013). Doing data science: Straight talk from the frontline. ” O’Reilly Media, Inc.”.