Importance of Data Analytics in E-commerce
For any e-commerce store it, it collects and stores numerous types of data about the purchases by the customers who visit and purchases products from the online stores. The data collected from the visitors are mainly unstructured in nature (Van Donselaar et al., 2016). Therefore, in order to extract meaning full information from the unstructured data it is important to use the different data analysis techniques which is helpful recognizing the patterns among the data elements in the specific data set (Akter and Wamba 2016). In addition to that use of the different statistical model the future trends can be predicted so that business organization can recognize and exploit the business opportunities in order to gain advantage against the competitors.
The following report contributes to the exploratory analysis of a given data set, determining a predictive model as well as prediction of the sales of the products depending on the trend recognizable in the dataset.
For this data analytics projects at first the collected data is checked and processed in order to make the data clean and without inconsistency in it by finding existence of the null value in the dataset so that the analysis methods can be applied on the dataset.
Typically, this analysis is about demonstrating the data in an approach to clarify the reaction or influence of it on another factor of the dataset. The goals of this approach is to predict the trend of the change in the dataset with the help of the selected features or factors (Montgomery, Jennings and Kulahci 2015). Typically, in factual trial plans, a test is created and information is recovered thus. This permits to produce information in a way that can be utilized by a factual model, where certain presumptions hold, for example, degree of freedom of the selected variables and randomization.
Different type of customers on an e-commerce site engages with it numerous ways. Thus use of the Predictive analysis can be very helpful for the e-commerce business organization to look at the variables and relation between them in order to produce the preferred engagement from customers of different segments and geographic regions. This can be helpful for the organizations to encourage customers to sign up for newsletter form the website, use of promotional offers to get offers on the products or other form of engagement (Omondi and Mbugua 2017).
In this data analytics project the main objective are to find out the factors that are influencing the sales of the products and recognize the trends as well as patterns in the given data set so that the customer behaviour can be predicted to meet the product demand in the regions in which the organization is doing business (Montgomery, Jennings and Kulahcin 2015). In order to analyse the data set the data analysis techniques are used association analysis between the different data elements of the dataset, regression analysis in order to predict monthly sales while considering the most influential factor.
Exploring the Sales Data
Before starting the analytics of dataset, existence of the null values is checked. Using the python codes, the null values are checked and it is found that, there are no null values in the selected dataset. After this the statistical data about the e-commerce data is investigated. For the dataset, it is given by the following table,
Prodcut_Price($) |
Monthly_Sale($) |
Number_ofCustomers |
|
count |
1399.00 |
1399.00 |
1399.00 |
mean |
593.54 |
3902.64 |
27.44 |
std |
223.51 |
1529.53 |
10.58 |
min |
220.00 |
1200.00 |
10.00 |
max |
1000.00 |
6484.00 |
45.00 |
For the e-commerce dataset, it is found that, there are 1399 rows of data in the dataset and the columns Prodcut_Price($), Prodcut_Price($) and Number_ofCustomers consist of numerical data. The minimum, mean and maximum value for the product price is given by, $220, $ 593.54 and $1000.
In the similar manner the values for monthly sales it is given by $1200, $3902.64 and $6484.more over for the number of customers it is given by 10, 27.44 and 45.
For the different other columns in the dataset we found that there are 5 different types of Product Name are in the dataset, 6 Geographic Regions from which the orders are placed and two types of shipping and customers in the dataset.
In the analysis of the data the maximum selling products is also investigated depending on the dataset values. When the data is plotted using the python modules and code following bar chart,
It is evident from the above chart that the digital camera is the mostly sold product from the e-commerce store. The second highest selling product is the Kindle Reader.
When the processed data about the number orders from each region by the customers is plotted using bar graph, it produces the following bar graph is generated which shows the total number of orders from different regions,
Further analysis produces the following table,
Geographic_Region |
Prodcut_Price($) |
Monthly_Sale($) |
Number_ofCusrtomers |
ACT |
131025 |
846486 |
6177 |
HIMI |
141997 |
890572 |
6359 |
JBT |
136209 |
935142 |
6223 |
NSW |
141084 |
937823 |
6476 |
SA |
135278 |
919701 |
6568 |
WA |
144782 |
930083 |
6586 |
From the above table it is evident that, there are lesser number of customers from the JBT geographic region among all the regions in which the business operates. It is also noticeable that, with more customer counts in ACT and HIMI region, these regions are generating lesser revenue compared to the JBT region. Therefore, in order to improve business performance, it is suggested to attract new customers in the JBT region.
Factors Influencing Sales
When the data set is investigated to determine the customers according to their shipping preferences, the following result is found.
Shpping _Type |
Monthly_Sale($) |
Number_ofCusrtomers |
CustomerPaid |
2761876 |
18879 |
FREE |
2697931 |
19510 |
The above table clearly shows that, the number of customers preferring free shipping is larger than the number of customers preferring the customer paid shipping of the products. As observed in the above table, the number of customers paid orders are 18879 where as in case of free shipped orders the number is much higher with the value 19510.
In order to prioritize the products’ according to the sales of the products in the different regions, following results are found,
For ACT region,
Product_Name |
Monthly_Sale($) |
Number_ofCusrtomers |
AMAZON ECHO |
134565 |
1050 |
APPLENOTEBOOK |
204084 |
1243 |
DIGITALCAMERA |
200958 |
1540 |
135344 |
1041 |
|
PLAYSTATION |
171535 |
1303 |
In order to improve revenue from the business from the ACT region, it is suggested to prioritize the sales of the products “AMAZON ECHO” and “KINDLE READER” as for both of the products are generating lesser revenue when compared to other products.
For the HIMI region,
Product_Name |
Monthly_Sale($) |
Number_ofCusrtomers |
AMAZON ECHO |
170524 |
1276 |
APPLENOTEBOOK |
141403 |
1053 |
DIGITALCAMERA |
199753 |
1311 |
KINDLE READER |
204513 |
1489 |
PLAYSTATION |
174379 |
1230 |
For the HIMI region, the products “AMAZON ECHO” and “APPLE NOTEBOOK” should be prioritized.
For JBT region,
Product_Name |
Monthly_Sale($) |
Number_ofCusrtomers |
AMAZON ECHO |
169681 |
1111 |
APPLENOTEBOOK |
171258 |
1149 |
DIGITALCAMERA |
164612 |
1323 |
KINDLE READER |
223826 |
1395 |
PLAYSTATION |
205765 |
1245 |
In JBT region, the “AMAZON ECHO” and “DIGITALCAMERA” should be prioritized.
For NSW region,
Product_Name |
Monthly_Sale($) |
Number_ofCusrtomers |
AMAZON ECHO |
179187 |
1097 |
APPLENOTEBOOK |
191564 |
1224 |
DIGITALCAMERA |
207374 |
1574 |
KINDLE READER |
190851 |
1343 |
PLAYSTATION |
168847 |
1238 |
In NSW region, the PLAYSTATION and the AMAZON ECHO should be prioritized in order to improve revenue from the region.
For the SA region,
Product_Name |
Monthly_Sale($) |
Number_ofCusrtomers |
AMAZON ECHO |
209083 |
1704 |
APPLENOTEBOOK |
211006 |
1373 |
DIGITALCAMERA |
169897 |
1127 |
KINDLE READER |
143054 |
1078 |
PLAYSTATION |
186661 |
1286 |
In case of SA, KINDLE READER and DIGITALCAMERA should be prioritized. Here it is also evident that, the number of AMAZON ECHO and APPLENOTEBOOK are almost same. Therefore, it can be stated that, most of the time both of the products are bought together.
For the WA region,
Product_Name |
Monthly_Sale($) |
Number_ofCusrtomers |
AMAZON ECHO |
140574 |
895 |
APPLENOTEBOOK |
183840 |
1545 |
DIGITALCAMERA |
246412 |
1541 |
KINDLE READER |
182000 |
1261 |
PLAYSTATION |
177257 |
1344 |
In order to improve the revenue from WA region, AMAZON ECHO and PLAYSTATION products needs to prioritized.
In the analysis of the data and set a prediction model using the Linear regression model in order to predict the monthly sales. In order to find the best fit model to predict the monthly sales of the products and revenue from it.
From the above figure it is evident that the products having price between the $500 and $600 can be helpful in improving the monthly sales of the products.
From the above analysis of the data set it is clearly visible that, from the ACT have the minimum number of customers.
Acquiring new customers through promotional offers: Therefore, it is suggested to use promotional offers for this region so that new customers can be acquired from this region.
Developing a Predictive Model
Personalized recommendations for the customers: There are numerous products that are bought together by the customers thus use of this products to influence other customers so that they can buy the set of products leading to the improved business as well as revenue from the regions like ACT and HIMI region.
Use of promotional campaigns: It is suggested to carry out promotional campaigns for the products that are selling slowly and close out the product that are slow moving from the warehouses.
Free shipping ono all the Products: In addition to that, the organization should ship all the products freely in the different regions listed in the data set. This will help in attracting the customers to buy products again and again. The reason behind this can be as stated as the result of segmentation of the customers depending on the nature of the shipping.
Using feedback by the customers for product and services for sentiment analysis: The organization should try to get feedback from the customers about the services and products it offers. One of the most important aspect of collecting the feedbacks for the products is it can be used for sentiment analysis about the services and products.
The Sentiment analysis is a technique to find out the polarity of the opinions about services and products. Using text analytics along with the machine learning techniques, negative and positive reviews can be categorised in order to gather useful insights from the collected data related to the customer satisfaction which indicates the overall performance of the e-commerce site for a given segment of product or region. This kind of processed data is very useful resource for e-commerce organizations to target and acquire customer while improving the performance.
Increasing sales using product recommendations: From the result of the above analysis it is evident that., there are products that are bought together by the customers. Analysis of this sales of the product can be used by the organization to provide product recommendations that will be based on the product browsing history as well as purchasing behaviour.
Conclusion
With the ever increasing amount data is streaming from the e-commerce websites about the customers and their buying behaviour. This kind of data is getting more complicated as the data is continuously generating from many devices such as smart phones, personal computers, as well as from the social media.
Use of the prediction model can be beneficial for e-commerce organizations as it will help the organization to explore patterns among the different elements of dataset relate to the customers buying behaviours of a region. With the analysis of the dataset the organization would be able to find out required changes in the business strategies for a given region in which the organization operate its business.
From the analysis it is evident that there are regions in which the organization need to acquire new customers as well as retain the exiting one. For this the above mentioned recommendations can be utilized so that the business performance of business can be improved.
References
Akter, S. and Wamba, S.F., 2016. Big data analytics in E-commerce: a systematic review and agenda for future research. Electronic Markets, 26(2), pp.173-194.
Baardman, L., Levin, I., Perakis, G. and Singhvi, D., 2017. Leveraging Comparables for New Product Sales Forecasting.
Montgomery, D.C., Jennings, C.L. and Kulahci, M., 2015. Introduction to time series analysis and forecasting. John Wiley & Sons.
Omondi, A.O. and Mbugua, A.W., 2017. An Application of association rule learning in recommender systems for e-Commerce and its effect on marketing.
Van Donselaar, K.H., Peters, J., De Jong, A. and Broekmeulen, R.A.C.M., 2016. Analysis and forecasting of demand during promotions for perishable items. International Journal of Production Economics, 172, pp.65-75.