Exploratory dataset analysis
In the ever increasing completive online market it is important to recognize and exploit every possible business opportunity. In order to do this, it is, important to recognize the market demand of the products and other patterns from the business data which is in most of the cases unstructured.
This paper contributes to the analysis of a given dataset for an e-commerce business about the products in different regions. The report includes exploratory analysis of the given dataset to recognize different patterns in the data set.
for this task the main objective is to predict the monthly sales for give data set of an E-commerce organization. In addition to that impact of different factors on sales of the products are also explored in the data mining method.
There are numerous data analytical methods that can be used for exploring the features of the considered dataset. These includes the methods like Descriptive Analysis, Regression Analysis, Predictive analysis and decision trees. For this data set analysis, we have selected the descriptive, and regression analysis for the prediction of the sales and descriptive analysis for the exploration of the different patterns in the dataset. In order to predict the monthly sales of the products the linear regression model is used. We have used the linear regression models as; linear regression is being helpful in finding out relationship or pattern between different variables in a given data set which are not previously explored.
As an example, it can be said that, analysis of the monthly sales data, geographic region, customer counts from a region can impact the total sales of the product and create patterns in the complete business. Patterns may be visible as the enhanced demand for certain type of product or increase in sales in the certain geographic region.
|
Prodcut_Price($) |
Monthly_Sale($) |
Customer_Number |
count |
1201.00 |
1201.00 |
1201.00 |
mean |
16.85 |
86.31 |
21.25 |
std |
6.09 |
34.75 |
8.34 |
min |
7.00 |
25.00 |
7.00 |
25% |
12.00 |
57.00 |
14.00 |
50% |
17.00 |
86.00 |
21.00 |
75% |
22.00 |
116.00 |
28.00 |
max |
27.00 |
145.00 |
35.00 |
Table 1: Statistical data about the dataset
For the Given dataset we got the above statistical data which shows that there are 1201 records in the selected dataset. The minimum value, mean value and maximum value for the Product price is given by, $7, $27 and $16.85. For Monthly sales it is given by, $25, $145 and $86.31. Moreover, for the Customer_Number these values are, 7, 35 and 21.25.
When the number of the customers are plotted according to the region, the following plot is generated,
Here, it is evident that, most of the customers are from NSW geographic region and the lowest number of customers for the E-commerce company is from VIC geographic region. With further investigation the number of customers from the different regions are provided by 271 customers from NSW region, 238 from QLD, 232 from SA, 229 from VIC and 231 from WA region.
Prediction of monthly sales using Linear Regression model
In order to find out the impact of the shipping type on the customer purchase behaviour the number of customer paid and free delivery orders are plotted which provided the bar chart as provided in the following figure,
From the above chart it can be interpreted that, maximum number of customer paid and free orders are form NSW region. On the other hand, minimum number of customer paid and free orders are from the SA and VIC region. Detailed statistics is provided in the following table,
At this stage the sale of different products in the different regions are compared. When the total value of the sold products is calculated for each region we found that in the NSW region, Notebook is the mostly ordered product from the ecommerce site.
For the SA region, butter is mostly sold product. More over for the regions VIC, QLD, WA regions mostly sold products is Toy car, Slipper and Notebook.
Products to be prioritized for increase in sale
In order to prioritize the sale of the products for improved revenue the revenue from the different products needs to be investigated which produces following result,
Product_Name |
Prodcut_Price($) |
Monthly_Sale($) |
Customer_Number |
Bread |
3762 |
20679 |
4704 |
Butter |
4337 |
22696 |
5381 |
NoteBook |
4376 |
21579 |
5560 |
Slipper |
4117 |
20170 |
5284 |
ToyCar |
3653 |
18535 |
4599 |
Table 2: comparison of sales of products
It is evident form the above table that the monthly sales of the Toy car are lowest among all the products in the given dataset and results in total revenue of $18535. In addition to that and slippers with total worth, $20170.
Therefore, it is suggested to use promotional offers and marketing campaigns for the Toycar and Slippers in order to improve the sales of the products.
Through the analysis of the dataset of the e-commerce number of orders with free and customer paid shipping is provided by,
Shpping _Type |
Geographic_Region |
Monthly_Sale($) |
Customer_Number |
CustomerPaid |
NSW |
10861 |
2644 |
QLD |
10208 |
2596 |
|
SA |
9162 |
2134 |
|
VIC |
10286 |
2555 |
|
WA |
9258 |
2351 |
|
FREE |
NSW |
12606 |
2976 |
QLD |
10596 |
2710 |
|
SA |
11066 |
2729 |
|
VIC |
9169 |
2259 |
|
WA |
10447 |
2574 |
Table 3: Customer with different shipping type
Total number of customers with customer paid shipping is 2644 and free shipping is 2976. Whereas for the minimum number, of orders with Customer paid shipping and free shipping is 2134 from SA region and 2259 from the VIC region.
From the above table, it can be stated that except the VIC region, other regions have lesser number of Customer with Customer paid shipping order compared to the orders with free shipping facility.
Most likely geographic region to target new customers
In order to target new customers, it is important to find out which of the regions consist lesser number of customers. In order to find out this we tried to get the total number of customers from different regions which resulted in the following table,
Geographic_Region |
Monthly_Sale($) |
Customer_Number |
NSW |
23467 |
5620 |
QLD |
20804 |
5306 |
SA |
20228 |
4863 |
VIC |
19455 |
4814 |
WA |
19705 |
4925 |
Recommendations for improved business performance
Table 4: Region wise sales
From the above table it is clearly visible that, among all the regions the SA and VIC regions have the lowest number of customers with 4863 and 4814 customers who bought products from the E-commerce site. Therefore, it is suggested target new customers in VIC and SA regions in order to improve business.
In order to predict the monthly sales of the product have used the Liner regression method on the selected dataset. Linear regression in sales forecasting is one of the frequently used technique in predictive analysis. The main concept behind this regression technique is to check two objectives. One is the performance of set of predictor variables in the prediction process to get a viable outcome depending on those variables and determining the variables in this process that have significant impact on the outcome variable. Following is the result of prediction on monthly sales depending on the customer count variable.
With the above plot it is expected that there will be a slight increase in the sales depending on the number of customers on the ecommerce site.
According to the prediction model there is slight increase in monthly sales of the e-commerce organization. Following are some of the recommendations that may help in improving the business performance.
- In order to increase the sales of the products the organization should provide free shipping for all the products as there are customers who prefers free shipping rather than customer paid shipping of the products.
- As there are lesser number of customers from the VIC region, therefore it is suggested to use the promotional offers and other marketing campaigns to increase the customer base from the region.
- There are lesser number of customers for the “Toy car” product as well as lesser amount of revenue from the product therefore it is suggested to promote the product to attract the customers or provide offers so that the sales of the product can be increased.
For the slow moving products if the sale does not increase even after promotional offers and marketing campaigns therefore it is suggested to close out those products. The organization should also try to get feedback for the product and services in order to influence other customers through the positive feedback.
On the other hand, the negative feedback can help in the detection of the issues in the business process so that they can be improved by sorting out those flaws.
Personalized recommendations every time a customer comes back to the e-commerce store this will improve their buying experience from the site.
Conclusion
Use of the data analytics techniques are helpful in the creation of knowledge base by reducing the large amount unstructured data in usable as well as recognizable information for a business organization. In the prediction of monthly sales use of the linear regression analysis is helpful in making smarter and accurate business decisions to improve the business performance as well as revenue from the business. With the above mentioned recommendations it will be possible for the organization to increase revenue from its business.
Baardman, L., Levin, I., Perakis, G. and Singhvi, D., 2017. Leveraging Comparables for New Product Sales Forecasting.
Omar, H., Hoang, V.H. and Liu, D.R., 2016. A Hybrid Neural Network Model for Sales Forecasting Based on ARIMA and Search Popularity of Article Titles. Computational intelligence and neuroscience, 2016.
Žylius, G.G., Simutis, R. and Vaitkus, V., 2015. Evaluation of computational intelligence techniques for daily product sales forecasting. International Journal of Computing, 14(3), pp.157-164.