Objective
All the facilities that are provided at the airport and at the time of flight are usually kept track by the Airport Quality Agency. The agency AQA has a major interest in evaluating the recommendations provided by the existing customers for choosing flights over other modes of transport to the non-existing customers. Further, the factors that influence the customers to recommend airlines over other modes of travel also needs to be evaluated in this research. In order to conduct this research, the Airport Quality Agency hired a company named Skytrax and asked them to conduct a survey on the existing passengers. In the mini survey conducted by Skytrax, the customers were asked to rate various different aspects of their travel by flight, starting from the airport, airlines, lounge as well as the seats of the flights. Ratings on various factors related to these four aspects were collected from the passengers by Skytrax and then presented to AQA for the purpose of the analysis. This process of data collection has proven very costly to the AQA and thus, if they find this research successful, then they will expand their research and further data will be collected from social media websites such as Facebook or Twitter. In this case, the collected data from Skytrax will be analyzed with the help of the Rapid Miner Tool. Analysis will be conducted on the seats data. It can be assumed that if the seating facilities in the airports as well as in the flights are satisfactory, then the travel by air might increase.
Thus the main research objective is to analyze the data on “Seats” that has been collected by Skytrax and find out the factors that will be influencing for the customers to recommend air travel.
There are numerous variables in the dataset “Seats”, collected by Skytrax. Among all these information, the information on the customer ratings will only be considered for the analysis. It is assumed that the ratings on the satisfaction of the customers is more important for a customer to recommend travel by air. The ratings of the customers were given on various aspects such as the overall rating on the seats, the presence of the legroom between the seats, rating on the reclining of the seats, width of the seat, presence of aisle space by the seats, whether the TV can be viewed properly, power supply as well as the seat storage. All the ratings given by the customers on these factors will be used as data to analyze and predict the recommendations for the travel by air. As it can be seen that the data is not complete and there are various missing values present in the data. All the missing values will be manipulated while running the analysis process in rapid miner.
Data Collection and Analysis
The main aim of this research is to find out whether the customers are recommending the air travel over the other modes of travel. Customer satisfaction rating has been considered as one of the most important factors that are responsible for this recommendation despite of a lot other factors. A histogram of the overall rating given by the customers have been designed using rapid miner. It can be seen clearly from the histogram that most of the people have given a rating which is very low on the seats of the flights. Thus, it is the immediate duty of the AQA to take care of the problems people are facing on seats so that their satisfaction is increased. Further, from the scatter diagram, it can be seen clearly that the people who have recommended the travel by air have given considerable higher ratings on the overall satisfaction.
Figure 1: Histogram showing overall satisfaction
Figure 3: Correlation Table
From the correlation table, it can be seen clearly that there is a positive correlation between recommendation and all the other aspects of satisfaction ratings. This indicates that with the increase in the ratings given by the people, the more they recommend the mode of transport and thus, the number of customers travelling by flight will increase. This will in turn influence the profit of the AQA.
In this research, recommendation of travel by air has to be determined. Thus, the data has been clustered into two different clusters. One cluster is the people who are recommending travel by air and the other cluster is formed with the people who are not recommending travel by air. In the figure below, the red cluster indicates traveling by air and the blue cluster indicates not travelling by air. It can be seen clearly from the graph below that the people providing higher ratings have mostly recommended the travel by air to other people.
Three different methods will be used here to build a model. These are the Logistic regression model, the decision tree model and the K-NN model. In the following sections there will be discussions about the processes, the results and the efficiency of the performance of the models.
Process of Running the Model
In this research, the recommendation of the customers has to be predicted with the help of the other variables on customer satisfaction ratings such as the overall rating on the seats, the presence of the legroom between the seats, rating on the reclining of the seats, width of the seat, presence of aisle space by the seats, whether the TV can be viewed properly, power supply as well as the seat storage. Since, here, recommendation is a dichotomous variable involving only two outcomes, “0” and “1”, which indicate “not recommend” and “recommend” respectively, thus, logistic regression will be used for the prediction. This model has been considered as one of the most important prediction models. Logistic regression can also predict data with explanation only if there are presence of missing values in the dataset. The modelling method also provides explanation of the data with appropriate tables only if there are missing values in the data.
Recommendation of Air Travel
Considering all the variables for the regression, all the variables have not been found significant.
Process of Running the Model
Another method that can be used efficiently for the prediction of the recommendations, is the decision tree. This is also another important method that can be used in order to predict the categorical variables, or more specifically dichotomous variables. This method is also a model simulator just like the logistic regression model. In case of the decision tree model, prediction model can be framed with the presence of missing data. Thus, one important aspect that keeps this method at an advantage with comparison to the logistic regression is the missing values.
It can be seen clearly from the tree provided below that when the seat legroom rating is high and above 3.5, the customers mostly tend to check the seat width. When both the seat legroom and the seat width are satisfactory, most of the customers are likely to recommend travelling by air. When seat width rating is not satisfactory. TV viewing condition comes into importance. The more the people are satisfied with the TV viewing, the more they are inclined towards recommending flight travel. On the other hand, when the seat legroom rating is less than 3.5 and the overall rating is more than 6.5, there is a clear indication of recommendation by the customers.
Figure 6: Decision Tree Model
Process of Running the Model
The K-NN model or the kth nearest neighbor model is one of the most important data mining models. This model can be used in case of both regression model as well as a classifier model. Though the application of this model is done widely in case of the classifier model.
Thus, from the analysis conducted so far, it can be seen clearly that the demand for the air travel is likely to increase with the increase in the customer satisfaction in all the factors that has been considered. As seen from the correlation analysis, that all the variables have a positive relationship with the recommendation. Thus, it is important for the Airport Quality Agency to keep the services up to date so that the customers are satisfied. Otherwise there will be harm in their business. If the customers that are travelling by air are not satisfied by the services provided, they will definitely opt for other alternative satisfactory transport for travel. Thus, it is highly recommended to improve the quality of the factors such as the overall rating on the seats, the presence of the legroom between the seats, rating on the reclining of the seats, width of the seat, presence of aisle space by the seats, whether the TV can be viewed properly, power supply as well as the seat storage. This will be efficiently benefitting the agency.
This research has been performed on a very small portion of the population. Thus, further research can be conducted by increasing the sample size. As already planned by the agency, the social networking sites can be used for the collection of the data. That way, data will be available from people all around the globe and the results so obtained will be more helpful for the Agency to develop their services.