Data Collection and Analysis
a) Due to growth in city size and underlying population, the public transport infrastructure requires maintenance and capital investments so as to provide mobility options to the people that are efficient yet affordable. One of the ways in which the efficiency factor coupled with assessability is improved is through alternation of timings and overhauling of routes so as to ensure wider coverage to a larger number of people and ensuring that travel time is minimised for a larger segment of population. These exercises are carried out after relevant market research is done with regards to the patterns of usage of travellers and there are specialised agencies that are involved in the same which provide key input in this regards (Meyers, 2017). These then form the basis of introduced changes which may not be useful to everyone but aims to maximise the efficiency and utility of the transport network while providing higher coverage especially to remote locations. The given report tends to analyse some datasets through statistical tools in order to understand behaviour and preferences of travellers and offer advice for future surveys.
b) For the given dataset, it needs to be determined if the dataset is primary or secondary. In order for the dataset to the primary, it is imperative that the underlying data must be collected by the researcher directly from the subjects or respondents. Clearly, this is not happening in the given case as the data has not been collected by the university and it has merely sourced the data and provided us the same. As a result, the given dataset would be labelled as secondary only and not primary (Eriksson and Kovalainen, 2015). There are essentially six variables in the given dataset with a sample size of 1000 observations. A brief description of the given variables is indicated as follows.
- Mode – It indicates the public transport means that a given trip uses and is essentially a categorical variable. Considering that no automatic arrangement of the responses is possible, hence the given variable is represented using nominal scale.
- Date – It indicates the date of travel for the given trip. Considering that the given responses can be arranged in chronological order, hence the appropriate measurement scale would be ordinal.
- Tap – Tap essentially represents two states namely tap on and tap off for the given trip and is essentially a categorical variable. Considering that no automatic arrangement of the responses is possible, hence the given variable is represented using nominal scale.
- Time represents the time aspect related to the trip and would be considered a quantitative variable. The measurement scale used for this would be interval scale.
- Count highlights the requisite frequency and is essentially a numerical or quantitative variable. The concerned measurement scale is ratio as absolute zero is defined.
- Location highlights the underlying station in the trip where tap on or tap off is happening at the particular time. The underlying variable is categorical. Considering that no automatic arrangement of the responses is possible, hence the given variable is represented using nominal scale
The key cases may be derived considering the differences in the above variables which have been defined above and would correspond to the behaviour and different preferences of the travellers.
c) For collection of dataset 2, 30 respondents have been selected and the relevant information collected from these. The focus of this data is only on two variables namely the gender of the respondent along with public transport mode. The dataset 2 would be termed as primary data considering the fact that it was not been taken from some other primary or secondary source but has been collected myself using survey as mechanism (Hair et. al., 2015). Even though this dataset is primary, if does not imply that this dataset would be more accurate that the dataset 1. Two reasons are responsible for the same. One is the use of non-probability based sampling technique. The other is the low sample size of 30 which is insufficient for an accurate representation of the population. The underlying sampling is convenience sampling which also does not aid in faithful representation of the population and hence the results obtained from the analysis of this data may lack in reliability (Eriksson and Kovalainen, 2015). With regards to data type and the corresponding measurement scale, gender would be categorical variable with the use of a nominal measurement scale since no automatic arrangement. The same is applicable with regards to mode of transport (Hillier, 2016).
Results and Discussion
a) The public transport usage summary of the Dataset 1 has been provided as follows.
The corresponding graphical summary of the information presented in the above table is exhibited as follows.
It is apparent from the summary of mode based on dataset 1 that the most commonly used public transport mode is train considering the highest frequency amongst the four modes. Bus mode is also quite frequently used with slight difference between train and bus. However, the other modes (light rail and ferry) have only limited ridership and are not popular as modes of public transport. Hence, it is imperative from the government perspective that requisite spending in expanding train and bus related infrastructure should be made so that the increasing traffic can be handled without the impact of congestion. Also, it makes sense for the government to explore the other two mediums which are less frequently used so as to enhance their usage and hence reduce the current load on train and bus related infrastructure.
b) The relevant statistical technique to be applied here is hypothesis test for which the hypotheses to be considered as summarised below.
The computation of the sample proportion has been carried out taking into cognisance the sample size of 1000 and 484 being the trips involving trains. The hypothesis related output generated through the use of excel is indicated below.
For hypothesis testing, the relevant approach deployed is p value. The computed value of this measure as seen from the above output attached comes out as 0.8517. It is apparent that this p value tends to exceed the significance level which implies that the available evidence is not sufficient to warrant null hypothesis rejection. Hence, alternative hypothesis cannot be assumed to be true (Flick, 2015).
Hence, requisite statistical support with regards to train capturing more than 50% market share is not present based on sample data. This is on expected lined considering the train and bus are quite close with regards to popularity and usage levels. This implies that no one mode would have more than 50% share as there is some share occupied by other modes of transport such as light rail and ferry.
a) The train related public transport mode numerical summary is presented below in a tabular format.
The corresponding graphical summary of the information presented in the above table is exhibited as follows.
The handling of maximum traffic at the Parramatta train station is established from the aid of both the numerical as well as graphical summary. Further, it is noteworthy that the values pertaining to Parramatta train station are significantly higher when compared to the traffic generated at the other selected train stations.
Conclusion
(b) The relevant statistical technique to be applied here is hypothesis test for which the hypotheses to be considered as summarised below.
The significance level for the hypothesis test has been taken as 0.05 or 5%. Besides, the relevant test statistic is F as apparent from the alternate hypothesis. The ANOVA output for the sample data is obtained below.
The p value derived for the ANOVA test has come out as 0.66. This tends to exceed the significance level of 0.05. The net result is that the available evidence from the sample data does not warrant rejection of null hypothesis (Eriksson and Kovalainen, 2015). Hence, the relevant conclusion is that the proportions of tap on and tap off travellers do not exhibit any meaningful difference.
(c) Based on the above analysis, it may be appropriate to conclude that the Parramatta station is the optimal choice in relation of linkage of the proposed underground train line since this would allow for maximum usage of the new train line connection and hence would justify the underlying investments by the government. Further, it would also enable better service and lower issues of over-crowding in peak hours.
In regards to the primary dataset 2, the numerical summary of gender preferences is exhibited below.
The corresponding graphical summary of the information presented in the above table is exhibited as follows.
As per the above summary of the data collected, it is apparent that with regards to light rail and bus, no particular gender preference is observed. But stark difference between the preferences of the two genders is witnessed in case of train and ferry. More than 50% of the female travellers in the sample tend to travel by train which is comparatively 25% for the male counterpart. However, with regards to drawing conclusions on the basis of the above summary and the underlying primary data, one must be careful owing to the high potential of the dataset 2 being biased and non-representation of the underlying population. This would arise on account of a low sample size and non-usage of random probability based sampling methods to obtain the respondents.
Conclusion
The discussion that has been conducted above is reflective of train being the most common transport mode for the sample data that has been provided from an external source. However, bus as a public transport mode is also quite prominent with only slight difference when compared to the usage of train. The net result is that the usage of ferry and light rail is limited to only a small share of the passengers. With regards to the new underground train line being proposed by the government, Parramatta railway station seems a suitable choice for connection owing to its ability to function as a hub and thereby cater to higher number of travellers. Dataset 2 highlights the gender preferences of the usage of public transport in NSW where females tend to exhibit a preference towards train while males have no particular preference towards any particular mode of transport. However, considering that the given data could be potentially biased, more research is necessary in this context of gender preferences.
With regards to future research on the topic, the time factor needs to be considered whereby similar data collection ought to be performed in different months so that a common trend in the preferences and usage trend can emerge. The capital expenditure that is involved in laying down any incremental infrastructure is quite sizable and hence consideration needs to be given to factors particularly seasonal trends and possible discount related to a given mode of public transport. However, extensive research needs to be carried out with understanding the precise reasons of preferences amongst the available public transport modes and thereby suitable changes ought to be introduced by the relevant authority to maximise efficiency.
References
Eriksson, P. and Kovalainen, A. (2015) Quantitative methods in business research 3rd ed. London: Sage Publications.
Flick, U. (2015) Introducing research methodology: A beginner’s guide to doing a research project. 4th ed. New York: Sage Publications.
Hair, J. F., Wolfinbarger, M., Money, A. H., Samouel, P., and Page, M. J. (2015) Essentials of business research methods. 2nd ed. New York: Routledge.
Hillier, F. (2016) Introduction to Operations Research 6th ed. New York: McGraw Hill Publications.
Mayers, L. (2017) Greater Sydney and NSW public transport undergo state’s ‘largest’ timetable overhaul ever, [online] Available at https://www.abc.net.au/news/2017-11-26/new-sydney-and-nsw-public-transport-timetable-launched/9194538 (Assessed September 21, 2018)