Project Details
In the present digital era, big data analytics became a necessity for the business organizations. Use of the big data analytics can help the organizations to find out different hidden patterns in the collected data from the different business process as well as from the clients of the organizations. The companies can exploit available market opportunities in order realize competitive advantage over competitor companies.
Different sections of this report contributes to the discussion about the different insights using the Watson analytics tool, recommendations for the manager of the ABC online multimedia company and research brief for the different dashboards.
This data analytics project is carried out in order to analyse and find out the hidden insights from the selected dataset with some guided question sets as well as advanced insights.
The selected dataset which is downloaded from the URL https://data.world/iamdilan/youtube-dataset contains 18 columns namely video_id (id of the uploaded video), category_id, publish_date, , title, channel_title that uploaded the video, time_frame of the video, trending_date, Day of the _week of the published video, country in which the , tags used for the video, number of views for the video, likes, dislikes count for the video, count of comments for the uploaded video on the different YouTube channel.
- There may be redundant video_id, Country, as well as titles of the videos in the columns as the dataset contains the viewer details for different videos on YouTube.
- True value for the video_error_removed indicates that the video has been removed from the YouTube platform.
- The dataset may not contain videos of all the categories.
- The tags for the videos may contain multiple redundant values.
Answer 1
There are total 55885 distinct uploaded videos in the selected dataset.
Answer 2
The dataset contains videos mainly related to 18 categories.
Answer 3
Number of countries in the selected dataset is 4 which are FRANCE, US, GB and CANADA.
Answer 4
Number of unique on the YouTube channels in the selected dataset is 12360.
Answer 5
top three countries when compared by the number of channels are FRANCE, CANADA and US.
Answer 6
The GB has the lowest number of channels according to the selected dataset.
Answer 7
US has 2207 different Unique YouTube channels.
Answer 8
Answer 9
Answer 10
The dataset contains the 13 years of uploaded video data.
Answer 11
Number of uploaded videos in the month of December is given by 8544 and following is the concerned dashboard.
Answer 12
Most number of videos uploaded in GB in the year 2018.
Answer 13
The time frame 16:00 to 16:59 most number of videos were uploaded on the YouTube.
Answer 14
Top three viewed video categories were 10, 24,1
Answer 15
The least viewed categories are 44,30,43
Answer 16
The video that has the highest percentage of likes is “Childish Gambino-This is America.”
Analysis Tasks and Visualization
Answer 17
The highest percentage of dislikes is recorded against the video title “So Sorry”
Answer 18
Friday is the day on which the most number of videos were uploaded on YouTube.
Answer 19
Least number of videos were uploaded on Saturday.
Answer 20
The below dashboard shows the monthly breakdown of the uploaded videos.
From the above dashboard, it can be stated that there was significant increase in the rate of uploaded videos from the month November, 2017. Again this rate of upload of videos gets decreased by 50% in the month of May, 2018.
Insight 1
In this advanced insight the day on which users mostly commented on the different videos on the YouTube platform.
From the above dashboard it can be stated that, the users commented on the videos mostly on the Fridays. Furthermore, least number of comments are recorded on the Saturdays.
Insight 2
In this dashboard, the total number of comments are compared for each country.
From the above dashboard, it is evident that most number of comments are recorded from the country GB. Second highest number of the comments are recorded from US. Lowest number of comments are recorded from the Country France.
Insight 3
This dashboard is designed in order to find out the top 10 channels that were disliked by the viewers.
The top 10 disliked video channels are YouTube Spotlight, LucasLucco, Logan Paul Vlogs, ChildishGambinoVEVO, Daily Caller, Bad Bunny, ShakiraVEVO, Salman Khan Films and two other channels.
Insight 4
In this dashboard, the tag with the most disliked video tags are discovered in order to manage the content on the platform.
The most disliked tag is“BIGHIT|”ë¹…ížˆíЏ””|””방탄소년단””|””BTS””|””BANGTAN””|””방탄”””.” For the next two most disliked video tags are “Logan Paul Vlogs” and “Rewind, Rewind 2017.”
Insight 5
In the following dashboard the relation between the dislikes and removed videos are analysed in order to find out that where ever the uploaded video was removed and error occurred due to the dislike by the viewers.
The above dash board shows that for the maximum number of videos with the dislikes are removed from the channels that are present in the GB and the second highest number of videos were removed from Canada.
While designing the dashboard on IBM Watson, it is found that even though the number of channels in the country GB is lesser compared to other three countries but the viewership from this country is more than the countries FRANCE and CANADA.
Advanced Insights
In addition to that, with the maximum views the number of comments for the videos are also high from other countries.
This dash board shows the top three countries are GB, US and CANADA. In the dashboards we have used the pie charts as this kind of visual representations can help in displaying comparative proportions of numerous sections available in the dataset. In addition to that, size of the complete pie is made to proportional of the total data or attributes available, can help in summarizing the large amount of data visually. At the end it can be stated that pie chart is visually simpler to understand compared to other graphs.
- For the country GB it is evident from the dashboards that it has huge increase in the viewership from the December, 2017 and has been increasing through the year 2018. Compared to this other three countries US, FRANCE, CANDA has a slow and steady growth in the number of views of the videos. Therefore, it is suggested to encourage the channels to upload videos of concerned interest more frequently.
- Dislikes for the videos leads to the removal of the videos from the platform therefore it is suggested to minimize the dislikes by restricting the type and tags of videos that are not of the interest of the users. Hence, it is suggested to remove the contents which are disliked by the viewers.
- Channels count from the country GB must be increased as the views are increasing exponentially from this country in order to maintain this trend.
- The growth rate for FRANCE, GB and US is slow compared to the GB, thus it is suggested to promote the liked videos to improve the rate of growth.
To
The Manager
ABC online Multimedia Company
Respected Sir,
This letter is intended to brief the insights available from Watson Analytics tool on the selected YouTube dataset. The dataset conations 13 years of data (2006 to 2018) and viewers four countries GB, FRANCE, US, and CANADA.
In the analysis it is found that most liked category was 10 which is Music related videos. Furthermore, most liked video was with the title “Childish Gambino – This Is America (Official Video)” and the most disliked video was “So sorry” titled video.
Again when considering the year 2018, in the month of may rate of uploaded of videos gets decreased by 50% compared to the previous month.
From the analysis of the dataset it can be recommended that, in order to increase the number of channels in the country GB as the viewership is increasing very rapidly and hence can be very helpful in improving the business. Furthermore, quality of contents related to vlogs must be managed related to other categories.
Finally, we would like to request you to please go through recommendations as well as insights can in order to improve the business of the organization.
Conclusion
With its natural language processing ability Watson can find out the implicit insights and patterns in the dataset to get started with the analytics. Along with the multiple advantages mentioned above there are certain limitations of this tool we found while working on this tool. Some of them are this tool is not able to process structured data directly from the source. In addition to that it cannot handle increasing rate of data with the limited resources provided with it.
The Watson analytics is helpful in leveraging the benefits of advanced analytics without the complexity of writing the extensive lines of codes in a programing language. After uploading the data, we can select from available starting points in order to create own visualisations from the data. After selecting one of the starting points we can create discoveries which is fundamentally collection of multiple visualisations. This visualization can be modified while adding numerous other data attributes and measures available for the selected dataset. In order to make the changes elements are available in the data tray along the bottom of the screen.
Beller, C. E., Bethard, S. L., Dubyak, W. G., Tonetti, A. C., Thatcher, S. T., & Julie, T. Y. (2018). U.S. Patent Application No. 15/359,010.
Chen, Y., Argentinis, J. E., & Weber, G. (2016). IBM Watson: how cognitive computing can be applied to big data challenges in life sciences research. Clinical therapeutics, 38(4), 688-701.
Hoyt, R. E., Snider, D., Thompson, C., & Mantravadi, S. (2016). IBM Watson analytics: automating visualization, descriptive, and predictive statistics. JMIR public health and surveillance, 2(2).
Lee, J., Kim, G., Yoo, J., Jung, C., Kim, M., & Yoon, S. (2016). Training IBM Watson using Automatically Generated Question-Answer Pairs. arXiv preprint arXiv:1611.03932.
Miller, J. D. (2016). Learning IBM Watson Analytics. Packt Publishing Ltd.
Mylopoulos, J. (2017, October). Goal-Oriented Regulatory Intelligence: How Can Watson Analytics Help?. In Conceptual Modeling: 36th International Conference, ER 2017, Valencia, Spain, November 6–9, 2017, Proceedings (Vol. 10650, p. 77). Springer.
Nagwanshi, K. K., & Dubey, S. (2018). Statistical Feature Analysis of Human Footprint for Personal Identification Using BigML and IBM Watson Analytics. Arabian Journal for Science and Engineering, 1-10.
Suarez Saiz, F. J., Urman, A., Sanders, C., Britt, M. W., Nielsen, R., & Stevens, R. J. (2018). IBM Watson Evidence Service (WES): A system for retrieval, summation and insight generation of relevant clinical evidence for personalized oncology.
Tsoi, K. K., Chan, F. C., Hirai, H. W., Leung, G. K., Kuo, Y. H., Tai, S., & Meng, H. M. (2017). Data visualization on global trends on cancer incidence an application of IBM Watson Analytics.
Zhang, X. C., Zhou, N., Zhang, C. T., Lv, H. Y., Li, T. J., Zhu, J. J., … & Liu, G. (2017). 544P Concordance study between IBM Watson for Oncology (WFO) and clinical practice for breast and lung cancer patients in China. Annals of Oncology, 28(suppl_10), mdx678-001.