According to the Statistics from Statista.com, there are 970 million social media users around the world in the year 2010. Which is far behind from the statistics in the present year, which increased to 2.77 Billion. These statistics only shows that social media is a platform where almost one fourth of the world’s population meet regardless of the race, age, sexuality and even social status. It is a free, real-time and popular platform which most of the people can avail anytime and anywhere.
Most of the time, social media, gives entertainment to those who use it. In some cases, social media also served as a platform in proliferating fraud information to public. However, the massive number of social media users’ can be beneficial in each other by sharing right information most especially in times of disasters.
Twitter introduced in the year 2006 offers microblogging which allows the user to create short message and share it online to other users. Twitter also popularized the use of hashtag which became the most used and recognizable feature of the platform.
Although Twitter is limited to 280 characters, today’s users can also share videos, images, audio clips, GIF’s and links.
The more than hundred million users of Twitter can post anything they want to share. They can even use some words which are related to disasters but has a different context. Which is the main goal of this research paper, to filter the tweets or microblogs, to separate the disaster related from those which are not. The simple keyword based search to identify disaster related microblogs will lead in getting massive portion of false positive tweets.
Finding useful tweets can be like looking for a needle in the haystack.
In dealing with this problem. This research used a machine based learning approach to classify tweets or microblogs as disaster type. Unlike some previous study which used post hoc analysis which focused on getting information from foreseeable events and using post hoc analysis on specific incidents. This study used to handle the problem in a general manner. Researchers also considered a more realistic setting, which they recognize tweets from unseen events.
There are three main contribution of this research. First is to analyze the distinctive power of different disaster features in classifying microblogs. Second is to show the potential in cross – disaster type training of classifiers. Last is to point out that conventional method will generate bias result.
Most of the studies about tweet classification suggest to additional sources of information in order using bag – of – words approach to cover the brevity of the data. This research covers multiple types of events from different geographical locations, this does not rely on specific locations only.
This research focused on two microblogs classification, “disaster or not”. The second classification is the disaster type where tweets are located into classes such as non- disaster, storm, flooding, earthquake, fire and other. Researchers used Support Vector Machines (SVM’s) in their experimentation of tweet classification. Researchers randomly sampled 6,500 tweets posted in the range of two years. But at first, they annotated the tweets manually which were done by three annotators hired through Crowdflower, a crowdsourcing service over Amazon mechanical turk. From 6,500 tweets, the experiment ended with 5,747 tweets. 2,850 tweets are considered related to disaster while the remaining 2,897 tweets are non- disaster.
Researchers also did experiment to test how efficient the classifiers can identify disaster related tweets or if not. Researchers gave factors considered in the effectiveness of classification. First is the effect of training of preceding incidents on the classification of new ones. Second is the effect of using incident – specific or generic features in accuracy of classification. The third one is the effect of training size.
In this study, researchers examined the difficulty in filtering microblogs. They presented classification framework for separating disaster related microblogs from non – related. Researchers also studied tweets according to their type such as earthquake, fire, flooding, storm and other. Comparing features of more general nature with features that are incident related, these two features has different application. The first one is for early identification of disaster content and the other one is for getting Twitter information on a known event.
Researchers have highlighted significant evaluation for similar studies. Conventional evaluation method is not fair just like cross – validation which use the complete data set as a mixed bag. Researchers recommended time – split evaluation where tweets are only used for training.
On the concluding part of the study, researchers noted that classifying short messages without context is hard. Providing the contextual information from other tweets and knowledge from other sources improve the accuracy of classification.
Twitter would be a helpful platform to each and everyone in the society in times of disaster. In this time of social innovations, asking for help and helping others in times of incident will be much easier. Hashtags and mentions can be a beneficial tool for doing good things. We must just first organize and classify which tweet would be beneficial.
References
- Karimi, S., Yin, J., & Paris, C. (2013). Classifying microblogs for disasters. Proceedings of the 18th Australasian Document Computing Symposium on – ADCS 13. doi:10.1145/2537734.2537737