Discussion of Concern Regarding Big Data Mining
Data mining is defined as the technique that is used to extract specific information from a certain volume of data. On the other hand, Big Data mining is the extraction technique when the volume of data is very large. Generally, Big Data mining technique is utilized to extract information from a large volume of data like organizational data, national data and others. Big Data mining is generally utilized for a number of different purposes like extraction of past statistics, identification of a particular person based on certain identity information available, extraction of some specific information of an individual like phone number and others. While Big Data mining is often an important data extraction technique for various purposes like crime investigation, statistical data analysis and others, some people also have raised concern regarding privacy issues that may arise with the Big Data mining.
The main aim of this essay is to discuss the possible issues that may arise due to Big Data mining and how they can be overcome. The quote under discussion in this essay is “As long as you have not done anything wrong, you shouldn’t be concerned about big data mining today”. In this essay, the topic of Big Data mining will be discussed in order to prove or nullify the quotation mentioned above.
Big Data, Data Mining, Big Data Mining
Big data and data mining are two entirely different concepts; however, both of them are related to the usage of large data sets for handling the data collection and also for the purpose of reporting the data which is responsible for serving the business or any other recipients. Despite of all these, the two terms are used for two different elements in data operations. Big data refers to the term for data sets which are large in size. These are the types of data sets that outgrow the database which are simple and the data handling architectures which were used in the earlier times. In the earlier time it refers to the time when the big data were much more expensive along with being less feasible. However, the term data mining refers to that particular activity where a search is being conducted in big data sets for looking for data which are relevant or pertinent in nature. This type of activity is considered to be a good example of the old axiom that is “looking for a needle in a stack of hay”. The basic idea of this is that the businesses are associated with collecting massive sets of data which might be collected automatically or homogenously. Followed by this the decision makers have to access the smaller and the more specific pieces of data from the larger sets of data. For this reason, they use the data mining technique that helps in uncovering the pieces of information that would be helping in informing the leadership along with helping in creating a chart of courses for a specific business. Big data mining might be associated with involving the usage of various kinds of software packages which include the analytic tools and many more.
Applications of Big Data Mining
Clear Goals: Goals of the business should be made clear regarding the aims that a particular organization wants to gain by making use of data mining. It is only possible to achieve success in big data mining in cases when few tangible, certain goals are capable of finding out the products which are having the least popularity and by what means it is possible to improve the situation. This can be done by analysis of the activities conducted by the customers in social media along with considering their feedback to the various loyalty programs can help in trove of the information from the aspect of relevancy of the inventory to the need and requirements that the customers are having.
Relevant Data Sources: The sources of data should be relevant in order to make sure there does not exist any kind of duplicate data or receiving of results which are totally not important at all. Big data mining is capable of providing credible results only when the data that is being collected is relevant and is collected from the sources which are relevant as well. Despite of all this it is determine of the information which are relevant in nature is not enough so it is also to be kept in mind that the size of the dataset stays close to the minimally appropriate level. In some cases when there exist the duplicates it is essential that semantic analysis is used so as to locate the keywords and determine any plagiarism if exists. Publication comparison of any duplicates can also be done so as to find the earliest publications.
The major concerns regarding the Big Data have also been highlighted by various researchers who also recommended to rethink the application of Big Data mining process in order to maintain discretion and privacy of the users.
According to Andrejevic, Big Data is now applied in a wide range of systems like social media, IoT devices, mobile services and others. In many of these systems, the users use some entirely personal information that they do not want to make public. These websites and systems are designed to provide maximum privacy to the customers and hence, it is important to maintain the privacy of these users instead of using data mining techniques for extraction of personal data.
Lee and Lee studied the data privacy problems that are the main sources of concern for the general users. All the information is stored in a certain database of the website within a Big Data system. The user only enters this information when he trusts the website and wants to store them for profile verification purpose only. However, since the entire Big Data is entered within an online interface, it is very easy to breach the security and steal information from the online storage (cloud). There are many third party entities that have strong big data mining techniques that can break through minor protections in the online storage and steal information from the same. As a result, the online users will not only fear exposing of their private information but will also undergo through huge amounts of losses if their bank accounts and payment cards are breached and misused.
Privacy and Ethical Concerns Associated with Big Data Mining
Applications of Big Data Mining
The major applications of Big Data mining in various sectors are discussed as follows.
Crime Investigation – One of the main applications of Big Data mining is crime investigation. Data mining assists the investigators in matching the evidences with the suspected objects or people that in turn helps to identify the criminal or even create suitable case against him. This process is further facilitated by the digitalization that requires all the citizens of the country to upload and store their identity in the national digital database.
Terrorism Control – Another major application of Big Data mining that is recently being utilized is the terrorism control. Data mining techniques are used by the Government agencies to detect the locations of the terrorists with the help of various data databases. For instance, during the 9/11 incident, extensive data mining was conducted in order to identify and locate the terrorists behind the incident. There terrorists were later caught or killed but data mining did not ensure stopping any terrorist activities as they continued later on.
Data mining also has many other applications like VAR system in football matches, market analytics, machine learning, ecommerce (online market place) and others.
Privacy and Ethical Concerns Associated with Big Data Mining
In spite of all the benefits and the applications of Big Data mining, several concerns have been raised regarding privacy and security problems that may arise with the Big Data mining process. Some major concerns raised are explained below.
Security of Personal Data – While using several websites like social media, professional websites, service websites and others, people need to use their personal information like phone number, email address, bank account and transaction card details (if online payment is involved) and others. All the information is stored in a certain database of the website within a Big Data system. The user only enters this information when he trusts the website and wants to store them for profile verification purpose only. Hence, the user will always prefer not to disclose any such information to any third party. On the other hand, any third party can use Big Data mining techniques to extract such data from the website and use the data for unethical reasons. Hence, it is natural for the users to feel concerned regarding the Big Data mining process when huge amounts of their personal data are uploaded in the online interface.
Mining for Ethical Reasons / Investigations – Authorized data mining is generally conducted as a part of an investigation process in which Big Data is mined to extract information about a suspect or other people that are related to the investigation. For instance, in a crime investigation, when there is one or more suspects, their data and information (e.g. call records, evidence pictures, videos and others) are mined from the Big Data like telephonic records of the cellular network service provider, data stored in the devices like phones and tables and other similar sources. Hence, in the quotation mentioned earlier it is said that there is not any need to be concerned if a person has not done anything wrong. This is partially correct as that person will not be convicted of anything if he is proved innocent after data mining. On the other hand, even if there is nothing to fear about, there are certain private and personal information that the users do not want to show anyone else. These include personal family photos, videos, messages and others. Thus, data mining process, even if done in an ethical manner and for certain purposes, any user will feel threatened and that their personal activities will be exposed to public. Moreover, the fear of expose is the main reason why the users use password protections in their personal profiles and folders. Thus it is justified that the users will feel concerned about Big Data mining irrespective of the fact that they have or have not done anything wrong.
Medical Data – Modern medical information systems have been developed in order to keep online medical records of the patients such that during appointment with the doctor, the doctor can easily access the patient’s online profile and read the medical documents from the same. However, these medical documents may contain some sensitive information like a certain deformity in the body of the patient, sex-related diseases and others that the patient does not want to share with other than the doctor. In case of medical data breach, this data may be compromised and the patients’ personal information will be exposed. Hence, the users can rightly be concerned with the Big Data mining process.
According to Mehmood et al., the concerns regarding Big Data mining have more ethical implications than technical. While there are valid concerns regarding the security of the information stored by the online Big Data system, the main issue that arises is the privacy of the information and ethical implications that may hurt the interests of the users. In spite of having valid reasons for Big Data mining, the users may raise the following questions regarding the procedure:
Who will be responsible for ensuring privacy of the users?
How is the application of Big Data Mining approved and under what circumstances?
If there are other ways to collect necessary information, why is it necessary to extract personal data of the users?
If the data mined from the Big Data is later proved to be useless, who will be responsible for disclosing personal information to public?
What is the meaning of privacy if such information is mined for use?
All of these reflect the ethical nature of the entire process of Big Data mining that are concerning the users. There are certain websites and applications that are meant to provide the users with absolute security and privacy when the users enter their own information. Furthermore, there are some users who capture their personal moments or use entirely personal things that they do not want to be viewed or known by any other people in order to avoid embarrassment or even detailed queries regarding the same. In addition, in most of these personal captures, often there are other people involved. Hence, it is important to ensure their private information remain private and secure from the access by any other person. The users are provided with the options of passwords and other authentication protections so that no other user is able to access the personal data. However, if data mining is executed from higher authority like police, government and others, it is possible to extract the information from the profiles and even if the user did not do any crimes or wrong, his privacy will be breached by the Big Data mining process. Furthermore, data mining does not give assurance over anything; it only provides the miner with some pieces of data and rest of the outcomes depend on the analysis method and the decisions made based on the analysis.
Conclusion
The entire discussion in the essay has been conducted on Big Data, Big Data mining and its ethical implications on the users’ privacy. As a reference, a common quote has been mentioned regarding the privacy of the users in Big Data mining. The quote states that there is no reason of concern for the users regarding Big Data mining if they are doing nothing wrong. However, the argument in the essay states that the Big Data mining technique breaches the basic right to privacy of the common people irrespective of whether they have done anything right or wrong. For the sake of argument, some may state that Big Data mining is done for valid purposes and to achieve a particular approved goal but it must be understood that just justifying the reason of mining is not sufficient as it has deep lying ethical implications. If sufficient information privacy is not provided to the users, there is not any value to the basic rights of a human being. One recommended process for allowing Big Data mining is to take approval from the user regarding whether their personal information and files can be accessed in case of urgent requirements like investigations. It is also important to undertake suitable steps in order to ensure the personal information is not disclosed to public and the security is maintained throughout the mining process.
Aggarwal, Charu C., and Jiawei Han, eds. Frequent pattern mining. Springer, 2014.
Al Nuaimi, Eiman, Hind Al Neyadi, Nader Mohamed, and Jameela Al-Jaroodi. “Applications of big data to smart cities.” Journal of Internet Services and Applications 6, no. 1 (2015): 25.
Andrejevic, Mark. “Big data, big questions| the big data divide.” International Journal of Communication 8 (2014): 17.
Bello-Orgaz, Gema, Jason J. Jung, and David Camacho. “Social big data: Recent achievements and new challenges.” Information Fusion 28 (2016): 45-59.
Crawford, Kate, and Jason Schultz. “Big data and due process: Toward a framework to redress predictive privacy harms.” BCL Rev. 55 (2014): 93.
Cuzzocrea, Alfredo. “Privacy and security of big data: current challenges and future research perspectives.” In Proceedings of the First International Workshop on Privacy and Secuirty of Big Data, pp. 45-47. ACM, 2014.
Gudivada, Venkat N., Ricardo Baeza-Yates, and Vijay V. Raghavan. “Big data: Promises and problems.” Computer 3 (2015): 20-23.
Hashem, Ibrahim Abaker Targio, Ibrar Yaqoob, Nor Badrul Anuar, Salimah Mokhtar, Abdullah Gani, and Samee Ullah Khan. “The rise of “big data” on cloud computing: Review and open research issues.” Information Systems 47 (2015): 98-115.
Hashem, Ibrahim Abaker Targio, Victor Chang, Nor Badrul Anuar, Kayode Adewole, Ibrar Yaqoob, Abdullah Gani, Ejaz Ahmed, and Haruna Chiroma. “The role of big data in smart city.” International Journal of Information Management 36, no. 5 (2016): 748-758.
Kambatla, Karthik, Giorgos Kollias, Vipin Kumar, and Ananth Grama. “Trends in big data analytics.” Journal of Parallel and Distributed Computing 74, no. 7 (2014): 2561-2573.
Lee, In, and Kyoochun Lee. “The Internet of Things (IoT): Applications, investments, and challenges for enterprises.” Business Horizons 58, no. 4 (2015): 431-440.
Mehmood, Abid, Iynkaran Natgunanathan, Yong Xiang, Guang Hua, and Song Guo. “Protection of big data privacy.” IEEE access 4 (2016): 1821-1834.
Terzi, Duygu Sinanc, Ramazan Terzi, and Seref Sagiroglu. “A survey on security and privacy issues in big data.” In Internet Technology and Secured Transactions (ICITST), 2015 10th International Conference for, pp. 202-207. IEEE, 2015.
Thuraisingham, Bhavani. “Big data security and privacy.” In Proceedings of the 5th ACM Conference on Data and Application Security and Privacy, pp. 279-280. ACM, 2015.
Tsai, Chun-Wei, Chin-Feng Lai, Han-Chieh Chao, and Athanasios V. Vasilakos. “Big data analytics: a survey.” Journal of Big Data 2, no. 1 (2015): 21.
Wamba, Samuel Fosso, Shahriar Akter, Andrew Edwards, Geoffrey Chopin, and Denis Gnanzou. “How ‘big data’can make big impact: Findings from a systematic review and a longitudinal case study.” International Journal of Production Economics 165 (2015): 234-246.
Wu, Xindong, Xingquan Zhu, Gong-Qing Wu, and Wei Ding. “Data mining with big data.” IEEE transactions on knowledge and data engineering 26, no. 1 (2014): 97-107.
Xu, Lei, Chunxiao Jiang, Jian Wang, Jian Yuan, and Yong Ren. “Information security in big data: privacy and data mining.” IEEE Access 2 (2014): 1149-1176.
Xu, Lei, Chunxiao Jiang, Yan Chen, Jian Wang, and Yong Ren. “A framework for categorizing and applying privacy-preservation techniques in big data mining.” Computer 49, no. 2 (2016): 54-62.
Yoon, Kyunghee, Lucas Hoogduin, and Li Zhang. “Big Data as complementary audit evidence.” Accounting Horizons 29, no. 2 (2015): 431-438.