Background of the study
Big data is referred to as the massive amount of data that needs to be processed in context of digital information organizations as well as government. It has been estimated that each day approximately 2.5 quintillion data bytes is processed over the world and it is increasing at a geometric rate[1]. The increment rate being so high, that around 90% of the total data has been created since 2015[2]. Security in Big data comprises of cryptographic encryption, security of data via third party cloud vendors, real time monitoring and audit logs, granular access control, provenance of data and filtering at a scalable privacy through digitally secured communication.
The aim of this research is to analyse, evaluate, assess securities in Big data along with identification of the security as well as privacy challenges of implementing it.
- To analyse security measures in Big data
- To evaluate the implementation process of security in Big data
- To assess the degree of total security that can be achieved by such security measures to Big data
- To identify privacy and security challenges of implementing security measures in Big data
With increased use of cloud based Big data security and privacy issues are getting magnified due to attacks from external hackers, internal personnel having access to unauthorized data and causing serious threats to Big data.
Since companies store important and delicate information in Big data, it is very necessary to provide security features to it. Such security features if not implemented, hackers might get internal information leading to serious damage to company.
Approximately 70% of the companies use sensitive data like payment card information, personal identity information and protected health information via cloud services. 52% of the organization use encryption 21% uses tokenization. 26% of the companies use Hadoop distribution, 27% uses Cloudera, Hortonworks is used by 15%, MapR being used by 10% and 3% uses IBM system[3]. Rest 19% of the companies do not uses any form of security features to protect their Big data. Hence, their data is susceptible to attacks from hackers and needs security.
The current research will analyse security measures in Big data evaluating the implementation process of securing Big data. Assessment of total security through such security measures will also be conducted through this research along with identification of privacy and security challenges in implementation of such security measures.
This chapter provides a brief overview of the existing problems and issues as well as current, existing solution to mitigate such issues effectively.
Conducting reviews regularly of user access to data |
Companies using Big data should conduct reviews of data access on a regular basis with stakeholders of the company. Each personnel having authorization to data access should be reviewed based on their job description and ensured that their access does not cause any harm to security of the Big data[4]. Such review can also be made through judging the work or employment responsibility within company. Further employees leaving the company should not be provided access to such Big data. Additionally employees should be prohibited to access the confidential data outside the company demilitarized zone (DMZ) for any unofficial purpose and without prior notification[5]. |
Masking of data |
Redaction of sensitive company data should be masked especially the names, social security numbers etc. Such data masking will ensure that the big data will not be shared outside the company by any means. Data masking is very essential in scenarios where companies sell their big data to third party companies and in such cases, very essential to protect their data through masking. |
Data encryption |
Data should be encrypted so that any unauthorized personnel cannot access the data without decrypting it with proper decryption key. Encryption of the Big data is very much essential to obtain flexibility of usage during any time required maintaining adequate security[6]. Encryption also ensures that data is protected from being corrupted while transferring it via internet. |
User behavior monitoring |
Big data is also secured through using monitoring of user behavior that keeps an eye on the habits of each user thus constructing a model for each user accessing data[7]. Any kind of anomaly in user access is considered a threat in the user behavior monitoring system. It usually detects as threat from usage pattern leading to immediate actions preventing access to the big data by that user. A company usually is notified regarding a security breach on an average of 250 days and is usually facilitated through using a user behavior monitoring that monitors access of each data from personnel based on their data accessing[8]. |
Prioritising data security |
Security of data is kept prime importance through governance of data by proper authorities thus ensuring that data security is kept as highest priority and hence is maintained on a regular basis. Such prioritizing of the data security is also kept on top agenda list in any company meetings and data policy of the company, which is updated on an annual basis. |
Big data is secured through Vormetric solutions that enable maximizing big data analytics security of sensitive data and thus helps in addressing requirements of compliance. Granular control, comprehensive coverage and robust encryption are required for securing big data from volatile environment judging from result analysis[9]. Single solution of security offers important aspects of major areas that helps in controlling and optimization of adherence compliance. Vormetric data security offers provides opportunities for encryption of big data, access control and key management that features offerings made for products having an extensible infrastructure.
Vormetric data security also helps in generating intelligence of data security through processes, users and applications. Broad range of sources containing unstructured and structured for big data initiatives are used for implementing security features in company database[10]. Based on data security two types of vormetric data encryption can be implemented which are transparent and application based.
Aim of the study
Vormetric transparent encryption helps ion controlling and encrypting data access at a system level of files. This kind of solution for encryption is very much easy to deploy, as no serious change for applications is required[11]. Transparent encryption is very much easily deployed on main servers encrypting outputs of big data controlling
Vormetric application encryption helps to encrypt the columns specifically in an application prior to writing those fields to database. This type of encryption ensures that a specific column remains inaccessible and hence cannot be read after incorporating it within big data framework. Analytic application is created through encryption products at specific fields.
Maintenance, analysis and monitoring of audit logs are maintained through clusters of big data and implementing features form Apache oozie. Such implementation is usually done based on examination and monitoring of logs that are created based on user-accessed files. Configuration of software and hardware should also be done based on servers and system’s architecture that ensures machine patch up and upgrading having little privilege to smaller number of users. Automation frameworks like Puppets and other configuration of automated systems are also utilized to servers of big data in a secure, uniform enterprise.
Programming tools are used for encrypting the database including Hadoop, NoSQL and other application software such as Apache Accumolo .20.20x version addressing challenge. DataStax Enterprise and Cloudera Sentry are application software, which helps in enhancement of application layer[12]. Nature of big data environment when compared to database server of high-end nature, the environment existing are found to be more complicated as well as vulnerable to foreign attack.
Security achieved in big data is by using security information and event management (SIEM) along with similar grade technologies. The security tools used for analysis of Big data contains two categories of functional origin which are SIEM with availability and performance monitoring which is abbreviated as PAM[13]. Log management, behavioral management analysis along with monitoring of application is focused with management operations. Big data analytical tools are the integration of PAM and SIEM, which helps in collection of huge volume of data, integration of such data and their analysis in real time that has additional capabilities.
The security achieved through using Big data analytics improves quality of data collected along with integration of security tools with third party LDAP and active directory servers that supports incident response workflows among SIEM tools. Major feature of the security measures is scalability that helps collecting data at real time, which is done, via packet analysis in a deeper search within the network as well as the system[14]. The events are correlated across various platforms that help in getting system logging information having differentiator with respect to events for a short period.
Visualization and reporting of security professionals helps in supporting compliance as well as operations reporting. Dashboards are accessible which helps in indicating various key performance indicators such as degree of security provided, web servers’ packet details and network analysis. Storage in big data being persistent also enables longer and archival storage of latency. Computational model in processing of batch is resistance to failure and intensive processing of I/O[15]. Splunk Hunk platform helps in visualization and analysis of NoSQL databases as well as Hadoop that sits between data stores in a non-relational stores and rest of applications integrating application environment[16]. The measures are also capable to detect intrusion products, which destabilize environment and system behavior for a malicious activity.
Objectives of the study
Security features provided by security measures lead to web interface that is insecure. Exploitation of administration of web interface helps in obtaining access to control over Internet of things (IoT) devices of unauthorized in nature[18]. Insufficient authorization also allows attacker and hackers in exploitation of poor password policies and breaking weak passwords for having access to IoT device at privilege modes and thus allows the malware to jump from network to IoT devices.
Cloud feature security provided by third party licensed software also collides with the existing infrastructure and system architecture leading to crashing of entire database due to data and system incompatibility. Insufficient knowledge of system architecture of host system, the security application also takes irrelevant steps like deleting existing database just on mere suspicion of malicious activity. Insecure mobile interface also allows the attackers to hack demilitarized zone (DMZ) and thus access control over the device through mechanism of configuration. Software/firmware is taken control and advantage of through unauthenticated and unencrypted connections in hijacking of IoT updating devices. If these malicious attackers are getting access to the system, it might corrupt the entire database and create crashing of system memory. Attackers also get access to host system through use of jumping the air gap through flash drives that are used for connecting to system network by unauthorized personnel.
Due to enhanced security measures, deployment of open source tools are used as emerging trend and single applications are often distributed across various machines in clusters that makes the configuration management challenging. Production of analytics in big data cluster spreads across incompatible, numerous JSON, XML and text files. Further complications are clustered, set up, configured and patched reducing any kind of security holes. Organisations started getting concerned about their security of big data through deployment of Apache ranger addressing current vacuum. In perimeter-based security model of Big data security of networks lacks and are dependent of perimeter based security sources. Such security sources believe that any service providers outside the perimeter cannot be trusted and hence cannot be used by any means[19]. NoSQL database servers are often installed in security needs of Big data and due to rapid market growth and lack of knowledge, critical components of the feature are not optimized. NoSQl cannot be configured effectively due to its program and application complicacy.
This chapter also provides a brief outline of research method to be adopted. A research onion has been provided to highlight the steps of research to be carried out. Research philosophy has been provided that helps to judge which paradigm should be adopted for this research. The research approach, research design, data collection process, size, sampling techniques are also discussed that is proposed to be carried out for the research. Ethical considerations of the study have also been provided that is to be followed in the research.
Research philosophy proposed for this research is post-positivism with a research approach of deductive research. Research design that will be followed is exploratory research. Data collection process will be both quantitative data collection via surveys and questionnaire as well as qualitative research via interviews with ex professional IT managers and experts. Probability sampling will be followed for inclusive and exclusive sampling. Sample size will be of 51 employees using Big data in their respective different organizations and 5 ex IT managers and experts of reputable organization and vast experience.
Rationale of the study
Research onion provides a detailed outline of the various research methods that needs to be followed. Hence, it provides a planning and carrying sequence of progression throughout the research. Different stages are described in details chronologically and help the researcher to adopt the suitable research method for a particular research.
Research philosophy to be adopted for this research will be post positivism, which helps in determining critically the reality and ensures that all theories can be justified through necessary supporting. It also believes that every theory has errors and each theory can be revised due to its fallible nature[21].
Other philosophies are not selected due to their irrelevant and limited scope for this research. Philosophies like realism cannot give accurate result as they consider incommensurability of various perspectives as different big data security has different working procedure and their security features.
Deductive research approach will be selected as it provides a more deductive approach that is a narrow approach. In deductive research approach, a research is started from a generalized view and progress towards a specific conclusion of the research. Thus, deductive approach is very much important in obtaining desirable results through exploring different security measures in Big data.
On the contrary, inductive approach will not be selected as it gives a wider approach. In inductive approach research is carried out from a specific view to a more generalized view that is the research is drawn from a previous research that has been carried out and it progresses towards a conclusion drawn from the previous researches. Hence, inductive approach will be rejected in this research.
Exploratory research design will be selected as it focuses on the insight and ideas that will be discovered regarding the Big data security in contrary to accurate data collection statistically. Exploratory research is very much applicable for implementation of research plan, as it is very much suitable for beginning a research[22].
Explanatory research design is not selected as it promotes to explanation of terms, which is very much irrelevant for this research. Explanatory research also leads to providing description and explanation of the research rather than providing insight and analyzing the research findings.
Quantitative data will be collected by both primary and secondary methods. Primary data will be collected with the help of surveys and online questionnaire of closed type, which will be provided, to them electronically by emails. The surveys and questionnaire will be conducted on 51 employees of different organization using Big data. These questions when analysed will ensure that detailed insight will be collected in those matters. Secondary data will be collected through interviews with retired professionals and experts of IT sectors including 5 managers.
Qualitative data will be collected through databases of books, journals and websites of various Big data software as well archives containing information regarding their security features and impacts.
Probability sampling will be carried out in sampling method and inclusive as well as exclusive sampling will be done for sampling techniques. Inclusive sampling will help the researcher to include the relevant data and exclusive sampling will enable the researcher to dismiss the irrelevant data[23].
Literature Review
Inclusive sampling is used for selecting data from a range of data to be collected in a research. Exclusive sampling is required for excluding the irrelevant data that is collected from both quantitative and qualitative techniques.
Sample size that is selected for this research includes 51 employees of IT firms. Qualitative data will be collected from 5 retired IT experts and professionals of reputed organizations.
All data that will be collected is considered to be taken by ethical means. Unethical ways will not be adopted for data collection. The data that will be collected will not be used for commercial purpose. Data protection act will be followed when collecting the data regarding Big data. Any relevant policies of the companies will be followed accordingly.
Week |
Task |
1 |
Will obtain some knowledge about the research, e.g. how to read papers, how to evaluate papers |
2 |
Will collect the necessary information regarding the research through journals, books, online articles, YouTube videos, company websites etc. |
3 |
Will prepare the electronic handout that will be provided to the employees for conducting surveys which will consist of open and closed ended questions |
4 |
Start interviewing retired IT professionals and experts of reputed organizations and conduct the online surveys and questionnaire |
5 |
Collect data from archives, database, library and various websites |
6 |
Analyse the data collected |
7 |
Incorporate the findings and analysis and prepare the final research document |
After completion of this research, it is anticipated to analyse the security measures in Big data. The implementation process of security in Big data will be evaluated highlighting the processes of such implementation. The total degree of security achieved through security measures will also be assessed so that any possible risks are identifies through risk analysis. Finally privacy and security challenges that will be faced during implementation of security measures in Big data will also be identified after successful completion of this research.
Reference List
Aradau and T. Blanke, “The (Big) Data-security assemblage: Knowledge and critique”, Big Data & Society, vol. 2, no. 2, p. 205395171560906, 2015.
Constantine, “Big data: an information security context”, Network Security, vol. 2014, no. 1, pp. 18-19, 2014.
Tankard, “Encryption as the cornerstone of big data security”, Network Security, vol. 2017, no. 3, pp. 5-7, 2017.
Dumbill, “Big Data is Rocket Fuel”, Big Data, vol. 1, no. 2, pp. 71-72, 2013.
Ohlhorst, Big data analytics, 1st ed. Hoboken, N.J.: Wiley, 2013.
Ganelin, E. Orhian, K. Sasaki and B. York, Spark, 1st ed. Indianapolis, IN: Wiley, 2016.
Gartner warns of big data security problems”, Network Security, vol. 2014, no. 6, p. 20, 2014.
Information Security Analytics, 1st ed. Amsterdam: Syngress Media Inc, 2015.
Bedell-Pearce, “When big data and Brexit collide”, Network Security, vol. 2017, no. 2, pp. 8-9, 2017.
Hoskins, “Common Big Data Challenges and How to Overcome Them”, Big Data, vol. 2, no. 3, pp. 142-143, 2014.
Lesk, “Big Data, Big Brother, Big Money”, IEEE Security & Privacy, vol. 11, no. 4, pp. 85-89, 2013.
Jiang, S. Al-madeed, A. Bouridane, D. Crookes and A. Beghdadi, Biometric security and privacy, 1st ed. Cham, Switzerland: Springer, 2017.
Hipgrave, “Smarter fraud investigations with big data analytics”, Network Security, vol. 2013, no. 12, pp. 7-9, 2013.
Protasowicki and J. Stanik, “Big data within national security threat analysis”, Ekonomiczne Problemy Us?ug, vol. 123, pp. 275-286, 2016.
C. Aradau and T. Blanke, “The (Big) Data-security assemblage: Knowledge and critique”, Big Data & Society, vol. 2, no. 2, p. 205395171560906, 2015.
J. Bedell-Pearce, “When big data and Brexit collide”, Network Security, vol. 2017, no. 2, pp. 8-9, 2017.
C. Constantine, “Big data: an information security context”, Network Security, vol. 2014, no. 1, pp. 18-19, 2014.
E. Dumbill, “Big Data is Rocket Fuel”, Big Data, vol. 1, no. 2, pp. 71-72, 2013.
Ganelin, E. Orhian, K. Sasaki and B. York, Spark, 1st ed. Indianapolis, IN: Wiley, 2016.
Gartner warns of big data security problems”, Network Security, vol. 2014, no. 6, p. 20, 2014.
M. Hoskins, “Common Big Data Challenges and How to Overcome Them”, Big Data, vol. 2, no. 3, pp. 142-143, 2014.
S. Hipgrave, “Smarter fraud investigations with big data analytics”, Network Security, vol. 2013, no. 12, pp. 7-9, 2013.
Information Security Analytics, 1st ed. Amsterdam: Syngress Media Inc, 2015.
R. Jiang, S. Al-madeed, A. Bouridane, D. Crookes and A. Beghdadi, Biometric security and privacy, 1st ed. Cham, Switzerland: Springer, 2017.
M. Lesk, “Big Data, Big Brother, Big Money”, IEEE Security & Privacy, vol. 11, no. 4, pp. 85-89, 2013.
F. Ohlhorst, Big data analytics, 1st ed. Hoboken, N.J.: Wiley, 2013.
T. Protasowicki and J. Stanik, “Big data within national security threat analysis”, Ekonomiczne Problemy Us?ug, vol. 123, pp. 275-286, 2016.
C. Tankard, “Encryption as the cornerstone of big data security”, Network Security, vol. 2017, no. 3, pp. 5-7, 2017.
C. Aradau and T. Blanke, “The (Big) Data-security assemblage: Knowledge and critique”, Big Data & Society, vol. 2, no. 2, p. 205395171560906, 2015.
J. Bedell-Pearce, “When big data and Brexit collide”, Network Security, vol. 2017, no. 2, pp. 8-9, 2017.
C. Constantine, “Big data: an information security context”, Network Security, vol. 2014, no. 1, pp. 18-19, 2014.
E. Dumbill, “Big Data is Rocket Fuel”, Big Data, vol. 1, no. 2, pp. 71-72, 2013.
Ganelin, E. Orhian, K. Sasaki and B. York, Spark, 1st ed. Indianapolis, IN: Wiley, 2016.
S. Hipgrave, “Smarter fraud investigations with big data analytics”, Network Security, vol. 2013, no. 12, pp. 7-9, 2013.
M. Hoskins, “Common Big Data Challenges and How to Overcome Them”, Big Data, vol. 2, no. 3, pp. 142-143, 2014.
Information Security Analytics, 1st ed. Amsterdam: Syngress Media Inc, 2015.
R. Jiang, S. Al-madeed, A. Bouridane, D. Crookes and A. Beghdadi, Biometric security and privacy, 1st ed. Cham, Switzerland: Springer, 2017.