Research Questions
1. Background
“Big Data” as its name indicates is a collection of huge amounts of formless and meaningless data which are generated by high-quality and heavy software applications belonging to a varied group of software applications such as social networks, a wide variety of scientific computing applications, medical information systems, e-government applications, and many more. The research has shown that data that is used and processed by these different software applications share some common attributes (Davis & Patterson, 2012). Some of these common characteristics can include large-scale data (which defines the distribution and size of data stores), scalability issues (it define the functionalities and features software applications processing across-the-board, huge data repositories such as big data). Scientific computing is believed to be one of the most important application areas for the reason than in this domain academic researchers and scientific create huge amounts of data every day in the results of their experiments and tests (for instance consider fields such as astronomy, high-energy physics, biomedicine, biology and many others). On the other hand, extracting valuable information and knowledge for different useful tasks on the basis of these huge, comprehensive data stores seems to be impracticable for common database management systems and other similar analysis tools
2. Research questions
- What are the common characteristics of Big Data?
- Identify the advantages and challenges associated with the use of Big Data.
- Review literature on Big Data.
3. Literature Review
3.1 Examples of Companies using Big Data
When dealing with big data, there are two important things that are usually taken into account; data generation and data analytics. Data analytics deals with the analysis of data and occur at two levels. Level 1 deals with data collection and regeneration while level 2 is predictive and prescriptive (Feinleib, 2014).
Disney has invested 1 million dollars in data in data generation and handling. They have developed a wrist pin that is given to customers and it collects data and relays it to a large central server where it is analyzed. This data is used in marketing and improving customer service (Wang, 2017). IBM, on the other hand, has invested 24 billion dollars in data analytics and through a company called Watson has employed about 15000 analytical practitioners to handle data collection and analysis. This data is used to analyze the market and improve their business. Another example on the use of data analytics in business is the BMW motor company whereby a survey conducted showed that people who were getting into cars were always having their windows broken in winter as a result of ice accumulation (Xu & Zhou, 2015). In order to improve the customer confidence in their product, the company took it upon itself to wash the parked cars and give them back to the customers whenever they wanted to leave. In this way, the customer confidence in BMW improved. Facebook uses data analytics to conduct surveys and improve their business and the quality of service they offer to their customers (Fontichiaro, 2018). Recently, Facebook conducted a survey asking the question of which gender between males and females spends more time sharing photographs on Facebook and the data collected showed that women spend more time sharing photographs than men. About 350 million photographs were shared daily on Facebook.
Literature Review
According to (Walker, 2015), for one to become a data scientist he must have data handling skills such as programming, databases creation and analysis, mathematical modeling, statistical analysis and above all he must be creative. If we analyze the trend on the use of big data by big companies, it is evident that companies are hesitant in investing in big data. About 55-60% of the investments in big data fail. This can be attributed to the fact that the companies start on technology first rather than an understanding of the understanding of the fundamentals of the business (Yadav, 2017).
Today, there is a very high demand for data in business performance and market analysis and hence the need for companies to invest in big data. However, a major setback in handling big data is the shortage of data scientists to work in this field. This comes in as a challenge to education institutions to train experts to work in this large data (Frampton, 2015).
Data mining entails the process of collecting and analysing large data volumes or data sets in order to discover their respective relationships. On the other hand, the term, ‘Big data’ describes a massive structured and unstructured data volume, which is so complex to process using the common or traditional database and technical software functionalities.
It is very essential to note that a deep scrutiny of real world commercial implementation of data, makes the International Business Machines (IBM) come out as one of organizations with a high quality ‘Big data’ hub. At this company’s ‘Big data’ hub, large volumes of information are handled, which are actually very hard to process in a traditional database. The data hub is composed is of data mining engines integrated to aid in easy handling of data.
The integration of data mining in IBM has made very easy and fast for the company to manage and process data in its globally placed (using cloud technology) immense data warehouses. Thus, this makes it clear that although the data is large, it is realistically the simplest and easily tolerable data volumes in data mining. In this sense, I hereby agree that the term ‘Big data’ is actually an over-hyped buzzword for data mining.
Microsoft Incorporation is one of the most successful software companies globally. Due to the large data volumes handled at the company, the subject of ‘Big data’ in the company has also been a subject of concern. At this company, issues related to ‘Big data’ have usually been experienced in scenarios where the organization’s traditional database system is exhausted with the ever-increasing data volumes. This includes operating system files, cache files, customer data and management information system data. However, through the adoption of data mining engines, Microsoft Incorporation has smoothly been handling all the large amounts of data that it shares globally with clients and partners. Therefore, this case study further makes me agree with the statement.
Approaches for Handling Big Data
Facebook Incorporation is a social network website that manages online communication for over a billion global users a month. These users share messages, photos, poking, placing status and storing personal data. In essence, the company handles very immense and complex data volumes, which cannot be manipulated in a simple traditional database. However, through the integration of data mining engines, the websites can easily and at a fast rate allow users to perform all their desired operations.
PayPal is one of the globally top companies in electronic of electronic commerce transactions. Considering the huge customer, transaction, and management information system data from all around the globe, it is clear that it handles terabytes of information in an hour. Having a database that handles terabytes of data is very challenging. All this faster and large data handling success is facilitated by the integration of data mining engines in its online system. This means, although the data is very large and complex, it is actually very simple for the data mining engines to handle. Thus, I hereby respond by saying, ‘Yes’ to the statement that, ‘Big data’ is an over-hyped ‘Buzzword’ for data mining.
3.2 Characteristics of Big Data
Though, the term “big data” is used in different ways in different disciplines. However, in their paper (Holdaway, 2014) define some common characteristics of the big data idea as they have to do with analytics:
- Investigating unstructured data and text to determine if these sources can be used to get useful information for the particular purpose (depending on the field for which this data is being collected).
- Minimizing the time difference between data collection and performing operations on these data to make is effective, which is sometimes known as near real-time business analytics (Thomas & McSharry, 2015).
- Making use of this analytics to drive business decisions further than the capabilities provided by the traditional business intelligence (BI) stack.
- Searching for and implementing economic, extremely scalable analytics frameworks (Tanik & Fielder, 2017).
Without a doubt, with big data a wide variety of organizations get a wonderful opportunity to control a number of sources of data (comprising both unstructured and structured forms of data) (Strong, 2015). Additionally, web data are based on exclusive characteristics for the reason that its scalability, its numerical redundancy, and accessibility of user responses and feedback (through click information and query logs) have made the collection and extraction of easy to use data (such as entities) from web data particularly attractive (Hurwitz, Nugent, Halper, & Kaufman, 2013). In fact, a wide variety of tools and techniques for entity extraction have already been productively utilized for determining sources to different product labels, people, or locations mentioned on websites. For instance, in the enterprise applications voice of the customer are used to discover opinions, sentimental knowledge and useful trends against a particular set of products and services available on a website and blogs. In this scenario, these kinds of applications allow business organizations to develop products in the light of the results of these applications. These results are derived from these huge amounts of data (Plunkett, 2014). There is another useful instance of a functional service that can include product data conflation derived from web information, for instance identifying frequent synonyms of products from the click information and web query log. Without a doubt, the identification of these high class services, providing a huge group of resulting data assets, and offering easy to use tools and applications for information extraction collectively can help develop a platform that is capable of effectively utilizing the social media and web data for a wide variety of applications (In Dey, In Hassanien, In Bhatt, In Ashour, & In Satapathy, 2018).
3.3 Advantages of Big Data
In addition, one of the most important advantages of big data is it allows the business organizations to identify and discover deep insights in data that can be effectively used for decision making (Paley, 2017). In this scenario, machine learning is believed to be one of the most useful technologies that can be used to open these insights. In fact, machine learning has already been used for a long time in a wide variety of areas (for instance internet search, fraud detection and marketing and advertising) (Simpson, 2016).
At the present, in order to make effective use of these huge amounts of data, business organizations and users should possess adequate knowledge of data querying and implement latest techniques to explore data. However, they face some difficulties and challenges while competently searching for deep insights in data (Kimball & Ross, 2013). Some of the challenges that they face in this process can include various questions such as, how to recognize applicable sets of data without difficulty from a large number of data sources, what and how to make use of data clean-up tools and techniques (Schutt & O’Neil, 2013), for instance estimated links between different data sources, deciding the techniques to select samples and results of a query gradually, and how to get effective demonstration? Additionally, the development and effective use of these applications require effective systems skills and various algorithmic issues and problems in each of the above-mentioned challenges (Molaro, 2013).
3.4 Challenges of Big Data
(Krishnan, 2013) discuss a number of challenges that users and organizations can face while using big data analytics. According to their viewpoint in view of the fact that big data is based on a collection of data from various sources and some sources such as sensor networks, can create astounding amounts of unstructured data. However, it can be purified and condensed by orders of scale. In this scenario, one biggest challenge for data scientists is to establish these purification and compression processes in such a way that they do not miss valuable facts (Mohanty, Jagadeesh, & Srivatsa, 2013). In addition, mining of these huge amounts of data can require dirt free, integrated, resourcefully available and reliable data, declarative mining and query interfaces, effective mining programming, and big data processing setting. In this scenario, one of the serious problems is the lack of harmonization among various database systems used to maintain the data and offer SQL querying.
At the present, there are a wide variety of effective and efficient tools that can use to deal with big data-related problems and issues. For instance, (Manning, 2013) discuss a tool known as Glade, which presents an easy to use interface to perform a wide variety of investigative actions and a dedicated runtime for aggregation. In this scenario, the interface consists of a standard UDA interface at the same time as the run-time looks like the relational aggregate operator.
4. Methodology Review
4.1 SECTION 1 – REVIEW
The research approach is thought to be an essential part of any research. The fitting choice of research approach frames the reason for solid examination. In this proposed research consider, a systematic approach will be taken after for achieving the coveted points and targets effectively. In such manner, a subjective research approach will be utilized, as it is a methodology of finding out qualities, discernment and in addition states of mind of people in an upgraded way. A subjective approach will help with having an enormous comprehension about big data (Rossi, Wright, & Anderson, 2013).
The collected data will be analyzed through a qualitative approach, for ascertaining findings reflecting the important information about competitions and prices and other market trends in the electronic markets. Additionally, in this proposed research study an inductive approach will be used with the intention of obtaining about electronic markets from a broader perspective. Subsequently, the inductive approach will also assist in understanding different factors responsible for market trends and current market conditions of electronic markets. Moreover, the analysis of data will also aid in measuring the extent as well as trends of customer satisfaction and sentiments of customers in relation to products and/or services (Rossi, Wright, & Anderson, 2013).
4.2 SECTION 2 – SELECTION
In this research, Online Survey will be the methodology that will be used. This will be done by making a record with SurveyMonkey and after that planning a survey. In the wake of outlining a total survey then a web connection is created. This web connect is sent by means of SMS, email or any talk benefits that my objective population can utilize. They will be required to tap on the connection and in under five minutes filling in the questions.
A few preferences of utilizing online surveys include:
- Convenience – Participants can round out questionnaires when they decide to and begin and stop a survey at their recreation. This gives control over finishing the survey to the person, which can expand commitment and response rates.
- Cost effective-No money is required for printing or notwithstanding making calls. You simply need to send the survey to connect to your objective respondents.
- Saves on time – Through an online input administration framework like SurveyMonkey, you can rapidly make, manage, gather and break down surveys. Playing out these functions in one incorporated web framework saves you a considerable measure of time.
5. Conclusion:
In conclusion, this paper has presented a detailed analysis of a latest emerging IT concept known as big data analytics. It is an admitted fact that organizations heavily depend on huge amounts of data for deriving useful facts in order to improve their business performance. Though, in this process, they can face a number of serious problems. However, there are a number of tools that can be used to effectively deal with these problems. The research has shown that in the future the use of big data analytics will further grow and expand in other areas of life. Without a doubt, big data analytics seem to be an up-and-coming type of knowledge work, which offers a large number of capabilities and opportunities for both academic and business perspectives.
6. References:
Davis, K., & Patterson, D. (2012). Ethics of big data. Sebastopol, CA: O’Reilly Media.
Feinleib, D. (2014). Big Data Bootcamp: What Managers Need to Know to Profit from the Big Data Revolution. (Big data bootcamp.) Berkeley, CA: Apress.
Fontichiaro, K. (2018). Big data.
Frampton, M. (2015). Big data made easy: A working guide to the complete Hadoop toolset.
Holdaway, K. R. (2014). Harness oil and gas big data with analytics: Optimize exploration and production with data driven models.
Hurwitz, J., Nugent, A., Halper, F., & Kaufman, M. (2013). Big data for dummies. Hoboken, NJ: Wiley.
In Dey, N., In Hassanien, A. E., In Bhatt, C., In Ashour, A., & In Satapathy, S. C. (2018). Internet of things and big data analytics toward next-generation intelligence.
Kimball, R., & Ross, M. (2013). The data warehouse toolkit: The definitive guide to dimensional modeling.
Krishnan, K. (2013). Data warehousing in the age of big data.
Manning, P. (2013). Big data in history. Houndmills: Palgrave Macmillan.
Mohanty, S., Jagadeesh, M., & Srivatsa, H. (2013). Big Data Imperatives: Enterprise Big Data Warehouse, BI Implementations and Analytics. Berkeley, CA: Apress.
Molaro, C. (2013). DB2 11: Database for big data and analytics.
Paley, N. (2017). Leadership Strategies in the Age of Big Data, Algorithms, and Analytics.
Plunkett, T. (2014). Oracle big data handbook: [plan and implement an enterprise big data infrastructure]. New York: McGraw-Hill.
Rossi, P. H., Wright, J. D., & Anderson, A. B. (2013). Handbook of Survey Research. Burlington: Elsevier Science.
Schutt, R., & O’Neil, C. (2013). Doing data science.
Simpson, D. (2016). The Use of Big Data. Hauppauge: Nova Science Publishers, Inc.
Strong, C. (2015). Humanizing big data: Marketing at the meeting of data, social science and consumer insight.
Tanik, U. J., & Fielder, D. (2017). Transdisciplinary Benefits of Convergence in Big Data Analytics. Big Data and Visual Analytics, 165-179. doi:10.1007/978-3-319-63917-8_9
Thomas, R., & McSharry, P. E. (2015). Big Data revolution: What farmers, doctors and insurance agents teach us about discovering Big Data patterns. Chichester: John Wiley & Sons.
Walker, R. (2015). Benefits of Scale and Velocity in Big Data. From Big Data to Big Profits, 35-60. doi:10.1093/acprof:oso/9780199378326.003.0002
Wang, B. (2017). Creativity and data marketing: A practical guide to data innovation.
Xu, C., & Zhou, A. (2015). Quality-aware scheduling for key-value data stores.
Yadav, V. (2017). Processing big data with Azure HDInsight: Building real-world big data systems on Azure HDInsight using the Hadoop ecosystem. New York: Apress.