Major milestones of data mining evolution
As the adoption of technology is growing rapidly, number of smart devices are also increasing all over the world. This devices are used by enormous amount of people to transfer data, automate tasks and communicating with different devices. As this applications of such technologies are growing, the total data usages is also increasing. Devices such as phones, computers are constantly generating data every day. Generally, huge amount of raw data are stored into a physical database or cloud. In both cases, data must be stored, organized, summarized, analyzed and synthesized for optimal decision-making purposes. Useful as well as desiring the information is very difficult to extract with the large number of data available.
Data mining is basically a statistical application, the methods of data mining are somewhat different from the statistical methods. Even though data mining has gain popularity around this century. The data mining has evolved many years before the evolvement of Moneyball and Edward Snowden (Witten et al. 206). The following shows the major milestones about how the data mining evolved with blended data science as well as big data.
In 1763, Thomas Bayes’ published a paper associated with the relating probability. This is known as bay’s theorem as considered as a fundamental of data mining because it helps to understand the complex realities that are based on the estimated probabilities.
In the year 1805, Carl Friedrich Gauss as well as Adrien-Marie Legendre construct regression for defining body orbits around the Sun, planets as well as comets. For regression analysis, estimating relationship among the variables is done. It is still considered as a key too in data mining.
In 1936, Alan Turing published a paper which provides certain idea of a universal machine which is capable of computing like modern day computers.
1943 Warren McCulloch and Walter Pitts were the first to create a conceptual model of a neural network. This network is capable of receive inputs, process inputs and generate output.
In 1970, Moderate database management system enables the possibility to store and query petabytes and terabytes of data. This type of warehouse enable analytical way of viewing the data (Larose and Larose 2014).
In 1980, HNC first introduced the phrase “database mining”. The purpose of the workstation was to build a neural network.
1989 The term “Knowledge Discovery in Databases” (KDD) is coined by Gregory Piatetsky-Shapiro.
Present (2018) finally, one of the most active techniques being explored today is Deep Learning.
Nowadays, data mining is utilized in various organization including education sector, health sector, public sector, telecommunication sector, construction sector and science and engineering sector.
Impact on different sectors
Education Sector: Data mining is used in various studies in the education sector including:
- Determining the relationships between the socioeconomic level of students and the level of academic learning.
- Determining the relationship between academic success and participation in extracurricular activities of university students.
- Preventing students from failing and determining the factors that affect success.
- Determining the profiles and preferences of students entering the university entrance exam.
- Choosing a profession according to the demographic and personal characteristics.
Health Sector:
- to prevent corruption in hospital expenditures
- to determine the high risk factors in surgeries
- to classify the patient data according to factors such as age, gender, race and treatment,
- to set the success of treatment methods applied in the hospital,
- to estimate the resource use and patient numbers in hospital
- to determine the treatment method to be applied to the disease
Telecommunication Sector
- to identify the characteristics of customers who need special action as suspension or deactivation
- to determine user templates for social network usage
- to determine the future movements of mobile users,
- creating the preference of university department, determining the factors that influence the preference order of new enrolled students
- determining the status of students’ pass and fail
Public Sector:
- To determine which offenders are likely to commit crimes in terms of safety.
- to forecast the future of public investment, analyze the data in defense industry
- to classify public expenditures, plan the correct use of resources
- to measure performance of employees, manage business processes
- for measuring the performance of employees, manage business processes
- for measuring the performance of employees, manage the business processes
Construction Sector
- to determine the location of data mining method in construction management
- to determine which offenders are likely to commit crimes in terms of safety
- to determine the relationship of leadership-motivation between the chief and the worker
- to forecast the future of public investment, analyze the data in defense industry
- to classify public expenditures, plan the correct use of resources
- to measure performance of employees, manage the business processes
- to estimate population, forecast the weather, determine new job opportunities
Data mining is used for forming similarities for searching the value of business information in the large database. Data mining mainly provides an optimal solution to analyze data in order to collect and manage data. It also can generate new business opportunity in near future which are followed:
Collective or distributed Mining: Researchers are heavily focusing on the collective and distributed data mining as it is getting as high amount of attention. Most of the researchers are focused on the physical database and data warehouse to store and collect information more effectively (Wu et al. 2014). The problem occurs when warehouses are situated in different area. It is hard to collect information from different location. This is commonly known as Distributed data mining. As for instance, data from different farms consist of different branches of a corporation, it is very difficult to analyze as well as extract desired data from this data source. The concept of DDM uses various types of approaches that are used traditionally for analyzing the combination of the local data analysis (Romero and Ventura 2013). To perform the local data analysis, there should be data models generated as well as combining the data models from various data sites for developing the global market.
Ubiquitous data mining: The advantage of using mobile phones, laptops as well palmtops for generating the using of access for the large quality of the data that are possible. For analyzing the data in an advanced way, the data are used for extracting the information. For accessing as well as analyzing the data from the device mainly offers many challenges. For example, the UDM provides extra cost because of computing, communication, as well and security (Fan and Bifet 2013). The main object of UDM is mining the data for minimizing cost of the device. The second challenge that is provided by the UDM is the interaction between the human and the computer. There are visualize patterns involving clusters, and classifiers that are associated. With the small areas of display, there are serious challenges that are offered for interactive environment of data mining (Freitas 2013). The management of data in mobile environment also creates a big challenge. The main issues that are considered mainly includes UDM, algorithm of the distributed as well as mobile communication. There are also markup languages as well as various representation of the techniques that are available for UDM.
Future scope
Hypertext and the Hypermedia data mining: Hypertext and the hypermedia data mining are categorized as the data mining that includes many hyperlinks, other hypertext forms of information as well as the text mark up. This data mining is mainly related to the web mining as well as the multimedia mining. In case of supervised learning or classification, one process mainly starts by providing reviewing the training data where the items are marked for being part of the class or the group. The data is basic that helps to construct the algorithm (Demšar et al. 2013). There can be optimal solution for data mining that helps in supervised learning and also helps to analyze the social network.
Conclusion:
Data mining is basically a statistical application, the methods of data mining are somewhat different from the statistical methods. The data mining provides an optimal solution to analyze data in order to collect and manage data. Nowadays, it is widely adopted by various organizations in order to taking advantages of automation (Baker and Inventado 2014). Typically. It tasks huge expenses and time to analyze data and extracting desired information. Data mining is very useful and adopted by medical sectors, health sector, public sector, telecommunication sector, construction sector and science and engineering sector (Braha 2013). Since the early development of the miming, it provides multiple opportunities and the list is still growing. Data mining provides some key advantages over the typical neural network. In this paper, basic description of the data mining is illustrated along with the important milestones of deployment. Later the possible future scopes are illustrated along with the impact of data mining on various organization.
Reference:
Witten, I.H., Frank, E., Hall, M.A. and Pal, C.J., 2016. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann.
Larose, D.T. and Larose, C.D., 2014. Discovering knowledge in data: an introduction to data mining. John Wiley & Sons.
Wu, X., Zhu, X., Wu, G.Q. and Ding, W., 2014. Data mining with big data. IEEE transactions on knowledge and data engineering, 26(1), pp.97-107.
Fan, W. and Bifet, A., 2013. Mining big data: current status, and forecast to the future. ACM sIGKDD Explorations Newsletter, 14(2), pp.1-5.
Demšar, J., Curk, T., Erjavec, A., Gorup, ?., Ho?evar, T., Milutinovi?, M., Možina, M., Polajnar, M., Toplak, M., Stari?, A. and Štajdohar, M., 2013. Orange: data mining toolbox in Python. The Journal of Machine Learning Research, 14(1), pp.2349-2353.
Freitas, A.A., 2013. Data mining and knowledge discovery with evolutionary algorithms. Springer Science & Business Media.
Braha, D. ed., 2013. Data mining for design and manufacturing: methods and applications (Vol. 3). Springer Science & Business Media.
Romero, C. and Ventura, S., 2013. Data mining in education. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 3(1), pp.12-27.
Tang, Q.Y. and Zhang, C.X., 2013. Data Processing System (DPS) software with experimental design, statistical analysis and data mining developed for use in entomological research. Insect Science, 20(2), pp.254-260.
Roiger, R.J., 2017. Data mining: a tutorial-based primer. Chapman and Hall/CRC.
Baker, R.S. and Inventado, P.S., 2014. Educational data mining and learning analytics. In Learning analytics (pp. 61-75). Springer, New York, NY.
Tan, P.N., Steinbach, M. and Kumar, V., 2013. Data mining cluster analysis: basic concepts and algorithms. Introduction to data mining.