Data Mining and Its Purpose
This assignment is about conducting an analytical research on the chosen topic: Data Mining. The aim of the assignment is to show the process of how the research was conducted practically through broad scan and then filtering out the appropriate and relevant information related to the topic through focused review.
In the later sections of the paper, the literature review about the topic Data Mining has been covered. Wherein the two most relevant papers filtered out from the whole research was selected and evaluated to determine the answers to questions like research problem, subject, instrumentation etc.
The research in this paper is based on ‘Data Mining methods to identify and detect data stealing and frauds’. A broad scan was conducted wherein I reviewed more than 20 papers and found more than 10 of them directly related to the topic which contained some of the relevant information.
The two papers chosen for the research are based on conventional approach and technology based approach on what kind of frauds exist related to it and also various data mining techniques on how to overcome detected frauds.
I researched about the most common problem related to technology which world is exposed to in today’s time. I found out about the online frauds related to credit cards, personal data which is the most sensitive data. I decided my topic for the paper to ‘Data mining in detection of crime and various data mining methods that are being used to detect identity crime. I researched about the topic on different platforms such as pdf’s from Google Scholar, papers based on IEEE and so on. There were a number of papers presented on this topic. All those papers have been referenced here in the bibliographic sectio
Table 1: Research Journal
Date |
Task |
Action |
Comment |
28/08/2018 |
Conducted a search for the topic |
Read about various problems related to technology |
Decided on two topics : Data Mining and Social Media Issues |
29/08/2018 |
Conducted a search to decide from 2 topics listed out previous day |
Read a number of research publications and papers on both the topic |
Selected Data Mining as the final topic |
30/08/2018 |
To find relevant information and papers related to the topic Data Mining |
Explored Google explored, Google scholar, IEEE reference papers, some libraries |
Approximately 25 papers were selected which contained some important and relevant information related to the topic |
31/08/2018 |
Literature Reading |
A quick reading given to 4 papers and journals |
Got some knowledge about the topic |
1/09/2018 |
Literature reading |
A quick reading given to 5 more papers and journals |
Got a more clear understanding of the topic and how to write up a literature review |
2/09/2018 |
Literature reading |
Read few more papers on the topic |
Understood the topic thoroughly and discarded papers which were not related or useful |
3/09/2018 |
To Filter out the papers from the collection remained |
A thorough reading to each of the paper filtered out previous day |
Selected two final papers for the reference |
4/09/2018 |
A thorough understanding of how to write literature reviews |
Researched few papers and guides on the proper steps to write up literature reviews. |
All the information collected on the topic and how to start with the assignment |
5/09/2018 |
Started with the assignment |
Cited the work and sources |
Citations which were related to my write-ups included them in the assignment |
6/09/2018 |
Reviewed both selected papers |
Noted down all topics and information to be included in the literature review section |
Paraphrased them in my own words |
7/09/2018 |
Written the Literature review section |
Covered all the requirements |
Completed the assignment before deadline. |
Filing Journal
Table 2: Filing System
Source |
Keywords Used |
No. of Returned Literatures |
No. of collected Literatures |
|
Most Common Technology Problems Current Online Issues Online Threats |
18000 7654 5436 |
2 1 2 |
Scholar |
Data Mining Tools Magical Thinking in Data Mining A Review on Data Mining Methods |
17896 14567 9078 |
3 3 4 |
IEEE |
Data Stealing Process Data Mining Tools |
432 235 |
2 6 |
VU Library |
Review on Data Mining Measured taken to detect frauds on credit cards. |
34000 1176 |
2 3 |
After reviewing the papers, documents presented in bibliographic section of broad scan, I concluded that I had sufficient information to continue my assignment and I did not need to research more on the topic. Also, I found that few of the papers were not much relevant to the topic and nothing much useful could be extracted from it so I decided to discard them. Here I am presenting the updated filing system and bibliography of the papers I chose to consult for further reading.
Updated filing System
Table 3: Updated Filing system
Source |
Keywords Used |
No. of Returned Literatures |
No. of collected Literatures |
Scholar |
Data Mining Tools Magical Thinking in Data Mining A Review on Data Mining Methods |
17896 14567 9078 |
2 2 1 |
IEEE |
Data Mining Tools |
235 |
2 |
VU Library |
Review on Data Mining Measured taken to detect frauds on credit cards. |
34000 1176 |
2 1 |
Any kind of fraud or crime related to someone’s personal data which has been obtained by wrongful deeds is termed as Identity crime. The crime is mainly attempted for economic gain. Using deceitful identity document and falsification are major enablers related to Identity Fraud. E-commerce is badly affected because of these crimes. The section 1.3.1 gives an introduction to Identity crime: how the crime is committed and what are the major areas where this type of crime is prominent. The next section 1.3.2 gives an overview of Data mining and the purpose of data mining tools. In section 1.3.3 we will talk about various data mining methods to identify and detect identity crime and in section 1.3.4 we will give a general discussion. In section 1.4 we will discuss about the different credit-cards related frauds and different mining techniques to detect credit card related frauds. Also predefined precautions to prevent these frauds.
Identity Crime
Data Mining Methods for Detection of Identity and Credit Card Fraud
An identity crime takes place when a person’s personal information is stolen and some kind of unlawful activity is observed using that information. For example: using a person’s credit card’s account information to charge products. Commission of this crime can be done in two ways: one is mock identity crime that denotes believable but made-up identity and other is illegally using someone’s original identity details. The first kind is easy to obtain but difficult to use while the second kind is easy to use but difficult to obtain.
Figure 1: Identity Crime Examples
Identity Crime is a major trepidation in two domains: Credit Application and Transactions. The domain Transaction is related to the monetary transactions which are initiated when any online purchased is made. Victim’s card details are obtained by some means and used while the other domain Credit Application is related to loans, credit card issued on false identities.
Data mining is related with discovering bits of knowledge which are measurably trustworthy, previously unknown and noteworthy. All the data is must and has to be sufficient and clean. The problem must be defined precisely as it cannot be answered using reporting and query tools but resolved using a process model. After all the useful data is collected, the task of data mining is to classify them, cluster them in groups and also perform segmentation to determine any possible patterns of actions committed to perform the fraud. This data is now treated as information helping to detect anomalies. All this is done with the help of certain data mining tools.
This process can be categorized into three activities:
Detection: Observe database and try to find hidden patterns.
Analytical Modeling: Studying different kind of patters from history and trying to find out these patterns in the real database.
Material Modeling: Using extracted patterns from the database to find infrequent origins.
We discussed about the two domains related to identity crime: Transaction and Credit Card. There are different mining methods for both of them. The methods related to transaction domain are as follows:
Outlier Detection: When some of the existed data objects depicts abnormality and doesn’t complies with the usual pattern and behavior of the data are known as outliers. In general, two methods are used to detect outliers: Block procedures and Consecutive Procedures, in former either all suspicious data objects are counted as outliers or all are treated as consistent objects while in later inside out approach is used. Some of the data mining algorithms to use this tool are cell-based algorithms, Nested-loop algorithms and Index based algorithms.
Credit Card-Related Frauds and Mining Techniques to Detect Them
Neural Networks: This method is connected to understand the imitations of how the human brain functions. This is achieved with the help of interconnected nodes. There are many layers to these adjacent nodes and each of them is connected to one another in a strong way. Each node individually receives an input from all the other connected ones, analyses it using predefined functions and produces the output.
An example of this method is MLP based classifier It acts just on the data of the activity itself and of its prompt past history.
Decision Trees: This uses a statistical approach towards data mining. A tree shaped structure is created using independent attributes and logical AND which is one dependent attribute. The major goal of this method is to reduce the complexity of the problem by classifying it into several simpler problems.
However, the above methods are not effective in the domain credit application and it has it own methods of data mining.
Case-Based Reasoning (CBR): CBR investigates the toughest cases that have been misclassified by present techniques and methods. Recovery process utilizes verge closest to neighbor coordinating. This method has twenty percent higher rates in comparison to other credit applications techniques.
Spike Detection and Communal Detection: Communal Detection is the whitelist-arranged approach on a settled arrangement of characteristics. It finds genuine social connections to lessen the doubt score and is alter impervious to fake social connections. To strengthen communal detection there is layer spike detection: This is an attribute-oriented methodology on a variable arrangement of characteristics. Both Communal and spike can identify more crimes, can act as a better explanation behind changing legitimate conduct and wipe out the repetitive properties.
The frauds related to the credit card have become primary threat to its users. With each passing day it is becoming more vulnerable and so are the measures. Sometimes usual mining patterns are unable to identify frauds related to these kinds hence algorithms are updated on regular basis to prevent and combat such types of deceitful crimes. Data mining is one of the new techniques to determine the frauds and crimes related to credit cards. Compared to Data mining approach Big data is more advanced and provides a new perspective to real time fraud detection and prevention techniques.
The frauds related to credit cards are conferred in the accompanying ways:
- Stealing genuine cards
- Falsification of record or individual data,
- Utilization of record in unauthorized manner or illegally for individual gain,
The frauds related to credit cards can take place in both ways offline and online:
- If PIN has been used to make the transaction by an unauthorized user, it counts as online fraud
- If transaction has been made without PIN it is counted as offline fraud.
Figure 2: Types of Credit card Frauds
Detection of Fraud using Big Data techniques vs. Data Mining Techniques
As discussed earlier data mining performs the task by classifying the data, cluster them in groups and also perform segmentation to determine any possible patterns of actions committed to perform the fraud.
Figure 3: Data Mining Approach
While Big Data has three primary ways to determine any fraud related to credit cards.
Figure 4: Big Data Approach
Discussion and Conclusion
On one scale the idea of global networking has helped in business growth and opportunities. Also made everything mobile and easy to access but on the other scale this has provided and encouraged multiple ways to the fraudsters. So manufacturing robust, secured, accurate environment along with user friendly interface is a major task for the banks. Data mining and Big Data are helping on a good scale to achieve all this. Conversely, utilization of Big Data is still at an early age and need time and efforts to meet user and bank needs. But it is the best alternative amongst all possible ways as it has the capability to operate with large data set in a real time environment.
The research in this paper is based on ‘Data Mining methods to identify and detect data stealing and frauds’. Also how the credit card holders experience various kinds of theft on account of their cards and what techniques are in work to overcome these failures and frauds.
A broad scan was conducted wherein I reviewed more than 20 papers and found more than 10 of them directly related to the topic which contained some of the relevant information. the research are based on conventional approach and technology based approach on what kind of frauds exist related to it and also various data mining techniques on how to overcome detected frauds.
References
Lavrac, N., Motoda, H., Fawcett, T., Holte, R.,Langley, P. & Adriaans, P. (2014). Introduction: LessonsLearned from Data Mining Applications and Collaborative Problem Solving. Machine Learning 57(1-2): 13-34.
Brause, T. Langsdorf, & M. Hepp. (2014). Credit card fraud detection by adaptive neural data mining. In Proceedings of the11th IEEE International Conference on Tools with Artificial Intelligence, pages 103-106.
Bhattacharyya, S. Jha, K. Tharakunnel & J. Christopher. (2017). Data mining for credit card fraud: A comparative study”, Decision Support Systems 50 pp. 602–613.
S Kerly. (2012). A Comparative Assessment Of Supervised Data Mining Techniques for FraudPrevention”,TIST.Int.J.Sci.Tech.Res.,Vol.,1-6