ETL in Credit Card Fraud Detection
ETL mainly denotes extracting, transforming and loading data from various data sources into a single data source that is loaded into a data warehouse and other target systems. Prior of data analysis , the main function of extraction is to extract data sources from various websites and other online sources to conduct data analysis (Alshurideh et al 2020). After data extraction, data is finally loaded into the data warehouse systems. Data transformation is done by applying procedures like application, calculations and concatenations and the final data is loaded into the data warehouse systems.
Exploratory data analysis plays an important role in credit card fraud detection systems. Predictive analytics helps in expanding the overall capabilities for fraudulent transaction detection. Descriptive analytics is an important process that helps in tracking historical data of fraudulent transactions over a period of time. Techniques like decision tree analysis and logistics regressions are used in predicting frauds (Kirasich, Smith and Sadler, 2018). The credit card fraud detection feature uses the concepts of location scanning to check the patterns of fraud.
The EDA process focuses more on predictors time and number of transactions that take place on a yearly basis. The total number of transactions are quite small and only a small fraction of transactions close to the maximum.
The proposed system uses logistics regression in creating the classifier in preventing frauds for all types of credit card transactions. The main strengths of logistics regression model is to analyze data in ensuring higher degree of detection accuracy and data pre-processing step is implemented for all types of credit card transactions. To detect fraud, the logistic regressions collects data and extracts the main features from the model (Mienye, Sun and Wang, 2019). The machine learning model has various types of training sets in computing the required probability of frauds. The prediction of the model probability is usually greater than 50% and the main instance belongs to the class. The model uses concepts of fraud detection techniques like decision tree and genetic algorithms in simplification of prediction of frauds.
The main limitations of logistic regressions that are used in fraud detection models are mainly that the model is unable to detect the proper data sources. There lies a problem in accuracy of the data model due to such types of fraudulent activities.
The main strength of financial fraud detection using decision tree classifiers is to handle main categorical features of handling credit card frauds. The decision tree executes well on models and acts as the main section for fraud detection (Mahesh, 2020). The model has an ability in handling large amounts of information and is used for data processing activities. Decision tree algorithm model handles complex high level data features like fraud detection engines using advanced algorithms like feature selection method. Various data adaptive techniques are used against the data algorithm model by the customers.
The limitations of decision tree classifiers is that a large amount of data is handled every day and the model needs to be faster to respond to frauds over a period of time (Anghel et al 2018). Data imbalancing is really difficult for detecting fraudulent detections. Data misclassification is a major challenge which serves as one of the major limitations of fraud detection methods.
Exploratory Data Analysis (EDA) in Credit Card Fraud Detection
Logistics Regression is used in predicting the probabilities of fraud detection that is connected to a specific class. The machine learning prediction model are connected in evaluating patterns that helps in fraud detection algorithms. Along with its increased productivity, artificial reasoning has thus emerged as a powerful tool for avoiding financial wrongdoings (Probst, Boulesteix and Bischl, 2019). Computer-based intelligence can be used to break down massive amounts of data to reveal extortion patterns, which can then be used to detect misrepresentation in real time. The idea behind AI is that fake exchanges display unique examples that distinguish them from genuine ones. These examples are recognised by AI computations, which can distinguish between scammers and genuine clients.
Accuracy Metrics, Correct Predictions , Incorrect Predictions
The accuracy metrics of logistics regressions lies within the range of 92.6%
Correct predictions percentage is almost 89.6% and number of incorrect predictions is almost 3%
Critical Evaluation of Decision Tree Classifier
Explanation
Decision Tree methods for avoiding misrepresentation and hazard on the board begin by gathering and arranging as much recently recorded data as possible. AI calculations can examine huge measures of information in an extremely brief measure of time. They have the ability to gather and break down new information in a systematic manner (Rahman et al 2019). As the speed and volume of eCommerce increases, speed becomes increasingly important. As the volume of exchanges for banks grows, so does the strain on a standards-based framework and human investigation. This entails an increase in costs and time, as well as a reduction in precision. It’s the polar opposite with an AI computation. The program improves as more information comes in, empowering it to distinguish extortion quicker and with more precision.
Accuracy Metrics, Correct Predictions , Incorrect Predictions
Accuracy Metrics of decision tree is 86.3%
Correct Predictions is approximately 80.3%
Incorrect Predictions is almost 6%
Logistics Regression Accuracy Metrics 92.6% |
Decision Tree Classifier Accuracy Metrics is 86.3% |
Correct Predictions 89.6% |
Correct Predictions 80.3% |
Incorrect Predictions 3% |
Incorrect Predictions 6% |
Data Classifications Machine Learning Algorithms in test and train of data sets |
Data Classifications Machine Learning models of decision tree classifier is used in detecting credit card frauds. |
It is highly recommended to use logistics model in data mining techniques like logistics regression and predictive modelling activities in predicting the overall outcomes. The main AI techniques comprises of data mining activities that are cluster , classification and segmentations that is related to credit card frauds (Watson et al 2019). The logistic regression model is usually dependent on dependent variables like “Transaction Id”, “Time”, “Age” and “Fraud”
References
Alshurideh, M., Al Kurdi, B., Salloum, S.A., Arpaci, I. and Al-Emran, M., 2020. Predicting the actual use of m-learning systems: a comparative approach using PLS-SEM and machine learning algorithms. Interactive Learning Environments, pp.1-15.
Anghel, A., Papandreou, N., Parnell, T., De Palma, A. and Pozidis, H., 2018. Benchmarking and optimization of gradient boosting decision tree algorithms. arXiv preprint arXiv:1809.04559.
Kirasich, K., Smith, T. and Sadler, B., 2018. Random forest vs logistic regression: binary classification for heterogeneous datasets. SMU Data Science Review, 1(3), p.9.
Mahesh, B., 2020. Machine learning algorithms-a review. International Journal of Science and Research (IJSR).[Internet], 9, pp.381-386.
Mienye, I.D., Sun, Y. and Wang, Z., 2019. Prediction performance of improved decision tree-based algorithms: a review. Procedia Manufacturing, 35, pp.698-703.
Probst, P., Boulesteix, A.L. and Bischl, B., 2019. Tunability: importance of hyperparameters of machine learning algorithms. The Journal of Machine Learning Research, 20(1), pp.1934-1965.
Rahman, A.S., Shamrat, F.J.M., Tasnim, Z., Roy, J. and Hossain, S.A., 2019. A comparative study on liver disease prediction using supervised machine learning algorithms. International Journal of Scientific & Technology Research, 8(11), pp.419-422.
Watson, D.S., Krutzinna, J., Bruce, I.N., Griffiths, C.E., McInnes, I.B., Barnes, M.R. and Floridi, L., 2019. Clinical applications of machine learning algorithms: beyond the black box. Bmj, 364.