Protecting Sensitive Data In Machine Learning: Strategies And Techniques

Problem Domain and Research Questions

Today the world is relying on data only. Every process, project and individuals are having specific data and their values for some specific works. Everyone is there on social media and using many email servers and clients to exchange their information with others. It is generating lot of data every day and is having all kind of information either public or private.

Apart from all these social gathering and openness, everyone is pretty much concern about the privacy and security for their private data. To handle this big amount data and analyzing its privacy level researchers and developers are using Machine Learning and Deep Learning using AI and Neural Networking as Data Mining algorithms.

One can find various techniques for removing the sensitive data from the datasets thereby

training the machine learning models. We are going to mainly highlights the strategies for the

identification and the protection of sensitive information and the processes also help to address

the security measures you need to have the machine learning data. Handling the sensitive

information is the data which either user’s legal counsel or user’s need to protect with some

additional measures like restricted access or with the encryption. The examples like entering the

field name or entering your mail id, the billing information that is allowed by the cloud for the

data engineer for deducing the sensitive data which is termed to be sensitive.

1. Problem domain and research questions

As we know, now a day, everyone is connected through internet and sharing their personal data over shared spaces intentionally or unknowingly. For the shared space owner its very much difficult to analyze the public and private data and keep the privacy accordingly because of a larger amount of data is updating every day. The client need to get the information about all those data which are kept over the server as private data and then provide some special security features to them.

2.1 Problem domain

Statistical Predicate Invention

The affirm invention in the ILP and the hidden changeable discovery in the analytical learning are the two different faces of one problem. The research scientists generally accept that it is the main issue in machine learning. After predicate inventions, machine learning becomes simpler.

Generalizing the across domains Usually, machine learning is defined as the hypothesizing across various tasks from similar domains. In the past few decades, this is carried away easily. The main difference between the people and the machine learners is, people are capable of analysing crosswise easily.

Studying maximum structuring levels

Till now, the algorithms for the statistical relational learning have been developed from the structured inputs and the outputs. But they aren’t used for learning the internal representations. In statistical learning and ILP, the models typically consist just two structure levels. Examples include: in the support vector machines, the dual levels include the clauses with the conjunctions. As the levels are enough for representing the interest of the functions, it is the efficient way of representing maximum functions.

Statistical Predicate Invention

Learning combination and inference

The inference is the most crucial factor during the process of structured learning. This leads to the ironical state of cases where one can spend more data. The learners must be biased and the inference requirements must be effective, so adept inference must be bias. The learners must design right from the scrape for learning the models those are powerful enough so that the inference over it must be efficient. The important problems here include resolution of the entity, matching schema, and the alignment concepts.

2.2 Research questions

Q-1: How machine learning algorithms can be used to identify sensitive data?

Machine learning makes use of higher level practices like PCI-DSS and HIPAA those specifically offer the best practices for sensitive data protection along with informing the clients and the customers about various ways of handling sensitive data. The certifications even allow the clients for making proper decisions regarding the security of data.

Q-2: How to remove and mask the sensitive data in machine learning?

There exist some cases wherein the removal of the sensitive data results in the reduction of the dataset values and whenever this situation occurs the data which is said to be sensitive must be masked with the help of one or more processes. Based on the dataset structure, you need to remove the data depending on various approaches.

Whenever we are unable of removing the sensitive information we need to mask the data. There are various techniques in machine learning those are used for masking the data.

Q-3: Is it difficult to handle sensitive data in large data sets?

The ownership concept inculcates and divides the data sets of the machine language those contain data from maximum users. Hence the data engineers must be permitted access for the complete data set for using the data set. The reduction of resolution or encryption of the data fields is mostly used to be the deterrent measure but it is not sufficient for the dataset of machine learning.

2. Background and Project Objective

Machine Learning and Deep Learning are the methods and branches of Artificial Intelligence (AI) and mostly used to solve the real-life problems for researchers and developers. This method is very much into getting information using learning strategies from data instead of writing lengthier codes. Machine Learning for Data Science is one of the trending research topics these days and to get the information about sensitive data from huge databases is the key benefits of this research.

3.1 Summary of Literature Review

Machine learning is used in a very vary of computing tasks wherever algorithms with smart performance is tough or infeasible; for instance, applications embrace email filtering, detection of network intruders and laptop vision. An edge-based approach basically doesn’t work in this day and age. A venture must insert security controls in the information or ensure they are attached to the information. In any case, there aren’t boundless assets to address cybersecurity dangers. So, it’s basic to comprehend what’s delicate. This requires characterization instruments. We’ve built up a strategy that handles versatile information grouping utilizing machine learning. Via robotizing forms, characterizing archives and different kinds of unstructured information, and setting up confinements, for example, what’s for inner utilize and what can be imparted to whatever is left of the world—it’s conceivable to approach cybersecurity all the more adequately.

Studying Maximum Structuring Levels

With a specific end goal to computerize the order procedure and make it more versatile, it’s important to give the product classifier tests of touchy information and tests of non-delicate information. Machine learning enables the classifier to gain from the examples and concentrate administers about what makes archives delicate or not touchy. Once the classifier builds up the models for touchy records and exceptionally private archives,

it can deal with the arrangement procedure independently. The following stage is to set up the suitable security controls. This incorporates layering extra security over existing controls, for example, dynamic confirmation. It utilizes the intellectual impression and in addition to other social strategies to recognize inconsistencies. It’s likewise conceivable to utilize the order system to computerize different errands. For instance, the framework can scramble suitable information through a calculation or include extra verification necessities for specific kinds of reports. These could incorporate multifaceted verification

3.2 Objectives of the Project

The main methodologies that will involve in Machine Learning identification of sensitive private data are:

Diagnosis System
Reliability of Data
Supervised Learning
Classification of Data
Optimization of Data

Here appropriate software and hardware requirements necessary to find the sensitive data from large database have been listed. [15, 26] To handle the data from backend we need to parse some SOAP APIs. To verify all these APIs, we are going to use most popular API tools like Postman and SoapUI which will process the API on different methods like POST, GET and DELETE. It will reflect one JSON in object and array form, through which we will verify the data.

Hardware Requirements

These are the hardware that will be used in this project for preparing the test system to run the ML scanner and its performance testing on this and install all related tools.

Table-1: Hardware Specifications:

Name of Hardware	Quantity	Project Requirement	Cost Estimate
Test System – RAM: 8GB or above HDD: 500GB or above Processor : i3 or above	1	Test system for data analyzer and ML design	500 USD
Wireless Router	1	To prepare the single wireless network for testing. Enabled with WPA password.	100 USD
Android Mobile	1	To generate runtime sensitive private data to test its identity from ML System.	200 USD

Table-1: Hardware Specification

These are the hardware that will be used in this project for preparing the test system to run the ML scanner and its performance testing on this and install all related tools.

4.2 Software Specifications

These are the software that will be used in this project. These software tools and programming language will be used for preparing the test system to run the ML scanner and its performance testing on this and install all related tools.

Table-2: Software Specification

Operating System	Linux 64bit
Programming Language	Python and R
Documents	MS Word, Adobe Reader
Analysis	ML algorithms, R Studio
Server	Apache Server
API Tool	Postman, SoapUI

Project flow with the time duration are listed here. The flow is having 11 levels of works that was divided into 12 weeks. As we mentioned here, we have already done most of the work based on this duration table.

Table-3: Weekly Activities

Week No.	Activities
1	Project Plan and Analysis Phase (Machine Learning to Identify Sensitive Private Data) · Did research work on machine learning · Analyse and discuss each aspect of machine learning in group
2	· GDPR (EU General Data Protection Regulations) are the rules which are applied on companies to protect people’s private data under these rules companies are restricted to keep the personal information of user safe and hide it from other people. · API tools are the tools which helps to hide the personal data of people these tools shows only that information which is necessary and the hide all the sensitive information of user .
3	Designing Phase (Interface) · Select five different API tools of machine learning · All five members prepare one API tool as design approaches of project and following tools were selected by each member · But select one tool for the final project
4	Implement Phase according to the paper (Application Design) · Implement or practically install and run each tool of design approach · Install following API tools o Post man o Karate o NLTK o Swagger o Rest · Then all these tools were compared after the comparison of all these tools all members find NLTK the best tool to identify sensitive data so NLTK was selected as final project tool · Design of simulation of different sections in Machine Learning and Regression Techniques · Prepare Final design template on the basis of NLTK and then start final work on NLTK
5	Design of simulation of different sections in Machine Learning and Regression Techniques
6	Implementation of Supervised Learning and Prediction Model
7	Calculate and Analysis Result
8	Evaluate the scenario for data mining and identification
9	Compare result using Graph and Table
10	Testing
11	Documentation Work
12	Project Presentation

Table-4: Roles and Responsibilities of each team member

ACTIVITIES/ROLES	KIRAN	MANDEEP	KUSUM	SANDEEP	RAJBEER
Project Plan and Analysis Phase (Machine Learning to Identify Sensitive Private Data)	Review on privacy preserving machine learning	Review on sensitive information acquisition	Review on intelligent assistance for data mining process	Review on automating anomaly detection	Review on risk analysis of protocols for prediction
Designing Phase (Interface)	Discuss summary of her literature review	Discuss summary of her literature review	Discuss summary of her literature review	Discuss summary of her literature review	Discuss summary of his literature review
Implement Phase according to the paper (Application Design)	Discussed and written by all team members
Design of simulation of different sections in Machine Learning and Regression Techniques	Preparing system to install ML tool and Apache Server.	Prepare background section area including GDPR	Prepare the approach to create MLTK framework to work on machine learning.	Work on input of data in the form of output of sensitive data	Work on postman api tool
Implementation of Supervised Learning and Prediction Model	Manage the server for hosting private sensitive data.	Manage the server for hosting private sensitive data.	Apply given approaches from reference papers to implement prediction model.	Work on different approaches to choose the best one	Describe different methodologies regarding predicted model
Calculate and Analysis Result	Follow the procedure of swagger API testing developer to describe the structure of the API so as the machine could read it.	Used the BI approaches to analyze the unstructured data by Hadoop and NoSQL database Cassandra as it is the best database to be used for streaming data.	Prepare flow chart for algorithm to generate runtime private data to test its identity from ML system	Karate tool released as an open-source tool by Intuit, working on it	run the Postman tool to test REST and SOAP APIs prepared to test the data integrity.
Evaluate the scenario for data mining and identification	All team members together work on evaluation of data mining and identification. We Prepare the business case study for the proposed methodologies of ML based identity system. Prepare plan to get data from R based data mining system.
Compare result using Graph and Table	All team members together work on comparing result using graph and table and Describe all steps of implementation in detailed design diagram. Prepare the result graphs and tables to compare with base model.

This Gantt chart is used to characterized by our works what we have done till now. We have arranged the plan of 3 months (12 weeks) for our assignment work. Here we have specified all assignment what we have done alongside with the undertaking Gantt Chart.

The structure of the project is for the understanding of the Australian privacy act and the other international data privacy requirements along with identifying the detailed compliance needs. The project also develops the customized AI based solution for the compliance needs.

Learning Combination and Inference

The project outcomes:

One can work on the challenging platforms and the various innovative features. The AI solutions of CBA are also used by the leading business in Australia. One can get a chance to work with those living in Australia and breather the product, design, tech and our proper work will have the real impact. Also, one can learn the technologies which are fundamentally changing the IT and the business markets. Hands-n training and guidance for self-learning would be given. We also have a potential job placement and the opportunity for our career growth with the company. Also, the team member would get a referral and the experience letter provided on project completion.

When we have completed this project, we will understand how to:

Build the custom machine learning model thereby using the easy graphical tools from the IBM Watson Stack and also have the Natural Language Understanding use for the model for identifying the personal data and private data.

Make use of regular expressions for augmenting the NLU for the identification of metadata. Configure the requirement of the personal data that needs to be identified and also assign the weight for the personal data to assign the score. View the score and personal data identified in a tree structure for better visualization and consume the output by other applications. The Cognitive Compliance Processor solutions address the requirements for identifying the personal data from the various unstructured documents.

The solution influences more on advanced natural language processing, machine learning, and the text analytics for precisely identifying the sensitive data and the personal or private data from the large volumes of the unstructured data and then alerts the other systems. The solution of the project will enable the various organizations for automatically detecting the privacy and the sensitive data at scale for speeding up that is difficult to accomplish along with large workforce.

The overall objective of the project is developing the MVP application that demonstrates the application approval of the AI capabilities for a process of larger volumes of the unorganized information at scale. The project even uses the “boiler-plate” for the fast development. The Cognitive Compliance Processor would employ the combination of the logical rules, the recognition of the text patterns, machine learning and language processing for identifying the private and the sensitive data from the unstructured data sources like documents, enterprise applications like the ERP and the CRM systems, the service call logs, customer emails those might violate the compliances like the privacy acts or GDPR. Research methods to be used for the next stage of the project.

5.5 Individual design approach 1

NLTK

However, we easily forget the amount of data stored in our daily conversations, with advancements in the digital mural, or with the help of Natural Language Processing as it is expanding the field in machine learning and artificial intelligence. The text is available in various ways right from individual word list to various paragraphs and sentences containing special characters. Transformation of this text to some algorithm is a difficult process and contains various parts like cleaning followed by annotation, normalization and then the analysis. There exist various methods for pre-processing. Those include:

Research Questions

Capitalization: The text contains various capitalizations those reflect the starting points of the sentence or emphasizes the proper nouns. The important and the common approach is reducing it to the lower cases for simplicity but, at the same time, you need to keep in mind some things like, if the word “US” is changed to “us” then it changes the meaning of the whole sentence.

Tokenization: It describes breaking up the paragraphs in small sentences and then the sentences into single words. Here, for the algorithms those are language specific, you need to make use of Punkt models from the NLTK.

Even though making the text is a complicated process that needs some choice of the optimal tools, most of the libraries which are pre-built and the services are used for mapping the words and the term manually. When the dataset is prepared, then you can apply the machine learning techniques.r request methods frequently, which are as below:

POST Request – For Creating or Updating data,

PUT Request – For Updating data,

GET Request – For Retrieving/Fetching data and

DELETE Request – For Deleting data.

Request URL – It is where to make the http request.

Request Headers – In request headers it contains key-value of the application. I have used mainly two key-value, which are as follows

Content-Type – A content-type describes the format of object data. Content-type, which I have used the most for the requests and responses, is application/json.

Authorization – An authorization token, included with requests, is used to identify the requester.

Request Body – It contains the data, if any (depends on type of request method), to be sent with request. I have used raw form of data for sending request.

A swagger is an approach which allows the developer to describe the structure of the API so as the machine could read it. We have different API and tools to perform swagger approach. The steps to be followed would be: –

Open the swagger editor
Write swagger definitions according to the API in the swagger editor.
The swagger definition generates the code for the API with the user interface (UI). The UI generated and the swagger codegen is further processed by the system automatically. Deployed the compiled definition to the API so as the machine could perform the operation by learning about the definition defined in the swagger by its own using different libraries and API. The swagger follows the top-down approach or we call as a stepwise approach.
Karate based API is a REST API used to test the machine learning algorithms. This tool was released as an open-source tool by Intuit. This tool is used to automate the API testing and all the required features and dependencies to test the API. The API uses Java language to process and test the ML API.The important feature about Karate based API is that you don’t need to write the steps definitions as karate would create those steps definition automatically because karate is DSL and it works on Cucumber-JVM. After the testing of the API is completed the Karate API provides a report for the testing results being generated5.9 Individual design approach 5

In order to analyze the unstructured data, the first step is to save the data in a proper database, so for storing the data we build the algorithms to process the data on the platforms as Hadoop (Hadoop distributed file system) and NoSQL database Cassandra as it is the best database to be used for streaming data.

The second step is to fetch or analyze the sensitive/ important data from the bunch of datasets stored in the databases, for that, you need to use a powerful business intelligence tool to process the data and provide values or output in the format of graphs and GUI interfaces.

The output generated would be in the format if graphs and images which could be further deployed on the dashboard not be given to the client according to the requirements.

5.10 Budget and References:

We included some cost for project.

Important Note: Linux Operating System is open source and we can use free version. We are mentioning about hardware simulation based cost.

Table-4: Hardware Cost

Names of Hardware	Quantity	Project Requirement	Cost Estimate
Test System – RAM : 8GB or above HDD : 500GB or above Processor : i3 or above	1	Test system for data analyser and ML design	500 AUS
Wireless Router	1	To prepare the single wireless network for testing. Enabled with WPA password	100 AUS
Android Mobile	1	To generate runtime sensitive private data to test its identity from ML System	200 AUS

Research methods to be used for the next stage of the projec

There are mainly two types of data collection process and those are the qualitative approach and the quantitative approach. In the qualitative approach of data collection process the research is based on the evaluation of the aspects of the topic which are related to the subject on which the research is to be done. It is based on the point of view of the people that are associated with the project and this would be enabling the analyst to conduct the analysis which is in-depth for the subject of the field of in which the project is to be developed.

The main process for this are the interviews and the questionnaires. Whereas the quantitative analysis depends on the collection of the data that is generally used for the processing of the data in a formal process. In this study the data that is to be analysed would be done with the help of the qualitative data collection process for the understanding the importance of the Machine Techniques and Artificial Intelligence

Data Collection Technique:

For the collection of the data two techniques can be considered; the primary data collection method and the secondary data collection technique. The primary data collection is basically done by the process of interviews and questionnaire. In this process the point of view of a certain group of people that are associated with the project and the also to some extent the targeted respondents are the experts in the field of research. However, in the secondary data collection technique the data that is collected is already established and published by some other researcher. For the research in the field of Machine learning and Artificial Intelligence the secondary data would be used in this project developed

Structure of Secondary Data:

The secondary data would be used for the development of the research of the project. The collected data would be very important for the researcher as the research has already been performed by someone else and time for research can be utilized in some further analysis of some additional resources for the research. The research can be refined to a greater deal and the data resulting from the resource would be approximated more accurately. Hence the secondary data would be used for the analysis for the research.

Conclusion, limitations and future work

For conclusion it is to be said that the machine learning tool can be used for the development of the process that would be removing the data from the datasets which are very sensitive. The identification and protection of the data have been highlighted in this research. The protection of the sensitive data can be handled efficiently from this project and research would further increase the possibilities of improved security that would be used for the cause such as the encryption procedures and this would enhance various type of procedures such as the process of billing and data protection plans. The NLTK tool has been identified as the final project tool and hence, the final work of the design would be prepared on the template and in future there would be further improvement in the NLTK tool which would be utilized in the project as well.

There is a huge improvement in the field of Machine Learning and Artificial Intelligence. NLTK, open source tool used for analytics. like any other tool this tool could be used to build bigger applications and visions. There is a great knowledge which could be clustered together to get to the meaning and relationships in the documents. Previously the internet was obsessed with just search (which made google such a giant today). The Natural Language Process Implementation is used for the cleaning of the data and the process used for the implementation of NLTK is described below:

Although making the text is a complicated process which requires some optimal tools there are a large number of libraries which are pre-installed in the process. The machine learning process can be applied after the dataset is prepaired. NLTK is a batteries included pedagogical resource. NLTK comes with over 50 corpora and lexical resources that are specifically mapped to machine learning algorithms – particularly classification ones using Maximum Entropy. In addition, NLTK comes with many utilities including Streaming Reader classes that let you access corpora in a memory safe and efficient manner from disk, as well as utilities for Frequency analysis and tree construction.

Most NLP libraries are written in Java or Python; there is no compelling reason to re-implement these algorithms just for another language. Currently, most applications are web-based or mobile-based; compiling to native executable just gains some performance, not really important on every application. NLTK also comes bundled with MANY parser algorithms including the very modern and excellent Viterbi Parser.

It was created by academicians who were more interested in teaching principles of NLP rather than the production implementation of them. NLTK is not production ready (except for its stemmers, lemmatizers, tokenizers, and utility code like FreqDist, ConditionalFreqDist, Tree, CorpusView, and StreamBackedCorpusReader), it isn’t lightweight, it isn’t generally applicable (requires domain knowledge), and it isn’t magic, but it is a very useful tool.

NLTK can be used with big data processing engine like Apache Spark, but only in a standalone mode, that is to stay on a single machine on a single consolidated data set. NLTK can’t be used for distributed processing. NLTK includes support for the Malt Parser (nltk.parse.malt.MaltParser), which provides dependency parses. Unfortunately, NLTK doesn’t really support chunking and tagging multi-lingual support out of the box i.e. no pre-trained POS taggers for languages apart from English.

References

[1] J. D. Procaccino, J. M. Verner, K. M. Shelfer, and D. Gefen, “What do software practitioners really think about project success: an exploratory study” Journal of Systems and Software, Vol. 78, no. 2, pp. 194-203, 2005.

[2] K. Schwalbe, Information Technology Project Management, 3rd ed. Boston: Course Technology, 2004.

[3] J. D. Procaccino and J. M. Verner, “Software project managers and project success: An exploratory study” Journal of Systems and Software, Vol. 79, no. 11, pp. 1541-1551, 2006.

[4] A.-P. Bröhl and W. Dröschel, the V-Model. Oldenbourg-Verlag, 1995.

[5] T. Abdel-Hamid and S. Madnick, Software Project Dynamics. Prentice Hall, 1991.

[6] M.Deininger and K. Schneider, “Teaching Software Project Management by Simulation – Experiences with a Comprehensive Model” In Conference on Software Engineering Education (CSEE), ser. Lecture Notes in Computer Science 750, Austin, Texas, 1994, pp. 227-242.

[7] A. Jain and B. Boehm, “SimVBSE: Developing a Game for Value-Based Software Engineering” In Software Engineering Education and Training, 2006. Proceedings. 19th Conference on, 2006, pp. 103-114.

[8] K. Schneider, AusführbareModelle der Software-Entwicklung. Struktur und RealisierungeinesSimulationssystemes. vdf, 1994.

[9] M. Deininger and K. Schneider, “Teaching Software Project Management by Simulation – Experiences with a Comprehensive Model” In Conference on Software Engineering Education (CSEE), ser. Lecture Notes in Computer Science 750, Austin, Texas, 1994, pp. 227-242.

[10] D. Rodriguez, M. Satpathy, and D. Pfahl, “Effective software project management education through simulation models. An externally replicated experiment” In International Conference on Product Focused Software Process Improvement (PROFES), ser. Lecture Notes in Computer Science 3009. Kansai Science City, Japan: Bomarius, F., 2004.

[11] K. Schwalbe, Information Technology Project Management, 3rd ed. Boston: Course Technology, 2004.

[12] K. Schneider, “A Descriptive Model of Software Development to Guide Process Improvement” In Conquest. NÃ/rnberg, Germany: ASQF, 2004.

[13] JilKlünder, Kurt Schneider, Fabian Kortum, Julia Straube, Lisa Handke, Simone Kauffeld, Human-Centered and Error-Resilient Systems Development, vol. 9856, pp. 111, 2016.

[14] W. D. Scott & Co, Information Technology in Australia: Capacities and opportunities: A report to the Department of Science and Technology. [Microform]. W. D. Scott & Company Pty. Ltd. in association with Arthur D. Little Inc. Canberra: Department of Science and Technology, 1984.

[15] “Functional Organizational Structure Advantages”, Smallbusiness.chron.com, 2018. [Online]. Available: https://smallbusiness.chron.com/functional-organizational-structure-advantages-3721.html. [Accessed: 11- Sep- 2018].

[16] “Consensus-based transfer linear support vector machines for decentralized multi-task multi-agent learning”, RuiZhang ;Quanyan Zhu, 2018 52nd Annual Conference on Information Sciences and Systems (CISS), Year: 2018, Pages: 1 – 6.

[17] “Preserving Model Privacy for Machine Learning in Distributed Systems”, Qi Jia ;LinkeGuo ; ZhanpengJin ; Yuguang Fang, IEEE Transactions on Parallel and Distributed Systems, Year: 2018, Volume: 29, Issue: 8, Pages: 1808 – 1822

[18] “Scrutinizing action performed by user on mobile app through network using machine learning techniques: A survey”, Rutuja A. Kulkarni, 2018 2nd International Conference on Inventive Systems and Control (ICISC), Year: 2018, Pages: 860 – 863

[19] Privacy Risk in Machine Learning: Analyzing the Connection to Overfitting”, Samuel Yeom ; Irene Giacomelli ; Matt Fredrikson ; SomeshJha, 2018 IEEE 31st Computer Security Foundations Symposium (CSF), Year: 2018, Pages: 268 – 282

[20] Distributed Differentially Private Stochastic Gradient Descent: An Empirical Study”, IstvánHegedus ;MárkJelasity, 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP), Year: 2016, Pages: 566 – 573

[21] A. Miri, ” Privacy-preserving protocols for perceptron learning algorithm in neural networks Saeed Samet”, 2008, pp. 10-65 – 10-70.

[22] K. Lin and M. Chen, “On the Design and Analysis of the Privacy-Preserving SVM Classifier“, IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 11, pp. 1704-1717, 2011.

[23] N. Chen, B. Ribeiro, A. Vieira, J. Duarte and J. Neves, “Extension of Learning Vector Quantization to Cost-sensitive Learning”, International Journal of Computer Theory and Engineering, pp. 352-359, 2011.

[24] A. Sarwate and K. Chaudhuri, “Signal Processing and Machine Learning with Differential Privacy: Algorithms and Challenges for Continuous Data”, IEEE Signal Processing Magazine, vol. 30, no. 5, pp. 86-94, 2013.

[25] C. Guivarch and S. Hallegatte, “2C or Not 2C?”, SSRN Electronic Journal, 2012.

[26] M. Senekane and B. Taele, “Prediction of Solar Irradiation Using Quantum Support Vector Machine Learning Algorithm”, Smart Grid and Renewable Energy, vol. 07, no. 12, pp. 293-301, 2016.

[27] “Confusion-Matrix-Based Kernel Logistic Regression for Imbalanced Data Classification”,

Miho Ohsaki ; Peng Wang ; Kenji Matsuda ; Shigeru Katagiri ; Hideyuki Watanabe ; AncaRalescu,

[28] “Learning privately: Privacy-preserving canonical correlation analysis for cross-media retrieval”,

[29] “An approach to identifying cryptographic algorithm from ciphertext”, Cheng Tan ;Qingbing Ji

2016 8th IEEE International Conference on Communication Software and Networks (ICCSN),

Year: 2016, Pages: 19 – 23

[30] “Automated big security text pruning and classification”, KhudranAlzhrani ; Ethan M. Rudd ; C. Edward Chow ; Terrance E. Boult, 2016 IEEE International Conference on Big Data (Big Data), Year: 2016, Pages: 3629 – 3637

[31] “Censoring Sensitive Data from Images”, Stefan Postavaru ;Ionut-MihaItaPlesea, 2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), Year: 2016, Pages: 443 – 448

[32] “Insider Threat Detection with Face Recognition and KNN User Classification”, M SubrahmanyaSarma ; Y Srinivas ; M Abhiram ; LakshminarayanaUllala ; M. SahithiPrasanthi ; J Rojee Rao, 2017 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM), Year: 2017, Pages: 39 – 44

[33] “SmarPer: Context-Aware and Automatic Runtime-Permissions for Mobile Devices”, KatarzynaOlejnik ;ItaloDacosta ; Joana Soares Machado ; KévinHuguenin ; Mohammad Emtiyaz Khan ; Jean-Pierre Hubaux, 2017 IEEE Symposium on Security and Privacy (SP), Year: 2017, Pages: 1058 – 1076

[34] “Cloak and Swagger: Understanding Data Sensitivity through the Lens of User Anonymity”,

Sai TejaPeddinti ; Aleksandra Korolova ; ElieBursztein ; GeetanjaliSampemane, 2014 IEEE Symposium on Security and Privacy, Year: 2014, Pages: 493 – 508

[35] “A Framework of Privacy Decision Recommendation for Image Sharing in Online Social Networks”,

DonghuiHu ; Fan Chen ; Xintao Wu ; Zhongqiu Zhao, 2016 IEEE First International Conference on Data Science in Cyberspace (DSC), Year: 2016, Pages: 243 – 251

[36] Toward intelligent assistance for a data mining process: an ontology-based approach for cost-sensitive classification”, A. Bernstein ; F. Provost ; S. Hill, IEEE Transactions on Knowledge and Data Engineering, Year: 2005, Volume: 17, Issue: 4, Pages: 503 – 518

[37] “Biased locality-sensitive support vector machine based on density for positive and unlabeled examples learning”, Lujia Song; Bing Yang; Ting Ke; Xinbin Zhao; Ling Jing, 11th International Symposium on Operations Research and its Applications in Engineering, Technology and Management 2013 (ISORA 2013), Year: 2013, Pages: 1 – 6

[38] “Protecting data from malware threats using machine learning technique”, MozammelChowdhury ;Azizur Rahman ; Rafiqul Islam, 2017 12th IEEE Conference on Industrial Electronics and Applications (ICIEA), Year: 2017, Pages: 1691 – 1694

[39] “Sensitive Information Acquisition Based on Machine Learning”, WenqianShang ;Hongjia Liu ; RuiLv, 2012 International Conference on Industrial Control and Electronics Engineering, Year: 2012,

Pages: 1117 – 1119

[40] “Privacy preserving extreme learning machine classification model for distributed systems”, FerhatÖzgürÇatak ; Ahmet FatihMustaço?lu ; Ahmet ErcanTopçu, 2016 24th Signal Processing and Communication Application Conference (SIU), Year: 2016, Pages: 313 – 316

[41] “Privacy Preserving Decision Tree Learning Using Unrealized Data Sets”, PuiKuenFong ; Jens H. Weber-Jahnke, IEEE Transactions on Knowledge and Data Engineering, Year: 2012, Volume: 24, Issue: 2,

[42] “Differentially private query learning: From data publishing to model publishing”, TianqingZhu ; Ping Xiong ; Gang Li ; Wanlei Zhou ; Philip S. Yu, 2017 IEEE International Conference on Big Data (Big Data), Year: 2017, Pages: 1117 – 1122.

[43] “Private and Scalable Personal Data Analytics Using Hybrid Edge-to-Cloud Deep Learning”, Seyed Ali Osia ; Ali ShahinShamsabadi ; Ali Taheri ; Hamid R. Rabiee ; Hamed Haddadi, Computer

Year: 2018, Volume: 51, Issue: 5, Pages: 42 – 49

[44] “Privacy-Preserving Data Classification and Similarity Evaluation for Distributed Systems”,

Qi Jia ;LinkeGuo ; ZhanpengJin ; Yuguang Fang, 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS), Year: 2016, Pages: 690 – 699

Turn in your highest-quality paper
Get a qualified writer to help you with

“ Protecting Sensitive Data In Machine Learning: Strategies And Techniques ”

Get high-quality paper

NEW! AI matching with writer

Order an Essay Now & Get These Features For Free:

Turnitin Report

Formatting

Title Page

Citation

Outline

Place an Order