Job Of A Data Scientist: Domain Ontologies And The ProtÃ©gÃ© Tool

Background and Literature Review

Discuss about the Job Of a Data Scientist.

The job of a data scientist is evolving each passing day. My MSc project will target to help data scientists how to approach a data science project better. The project will work as a consultative system that would assist a data science expert. This advisory system will be based on the phenomenon called domain ontology. Ontologies, in general, streamlines usual characteristics of a particular field of research for the purpose easier research and developing work. (Chiff, 2018).

Before digging deep into the concepts of domain ontologies, life cycle of the project and various techniques, there are some basic terms and characteristics that should be thoroughly examined for understanding.

What is an Ontology?

In this era of information and technology, every aspect of the technology is evolving in each passing moment (Maxim, 2018). The relationships, values and properties of a technology which also changes as part of its evolution. The word ‘ontology’ has a broad meaning as compared to its meaning in the field of technology. Ontology’s dictionary meaning is to study the existence of things.

In the field of information and technology, an ontology means generally accepted statements in a field of study. In the field of artificial intelligence and data science, the ontology means a set of ideas, interactions, relations and events in which researchers and analysts share the information with each other. They are some pre-agreed or ground rules of a respective field of study (Wang, 2016)

Every field, as it evolves creates complexities and differences in opinions between the experts. Especially in the field of technology, it has been observed that researches, data scientists, and analysts often come up with modified definition, rules, and regulation for a field of study. Machine learning, artificial intelligence, big data, data sciences are some of the developing industries under information and technology. Hence, experts believe that there should be a predefined value, arguments, interactions, and values for each of the fields of study based on which; experts share their ideas and innovate products and revolutionizing technologies (Tandon, 2018). This idea of streamlining the foundation of thought process is known as ontology.

Ontologies limit and might overcome complexities of ideas and fundamentals in a field of study. It can also be determined as the controlled thought of a word that is maintained globally in an area of technology (Zheng, 2017). This accepted vocabulary is the foundation on which researchers develop various research papers to improve the field of technology. As the unique research papers are developed based on the ontologies, the translation of those papers are become easier. For global acceptance of those papers, translating them becomes necessary. (Lohr, 2018)

Domain Ontologies

The ontologies provide the channel or the foundation on which further studies can be conducted for a particular domain. Consider a process happening in the field of data science that is based on facts (2(Zhao, 2014). Consider this process as the fundamental procedure of this field. If we would try to explain or give it a name for future use, it would be considered in the ontology. Hence, it is the representation or the formal naming of a set of concepts, procedures, and relations between the entities (Sean Kandel1, 2011)

So, ontologies can be considered as the ground rules. If you are playing a sport, there will be a set of rules based on which you will be rewarded points, fouls, penalties, etc. Hence, to develop a data science project smoothly and successfully, you will need ontologies to ensure a maximum satisfactory outcome. These ground rules, ontologies, must be developed smartly as they will work as the foundation of the whole project (Loat, 2015).

What is the “domain ontology”?

Domain ontology represents a set of ideas belong to specific universe. For example, Biology, Politics and Information, and technology. These three are three different domains whose ontologies are entirely distinct from each other (Tei, 2017). The relation and characteristics of the ontologies of these independent domains will be completely different in not only meaning but value. The term “mouse” is related to Biology and information and technology. But the interpretation of that word will be completely different. In the biology, the word mouse means an animal. However, in the domain of information and technology, the word mouse means a hardware which is used as an input device in personal computers. (Zhao, 2014)

The concept of Domain Ontology is revolved around the relations and concepts of a domain of the field of study. In today’s era, the human consciences have evolved. Hence in the same field of interest, due to different perspective of the experts, ontologies can be diversified due to obvious reasons. (Endel, 2015) Author’s perspective, their vision, their approach toward a problem, their way of executing a project, etc. Hence, practically, it not possible to develop a complete domain ontology for a domain because it is not only expensive, but it requires an investment of time and resources to be able to cope with the objective in this ever-changing world (Sean Kandel, 2011)

As per the requirement of the project, it is imperative that domain ontology is predefined before starting the developing of the project to ensure that there are certain concepts, relations between properties, values, etc. that are pre-determined. Based on those values, there is a possibility of better developments on a different level (Brooks-Bartlett, 2018)

What is an Ontology?

However, as already stated, it is practically impossible to cover the whole domain in a set of ontologies. The world is rapidly changing over a short span of time. Hence, it might be possible that all the resources, time and money invested in developing a set of ontologies, will not help in the future because those domain ontologies will not be relevant anymore as the future changes. So, you can at most create a core domain ontology that would only cover the basic classes of that domain. As already discussed, adding dynamic classes in ontology might not be relevant in the coming time. So just defining the basic classes of a domain is a good start to develop a domain ontology.

As per the example mentioned at the start, the identical word is distinct for the two domains. With, inside the same domain, the interpretation of ontologies will be completely different for distinct projects (Karen, 2017). Even in the same project, it often happens that domain ontologies are no longer compatible with the project. Hence, it becomes a responsible job for the ontology developer to implement a robust and scalable ontology that would cope with any type of requirement may arise in the span of the project.

The ontology creator needs to be skilful enough to comprehend the whole project rigorously. Only then he can give his fullest potential, which is required for developing ontologies for the project.

Protégé tool

Developing an ontology for a project is not a process for a naïve. Certainly, it requires a significant knowledge of data science but you need a system on which you can define those ideas as ontologies. Just like coders uses different editors for coding in specific languages, there needs to be the genuine editor for coding ontologies.

For example, for web development, there are many editors like Sublime Text Editor, Notepad++, Komodo Edit, etc. The job of these editors is to allow the programmers to write code that would give the existence of the web applications. For Java and Android, there is an editor called Eclipse. The fundamental purpose of these editors is to enable the developers to write code for a specific programming language for their projects. Furthermore, for certain type of language coding, there are some predefined library files that must be imported to let the functions run (Ontologies: Practical Applications., 2018). However, if you explore a specific type of language in your developing journey, there are many tools which are available without any cost while others available on annual or monthly charges basis. The developers of this software have decided a purchasing fee they charge as a royalty.

What is the ‘domain ontology’?

For ontologies, Protégé is widely considered the leading ontological engineering tool. Protégé is free for download. Along with editing, Protégé is used as a knowledge management system. These types of editors provide the platform to the developers where they can write code and test it. This phenomenon is known as a graphical user interface where coders can write a programming language and code. With Protégé’s graphic user interface, you can define the ontologies for your project.

Protégé has been created by the Stanford University. They made it available for the public use under a license. They call it BSD-2 clause. However, the previous versions of this tool are carried out with the collaborative approach with Manchester University.

The plugin architecture of this tool is robust enough to not only build simple ontology but a complex ontology that would revolutionize any project. To develop a wide range of the productive problem-solving systems, the creators of ontology can use the output of Protégé to the other mainstream systems. Furthermore, it might be possible that not everyone is familiar to the framework. Hence, Stanford University offers full support to new users by providing community-based and paid support.

The Protégé had an active community of students, developers and entrepreneurs who ask questions, discuss solutions and share their plug-ins. The overall contribution of the community provides valuable solutions to any type of developing and technical questions. Protégé supports W3C standards that are the proof of its verification from the W3C itself. Another major aspect of using Protégé is its enhanced open source environment. It is based on object-oriented language Java and supports plug and play, which provides the flexibility to the users (Pierson, 2017)

The lifecycle of a Data Science project differs from project to project and organization to organization (Jacquette, 2014)The significance of the lifecycle of any project is to denote what is followed by what procedure and how every phase of the project relates to the rest of the project. The basic steps of a data science project are described below.

Business understanding

Business Understanding is the foundation of the whole Data Science Project. This step helps to understand the purpose of a project. You will get the idea of what is expected from the entire project and why it should be developed. To achieve the end goal of the project, this step assists the researcher to ease the whole methodology (&(Schutt & Neil, 2004). The researcher gets to know what parameters of the business is going to affect the overall outcome of the project so that he can modify the concerned segments of the process accordingly. This step gives the idea of which parameter’s value will determine the success rate of the domain. Through this step, the researcher gets the idea of the admissible data sources that the enterprise needs the full access of or can build a framework through which they can monitor and assess the source.

Protégé Tool

Defining the goals and find the relevant data sources to predict the complete outcomes of the project are the core objectives of this step. Defining objective has the highest priority because based on that research, the overall project will take the shape. Asking accurate questions for the existence of the project is the way to define the goals. Because the objective of every project is to find the answer to a question and make the overall procedure smoother. Hence, tough questions are the key ingredients in defining the foundation and objective of the project. To answer such questions, various data sources should be analysed and this is part of the next steps.

Data Understanding & Data Preparation

Data Understanding and Data Preparation are notoriously famous as the most time-consuming phase of the Data Science project. With that being said, these processes have the significant impact on the overall project at the end. As it is implied by its name, Data Understanding means to comprehend the data in the raw format. Find the relevant attributes of the gathered data which are going to impact the overall project. The difficulty of the analysis can vary widely. The analysts may have to analyse a spreadsheet which only needs filtration and modification. On the other hand, they may have to deal with millions of data entry and hundreds of attributes that will be stored in a large data file.

Moreover, a data analyst should be able to identify which types of data are helpful for a project and in those data, which are the attributes that should be used to deliver a successful project.

After the rigorous work of understanding the data, the next step is the data preparation. Data Preparation is the procedure in which the analysts transform the data into refined information to enable the use of Business Intelligence or any other data analysis technique within an organization. The preparation of the data can be performed during the stage of the data understanding. As an example of data preparation, a dataset can contain information which is compiled in different formats, different language, various calculation, etc. (Houlihan, Data wrangling , 2016)

In the data preparation, while merging everything into the one major dataset, it might happen that the information from either of the source is not like the expected form or it might be missing entirely. In that case, the anticipated usage of the information will be void completely and that data cannot be used for research. Major tasks of the data preparation phase include imputation of missing data, joining of various data sets, changing of the data-types, identification of outliers, etc. At the end of this phase, it is ensured that the data preparation is valid for the use of intended purpose and can be used in a Data Science project.

Data Modelling

After completing the initial phase, there would be some final datasets and information that is going to be used for project execution. In a Data Science project, the information is relatively higher and with many variables and attributes. Currently, deciding the flow of execution is also very important. It might be possible that even though the data is cleaned and merged properly from thousands of resources if the flow of execution is not optimized, all the hard work done in the previous phase might not give you the anticipated end results (Houlihan, Data wrangling , 2016)

Data Modelling is known as the graphical representation of how the project will execute step by step. It can be even considered as the flowchart of the whole process where every major and deciding phase of the procedure will be considered to generate a successful result. (Endel, 2015) A well-researched data model that satisfies logical and conceptual sides of the project will ensure everything is under control and it will allow the researchers and analysts to make changes in the flow to ensure the error-free and successful execution.

Evaluation

Evaluation fills the bridge between the manufacturer and end user. If this phase is done properly, there is a significant amount of confidence gained that the procedure is likely to succeed in the market. With, data evaluation demands a great deal of understanding of not only the market but the product. During this phase, you can check what is working and what is not. If you find any error and any required modification in the process, you can apply as many times as possible until you get the desired output in the end.

Deployment

Deployment is the final stage of the data science lifecycle. From collecting data to modelling them and after doing a final evaluation of them, it comes down to deploy them for the end user. After the data sets have performed well and produced an acceptable and anticipated result, deployment is a stage where the product is deployed and becomes available to the end-user. Based on the type of business and industry, deployment is done in either real-time or production-like environment. Some also prefer to deploy the application on a batch basis to ensure maximum acceptance.

Data Wrangling

In this era of information, organizations with the most data have more power. Generally, it is not easy for the companies to monitor and manage such millions of data at the same time. Data, as it is widely perceived, is a collection of facts that is converted into a form that a computer can comprehend (Makaranka, 2018). It is a world known fact that data dominating the world of irrespective of whichever industry you are currently associated with.

From finance to healthcare, from real estate to education, from information and technology to government organization, almost each industry of the current world is driven and controlled by the phenomenon called Data (Endel, 2015).

We, as humans, communicate and understand everything around us in form of information communicated with each other, while computers understand the same information in the form of the machine languages. So, the machine-readable information is known as Data. It has been a general observation in the corporate culture that a significant number of hours by most of the managers is spent in the hunt of finding the relevant data (Upside Staff, 2018). The data processed and stored by the companies are raw. Data is an information if used properly and efficiently could be modified in a productive information that would enhance the overall knowledge regarding the overall process. Through data, you can not only know what has happened so far but can increase your services and products through deeper insights hidden inside the raw data. So, eventually, the Data will be an enormous knowledge hidden in a form of numbers and statistics.

If utilized properly, Data can help a business improve exponentially. The current scenario of the industries demands quick adaptation and response to the data received in the backend. Though it might be possible that the functioning of a company may take some time to react to a set of data (Makaranka, Real-Time Analytics: Challenges and Solutions, 2018). However, the managers should develop an intelligent way to process tons of data accurately and efficiently.

In the field of Data Science, where the Data is in more complex forms with hundreds and thousands of variables, it becomes extremely important for researches and business intelligence officials to cut down the unnecessary information, clean the data sets and come up with most helpful values that can ensure the successful execution of the project (Owl, 2018)

As discussed in the lifecycle of the Data Science project, phases like Data Understanding and Data Preparation determines of the whole procedure. Data Understanding and Data Preparation both are collectively known as Data Wrangling. As the name of the project has the phrase Data Wrangling, this phase of the lifecycle is one of the most crucial and important parts of the Data Science. Data wrangling is the procedure of converting the raw data into more appropriate and understandable data set so that they can be better utilized for real-time application purposes. Globally, there are more and more data generated and stored in databases across the world. Hence, it becomes imperative that those messy and complicated data sets should be redefined and utilized for deeper analytical purposes. The challenge is to find the proper data acquisition channels which are the sources of the data. Then, run a fast filtration that would not only convert the raw data but make it more informative and user-friendly for the analysts.

The idea here is to gather as many data as possible and reveal the most important parts of the same. Since the sources of the data gathered are different from each other, it would happen that the format and the variables covered in every data sets are different from each other. It is also possible that in a few datasets there are values which are no longer valid or corrupted for technical reasons. It might happen that some of the values are not available in the first place. So, it is necessary to keep every possible scenario in mind before going ahead (Houlihan, Data wrangling. , 2016). There are third-party tools available that can covert the data sets which are available in the different formats to our most-liked format to get a unified data set. However, the unavailable values and corrupted variables are still present in the global format. Hence, in that case, there is a necessity for an expert researcher who can fill those blank and invalid values for obvious purposes. The person might have to manually edit a few data entries so that the overall procedure can work smoothly afterward.

Most of the data scientists would accept the fact that most of their time and resources are invested in the data filtration, data clearing, wrangling instead of coding the module that would use those data. Data Wrangling provides the integrity to the data. After the whole data wrangling procedure, you will get a set of information that is consistent and solution oriented (Kauermann, 2018). The old and invalid data will be wiped out through the process. Based on your project and outcomes, it might be possible that you would like to add your variable and set of information inside the final data set. Again, in the tedious tasks of data cleaning, adding your own variables might increase the complexity of managing huge information altogether. Hence, Data Wrangling overcomes this by giving the analyst full control over what to include in the final data set (Upside Staff, 2018)

In an organization, there might come a scenario where you need to explain those complex data sets that you have received from your colleagues, managers, or stakeholders. It is evident that not everyone is familiar with analysing such a large data sets except someone who is a data scientist or data analyst (Upside Staff, 2018). At that time, convey your message and your handwork to your employees, management, stakeholders smartly are what makes a huge difference to the future of your project. Hence, data wrangling will make your complicated data to comprehensible and actionable data sets that would convey your message to the stakeholders.

DATA Quality

Data is a set of values. It is the information that is gathered for reference or analysis purposes. Data is the one thing that drives the major industries and sub-industries of information and technology. Without Data, the whole world will be in hibernate mode as in this ever-changing world, data is the fundamental element behind everything. Here, the data which is referred to is the data stored on the internet or user-generated information that is available for the research purposes (Ostertag, 2010). Data is the information that data scientists use while developing a new product or executing a project. Without the data, the analysts cannot conquer anything productive because they do not have any standpoint from where they can compare their outcomes.

The quality of the data is also equally important. When the domain is small and the engagement is optimum, you can expect quality data. However, as the domain expands, the consistency of the data quality becomes significant. If the data that is gathered is reliable for the real-world applications, it is known as quality data (Houlihan, Data wrangling , 2016). The higher the quality of the data is determined, the higher its positive impact on the execution of the project. However, the data can also be considered as good quality data if it can reflect the real-world scenarios accurately and efficiently (Cleveland, 2014). That representation of the real world can be utilized to improve the quality of the project. It will be at another level if such accuracy is achieved through data quality.

In a data science project, the overall data that is gathered from the enormous sources are might be irrelevant at the time due to obvious reasons (Cooper, 17). The information collected from such scenarios have the different format, corrupted values, unfulfilled values in the database, etc. Anything that is lagging or any irrelevant information will reduce the quality parameter of the data. Hence, there will be basic cleaning procedure that should be performed to ensure the quality of the data.

Why Quality data is required?

Quality data can harness analysts and data scientists in analysing the information stored in the datasets and apply their knowledge into their projects. This will elevate the quality of your project to a whole new level (Hiltbrand, 2018). If your data is of worthy quality, there will be productive variables and values of the database entries that will allow you to react to a real-time scenario. As already stated, better the data reflect the real world, the higher its chances to be considered as top-quality data. Hence, if it reflects the real-world scenario that positively or negatively affecting your product at the end, you can react spontaneously about whether you want to change something or improvise the existing condition of the project (Dooley, 2018)

If you ever want to audit your project, quality data can be the benediction. Because the quality data will make your efforts to half by ensuring maximum return of investment. Hence, it is advised to always ensure gathering quality-rich information (Caplin, 2017).

What is an advisory system?

Advisory system, as the name goes is the system that helps to solve a problem. Such systems target those issues which are usually handled by humans who dedicates his time and intelligence for solving problems. (Grow, 2018). The advisory system overcomes resource investment because they are smart enough to cope with the intelligence required to solve the problem that may arise in a specific scenario. So, the advisory system can also be defined as the expert system. These types of system replicate the human intelligence by come-up with real-time solutions. Hence, experts’ opinion can be easily substituted thanks to the excellent ability of the advisory system to come up with the solution (Little)

The real challenging phase comes when we need to develop such an advisory system for a particular type of project. Ideally, the human intelligence of solving problems has to be coded and developed into the machine language that is scalable enough to cope with the many issues that might arise during the execution. Though, the code needs to be robust enough to cope with similar potential scenarios that would have been present otherwise (Cali?, 2017). So, if you are developing the module, your code must be efficient enough and make the machine so intelligent that it can cover several scenarios in a single snippet.

Boon of advisory system to novice users

Consider somebody who is advice-giving individual to a decision maker of the team. Here, the challenge for the decision maker is to identify the presence of a problem before giving any decision regarding the problem. It usually happens in the hierarchy of the company that the decision maker person is attached with the management. Means, he has other major roles to look for along with decision making for projects. Hence, it might be possible that he would not get enough time to look for problems, come up with a solution and give confirmation on the solution to that problem. In this case, the advisory system can be extremely beneficial for the decision maker as half of the job done by the system itself. All he must concern about the solution that the system can find and if he finds it a perfect fit for the matter, he can give the final confirmation (Bluttman, 2009)

However, one of the major obstacles is to develop a robust advisory system that has the best cognitive and reasoning abilities to solve any problem that may arise in the project. Its job is to find the options for alternative decisions that will have the highest chances to turn up as the most desirable output at the end (J., 2015). The role of the advisory system is to provide the solution of the open-ended scenarios. In other words, it can be considered as a highly skilled architecture that gives intelligent solutions to the unstructured problems. Hence, the decision maker has to be experienced enough to decide if he must accept the suggested solution of the advisory system. Based on his skills, there will be chances of the project will be capable enough to deliver expected results. (J, 2018)

But with the passage of time, the real-time experiences will help a novice analyst to adapt and evolve in the system (Bera, 2017). A skilled analyst can grasp the functioning of the system faster with the help of the smart advisory system. His job will be half done if there is a trustworthy advisory system constantly monitoring the potential problems that might occur soon. Hence, the system will be able to forecast the future before happening and apply necessary changes instantly thanks to the fine understanding between the decision maker and the system itself.

If everything is going well, a fresher analyst will have a thorough understanding of the whole procedure instantaneously and he can come up with a creative and efficient idea that might change the direction of the whole procedure (Nagrecha, 2015). The advantage of a new-comer in any organization is that they are filled with enthusiasm and courage. If given proper assistance and help at the right time, these people can create wonders in the enterprise. A smart machine and a passionate analyst is the best combination an organization can expect at this time.

Why Advisory System should be developed as a Web Application

Since the evolution of the internet, websites and the internet itself has become an integral part of the lifestyles of the people (Biewald, 2018). It has been observed that a significant number of not only professional individuals but college teenager accesses the internet and surf the web through websites (Bembenik, 2013). This clearly shows the how comfortable people are with this architecture of finding the information on the internet from anywhere anytime with just a few taps and clicks on the smart device.

The major unique selling point of any web application is that it reduces the cost of the project. There is no need to go for any physical installation wherever and whenever you want to run a web application. Every computer system owns a web browser today. Just go to the address by typing the required URL and just hit enter and you are there at a single click.

The reason for going for a web application is because of its availability of 24×7 and 365 days. No need to opt for any specific hours of the day in order to get the job done. Another point is that for using a web application, you do not need any specific hardware. Just a standard PC or a smartphone is a good fit for your objective.

You can access the web application anywhere and anytime in the world. As already mentioned, whenever you want the advisory system just go to its web application irrespective of your time zone and geographical location (McGuinness., 2001)

A web application mostly supports centralized data storage structure. Hence, anything you do on the web will be stored in the repository or the cloud from which you can access the files even in the coming time (Kauermann, 2018).

References

Andrew McAfee, E. B. (2012, October). Harvard Business Review. Retrieved from Harvard Business Review: https://hbr.org/2012/10/big-data-the-management-revolution

article, R. T. (2018). Demand For Data Scientists Surge By 400% In India. Retrieved from www.businessworld.in: https://www.businessworld.in/article/Demand-For-Data-Scientists-Surge-By-400-In-India-/11-07-2018-154540/

Bembenik, R. (2013). (2013).Intelligent tools for building a scientific information platform. Berlin:. Springer.

Bera, S. D. (2017). Life cycle reliability analysis using imprecise failure data.Life Cycle Reliability and Safety Engineering,. Springer, 6(4).

Biewald, L. (2018). The data science ecosystem part 2: Data wrangling. Retrieved from www.computerworld.com: https://www.computerworld.com/article/2902920/the-data-science-ecosystem-part-2-data- wrangling.html

Bluttman, K. (2009). Access Hacks. O’Rieley Media.

Brooks-Bartlett, J. (2018). Here’s why so many data scientists are leaving their jobs. Retrieved from towardsdatascience.com: https://towardsdatascience.com/why-so-many-data-scientists-are-leaving-their-jobs-a1f0329d7ea4

Cali?, A. W. (2017). Data Analytics. Cham. Springer International Publishing.

Caplin, A. (2017). INTRODUCTION TO SYMPOSIUM ON “ENGINEERING DATA ON INDIVIDUAL AND FAMILY DECISIONS OVER THE LIFE CYCLE. Economic Inquiry,, 56(1).

Chi, Y.-L. (FEB/2007). Elicitation synergy of extracting conceptual tags and hierarchies in textual document. Expert Systems with Applications (EXPERT SYST APPL), 349-357.

Chiff, M. (2018). Timely Information: How Current Is This? Retrieved from https://tdwi.org/articles/2018/03/02/bi-all-timely-information-how-current-is-this.aspx

Cleveland, W. a. (2014). Divide and recombine (D&R): Data science for large complex data.Statistical Analysis and Data Mining:. The ASA Data Science Journal,, 7(6).

Cooper, J. a. (17, 4). Commentary on issues in data quality analysis in life cycle assessment. The International Journal of Life Cycle Assessment,, 2012.

Dooley, B. J. (2018). Humans in the Loop for Machine Learning. Retrieved from https://tdwi.org/articles/2018/07/09/adv-all-humans-in-loop-for-machine-learning.aspx

Endel, F. a. (2015). Data Wrangling: Making data useful again. IFAC-PapersOnLine, 48(1).

Gang Lv, C. H. (2016). Research on recommender system based on ontology and genetic algorithm. Neurocomputing, Volume 187, 92-97. doi:https://doi.org/10.1016/j.neucom.2015.09.113.

Grow, G. (2018). Reducing the Impact of Bad Data on Your Business. Retrieved from https://tdwi.org/articles/2018/07/06/diq-all-reducing-the-impact-of-bad-data.aspx

Hiltbrand, T. (2018). 5 Advanced Analytics Algorithms for Your Big Data Initiatives. Retrieved from https://tdwi.org/articles/2018/07/02/adv-all-5-algorithms-for-big-data.aspx

Houlihan, P. (2016). Data wrangling .

Houlihan, P. (2016). Data wrangling. .

J, L. (2018). Advancing science and technology with big data analytics.Statistical Analysis and Data Mining. The ASA Data Science Journal,, 11(3), 97.

J., L. (2015). Big Research Data and Data Science. Data Science Journal, 14.

Jacquette, D. (2014). Ontology. Abingdon, Oxon: Routledge. Retrieved from Ontology. Abingdon, Oxon: Routledge.

Karen. (2017). How to minimise data wrangling? Retrieved from https://dzone.com/articles/how-to-minimize-data-wrangling-and-maximize-data-i-1

Kauermann, G. a. (2018). Data Science: a proposal for a curriculum. International Journal of Data Science and Analytics.

Little, D. (n.d.). Fallibilism and Ontology in Tuukka Kaidesoja’s Critical Realist Social Ontology. Journal of Social Ontology, 1(2).

Loat, N. (2015). What is Data Wrangling . Retrieved from https://www.datawatch.com/what-is-data-wrangling/

Lohr, S. (2012, February 11). The age of Big Data. Retrieved from N.Y. Times: https://www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-the-world.html

Lohr, S. (2018). For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights. Retrieved from https://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-work.html

Makaranka, I. (2018). Real-Time Analytics: Challenges and Solutions. Retrieved from https://tdwi.org/articles/2018/06/15/adv-all-real-time-analytics-challenges-and-solutions.aspx

Maxim, G. (2018). Data Digest: Data Science Platforms, Job Ranking, and Marketing. Retrieved from https://tdwi.org/articles/2018/03/06/adv-all-gartner-data-science-0306.aspx

McGuinness., N. F. (2001). “Ontology Development 101: A Guide to Creating Your First Ontology”. Stanford Knowledge Systems Laboratory Technical Report , Stanford Medical Informatics Technical Report SMI-2001-0880, March.

Nagrecha, S. a. (2015). Quantifying decision making for data science: from data acquisition to modeling. EPJ Data Science , 5(1).

Ontologies: Practical Applications. (2018). Retrieved from Datasciencecentral.com : https://www.datasciencecentral.com/profiles/blogs/ontologies-practical-applications

Ostertag, S. (2010). Processing Culture: Cognition, Ontology, and the News Media1 . Sociological Forum,, 25(4), 824-850.

Owl, M. (2018). Data Stories: Simplification and Abstraction. Retrieved from https://tdwi.org/articles/2018/03/14/bi-all-visualization-simple.aspx

Cimiano, A. H. (2005). Learning concept hierarchies from text corpora using formal concept analysis. Journal of Artificial Intelligence Research, 24:305–339.

Pierson, L. a. (2017). Data science. Hoboken, NJ: John Wiley and Sons, Inc.

Press, G. (2016, 03 23). forbes.com. Retrieved from https://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/#732c23c36f63

Schutt, R. a., & Neil, C. (2004). Doing Data Science .

Sean Kandel, J. H. (2011). Research directions in data wrangling: Visualizations and transformations for usable and credible data. SAGE Journals.

Sean Kandel, J. H. (2011). Research directions in data wrangling: Visualizations and transformations for usable and credible data . Information Visualization , Vol 10, Issue 4, pp. 271 – 288.

Sean Kandel1, J. H. (2011). Research directions in data wrangling: Visualizations and transformations for usable and credible data. SagePub.

Tei, P. (2017). Fewer Flights, Bigger Delays and a Bad Year for JetBlue: 17 Charts on the U.S. Airline Industry in 2017. Retrieved from https://medium.com/towards-data-science/data-science/home

Tye Rattenbury, J. M. (2017). Principles of Data Wrangling: Practical Techniques for Data Preparation. O’Reilly Media, Inc.

Upside Staff. (2018). Data Digest: Applying Machine Learning, Deep Learning, and AI. Retrieved from https://tdwi.org/articles/2018/06/28/adv-all-applications-0628.aspx

Upside Staff. (2018). Data Digest: Projecting Winners and Monitoring Screen Time. Retrieved from https://tdwi.org/articles/2018/06/19/adv-all-odd-applications-0619.aspx

Wang, J. (2016). Big data cloud, mining and management. International Journal of Data Science .

Zhao, L. a. (2014). Ontology Integration for Linked Data. Journal on Data Semantics , 3(4), 237-254.

Zheng, W. (2017). Data Wrangling Versus ETL: What’s the Difference? Retrieved from https://tdwi.org/articles/2017/02/10/data-wrangling-and-etl-differences.aspx

Turn in your highest-quality paper
Get a qualified writer to help you with

Get high-quality paper

NEW! AI matching with writer

Order an Essay Now & Get These Features For Free:

Turnitin Report

Formatting

Title Page

Citation

Outline

Place an Order