Identifying and Collecting Relevant Data Sources
This one of the most essential section of project research. For one to be able to carry out the experiments and results analysis effectively, selection of the most appropriate sources of data is inevitable. Therefore, the first task I undertook was to collect the data relevant to my research. These data were to be used later in my experiments. To do that, I first identified the relevant data sources and from all those sources, I picked the ones that seemed the most appropriate for my research. After that I went on with the recording the data from those sources, in respective tables for future references. I later stored all these records in my computer as it was a safer method of storage (Polkinghorne, D., 2005, 137).
In order to ensure efficiency in my work, I found it important to gather as much information as I could, regarding my research problem. I also made sure to choose data sources that differed from each other in one way or the other. That is, in choosing the source of data, I ensured that the people involved for example, were from different places, different age sets, different sex, different levels of education and of different life styles (Doan, A.,2001,509-520).
For ease in the process of data recording, I created a table that consisted of all the fields that I intended to have information on (Duriau, V., 2007, 5-34). The table is as shown below:
Table 1– Data Collection
Name of the data source |
Data source organization (Public=universities, Companies, stadiums, market places and malls) |
Description of the data |
The format of the data. |
Charges for the data Free |
Target data source |
Data 1 |
public |
How many people are aware of the wireless home security systems? |
Txt |
Free |
Yes |
Data 2 |
public |
How many people have wireless home security systems in their homes? |
txt |
free |
yes |
Data 3 |
public |
How many people use the wireless home security system efficiently? |
Txt |
Free |
Yes |
Data 4 |
public |
How many people like this type of home security? |
Txt |
Free |
Yes |
Data 5 |
public |
How many people have a feeling that this type of home security is sufficient for them? |
Txt |
Free |
Yes |
Data 6 |
public |
How many people can afford this type of security? |
Txt |
Free |
Yes |
In order to keep track of my research work, I recorded all relevant raw data exactly as gotten from the sources, on a different table. I then saved these raw data inform of text files in my computer on a specific location (//raw data//), for later use I the experiments that I would later carry out (Bennett, J.,2013, 8-19). The table below elaborate further.
Data Storage.
Data Source Name |
Date Collected |
Location of the Saved files |
Name of the Saved File |
Format of the Saved File |
Number of data records |
Data 1 |
10/06/2018 |
//raw data// |
Survey1.txt |
.txt file |
2000 |
Data 2 |
11/06/2018 |
//raw data// |
Survey2.txt |
.txt file |
1000 |
Data 3 |
12/06/2018 |
//raw data// |
Survey3.txt |
.txt file |
2000 |
Data 4 |
13/06/2018 |
//raw data// |
Survey4.txt |
.txt file |
500 |
Data 5 |
14/06/2018 |
//raw data// |
Survey5.txt |
.txt file |
600 |
Data 6 |
15/06/2018 |
//raw data// |
Survey6.txt |
.txt file |
3000 |
Data 7 |
16/06/2018 |
//raw data// |
Survey7.txt |
.txt file |
1000 |
In this phase, I divided the main activity smaller sub-tasks that were easy to handle hence reducing the complexity of the task. These sub-tasks include data pre-processing, dimension reduction, design and implementation processes (Paxton, P., 2001,287-312).
In most case, raw data is filled with noise factors, missing values and inconsistency. Due to this fact, in most cases the pre-processing of the raw data is found to be inevitable. This process is aimed at increasing the quality and correctness of the data, which would therefore improve the quality and the efficiency of the expected experimental results. This is one of the most crucial steps in the preparation of data for the experimental activities in a research. This process entails preparation and transformation of the initial dataset. These data pre-processing methods are divided into various categories. In my analysis, I used the various categories including, data cleaning, integration of data, data transformation process and reduction of the data (Kuhn, M.,2013,27-59).
Data Recording and Storage
This process is essential in ensuring that the data awaiting to be analyzed is complete –with no lacking values or certain attributes of interest, error-free and consistent. I carried out this task to eradicate errors from and ensuring that the data analyzed was complete and consistent hence providing dependable results. It involved; ignoring the tuple in the raw data, the process of filling in the missing values manually, filling the missing values by use of global constant, use of the attributes’ mean in filling the missing values, using attributes’ mean for all samples belonging to the same categories and the use of the most probable value to fill in the missing values in the data (Raman, V, 2001, 381-390). The figure below elaborates further, the process of data cleaning:
In this section, I combined all related data from different sources into a one whole or rather into a coherent data and stored it in a specified file location. Among other problems faced in conducting this task is the entity identification problem. That is, it was a bit difficult to match-up the like raw data entities from various data sources into one whole. However, despite the problems, I managed to carry out the task efficiently (Lenzerini, M, 2002, 233-246). The figure below elaborates further in a lay mans’ language, the process of data integration:
This process involved the consolidation of data into more appropriate forms which would make the “later” processes easy to undertake. The various tasks that were associated with this process included; the normalization of data – here the attribute data are topped in order for them to fall under a small set range, say, -2.0 to 1.5, 0.0 to 2.0 or 2.0 to 3.0, work smoothing to eradicate noise or error content and aggregation process, generalization of the data – the primitive raw data are replaced with the concepts that fall under some higher level categories (Osborne, J,2002,10-15).
Analysis of very complex and huge amount of data may consume a very long time. This is all because any one carrying out a data analysis process would like the result they finally acquire to be dependable and efficient and as expected. Therefore, I found it important to reduce the data by use of the appropriate data reduction techniques so as to analyze the reduced representation of dataset compromising the integrity of the original and yet ensure the quality expected is met (Masse, L,2005, 44-55). In order to ensure the above mentioned characteristics are met, I employed the various data reduction strategies including;
- Data cube aggregation – here the aggregation operations are used in data cube construction.
- Dimension reduction – this involved the removal of all the totally irrelevant data, less relevant data and removal of unnecessarily duplicated data.
- Data compression – here, I applied wavelet transform and principal component analysis encoding mechanisms to reduce the sizes of the data.
- The Numerosity-reduction, where data is estimated by alternatively smaller data presentation.
- Concept hierarchy generation – here, I replaced the raw data attribute with a higher conceptual levels.
Data Pre-Processing and Dimension Reduction
As the data I had collected were not all useful, I had to carry out this process of selecting the data with the highest quality and the most complete ones. I then created a table and filled with the data that result that I had gotten from the selection and the reduction of the dimensions, and saved it in form of a text file in a specific file location for use in the later stages (Liu, H,2010,4-13).
Date |
Data Source Name |
Purpose of Pre-processing |
Pre-processing Method |
No. Original Data records |
No. Resulting Data Records |
No. Original Features |
No. Resultant Features |
Resulting Data File Name |
15/06/2018 |
Data 1 |
Cleaning the missing data |
Pre-fill the missing values |
2000 |
2100 |
20 |
20 |
Result_Data1.txt |
15/06/2018 |
Data 2 |
Cleaning data noise |
Data filtering |
1000 |
900 |
8 |
10 |
Result_Data2.txt |
15/06/2018 |
Data 3 |
Avoid duplication of data |
Data reduction by cleaning |
2000 |
1800 |
10 |
6 |
Result_Data3.txt |
15/06/2018 |
Data 4 |
Filtering the data |
Data reduction |
500 |
200 |
4 |
2 |
Result_Data4.txt |
15/06/2018 |
Data 5 |
Data reduction |
600 |
300 |
10 |
10 |
Result_Data5.txt |
|
15/06/2018 |
Data 6 |
Feature selection |
Data integration |
3000 |
2000 |
30 |
20 |
Result_Data6.txt |
15/06/2018 |
Data 7 |
Cleaning missing values |
Data integration |
1000 |
800 |
10 |
10 |
Result_Data7.txt |
In the methodology section I had proposed the use of qualitative and quantitative methodology approaches for my research. I researched on both of the proposed approaches and I found it more appropriate to combine both approaches in my research. For the purpose of time saving and efficiency therefore, I decided to use the hybrid methodology approach for my research. This approach combines the features of both Qualitative and Quantitative approaches thus more competent as compared to each of the two approach separately.
For efficiency of my research’s results, I took a step of data collection, that is, I collected as much information regarding the wireless home security systems from the various search engines (Google, Opera mini, Ms. Edge etc.), compared the available systems and from it, I derived my hypothesis. These hypotheses then helped me formulate an effective questionnaire, which consisted of six questions. These questions formed the basis of my research – survey (Gerber, A, 2012, 5-10) .
In deciding who I was to present my questionnaire to during the survey, I mainly considered the age factor, gender, educational levels, the lifestyles adopted by the various respondents and the geographical locations of the respondent. In my survey, I adopted the statistical method of data recording as shown below.
Date |
Experiment |
Purpose of Experiment |
Description of Procedures |
Input Data |
Expected output Value |
Name of the Resultant File |
14/06/2018 |
Experiment 1 |
Evaluating the methods used in the research |
Comparing the accuracy of two data analysis methods. |
Pre-processed Data. |
Which of the methods gives a more accurate results. |
Result1.txt |
14/06/2018 |
Experiment 2 |
Evaluate the hybrid method |
Comparing the raw data collected with the hypothesis set |
Raw data collected and hypothesis set. |
Which one of the two is correct. |
Result2.txt |
With the rapidly growing technology, there is a considerable raise in clams that occur in our homes on daily basis. This therefore calls or rather mean that something should be done to improve the security levels in our homes. With that in mind, it’s thought that implementation of wireless home security system is one of the best way to ensure the issue of home insecurity is dealt with appropriately. However, its argued that this may or may not be largely adopted by as many home owners maybe, due to lack of awareness and other related factor. In order to determine whether or not and the idea of implementing such system is effectively adopted and if adopted then, by how many people, I decided to undertake a survey from various places (Gerber, A,2012,16-20). I therefore formulated a questionnaire and went on field to gather the data I needed. I then carried out a series of experiments, on the data I had collected from the field.
Data Cleaning, Integration, and Normalization
All processes carried out during the experiment were performed in homogeneity manner in order to avoid or rather minimize the errors that would have otherwise resulted from the experimental activities. In addition, the research regarding the existing home security systems was carried out in early stages of the research and therefore, I had already begun building the expectations towards the experimentation results. These experiments then provided me with the expected results, in a precise manner.
The questionnaire that I formulated for use in my survey were formed on the basis of the Demographic information contained in the table below.
Gender |
Male Female Both |
The Range in Age |
26-35 36-45 46-55 56 and over |
Educational Levels |
College Diploma Bachelors’ Degree Undergraduate Graduate Masters PHD |
The Lifestyle of the Respondent |
Poor Middle-Class Rich Very Rich |
Occupation |
Student Housewife Business Man/Woman Employed Any Other |
The table below show the questions that were manifested in my questionnaire that I used during my survey.
Survey Question.
Question Number |
The Question |
Question 1. |
How many people are aware of the wireless home security systems? |
Question 2. |
How many people have wireless home security systems in their homes? |
Question 3. |
How many people use the wireless home security system efficiently? |
Question 4. |
How many people like this type of home security? |
Question 5. |
How many people have a feeling that this type of home security is sufficient for them? |
Question 6. |
How many people can afford this type of security? |
Question 7. |
In designing the above survey questions, I found the demographic information stated above very useful. These questions made my survey process to run in a smooth manner and as I expected. The survey met all the conditions set for the process and the collected data were of high quality.
This phase entails the implementation of the designed survey. In this phase, I recorded all the for the runs, in a table for future use. By this, I ensured that all the information obtained from each run were properly recorded in a systematic manner, in their respective table fields (Bakeev, K, 2010,23-30). The table below elaborates further the process.
Table 7 – Demographic Records and the Total Number of Participants.
Structures |
Percentage(%) of response (R) |
Gender: · Male · Female |
1700(54.84%) 1400(45.14%) |
Age-range · 26-35 · 36-45 · 46-55 · 56 and above |
930 people (30%) 1550 people (50%) 465 people (15%) 155 people (5%) |
Level of Education · College Diploma · Bachelors’ Degree Undergraduate · Graduate · Masters · PHD · Others |
310 People (10%) 310 people (10%) 930 people (30%) 620 people (20%) 620 people (20%) 310 people (10%) |
The Lifestyle of the Respondent · Poor · Middle-Class · Rich · Very Rich |
465 People (15%) 1550 People (50%) 620 People (20%) 465 People (15%) |
Occupation · Student · Housewife · Business Man/Woman · Employed · Others |
620 People (20%) 465 People (15%) 930 People (30%) 930 People (30%) 155 People (5%) |
In this phase I created a table of the questions that I had used in my survey against number of responses for each question. In this table, I expressed the responses to various questions, each as a percentage of the total responses gotten from the field during the survey process.
Survey Question |
Responses to the Question |
How many people are aware of the wireless home security systems? |
1600 (51.61%) |
How many people have wireless home security systems in their homes? |
2200 (70.97%) |
How many people use the wireless home security system efficiently? |
1000 (32.26%) |
How many people like this type of home security? |
2500 (80.65%) |
How many people have a feeling that this type of home security is sufficient for them? |
2900 (93.55%) |
How many people can afford this type of security? |
1200 (38.71%) |
For the purpose clarity I presented the responses in for of bar graphs and pie charts as shown below
With the current change in the technology, security issues that faces our homes tend to increase in numbers. For those with good exposure to the technology, this is issue is not a big deal to them. This is due to the fact that they are aware of the automatic and remotely controlled security systems that can be implemented in their home. This though is only effective to those people with awareness and capital to implement such systems. This means that there are those people who are aware of the wireless home security systems, they are very much willing to implement them but they are limited by financial factors.
Data Reduction Techniques
On the other hand, there are those people are not limited by any factor –for instance the financial factor, they would wish to do anything in order to secure their homes yet they don’t have these systems implemented in their homes. This is generally due to lack of knowledge or rather, due to lack of awareness. This category of people need to be informed so as to ensure they are left aware of the system and that the only part left for them is making choices of whether or not to employ such systems in their home. Now, the big question remains, “who will really teach them”?
In the society still, there are those people who have implemented these systems but do not use them efficiently in order for them to get the best results out of such systems. This means, they are very much aware of what to do in order to counter security issue, they have shown interest in taking such measures by implementing such security system but once the systems are in place, these people don’t really care about the operations of these systems. As a result, they make little or no follow ups regarding the updating of these security system. In this case, the systems are of no use and it’s better if they were not installed at all (Pfaffl, M, 2002,36).
I conducted a survey whereby I selected some specific feature that people in the society could have and following those features, I formed five group of people to target I my survey process. I formed these groups on the basis of age, Educational levels, occupation, gender and their lifestyles.
From my survey: 51.61% of the people were very much aware of the existence of wireless home security systems; out of the total number of people who knew about these systems, 70.97% had implemented the systems in their homes; Only 32.26% of the people with the systems in their home used the systems efficiently to get good results; 80.6% of the participants like the way these systems work; 93.55% of these people believe that these systems are sufficient enough to run their homes but only 38.71% of the participants could afford to run such system in their homes, in an efficient manner and get the best outcomes.
References
Polkinghorne, D.E., 2005. Language and meaning: Data collection in qualitative research. Journal of counseling psychology, 52(2), p.137.
Doan, A., Domingos, P. and Halevy, A.Y., 2001, May. Reconciling schemas of disparate data sources: A machine-learning approach. In ACM Sigmod Record (Vol. 30, No. 2, pp. 509-520). ACM.
Duriau, V.J., Reger, R.K. and Pfarrer, M.D., 2007. A content analysis of the content analysis literature in organization studies: Research themes, data sources, and methodological refinements. Organizational research methods, 10(1), pp.5-34.
Bennett, J.C., Violin Memory Inc, 2013. Method and system for storage of data in non-volatile media. U.S. Patent 8,452,929.
Paxton, P., Curran, P.J., Bollen, K.A., Kirby, J. and Chen, F., 2001. Monte Carlo experiments: Design and implementation. Structural Equation Modeling, 8(2), pp.287-312.
Kuhn, M. and Johnson, K., 2013. Data pre-processing. In Applied Predictive Modeling (pp. 27-59). Springer, New York, NY.
Raman, V. and Hellerstein, J.M., 2001, September. Potter’s wheel: An interactive data cleaning system. In VLDB (Vol. 1, pp. 381-390).
Lenzerini, M., 2002, June. Data integration: A theoretical perspective. In Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (pp. 233-246). ACM.
Osborne, J.W., 2002. The Effects of Minimum Values on Data Transformations.
Masse, L.C., Fuemmeler, B.F., Anderson, C.B., Matthews, C.E., Trost, S.G., Catellier, D.J. and Treuth, M., 2005. Accelerometer data reduction: a comparison of four reduction algorithms on select outcome variables. Medicine and science in sports and exercise, 37(11 Suppl), pp.S544-54.
Liu, H., Motoda, H., Setiono, R. and Zhao, Z., 2010, May. Feature selection: An ever evolving frontier in data mining. In Feature Selection in Data Mining (pp. 4-13).
Gerber, A.S. and Green, D.P., 2012. Field experiments: Design, analysis, and interpretation. WW Norton.
Bakeev, K.A. ed., 2010. Process analytical technology: spectroscopic tools and implementation strategies for the chemical and pharmaceutical industries. John Wiley & Sons.
Pfaffl, M.W., Horgan, G.W. and Dempfle, L., 2002. Relative expression software tool (REST©) for group-wise comparison and statistical analysis of relative expression results in real-time PCR. Nucleic acids research, 30(9), pp.e36-e36.