Validity and Reliability
Discuss About The In Educational And Psychological Assessment?
Even though case studies and research are resourceful in assessing the applicability of various theories of development, lack validity and reliability can render the entire study inoperative. The study must pass the various test to the validity and reliability for the ensuring of their practicability. The various of measurements of assessing the quality of studies require the investigators to quantify the abstracts¸ intangibles and hypothesis that are not observable. Despite that, the quantification must be drawn from other inference methods. Therefore, this paper looks the various principles that lecturers can follow to assessing and ensuring the validity and reliability of the scope of their workplace. In general, validity refers to the extent to which the measurements of the actual hypothesis that it was intended to measure. On the other hand, reliability looks the extent of consistency of the prediction of the obtained.
For any investigational treatment to reach its objectives, the investigators must statistically establish that the design of the study and its execution ruled out all alternatives that could have had an impact on the observed facts. The process filtering these alternative bases is what researchers refer to as assessing the internal validity. The work of (Greenberg, Daniels, Flanders, Eley, & Boring, 2012) states that internal validity is the degree by which the study’s results accurately reflect the exact situation of the studied population. The work continues to state that where the proved results show variation to the practical situation in the studied population, then there is no reason for believing that the results apply to the rest of the population (external validity). In other words, researchers must make sure that the results are caused by the provided treatment in the study but not any other cause. On the part of external validity, (Ihantola & Kihn, 2011) states that a study is said to have external validity if its obtained results obtained can apply to the real world in other analogous programs and methodologies. According to (Matthews & Kostelis, 2011), external validity is capable of affecting the observers’ abilities to credit the research findings with generality depending on the used procedures.
Arguably, a study is valid if it stuck to the sole purpose of that study, the core drive that sparked the idea of that test. A test showing higher validity means it did not divert from the first intention or the job it was intended to treat. A test with poor or low validity means it diverted from its first intention and that means it lacks justification for its use. Therefore, no justification of relying on such results. About that, it, therefore, means that there must be a method of determining what is valid and what is not. Therefore, this idea introduces another concept of tests to validity.
To know whether the study has validity or not, a lecture must know how to test the validity. It is only after having this knowledge of testing the validity when someone would be able to ensure and maintain a valid study. The work of (Anthoine, Moret, Regnault, Sebille, & Hardouin, 2014)explains this by terming validation as a process that requires the collection of comprehensive evidence that is falling on measurement properties that include construct validity, face validity, consensual validity, predictive validity, content validity, and criterion validity.
Internal Validity
Even though this method serves in measuring validity, it is the weakest method. Face validity is more of a logical test (Culyer, 2014). This method looks at whether the question or study in any way seems to relate to the intended concept (Maroof, 2012). In other words, does the manifest content of the data study instrument really address its concept? For instance, in the following questions on a caregiver study, the first question has higher face validity than the second question. First question; “How many time a week do you check on the patient’s diet? Second question; “Do you love the patient?” One key point to note is that the formation of the content does not matter. What matters is the concept. If the question is about giving care, loving a patient does not seem to relate much to the content.
This one refers to the ‘core’ of the measure. In particular, it is a question as to whether the content of the study instruments reflects a realm of knowledge within the construct (Culyer, 2014). For instance, while conducting a study on parent-child communication, this study has to be addressing all forms of communications. That is, it has to address the written, verbal and non-verbal communication (Greenberg, Daniels, Flanders, Eley, & Boring, 2012). If the study is about studying the effects of tobacco, it should cover all uses of tobacco instead of relying on cigarrette alone. A lecturer applying the content test to ensure the validity can use the most common method of using an expert within the field for assuring whether the content is tapping on the intended study (Drost, 2011). For instance, if someone is studying the impact of stress on student’s performance, he/she can use an expert like a counselor to assess whether the contents of the questionnaire cover the entire domain.
This form of validation is similar to predictive or the concurrent validation. Their difference is that consensual validity uses two steps of validation (Thanasegaran, 2009). The first step is the assessment of self-report questionnaires from the participants. The second step is the comparison of the survey scores against observers’ ratings. The observers must be people who know those participants. This validation approach works by assuming that there are different limitations on both the ratings and the self-report questionnaires. However, this approach may suffer some disadvantage. The first one is that people tend to favor themselves when answering a self-reported questionnaire thus providing too positive information about their impression. The second one is that observers might not be able to have background information of each participant.
Criterion validity is the capacity of the measure or the predictor to predict the outcome or a criterion. In other words, criterion validity deals with validation of a measurement instrument by employing external criterion (Newton & Shaw, 2014). This validity deals with correlating the facets of construct validity by demonstrating how the instruments and the criterion results match the theoretical predictions. The criterion must be a well-accepted concept which is termed as the “Gold Standard” measure. For instance, if the study is measuring the liver rejection during transpant, the “Gold Standard” that rules on matters of transplant rejection of a liver is a histological manifestation of rejection examined from a sample of biopsy tissue sample). If this measure correlates well by showing higher sensitivity and specificity matching the “Gold Standard” like a sample result in the lab, then criterion validation becomes established.
External Validity
According to (Gomm, 2008) criterion validation works by using statistical measures. Where external criterion occurs at a future period of time, such criteria are called the predictive validity. On the other hand, if the external validation is used for the comparison of data acquired after a short while from the respondents, that external criterion is called a concurrent validation.
Like as discussed above, this is a form of criterion validation which addresses the relationship between measure instrument and the criterion assessed but in a future period of time (Gomm, 2008). To restate, it is the extent to which a particular test can provide an accurate prediction of a criterion that is expected to occur at some time in the future. An example of the application of the predictive validity is where it is used to predict a student score for a future class. For instance, prediction can be made for a new intelligence assessment. When the kid is at age 12, the predictor can state that the kid would score higher in the university based on the current prediction results. If this prediction comes true when the kid gets to the university, then that predictive test had validity at the time of prediction.
This approach to validity looks at the extent to which the test captures a particular theoretical framework and the way that test overlaps with other facets of validity (Burton & Mazerolle, 2011). The idea is that construct validity is more of whether ensuring that the test scores readings are consistent with the general knowledge of the theoretical framework and existing observations. What this means is that test must be based on an existing phenomenon (Matthews & Kostelis, 2011). For instance, a validity test targeting intelligence must rely on the theories of intelligence. In this case, the more evidence the researcher brings to the test pf the construct validity, the higher the validation. But since no single line that explains what a construct validity for a test looks like, researchers combine different methods and models to make sure that the construct validity of that test is all-xinclusive.
A lecturer must control the threats to both to internal and external validity to increase the confidence with the results. If threats comprise internal validity, the investigators’ confidence in asserting the relationship between variables (dependent and independent) also becomes compromised. On the other hand, if threats affect the external validity, the investigators’ confidence in asserting the applicability of the results to a general population becomes compromised. For a lecturer who needs to ensure and maintain validity of the test, he/she must have the understanding of the major threats and how to control them.
The threat of maturation refers to an alternative explanation formed by natural change (McDermott, 2011). For instance, some problems in people can disappear naturally as they grow. i.e. depression. For instance, suppose a study is being done to find out whether Solution X treats depression. Even if someone does not take the treatment, the depression may still go away. So Solution X may be subjected to other explanations like a natural change. In this case, natural change is the ‘threat of maturation. As a lecture ensuring this threat does not affect the validity, one can introduce a control group. The control group is measured at the same time but not exposed to the hypothesized cause. Then both groups should mature or change to the same degree. Any difference noted between the groups can then connected to the hypothesis being tested.
Forms of Validity
This threat means that the observed effect doesn’t generalize to other periods (Onraet, Van Hiel, Dhont, & Pattyn, 2013). For instance, a study done on the HIV/AIDS in 1990 may show that a low percentage had knowledge about the disease. If the same study is done today, the results will show a greater percentage than the 1990 study.
This threat termed as instrumentation happens when the instrument employed by researcher is changed within the course of the investigation (Yu & Ohlund, 2010). For instance, while testing whether solution X treats depression, the researcher starts with type A of the solution but later introduce type B of the solution may be due to development of the technology. A change in depression might not be attributed to the solution X but to the change of the solution.
Another name for this threat is sensitization. The principles behind this test is that it is possible for a test to affect the people who are being studied. (Yu & Ohlund, 2010). This threat happens when the instrument creates a sensitizing effect on the respondents and the results acquired could be subjected to that sensitivity. For instance, solution x of depression might make people understand their feelings and make them more proactive about improving their emotional state. A lecturer or any other person undertaking a study can eliminate this threat by using a control group. The sensitizing effects will face the two groups leavin the hypothesis as the ony dissimilar effect. However, a control may not help as people do get affected differently. The solution to this threat is adding an experimental and control group that weren’t given a pretest.
This threat refers to any systematic difference in subject characteristics between groups other than the manipulated cause (Polit & Beck, 2010). Unlike the threat of maturation that is solved by introducing control group, this threat cannot be solved by a control group. For instance, in a study to find whether Solution X treats depression, suppose the experimental group was in fact less depressed than in the control group. However, instead of having Solution X only explanation, it could be possible the people in the experimental group were just more physically fit. This systematic difference in the subject characteristics can introduce a different interpretation other than the hypothesis under test. To eliminate this threat and ensure the validity of the test, a lecturer can introduce a method of randomization.
This threat is a combination of the threat of maturation and the threat of selection. This happens when groups systematically differ in their rate of maturation (Polit & Beck, 2010). For example, suppose the effectiveness of the Solution X was examined in an experimental group consisting of volunteers who are open to new things. In contrast, the control group consisted of more conservative people who don’t like change. During the selection, the participant were selected based on their level of depression. If one group shows less level of depression, maybe the lower rate of depression in the experimental could be because the open-minded people naturally got over with their depressive feelings more quickly than conservative people do. This threat can be eliminated by use of randomization.
Content Validity
Another name of this regression is a regression towards the mean. In the realm of research design, this threat of regression comes as a result of selecting subjects by focusing on their high scores or favorable characteristics (Heale & Twycross, 2015). For instance, while testing solution x for depression, if the selection and people who were critically affected by depression, they will be resistant to almost any form of approach. On the other hand, if the selection was composed of people with mild signs of depression, they might be responsive to that solution x.
According to (Barker & Pistrang, 2015), “reliability is the degree to which a research instrument an instrument is another word for a survey. So it’s the degree to which a research instrument produces consistent results.” A good example for this is a scale for checking the weight. The first time the scale shows that a person weighs 110 pounds. After one week, the same scale read that the same person weighs 140 pounds. This scale is not consistent hence it is not reliable. On the other hand, the first reading of the scale reads 110 pounds, and at the second week it reads the same 110 pounds. That scale is reliable. One thing to note is that it is not possible to accurately provide reliability or consistency from a research instrument. Even though, a researcher can use various ways of taking an estimation as discussed below.
This test applies where the different raters provide their feedbacks or decisions, and these decisions are compared to evaluate the judges’ or raters’ consistency. (Heale & Twycross, 2015). For instance, users of an online product can give ratings of feedback regarding the product. A product is either rated the same way by all the users, and sometimes some users give a positive rating while others give a negative rating. Inter-rater reliability compares all of the ratings from each rater or judge to see how consistent they are. If a product always gets positive ratings, then there is good interrater reliability among the raters. If some raters love the product while others do not like the product, there is low inter-rater reliability.
This test is used by first issuing the study instruments such as the questionnaires to a study group at two different intervals. After collecting the results, the investigators compare the results of the two phases and their level of consistency. The comparison aims to see whether both instruments hold similar results (Rumrill, Cook, & Wiley, 2011).
In this method, the investigator provides two instruments of the same version to the group (Picardi & Masick, 2013). The two versions must be measuring the same thing. Then the results from the two versions are compared to determine the consistency of the results between the two versions.
In Split-half reliability test, the investigator provides the people in the studied group with the instrument and the conducts the analysis of the items within the instrument by splitting them into two (Burton & Mazerolle, 2011). The idea is to is to form two groupings of the item instead of two groups of people. After that, the investigator compares each group with the other to assess their consistency.
Consensual Validity
This is when dissimilar instrument items that are intended to measure the identical or similar constructs are analyzed to see their capacity to produce comparable results (Greenberg, Daniels, Flanders, Eley, & Boring, Medical Epidemiology, 2012). Generally, internal consistency test-reliability has two forms. The first one is average inter-item correlation and the other one is split half reliability as discussed above (Drost, 2011). Drost states that the average inter-item correlation works where a group of people takes, then the entire group provides their responses. Once they finish, those responses are compared one after the other in pairs. After that, an overall comparison is done to generate an average of all the comparisons.
Conclusion
This paper aimed at showing the approach that a lecturer can take to ensure that his/her test has internal and external validity in addition to maintaining its reliability. The paper has discussed the various methods of assessing whether a test has validity. In this part, the paper recognized that one must know the methods measuring validity to make it easier for ensuring the same. Also, the paper covered the threats that can compromise the validity. Knowledge of these threats allows someone to avoid them to ensure that the tests have the validity. Lastly, the paper covers the various tests to reliability. Like validity, the knowledge of these tests helps someone to maintain the reliability.
References
Anthoine, E., Moret, L., Regnault, A., Sebille, V., & Hardouin, J.-B. (2014). Sample size used to validate a scale: a review of publications on newly-developed patient reported outcomes measures. Health and quality of life outcomes, 12, 2. doi:10.1186/s12955-014-0176-2
Barker, C., & Pistrang, N. (2015). Research methods in clinical psychology: An introduction for students and practitioners. John Wiley & Sons.
Burton, L. J., & Mazerolle, S. M. (2011). Survey instrument validity part I: Principles of survey instrument development and validation in athletic training education research. Athletic Training Education Journal, 6, 27-35. doi:10.4085/1947-380X-6.1.27
Culyer, A. J. (2014). Encyclopedia of health economics. Newnes.
Drost, E. A. (2011). Validity and reliability in social science research. Education Research and perspectives, 38, 105.
Gomm, R. (2008). Social research methodology: A critical introduction. Palgrave Macmillan.
Greenberg, R., Daniels, S., Flanders, W., Eley, J., & Boring, J. (2012). Medical Epidemiology. McGraw-Hill Education.
Heale, R., & Twycross, A. (2015). Validity and reliability in quantitative studies. Evidence-Based Nursing. doi:10.1136/eb-2015-102129
Ihantola, E.?M., & Kihn, L.?A. (2011). Threats to validity and reliability in mixed methods accounting research. Qualitative Research in Accounting & Management, 8, 39-58. doi:10.1108/11766091111124694
Maroof, D. A. (2012). Validity. In Statistical Methods in Neuropsychology: Common Procedures Made Comprehensible (pp. 5-16). Boston, MA: Springer US. doi:10.1007/978-1-4614-3417-7_2
Matthews, T. D., & Kostelis, K. T. (2011). Designing and Conducting Research in Health and Human Performance. Wiley.
Matthews, T. D., & Kostelis, K. T. (2011). Designing and Conducting Research in Health and Human Performance (1 ed.). Wiley.
McDermott, R. (2011). Internal and external validity. Cambridge handbook of experimental political science, 27-40.
Newton, P., & Shaw, S. (2014). Validity in educational and psychological assessment. Sage.
Onraet, E., Van Hiel, A., Dhont, K., & Pattyn, S. (2013). Internal and External Threat in Relationship With Right-Wing Attitudes. Journal of Personality, 81, 233-248. doi:10.1111/jopy.12011
Picardi, C. A., & Masick, K. D. (2013). Research methods: Designing and conducting research with a real-world focus. SAGE Publications.
Polit, D. F., & Beck, C. T. (2010). Essentials of nursing research: Appraising evidence for nursing practice. Lippincott Williams & Wilkins.
Rumrill, P. D., Cook, B. G., & Wiley, A. L. (2011). Research in special education: Designs, methods, and applications. Charles C Thomas Publisher.
Thanasegaran, G. (2009). Reliability and Validity Issues in Research. Integration & Dissemination, 4. Retrieved from https://aupc.info/wp-content/uploads/35-40-ganesh.pdf
Yu, C.-h., & Ohlund, B. (2010). Threats to validity of research design. Retrieved January, 12, 2012. Retrieved from https://www.creative-wisdom.com/teaching/WBI/threat.shtml