Hierarchy of Scientific Evidence
The scientific method is arguably the most valuable instrument for knowledge that we have ever invented. Nonetheless, it is not perfect as bad research does sometimes get published and the volumes of evidence generated are not created equally as study designs vary; that is, certain designs are more robust than others. As a result, you may find yourself in a situation where you have two studies with sufficient design that reach completely different conclusions. In this situation, to resolve this ostensible paradox, it is important to examine the experimental design and consider the strength of each. In other words, where do each of these studies fall in the hierarchy of scientific evidence.
In order to assist you in better understanding this hierarchy, below is a description of the more common study designs starting with the weakest and finishing with the most robust. However, before beginning the journey through the evidence hierarchy, first we must address what doesn't constitute scientific evidence.
Note, this hierarchy is mostly relevant to issues of human health (i.e., pharmaceutical efficacy, disease prevalence, etc.). However, many other scientific disciplines do use similar methodologies. Further, the hierarchy presented below is a guideline and not an absolute rule. There are going to be nuanced differences in the ranking depending on the source.
NOT Scientific Evidence
Below is a list of sources that do NOT qualify as scientific evidence:
Vani Hari (a.k.a., Food Babe)
“Mother know's best” (a.k.a., parental instincts)
Note, this list is not exhaustive and there are many more sources of non-scientific evidence out there, so you're going to have to use your best judgment when you encounter a new information source. Moreover, the individuals and websites listed here are cesspools of misinformation lauded as credible because “the science” supports their position; nothing could be further from the truth. These sources are anathema to the scientific community as they often avow positions that are antithetical to the scientific consensus and are an insult to science in general (i.e., their positions are not even scientifically plausible).
If you encounter an argument where these sources of information are used in the premise(s), the argument should automatically be considered bad and rejected. However, it is important to stay vigilant when implementing this strategy so as not to fall victim to the genetic fallacy. Implement the principle of charity and ask the other interlocutor to provide the references (i.e., the scientific evidence). Reformulate the argument using those references and then proceed with the discourse from there. That said, if the other interlocutor cannot provide further evidence, you can either end the discussion there (not very charitable) or table the discussion for a later time until adequate evidence is presented (more charitable).
*In general, YouTube is NOT considered scientific evidence. However, there are extenuating circumstances where the video may be of a researcher or science communicator who is communicating scientific evidence. In this case, it is your responsibility to validate the references to ensure the authenticity of the communicated results. Further, YouTube videos can be very useful for instructive purposes if the information can be verified as credible; candidly, I often use YouTube videos myself for instructive purposes. That said, please be cautious when choosing to use YouTube videos as scientific evidence.
Anecdotal & Expert Opinions
An anecdote is a person's own personal experience with a lifestyle change, supplement, therapy, etc., which isn't necessarily representative of the average experience or what more authoritative studies conclude. Furthermore, an expert's opinion, whether it is provided on a blog, news article, etc., shouldn't be conflated with actual scientific evidence. It is important to note that the expert may be explaining the current body of research and giving his/her opinion on potential research directions, but this still doesn't qualify as evidence. It is perfectly acceptable for expert opinion to be used to guide future research, but under no circumstances should it be recognized by anyone as scientific evidence. Again, both anecdotes and expert opinion are the weakest forms of evidence to the point where they are NOT considered scientific evidence. Any argument structured with an anecdote as evidence is bad and should be rejected.
A case report focuses on a single patient in medicine and documents the side effects, disease progress, symptomatology, diagnosis etc. Likewise, they may contain a demographic profile of the patient as well, but they usually just describes a novel or prominent occurrence. Essentially, these are written down anecdotes. Although low on the hierarchy of evidence, case reports can be a useful starting point for further investigation. That said, they are a simply a documented observation that must be investigated further using more rigorous methodology before any compelling conclusion can be reached. Under no circumstances should you be structuring an argument containing a case report as evidence. Such an argument would be considered bad.
Research on animals is a useful strategy in science as it can predict effects also seen in humans. However, since the physiology of a human will be different from that of other animals, the effects of the drug, food, etc. may differ. As such, it's paramount that human trials are subsequently conducted in order to ensure that the observed effect(s) also occurs in humans. This is why animal studies are placed near the bottom of the hierarchy.
Similar to case reports, the limited scope of this type of study is generally used as a starting point for future research. For example, within the pharmaceutical industry, the development of a new drug will generally go through animal studies first before entering human trials. Only when a drug shows safety and efficacy within animal trials will it then move forward to human trials. As with case reports, under no circumstances should you be structuring arguments using animal studies as your premise.
Note, animal studies allow us to format studies that would otherwise be unethical for human trials. For example, animal studies are often used in toxicity studies when attempting to establish thresholds for lethality or severe acute toxicity.
In Vitro Studies
The phrase In Vitro is Latin for “in glass” and is used to refer to studies conducted in a petri dish, test tube, etc. These studies use isolated cells, compounds, etc. instead of multi-cellular organisms. For example, if we wanted to investigate Alzheimer's disease and how it responded to a new pharmaceutical treatment, for an in vitro study, we would start by growing brain cells in a petri dish that were genetically manipulated to show the hallmarks of Alzheimer's disease (i.e., plaques and tangles). Once the petri dish had reached a critical threshold for the number of brain cells established by the experimenters, then the pharmaceutical could be introduced and the subsequent effects recorded.
The reason why this type of study isn't further up in the hierarchy is that it provides a very simplistic view that doesn't reflect how the drug, intervention, etc. will necessarily behave once introduced into a multi-cellular organism, which is vastly more complex. Returning to the example above, in practice, the pharmaceutical isn't directly exposed to effected brain cells. The patient would most likely ingest a pill that must be digested, absorbed in the intestines, travel through the blood to the brain, cross the blood-brain barrier (not easy to accomplish), and finally reach the affected cells. Along the way there are many other chemicals that the drug is going to encounter as the human body is incredibly complex with countless processes occurring all the time. For the drug to be effective, not only does it have to navigate this labyrinth to arrive at the brain cells, but it has to remain benign to all the other cells that it encounters along the way. Plus, it has to work once it gets there.
As you can see, the complexity gap between in vitro to multi-cellular life is considerable, which is why the results of these studies aren't taken more seriously. That said, similar to case and animal studies, in vitro studies provide a valuable first step towards galvanizing more powerful research. That said, due to its low placement in the hierarchy, any argument using an in vitro study as evidence should be considered bad.
These types of studies assess the prevalence of a particular trait, disease, etc. in a population at some moment in time. In general, these types of studies are observational only (i.e., data is collected without patient interference) and conducted via questionnaires or medical records. Moreover, these studies are simply looking for prevalence and correlation.
For example, suppose that I wanted to determine the prevalence of type II diabetes within a given population. Concurrently, I would collect data on lifestyle habits such as diet, exercise routines, etc. to see how these factors correlate with type II diabetes. However, there are problems with this approach from a point of robustness. First, if you're implementing the questionnaire strategy, there are no guarantees that the individuals are going to be telling the truth or can remember all the details surrounding your inquiries. Second, randomization is not present here, which aids in accounting for confounding variables (i.e., an extra variable that you didn't account for, which can ruin your entire experiment). Last and most salient, these studies cannot be used to establish cause and effect as they lack adequate rigor. Thus, they should really only be used as a starting point for future research.
As with the previous studies discussed thus far, we have yet to reach a point where these study types can be used responsibly as evidence in a good argument. Therefore, any argument that you find using this type of study as evidence is considered bad.
These are retrospective studies that involve two groups of subjects; one group will express a particular condition, symptom, etc. and the other will act as the control. The goal of the study is to track backwards in an attempt to establish a correlation between the particular condition or symptom and a possible cause. Note, as with cross-sectional studies, this type of study establishes correlation only, not causation.
Suppose that we again want to look at the prevalence of type II diabetes within a community and how it relates to lifestyle. To start, we would set up one group from the community that has type II diabetes (i.e., the outcome of interest) and another that doesn't (i.e., the control). Note, it is important to control confounding factors at this point otherwise the results of the study are equivocal. Next, we would begin to look at certain lifestyle factors (e.g., diet, exercise, etc.) within each group and compare how this factor impacts type II diabetes. Last, we then analyze the results and draw correlations.
Let's use sugar intake as an example. Here, after separating the community into the two groups (i.e., one with type II diabetes and the other without), we would then look at the daily intake of sugar within each group. We would expect to see higher daily intake levels for sugar within the group with type II diabetes if this was the cause of the diabetes. On the other hand, if the sugar is not the cause of the type II diabetes, we would expect to see sugar intake levels to be about the same across both groups. Once more, this type of study can establish correlation only and should be used to spur further investigation, not draw conclusions. That being said, any argument structured around this type of evidence is considered bad.
These types of studies are similar to case-control studies in that it involves two groups of subjects (one group expresses a particular symptom, condition, etc. and the other acts as the control) that are compared over time noting any differences between the two. However, unlike case-control studies that are conducted solely in a retrospective manner, cohort studies can be organized either prospectively or retrospectively.
Unlike a retrospective study, a prospective study is one where you select a group of individuals where no one has yet developed the symptom, condition, etc. and currently differ in their exposure to a potential cause of interest. Next, the group is followed for a given period of time to see if any develop the outcome of interest. Returning to the type II diabetes example again, we would select a group of individuals where no one had yet developed the disease. Then, we would follow them for a set amount of time to see if any eventually developed it; concomitantly, we would record the daily sugar intake for each individual as well. Note, as with case-control studies, this is an observational study only where the experimenter's role is solely that of data collection (i.e., you don't actively expose the group to the potential cause of interest).
We are finally at a point in the hierarchy where casual relationships can be established. This is possible due to the fact that you can actually follow the progress and observe whether or not the potential cause preceded the outcome. Of course, confounding factors must still be accounted for, but, when this is done properly, causation can be established. Thus, when observed in an argument, we are now in the regime of good.
Randomized Controlled Trial
The “gold standard” of scientific research where subjects are randomly assigned to the test group (i.e., the group that receives treatment) or a control group (i.e., the group that receives a placebo). When the study is “blind,” the subjects are unaware of which group they are in. When the study is “double-blind,” the experimenters are also unaware of which subjects are in which group. All of this is done in an attempt to remove any human bias as this can adulterate the results.
Let's return to our weathered example and suppose that we want to test treatment X's effects on type II diabetes. First, we select a group of individuals who have type II diabetes and try to minimize the number of confounding variables in the process. That is, we select individuals who are as close to the same age as possible, the same sex, economic status, etc. All this is done in an attempt ensure that the experiment is communicating the results of treatment X on type II diabetes. We don't want other factors mixed in as this would tarnish the findings.
Next, we randomly select half the people and place them into the treatment group while the other half is placed into the control group. This randomization adds to the robustness of this experimental design as it eliminates bias from the test subjects (i.e., they all think that they're receiving treatment) and further aids in controlling confounding factors (i.e., the physical, economic, etc. characteristics of the individuals are spread randomly between both groups). Note, with this study design, it is imperative that the control group receive a placebo to ensure that the results are free from the influence of the placebo effect. Further, it is ideal if the experimenters are blinded as well as this eliminates researcher bias.
All of these meticulous design details contribute to the overall robustness of this type of study. Unlike previous discussed study designs, simply being allowed to select your study participants gives you control over confounding factors to a degree that's unparalleled by others. Moreover, the added randomization and double-blinding increases the design power to a point where other study designs can't compete. That said, all of these factors contribute to randomized controlled trials being the most powerful experimental design available. As such, good arguments are structured using this type of scientific evidence in support of the conclusion.
Systematic Review & Meta-Analyses
Systematic reviews and meta-analyses are the most powerful forms of scientific evidence available. However, it is important to note that they are actually not conducting any experiments themselves, but are reviewing and analyzing the literature available on a topic of interest. In particular, systematic reviews collate the results from a large number of studies – in general, these are randomized controlled trials - on a given topic in an attempt to distill the salient points into a single paper. A meta-analyses goes further and actually collates the data sets from all the papers and conducts its own statistical analysis.
One of the reasons why this is considered to be the best evidence available is that it is using a large number of studies to draw conclusions. One of the traps that is important to avoid when analyzing the scientific evidence available on a given topic is cherry picking. That is, we should be very skeptical of a single study on a given issue and avoid making grandiose claims or using it to support our arguments before more research is available. Why? Well, bad studies can slip through the cracks and get published sometimes (e.g., Wakefield's paper claiming a causal link between the MMR vaccine and autism). To avoid this pitfall, we need to think holistically and observe this one study against a background of all the literature available on the topic. Both systematic reviews and meta-analyses achieve this goal for us.
Let's once again return to the example of a treatment X on type II diabetes. Suppose that there are 100 papers available in the literature investigating its efficacy. Now, of those 100 studies, only 5 show efficacy while the other 95 studies fail to meet statistical thresholds for significance (i.e., they show no link between treatment X and an improvement in type II diabetes). Humans have a tendency towards irrationality and those who really want treatment X to be the miracle drug for type II diabetes will latch onto those 5 studies devoutly. Now, enter systematic review and meta-analyses. The systematic review steps in and allows us to view those 5 studies in the broader context (i.e., there are just 5 studies out of 100 supporting efficacy), while the meta-analyses, going a step further, will collate all of the data from the 100 studies and then run an independent statistical analysis.
It is important to note that these papers should be listing their inclusion/exclusion criterion and that it is prudent to carefully evaluate them. For example, again suppose there are 100 papers available of a treatment X on type II diabetes. Now, we wish to structure a systematic review with exclusion criteria that allows us to include only 80 out of the 100 where the 5 studies showing efficacy are still included. Now, instead of having 5 studies out of 100 showing efficacy (i.e., 5%), we have 5 out of 80 (i.e., 6.25%). Thus, just due to how we structured our review, we were able to increase the overall statistical significance for efficacy of treatment X. As you can see, inclusion/exclusion criterion is important and worth carefully evaluating.
The overarching goal of this criterion is to increase the robustness of the results. That is, when implemented correctly, we should have even stronger faith in the results of the study. However, this is also an opportunity for researcher bias to creep in. If the researcher(s) designing the study are biased one way or another, their selection of inclusion/exclusion criterion can muddle the study outcome, which is why we must be vigilant about this. What is more, even after reviewing the inclusion/exclusion criteria for the review/analyses, we should make sure to evaluate the papers that were eliminated to see how they fit into the overall picture (e.g., was their exclusion justified? How do the results of this study compare to the studies included? Etc.).
Last, as systematic reviews and meta-analyses find themselves positioned at the top of the hierarchy, feel free to use these as evidence in any argument.
How do we choose the study design? Why don't we just use randomized controlled trials or cohort studies all the time (i.e., the studies that can actually establish causation)? As with many things in life, you can't take a “one-size-fits-all” approach when it comes to scientific inquiry.
For example, in the case of randomized controlled trials, they tend to be very costly and time consuming, so a research group may be faced with the hard decision of having to choose a different study design because of fiscal and/or temporal limitations. Moreover, in medicine, medical records are readily available, inexpensive, and a very valuable resource of information, so it's reasonable to try and gain as much information from them first before exploring other options. Further, if the outcome of interest is extremely rare (e.g., 1 out of 50,000), the sample size needed to observe the effect is beyond the scope of resources. In other words, the amount of individuals required for the study in order to observe the outcome of interest is too large to be pragmatic. Hence, a different study design must be chosen in order to accommodate such a large sample size.
Scientific evidence comes in a spectrum due to varying study designs. Because of this, the robustness of the resulting evidence finds itself structured into a hierarchy. It is imperative that you are familiar with this hierarchy as the strength of your arguments structured around it depends on where the studies lie. You should only be structuring arguments using cohort studies, randomized controlled trials, systematic reviews, or meta-analyses as these are powerful enough to show causation. An argument using the other study designs (i.e., anecdotes, animal studies, in vitro studies, etc.) as the sole form(s) of evidence is considered bad. As a Critical Thinker, your goal is to not only better understand science, but to know how to properly use science to structure good arguments as the decisions that you use to direct your life start with the arguments that you tell yourself. Good arguments lead to good decisions, which lead to better life outcomes.