The short answer is, yes. But why and how much? As a psychology researcher, you will likely be asking questions and conducting studies on topics that directly impact society. As examples, mental health treatment plans, learning, memory, cognitive biases, child development, even eating. What if you do not plan on conducting psychology research? In that case, clinicians still need to read and interpret the latest findings. In the following sections, we’ll discuss the power of statistical techniques as tools to guide our theory along with realistic limitations (i.e. what we can and cannot conclude from our results). To make things more concrete, we’ll also go through a few examples of advanced statistical techniques, what they share and don’t share with more basic methods, as well as their honest limitations. As researchers, we are responsible for consciously reporting accurate and honest results. To do so, you need at least a basic (not oversimplified) understanding of what statistics and its construction can tell us.
Imagine that you’ve contrived a brilliant study design and collected a good amount of data. Now it’s time to make sense of all the information. It’ll likely look like a bunch of numbers and a few strings of sentences here and there. Though often underappreciated, statistics is what can help us tell the story of our data. However, we must not fall prey to the idea that statistics carves nature by its joints. Statistics is a powerful tool for us to summarize the patterns in our data. However, it [statistics] actually does not tell us anything about cause and effect. Only our theory and study design can allow for that type of inference. Lastly, no matter how sophisticated your statistical model, you cannot “prove” a theory.
Mediation Analysis is a common and fairly popular method to determine how X caused Z through Y [Mediating and Moderating Variables Explained]. For example, what if we’re interested in testing the idea that financial stability leads happiness, but through reducing stress. To statistically examine this relationship, we are essentially multiplying the regression coefficient going from financial stability to stress and the coefficient going from stress to happiness. Then, we’ll run an analysis to see if that is significant. However, [linear] regression coefficients are directly calculated from the correlation ( r ) between the two variables. In fact, one can show that the linear regression coefficient describing the linear relationship between two variables (X and Y) can be written as: . It is important to understand this fact in order to accurately interpret your results. The statistics used for experimental and observational studies are all the same. However, we have all heard the phrase, “You cannot infer causation from correlation.” Instead, our experimental design – comparing outcomes from when some critical factor is present and absent – is what allows for causal inference. Statistics, on its own, has no implications of cause. This is actually a huge debate in cognitive science and AI researchers in computer science and statistics – whether AI can ever candidly understand or model cause, without stringent human input. This leads into our next example.
Neural Networks are advanced statistical methods that allow us to examine the relationship between many variables. For this reason, neural networks are very popular in industry, where large corporations like Facebook utilize the large amount of data it collects from its users to tailor ads. When I had first heard about neural networks, I thought that we had cracked the code, that we can now know how everything causes everything else. That was naïve. It wasn’t long until I learned that the weights (connections) “learned” by the algorithm are again capturing the relationship (not necessarily causal) between the variables, similar to our simple linear regression models. Therefore, these powerful methods also cannot tell us anything about cause and effect.
Lastly, when can we “prove” a theory? Technically speaking, never. However, we can garner support and evidence for our theory. But then what does the p-value tell us? It simply tells us that the null hypothesis’s “prediction” is unlikely. For example, your theory might predict that meditation will lead to increased mindfulness (change in mindfulness score > 0). However, the null hypothesis would predict no change in mindfulness (change in mindfulness score = 0). Showing the implausibility of the null does not prove your theory. Instead, a p-value tells us that our data is unlikely under the null, and we therefore likely require another hypothesis to better “explain” the data. It is important to keep in mind that many alternative hypotheses can explain or predict the same results. So, take your findings with a grain of salt.
Up to this point, we’ve discussed what statistics cannot tell us in terms of cause and effect. As researchers, however, the cause and effect relationships are what we are interested in. This does not make statistics useless. Instead, we can use these tools to make disciplined arguments about what we theorize the causal relationship between variables to be. As a researcher, you may have a theory in mind about how certain variables are causally related in the real world. If you are lucky, you may be able to pull off an ethical experiment. Your theory should make predictions about what may happen (effect) when a critical (causal) variable is manipulated. Upon opening your data, you will see that reporting your raw data is not helpful or informative to readers. Instead, we need a way to quantify the relationship between everything in our dataset. Statistics is a powerful way to go about that task. Along with quantifying the relationship between variables, it allows us to quantify the probability of spurious results (p-values) [since we were only able to collect information from a small sample of some larger population]. Therefore, its results serve to undergird our theories.
Data analysis is also an art. There will often be numerous ways to go about summarizing and describing patterns in your data. However, there are often more sensible methods depending on your research question and design. It is also possible to accidentally select the wrong model (if your data fails to meet certain assumptions). It is important to be mindful of the fact that statistical methods are based in advanced mathematics (that you don’t need to know directly), and its results may be based on very stringent assumptions about the data. For example, in a number of cases (e.g. simple linear regression), the model is only able to accurately estimate the likelihood of our data under some hypothetical distribution if we assume all our observations are statistically independent and random (also called independent, identically distributed, iid). This follows a fundamental law of probability: A is independent from B if Pr(A|B) = P(A), and thus Pr(A and B) = P(A) * P(B). As a user of these methods, the models (tools), will often come with instruction manuals that tell you what assumptions to check for. Therefore, it is very important to know that these assumptions (instructions) are there for a reason. If not followed, the model may not work as expected. Which means that the results from your very important study may be difficult to interpret and replicate. Even better, we can challenge ourselves to gain a basic and accurate understanding of the rationale.
To wrap things up, we’ve discussed statistics limitations (inability to model cause and effect) as well as its virtues (ability to rigorously quantify chance and interesting relationships). It is important that we challenge ourselves to accurately understand our results/ how we got to the results. So, the next time you are in a statistics course, don’t overlook the statistical theory motivating the models and tools we use. It may be challenging, but challenge is part of the learning process. Lastly, take advantage of cutting-edge models to help you answer some of your involved queries. Data is getting more complicated by the day, and we need to keep up with it.