An official website of the United States government
Official websites use .gov A .gov website belongs to an official government organization in the United States.
Secure .gov websites use HTTPS A lock ( Lock Locked padlock icon ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.
- Publications
- Account settings
- Advanced Search
- Journal List
Primer on binary logistic regression
Jenine k harris.
- Author information
- Article notes
- Copyright and License information
Correspondence to Dr Jenine K Harris; [email protected]
Corresponding author.
Collection date 2021.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/ .
Family medicine has traditionally prioritised patient care over research. However, recent recommendations to strengthen family medicine include calls to focus more on research including improving research methods used in the field. Binary logistic regression is one method frequently used in family medicine research to classify, explain or predict the values of some characteristic, behaviour or outcome. The binary logistic regression model relies on assumptions including independent observations, no perfect multicollinearity and linearity. The model produces ORs, which suggest increased, decreased or no change in odds of being in one category of the outcome with an increase in the value of the predictor. Model significance quantifies whether the model is better than the baseline value (ie, the percentage of people with the outcome) at explaining or predicting whether the observed cases in the data set have the outcome. One model fit measure is the count- R 2 , which is the percentage of observations where the model correctly predicted the outcome variable value. Related to the count- R 2 are model sensitivity—the percentage of those with the outcome who were correctly predicted to have the outcome—and specificity—the percentage of those without the outcome who were correctly predicted to not have the outcome. Complete model reporting for binary logistic regression includes descriptive statistics, a statement on whether assumptions were checked and met, ORs and CIs for each predictor, overall model significance and overall model fit.
Keywords: education, epidemiology, public health
Introduction
From its inception, the field of family medicine has prioritised patient care over research. 1 However, research has an important place in family medicine to improve quality, responsiveness and innovation in patient care. 2 As a result, there have been numerous calls in recent years 3 for family and community medicine practitioners around the world 4 5 to become more involved in research. 6 Among the recommendations for improving family medicine research is strengthening the use of appropriate research methods. 6
Binary logistic regression is one method that is particularly appropriate for analysing survey data in the widely used cross-sectional and case–control research designs. 7–9 In the Family Medicine and Community Health (FMCH) journal, 35 out of the 142 (24.6%) peer-reviewed published original research papers between 2013 and 2020 reported using binary logistic regression as one of the analytical methods. Given the high percentage of FMCH publications that include binary logistic regression, understanding this method is important for FMCH authors and reviewers.
The binary logistic regression model is part of a family of statistical models called generalised linear models. The main characteristic that differentiates binary logistic regression from other generalised linear models is the type of dependent (or outcome) variable. 10 A dependent variable in a binary logistic regression has two levels. For example, a variable that records whether or not someone has ever been diagnosed with a health condition like lung cancer could be measured in two categories, yes and no. Likewise, someone might have coronary heart disease or not, be physically active or not, be a current smoker or not, or have any one of thousands of diagnoses or personal behaviours and characteristics that are of interest in family medicine.
In addition to a binary dependent variable, a binary logistic regression has at least one independent variable that is used to explain or predict values of the dependent variable. For the example of lung cancer diagnosis, some logical independent variables could be age or smoking status. People who are smokers have higher odds of lung cancer, as do people who are older. Unlike the dependent variable, independent variables are not limited to be binary and can have two or more categories or be continuous.
There are many ways to identify and select variables that are important to include in a logistic regression model and researchers should carefully consider which variables to include. Some suggested strategies for variable identification and selection in logistic regression are included in a 2019 paper by Shipe et al 11 and other strategies for selecting variables are included in a 2018 paper by Heinze et al . 12 For those researchers new to logistic regression, collaboration with experienced researchers or methodologists is recommended. 6
The following sections are a step-by-step demonstration of how to conduct and interpret a binary logistic regression model. The analyses in this paper were conducted in R V.4.1.1 13 using the following packages: tidyverse, 14 odds.n.ends, 15 car, 16 finalfit, 17 knitr 18 and table 1. 19 The statistical code for reproducing the results or for adapting the code to use to conduct analysis on other data is available at this URL: https://github.com/jenineharris/logistic-regression-tutorial
Step 1: exploratory data analysis
Before a binary logistic regression model is estimated, it is important to conduct exploratory data analysis (EDA). EDA can include descriptive statistics and/or graphs. EDA serves multiple purposes, including: confirmation that the data were measured and labelled correctly, identification of potential problems with data distributions (eg, no cases in an important category), a preview of what model results might show, and information that can be used in reproducing statistical results. 20
As an example, consider a small data set with the survey responses of 32 long-term smokers. The data set includes three variables: lungCancer, yearsSmoke and bmi. The lungCancer variable is an indicator of whether the survey participant has ever been diagnosed with lung cancer; it has a value of 1 for yes and 0 for no. The years smoke variable is the number of years the survey participant has been a smoker, and the bmi variable is the category of body mass index (BMI) that the participant is in, which includes two categories: underweight or normal BMI and overweight or obese BMI. If the goal is to build a logistic regression model from these data where lung cancer diagnosis is the outcome variable and is predicted by years of smoking and BMI category, the first step would be to conduct EDA that first explores each variable and then explores the intersection of each predictor with the lung cancer outcome variable.
One way to explore each variable separately before modelling is to produce a table of descriptive statistics, choosing the most appropriate statistics for each variable type. Since years of smoking is closer to being continuous variable (rather than categorical), the best descriptive statistics would be either mean and SD or median and IQR. The way to choose between these two options is to determine whether the years of smoking data are normally distributed or not. Continuous variables that are relatively normally distributed are best described by mean and SD while those that are not normally distributed are more appropriately described by median and IQR.
The histogram ( figure 1 ) suggests that the variable is right skewed rather than normal, so median and IQR would be a more appropriate choice for descriptive stats for the years smoke variable. The other two variables, bmi and lungCancer are both categorical, so the most appropriate descriptive statistics are percentages and frequencies. Table 1 shows an example of a useful data exploration prior to binary logistic regression modelling.
Histogram showing distribution of years smoking for a sample of 32 smokers.
Example table showing characteristics of people in a small data set (n=32)
Table 1 shows that fewer than half of participants had ever been diagnosed with lung cancer, about 40% are overweight, and the median number of years smoking is just over 19. At this point, if something in the descriptive statistics seemed inconsistent with what you know about the sampling or the measurement, you could review the data and any data management steps to ensure everything was correctly recorded and labelled. Once satisfied with the univariate descriptive statistics, the next step might be computing descriptive statistics by outcome group. This step provides some insight into what the statistical modelling might find.
It is clear from table 2 that the people in the data who were diagnosed with lung cancer were smokers for a higher median number of years. It also appears that the distribution of people across BMI groups is different for the lung cancer and no lung cancer groups. For those without lung cancer, a higher percentage were in the underweight or normal BMI group and fewer in the overweight or obese BMI group compared with people being evenly split into these two BMI groups for those participants with lung cancer.
Example of a stratified table showing characteristics of people by lung cancer status in a small data set (n=32)
Tables 1 and 2 provide a few pieces of information useful for the regression modelling. First, the data seem to be cleaned and appropriately labelled. Second, the data suggest that model could show that the odds of lung cancer is higher with more years of smoking. The model might also find higher odds of lung cancer in those who are overweight or obese compared with underweight or normal BMI, but this is less clear from the descriptive analyses. If the model results are very different from what the descriptive statistics suggest will happen, it is worth taking the time for further exploration of the data to ensure there are no mistakes in recording and managing it correctly and the model settings are as expected.
Step 2: check binary logistic regression assumptions
Statistical models like binary logistic regression are developed with certain underlying assumptions about the data. Assumptions are features of the data that are required for the model to work as expected and, when one or more assumptions are not met, the model may produce misleading results. For example, consider the mean as a basic statistical model. The mean is one way to explain where the middle is in a set of continuous numbers. For the mean to work as intended and produce a value that is in the middle, the numbers are assumed to follow a normal distribution. If the numbers are skewed to the right, like the years of smoking variable in figure 1 , the calculated mean will be higher than the centre of the data. If the numbers are skewed to the left, the calculated mean will be lower than the centre of the data. With non-normal data, a different model, like the median, is likely to be a more accurate measure of central tendency.
Binary logistic regression relies on three underlying assumptions to be true:
The observations must be independent.
There must be no perfect multicollinearity among independent variables.
Continuous predictors are linearly related to a transformed version of the outcome (linearity).
Before conducting a logistic regression analysis, check these three assumptions. The model must meet all assumptions to be reported as unbiased and generalisable outside the sample.
Checking the assumptions
The independence of observations assumption requires that each of the observations in a data set is unrelated to the other observations in the data set. There are at least two different ways that data commonly fail this assumption. The first way is that a data set includes multiple observations from the same person (or mouse, or organisation, or whatever the type of observation is). The second way is where data include some sort of grouping like multiple family members who live in the same residence, multiple people from the same class in a school, or several people who live close together in the same neighbourhood. When people are in the same family, class or neighbourhood, they are more likely to share characteristics, which can limit the amount of variability in the data and introduce bias into the results. Checking this assumption requires knowing how the data were collected to ensure that the observations are unrelated.
The no perfect multicollinearity assumption requires that the independent variables are not perfectly correlated with each other. Variables that are highly, or perfectly, correlated with each other are statistically measuring the same thing (or similar things) and so are essentially redundant. Including variables in a model that are redundant can result in unstable model results. Correlation coefficients are often used to check for correlation among independent variables; two variables that are correlated at r=0.7 or higher share 49% or more variance and are considered somewhat redundant and problematic to include together in a single model as separate independent predictors.
There are several ways of checking the no perfect multicollinearity assumption. One that is commonly used is the Variance Inflation Factor or VIF. The VIF score for a variable quantifies how well that variable is explained by the other variables in the model. For binary logistic regression, the VIF score is generalised (GVIF) and takes on larger values. 21 To use the GVIF in a similar way as the VIF, a new value is often computed: G V I F 1 2 * D f . Although there does not seem to be consensus on a cut-off value for the G V I F 1 2 * D f , one commonly used cut-off for the G V I F 1 2 * D f is two. If this is used, variables with a G V I F 1 2 * D f value of two or higher might be considered problematic while those with G V I F 1 2 * D f less than two do not have any multicollinearity problems. In R, the vif() function in the car package prints the G V I F 1 2 * D f for logistic regression models. The output gives the value for each variable, like this:
## yearsSmoke bmi
## 1.783835 1.783835
The two G V I F 1 2 * D f values are below two and so are not problematic. For this model, the no perfect multicollinearity assumption is met.
The linearity assumption requires that continuous independent variables, or predictors, have a linear relationship with the log-odds of the predicted probabilities for the outcome. Linear relationships are relationships that seem to follow a relatively straight line. One way to check this relationship is to create a scatterplot with the continuous predictor on the x-axis and the log-odds of the predicted probabilities on the y-axis. Add a loess curve and a line representing a linear relationship between the two variables to the scatterplot. The loess curve shows the relationship between the predictor and the transformed outcome in a more nuanced way, while the fitted line shows what the relationship between the two would be if it were linear. If the loess curve and the fitted line are approximately the same, the linearity assumption is met. If the loess curve deviates from the line, the linearity assumption fails.
The loess curve is very close to the linear relationship so the linearity assumption appears to be met ( figure 2 ). Assuming that these data were collected using an acceptable sampling frame without related observations (independence of observations assumption), the data meet the assumptions to report the model as unbiased.
Checking the linearity assumption graphically.
Step 3: estimate the binary logistic regression model
The dependent variable for binary logistic regression is a categorical variable with two categories (denoted as y in equation 1 ). In the statistical model it is transformed using the logit transformation into a probability ranging from 0 to 1 ( equation 1 ).
Equation 1 . A statistical form of the binary logistic regression model.
In equation 1 , the p(y) stands for the probability of one category (often the presence of a behaviour or condition) of the dependent variable y , the b are coefficients of the independent variables or predictors, and the x are the independent variables. Those who are familiar with linear regression might notice that the statistical form of the linear regression model is inside the parentheses of the exponent of e in the denominator of the right-hand side of the equation.
Visualising the logistic function can help to clarify why this statistical form is useful for examining a binary outcome. Figure 3 shows the logistic function as the curve connecting the data points. Each data point is plotted with a value of the outcome along the y-axis. Because the outcome is binary with the two values of 0 and 1, the points are plotted at y=0 and y=1. The predictor variable is shown along the x-axis and appears to be continuous. Each data point takes a value of x which seems to range from about 10 to about 35. It is clear that the data points in the y=0 category of the outcome generally have lower values of x than the data points in the y=1 category. This pattern suggests that, as x increases, the probability of a person having the outcome value of y=1 also increases.
The logistic function with example data.
The grey logistic function line is the logistic regression model for these data. The line identifies the predicted probability of y=1 for each value of x. For example, if x=17, the predicted probability of y would be.18. This might be translated into into a percentage with a statement like, there is an 18% probability that someone with an x value of 17 would have a y value of 1. A more concrete example might be to think of the x value as years a person has smoked cigarettes daily and y as their probability for being diagnosed with lung cancer. So, a person who has smoked daily for 17 years has an 18% probability of being diagnosed with lung cancer. Please note that these data are not actual lung cancer data; this is just an example to assist in developing intuition around the logistic function meaning. If these data were years of smoking predicting lung cancer diagnosis, equation 1 might be rewritten as equation 2 :
Equation 2 . Applying the statistical form of the binary logistic regression model.
Step 4: compute ORs and report the results
While the predicted probabilities from the logistic function can be useful in measuring how well the model is predicting or explaining the outcome, the results of logistic regression are usually reported with ORs and CIs. Similar to the interpretation of a coefficient in linear regression, ORs quantify the change in the odds of having the outcome (ie, the odds that an observation has the value of 1 for the outcome variable) with a one-unit change in the predictor. Odds are computed using probabilities ( equation 3 ).
Equation 3 . Computing odds from probabilities.
Because the logistic function is used to compute probabilities (see figure 1 ), add the logistic model from equation 1 into equation 3 to get equation 4 showing how odds are computed for a logistic regression model.
Equation 4 . Computing odds from a logistic regression model.
Once the b 0 and b 1 are estimated using a statistical software package like SAS, R or SPSS, these values can be substituted into the simplified version of equation 4 to compute odds. This is not the final step, however, since odds and ORs are different. An OR is a ratio of two odds and is computed by dividing the odds of the outcome at one value of a predictor by the odds of the outcome at the previous value. So, for example, to compute the OR for lung cancer in our previous example, divide the odds of someone who has smoked for 15 years by the odds for someone who has smoked for 14 years. The result will be the increased or decreased odds of lung cancer with every 1 year increase in age. Equation 5 shows the statistical form of this computation.
Equation 5 . Using odds to compute ORs from a logistic regression model.
As an example, consider the output from R showing the estimates for the regression model used in figure 2 .
## glm(formula=lungCancer ~ (yearsSmoke), family=binomial(“logit”),
## data=lungCancerData)
## Deviance Residuals:
## Min 1Q Median 3Q Max
## −2.2127–0.5121 −0.2276 0.6402 1.6980
## Coefficients:
## Estimate SE z value Pr(>|z|)
## (Intercept) −8.8331 3.1623 –2.793 0.00522 **
## yearsSmoke 0.4304 0.1584 2.717 0.00659 **
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Dispersion parameter for binomial family taken to be 1)
## Null deviance: 43.860 on 31 degrees of freedom
## Residual deviance: 25.533 on 30 degrees of freedom
## AIC: 29.533
## Number of Fisher Scoring iterations: 6
The coefficient for years smoking is 0.4304. Substitute this value into equation 5 , O R = e . 4304 , to get an OR of 1.54. So, for every 1 year increase in time spent as a smoker, the odds of lung cancer for a participant in our sample are approximately 1.54 times higher. While the OR is useful to understand the direction and magnitude of the relationship between a predictor and the outcome, more information is needed to understand whether the OR for the sample suggests a relationship in the population that the sample came from. To understand this, a 95% CI is typically computed and reported with each OR.
A 95% CI for an OR shows the range of values where the true population value of the OR likely lies. That is, if 100 samples were selected from the population and a 95% CI were computed using the data from each sample, 95 of those CIs would contain the true value of the OR (given appropriate research practices). Most statistical software packages compute 95% CIs with ORs as part of the logistic regression output. For example, the lung cancer model output might look like this:
## (Intercept) 0.0001458295 0.00000005391304 0.02024744
## yearsSmoke 1.5378933421 1.20314425841351 2.28063266
This output includes the 1.54 OR for years of smoking along with the 95% CI 1.2 to 2.28. So, the odds of lung cancer increase by approximately 1.54 times for every year longer a participants smokes and, in the population that this sample came from, the true OR likely lies between 1.20 and 2.28. Because the range of the 95% CI does not include 1, this indicates that the OR is statistically significantly different from 1. If the CI had included 1, the OR would not be statistically significantly different from 1. An OR of 1 indicates that there is no difference in odds. So, for example, someone with 14 years of smoking would have no higher nor lower odds of lung cancer than someone with 15 years of smoking if the 95% CI for the OR included 1.
A logistic regression model with a single predictor in it produces unadjusted ORs demonstrating the relationship between the predictor and the outcome without taking into account other independent predictors or confounding variables. Reporting the unadjusted ORs for the main predictor or predictors of interest may contribute to understanding how covariates influence the relationship between the predictor and outcome. 22 23
Logistic regression models can also include categorical predictors. For example, adding a BMI variable with two categories, underweight or normal BMI and overweight or obese BMI, to the lung cancer model results in the following output:
## (Intercept) 0.000003035556 0.00000000001397127 0.003054554
## yearsSmoke 1.975695580376 1.35547578843285321 3.860461189
## bmiOverwei 0.049426238402 0.00109959322745114 0.739726723
Both of the CIs indicate that the association between the predictor and lung cancer is statistically significant. For every additional year of smoking, the odds of lung cancer are approximately 1.98 times higher (95% CI 1.36 to 3.86). Compared with people in the under or normal weight BMI group, those who are classified as having an overweight or obese BMI have approximately 0.05 times the odds of having lung cancer (95% CI 0.001 to 0.74). When an OR is less than one, another way to report the OR is to subtract the value from one and report the result as a percent decrease in odds, like this: Compared with people in the under or normal weight BMI group, those who are classified as having an overweight or obese BMI have approximately 95.06% lower odds of having lung cancer (95% CI 0.001 to 0.74). Remember that the data shown here are for demonstration purposes only and these model results should not be taken as true relationships between the predictors and lung cancer.
Model significance and model fit
In addition to reporting the results of assumption checking and the ORs and CIs, the model significance and model fit are useful tools to understand how well your model is reflecting what was observed in the data. First, model significance determines if your model explains the data better than the baseline percentage of people with the outcome would explain the data. Model significance is determined by a χ 2 statistic that is computed by comparing a null model that has no predictors in it (and thus is the percentage of people with the outcome of interest) to the model with predictors in it. The χ 2 statistic is computed by taking the probability of the outcome and subtracting the value of the outcome for each participant. So, with the lung cancer data, the percentage of people who have lung cancer is 43.75%, so the predicted probability for each person in the data set to have lung cancer would be 0.4375. This value is subtracted from each person’s actual value for the outcome (0 or 1) and the result is squared. All of these squared values are then added up into a value called Null Deviance. The Null Deviance quantifies how far the predicted probabilities from a model with no predictors (null model) were from the true values of the outcome. The same process is then repeated for the predicted probabilities from the model with predictors. This is the model deviance.
The difference between the null deviance and the model deviance follows a χ 2 distribution with the number of df being the number of coefficients in the model. If the χ 2 is statistically significant, this indicates that the model is doing a significantly better job at predicting the probability that someone has the outcome compared with just using the percentage of people with the outcome as a model. Most statistical software will provide the model χ 2 and its significance. For example, the R package odds.n.ends gives model significance like this:
## 23.214 2 <0.001
The model using BMI category and years of smoking to explain lung cancer status is statistically significantly better than the baseline at predicting lung cancer status [ χ 2 (2)=23.214; p<0.001].
While model significance suggests whether a model is better than the baseline percentage of people with the outcome, model fit metrics are useful for knowing how much better than the baseline a model is at predicting the values of the outcome. One way to understand model fit for binary logistic regression is to compute the percentage of observed values of the outcome that your model correctly predicted. The contingency table used here computes predicted probabilities based on the model and then classifies the probabilities using a cut-off of 0.5. So, any predicted probability of 0.5 or greater is classified as having the outcome and any predicted probability below 0.5 is classified as not having the outcome. With the lung cancer example, what percentage of people who had lung cancer were predicted to have lung cancer and what percentage of people without were predicted to be without. An examination of the contingency table, or the table showing observed and predicted values, can help understand how well the model did in explaining the observed data ( table 3 ).
Contingency table showing observed and predicted values of the outcome for the lung cancer model
The contingency table shows 15 people who did not have the outcome (observed=0) were correctly predicted to not have the outcome (predicted=0). Three people who did not have the outcome (observed=0) were incorrectly predicted to have the outcome (predicted=1). Ten people who had the outcome were correctly predicted, while four people who had the outcome were incorrectly predict. Altogether, 15+10 or 25 of the 32 observations had the outcome correctly predicted by the model for a per cent correctly predicted of 78.12%. So, for 78.12% of the people in the data set used to estimate the lung cancer model, the model then correctly predicted whether or not the participants had lung cancer.
The overall per cent correctly predicted gives a sense of how well the model did explaining or predicting the value of the outcome for all the participants. Sometimes it might be valuable to know how well the model did for those with the outcome or how well it did for those without the outcome. The term for how well a model predicts those with the outcome is sensitivity while specificity is how well the model predicts those without the outcome. In this case, 10 out of 14 of the people with lung cancer were correctly identified by the model for a sensitivity of 0.714 or 71.4%. The specificity of the model was higher, with 15 out of 18 (83.3%) of people without the outcome correctly predicted by the model.
The final model report should include:
Descriptive statistics on the outcome variable and each of the predictors.
Information on which assumptions were checked and whether they were met.
A statement about model significance.
A statement about model fit.
The model estimates including ORs and their 95% CIs.
An interpretation of the findings.
As an example, the lung cancer model shown here might be reported as follows:
We used binary logistic regression to examine whether years of smoking and BMI helped to explain lung cancer diagnosis in a sample of 32 people. The sample include 14 people with lung cancer and 18 without. The data met the binary logistic regression assumptions of independent observations, no perfect multicollinearity, and a linear relationship between the continuous predictor (years smoking) and the logit of the outcome. The model was statistically significantly better than the baseline at explaining lung cancer status [ χ 2 (2)=23.214; p<0.001] and correctly predicted the lung cancer status of 78.1% of participants include 71.4% of those with lung cancer and 83.3% of those without. Model estimates suggested that, for every additional year of smoking, the odds of lung cancer are approximately 1.98 times higher (95% CI 1.36 to 3.86). In addition, compared with people in the under or normal weight BMI group, those who are classified as having an overweight or obese BMI have approximately 0.05 times the odds of having lung cancer (95% CI 0.001 to 0.74).
ORs and CIs are often reported in tables for larger models, but for a model with just a few predictors, including the ORs and CIs in the text provides the same information and uses less space.
Researchers using logistic regression should note that logistic regression results, regardless of the size, direction or significance of the ORs, do not imply a causal relationship between the predictors and the outcome. 24 Also, while this tutorial describes the basics of conducting and reporting a logistic regression analysis, there are many more details to know about these models and their appropriate uses. 7–9 25–27
Twitter: @jenineharris
Contributors: JKH is the guarantor of this work and conceptualised and developed all aspects of this paper.
Funding: The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests: None declared.
Provenance and peer review: Not commissioned; externally peer reviewed.
Data availability statement
Data are available in a public, open access repository at https://github.com/jenineharris/logistic-regression-tutorial .
Ethics statements
Patient consent for publication.
Not applicable.
Ethics approval
This study does not involve human participants.
- 1. Gotler RS. Unfinished business: the role of research in family medicine. Ann Fam Med 2019;17:70–6. 10.1370/afm.2323 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 2. Ravi T, Cruz I, Ali F, et al. Outcomes of a scholarly activity curriculum for family medicine residents. Fam Med 2021;53:285–8. 10.22454/FamMed.2021.812680 [ DOI ] [ PubMed ] [ Google Scholar ]
- 3. Jantsch AG. Pesquisa científica, atenção primária e medicina de família. Revista Brasileira de Medicina de Família e Comunidade 2020;15:2466. 10.5712/rbmfc15(42)2466 [ DOI ] [ Google Scholar ]
- 4. Ponka D, Coffman M, Fraser-Barclay KE, et al. Fostering global primary care research: a capacity-building approach. BMJ Glob Health 2020;5:e002470. 10.1136/bmjgh-2020-002470 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 5. Rosser WW, van Weel C. Research in family/general practice is essential for improving health globally. Ann Fam Med 2004;2(Suppl 2):S2–4. 10.1370/afm.145 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 6. Fontenelle LF, Dias Sarti T. Pesquisar para quê? Revista Brasileira de Medicina de Família e Comunidade 2020;15:2319–9. 10.5712/rbmfc15(42)2369 [ DOI ] [ Google Scholar ]
- 7. Lee J, Tan CS, Chia KS. A practical guide for multivariate analysis of dichotomous outcomes. Ann Acad Med Singap 2009;38:714–9. [ PubMed ] [ Google Scholar ]
- 8. Labrecque JA, Hunink MMG, Ikram MA, et al. Do case-control studies always estimate odds ratios? Am J Epidemiol 2021;190:318–21. 10.1093/aje/kwaa167 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 9. Barros AJD, Hirakata VN. Alternatives for logistic regression in cross-sectional studies: an empirical comparison of models that directly estimate the prevalence ratio. BMC Med Res Methodol 2003;3:21. 10.1186/1471-2288-3-21 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 10. Harris JK. Statistics with R: solving problems using real-world data. SAGE Publications, 2020. [ Google Scholar ]
- 11. Shipe ME, Deppen SA, Farjah F, et al. Developing prediction models for clinical use using logistic regression: an overview. J Thorac Dis 2019;11:S574–84. 10.21037/jtd.2019.01.25 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 12. Heinze G, Wallisch C, Dunkler D. Variable selection - A review and recommendations for the practicing statistician. Biom J 2018;60:431–49. 10.1002/bimj.201700067 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 13. R Core Team . R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2021. https://www.R-project.org/ [ Google Scholar ]
- 14. Wickham H, Averick M, Bryan J, et al. Welcome to the Tidyverse. J Open Source Softw 2019;4:1686. 10.21105/joss.01686 [ DOI ] [ Google Scholar ]
- 15. Harris J. Odds.n.ends: odds ratios, contingency table, and model significance from a generalized linear model object 2021.
- 16. Fox J, Weisberg S. An R companion to applied regression. 3rd edn. Thousand Oaks CA: Sage, 2019. https://socialsciences.mcmaster.ca/jfox/Books/Companion/ [ Google Scholar ]
- 17. Harrison E, Drake T, Ots R. Finalfit: quickly create elegant regression results tables and plots when modelling, 2021. Available: https://CRAN.R-project.org/package=finalfit
- 18. Xie Y. Knitr: A general-purpose package for dynamic report generation in r., 2021. Available: https://yihui.org/knitr/
- 19. Rich B. table1: tables of descriptive statistics in HTML, 2021. Available: https://CRAN.R-project.org/package=table1
- 20. Harris JK, Johnson KJ, Carothers BJ, et al. Use of reproducible research practices in public health: a survey of public health analysts. PLoS One 2018;13:e0202447. 10.1371/journal.pone.0202447 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 21. Fox J, Monette G. Generalized collinearity diagnostics. J Am Stat Assoc 1992;87:178–83. 10.1080/01621459.1992.10475190 [ DOI ] [ Google Scholar ]
- 22. LaValley MP. Logistic regression. Circulation 2008;117:2395–9. 10.1161/CIRCULATIONAHA.106.682658 [ DOI ] [ PubMed ] [ Google Scholar ]
- 23. Norton EC, Dowd BE, Maciejewski ML. Odds Ratios-Current best practice and use. JAMA 2018;320:84. 10.1001/jama.2018.6971 [ DOI ] [ PubMed ] [ Google Scholar ]
- 24. Reichenheim ME, Coutinho ESF. Measures and models for causal inference in cross-sectional studies: arguments for the appropriateness of the prevalence odds ratio and related logistic regression. BMC Med Res Methodol 2010;10:66. 10.1186/1471-2288-10-66 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 25. Peng C-YJ, Lee KL, Ingersoll GM. An introduction to logistic regression analysis and reporting. J Educ Res 2002;96:3–14. 10.1080/00220670209598786 [ DOI ] [ Google Scholar ]
- 26. Ranganathan P, Pramesh CS, Aggarwal R. Common pitfalls in statistical analysis: logistic regression. Perspect Clin Res 2017;8:148. 10.4103/picr.PICR_87_17 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- 27. Connelly L. Logistic regression. Medsurg Nurs 2020;29:353–4. [ Google Scholar ]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
- View on publisher site
- PDF (460.9 KB)
- Collections
Similar articles
Cited by other articles, links to ncbi databases.
- Download .nbib .nbib
- Format: AMA APA MLA NLM
Add to Collections
- Open access
- Published: 28 December 2018
A logistic regression investigation of the relationship between the Learning Assistant model and failure rates in introductory STEM courses
- Jessica L. Alzen ORCID: orcid.org/0000-0002-1706-2975 1 ,
- Laurie S. Langdon 1 &
- Valerie K. Otero 1
International Journal of STEM Education volume 5 , Article number: 56 ( 2018 ) Cite this article
31k Accesses
53 Citations
4 Altmetric
Metrics details
Large introductory STEM courses historically have high failure rates, and failing such courses often leads students to change majors or even drop out of college. Instructional innovations such as the Learning Assistant model can influence this trend by changing institutional norms. In collaboration with faculty who teach large-enrollment introductory STEM courses, undergraduate learning assistants (LAs) use research-based instructional strategies designed to encourage active student engagement and elicit student thinking. These instructional innovations help students master the types of skills necessary for college success such as critical thinking and defending ideas. In this study, we use logistic regression with pre-existing institutional data to investigate the relationship between exposure to LA support in large introductory STEM courses and general failure rates in these same and other introductory courses at University of Colorado Boulder.
Our results indicate that exposure to LA support in any STEM gateway course is associated with a 63% reduction in odds of failure for males and a 55% reduction in odds of failure for females in subsequent STEM gateway courses.
Conclusions
The LA program appears related to lower course failure rates in introductory STEM courses, but each department involved in this study implements the LA program in different ways. We hypothesize that these differences may influence student experiences in ways that are not apparent in the current analysis, but more work is necessary to support this hypothesis. Despite this potential limitation, we see that the LA program is consistently associated with lower failure rates in introductory STEM courses. These results extend the research base regarding the relationship between the LA program and positive student outcomes.
Science, technology, engineering, and mathematics (STEM) departments at institutes of higher education historically offer introductory courses that can serve up to 1000 students per semester. Introductory courses of this size, often referred to as “gateway courses,” are cost-effective due to the number of students able to receive instruction in each semester, but they often lend themselves to lecture as the primary method of instruction. Thus, there are few opportunities for substantive interaction between the instructor and students or among students (Matz et al., 2017 ; Talbot, Hartley, Marzetta, & Wee, 2015 ). Further, these courses typically have high failure rates (Webb, Stade, & Grover, 2014 ) and lead many students who begin as STEM majors to either switch majors or drop out of college without a degree (Crisp, Nora, & Taggart, 2009 ). In efforts to address these issues, STEM departments across the nation now implement active engagement strategies in their classes such as peer instruction and interactive student response systems (i.e., clicker questions) during large lecture meetings (Caldwell, 2007 ; Chan & Bauer, 2015 ; Mitchell, Ippolito, & Lewis, 2012 ; Wilson & Varma-Nelson, 2016 ). In addition to classroom-specific active engagement, interventions are programs designed to guide larger instructional innovations from an institution level, such as the Learning Assistant (LA) model.
The LA model was established at University of Colorado Boulder in 2001. The program represents an effort to change institutional values and practices through a low-stakes, bottom-up system of course assistance. The program supports faculty to facilitate increased learner-centered instruction in ways that are most valued by the individual faculty member. A key component of the LA model is undergraduate learning assistants (LAs). LAs are undergraduate students who, through guidance, encourage active engagement in classes. LAs facilitate discussions, help students manage course material, offer study tips, and motivate students. LAs also benefit as they develop content mastery, teaching, and leadership skills. LAs get a monthly stipend for working 10 h per week, and they also receive training in teaching and learning theories by enrolling in a math and science education seminar taught by discipline-based education researchers. In addition, LAs meet with faculty members once a week to develop deeper understanding of the content, share insights about how students are learning, and prepare for future class meetings (Otero, 2015 ).
LAs are not peer tutors and typically do not work one-on-one with students. They do not provide direct answers to questions or systematically work out problems with students. Instead, LAs facilitate discussion about conceptual problems among groups of students and they focus on eliciting student thinking and helping students make connections between concepts. This is typically done both in the larger lecture section of the course as well as smaller meetings after the weekly lectures, often referred to as recitation. LAs guide students in learning specific content, but also in developing and defending ideas—important skills for higher-order learning in general. The model for training LAs and the design of the LA program at large are aimed at making a difference in the ways students think and learn in college overall and not just in specific courses. That is, we expect exposure to the program to influence student success in college generally.
Prior research indicates a positive relationship between exposure to LAs and course learning outcomes in STEM courses (Pollock, 2009 ; Talbot et al., 2015 ). Other research suggests that modifying instruction to be more learner-centered helps to address high failure rates (Cracolice & Deming, 2001 ; Close, Mailloux-Huberdeau, Close, & Donnelly, 2018 ; Webb et al., 2014 ). This study seeks to further understand the relationship between the LA program and probability of student success. Specifically, we answer the following research question: How do failure rates in STEM gateway courses compare for students who do and do not receive LA support in any STEM gateway course? We investigate this question because, as a model for institutional change, we expect that LAs help students develop skills and dispositions necessary for success in college such as higher-order thinking skills, navigating course content, articulating and defending ideas, and feelings of self-efficacy. Since skills such as these extend beyond a single course, we investigate the extent to which students exposed to the LA program have lower failure rates in STEM gateway courses generally than students who are not exposed to the program.
Literature review
The LA model is not itself a research-based instructional strategy. Instead, it is a model of social and structural organization that induces and supports the adoption of existing (or creation of new) research-based instructional strategies that require increased teacher-student ratio. The LA program is at its core, a faculty development program. However, it does not push specific reforms or try to change faculty directly. Instead, the opt-in program offers resources and structures that lead to changes in values and practices among faculty, departments, students, and the institution (Close et al., 2018 ; Sewell, 1992 ). Faculty members write proposals to receive LAs (these proposals must involve course innovation using active engagement and student collaboration), students apply to be LAs, and departments match funding for their faculty’s requests for LAs. Thus, the LA program has become a valued part of the campus community.
The body of research that documents the relationship between student outcomes and the LA program is growing. Pollock ( 2006 ) provided evidence regarding the relationship between instructional innovation including LAs and course outcomes in introductory physics courses at University of Colorado Boulder by comparing three different introductory physics course models (outlined in Table 1 ).
Pollock provides two sources of evidence related to student outcomes regarding the relative effectiveness of these three course models. First, he discussed average normalized learning gains on the force and motion concept evaluation (FMCE; Thornton & Sokoloff, 1998 ) generally. The FMCE is a concept inventory commonly used in undergraduate physics education to provide information about student learning on the topics of force and motion. Normalized learning gains are calculated by finding the difference in average post-test and pre-test in a class and dividing that value by the difference between 100 and the average pre-test score. It is conceptualized as the amount the students learned divided by the amount they could have learned (Hake, 1998 ).
Prior research suggests that traditional instructional strategies yield an average normalized learning gain of about 15% and research-based instructional methods such as active engagement and collaborative learning yield on average about 63% average normalized learning gains (Thornton, Kuhl, Cummings, & Marx, 2009 ). The approach using the University of Washington Tutorials with LAs saw a normalized learning gain of 66% on the FMCE from pre-test to post-test. Average learning gains for the approach using Knight’s ( 2004 ) workbooks with TAs were about 59%, and average normalized learning gains for the traditional approach were about 45%. The average normalized learning gains for all three methods in Pollock’s study are much higher than what the literature would expect from traditional instruction, but the course model including LAs is aligned with what is expected from research-based instructional strategies. Second, Pollock further investigated the impact of the different course implementations on higher and lower achieving students on FMCE scores. To do this, he considered students with high pre-test scores (those with pre-test scores > 50%) and students with low pre-test scores (those with pre-test scores < 15%). For both groups of students, the course implementation that included recitation facilitated by trained TAs and LAs had the highest normalized learning gains as measured by the FMCE.
In a similar study at Florida International University, Goertzen et al. ( 2011 ) investigated the influence of instructional innovations through the LA program in introductory physics. As opposed to the University of Washington Tutorials in the Pollock ( 2006 ) study, the research-based curriculum materials used by Florida International University were Open Source Tutorials (Elby, Scherr, Goertzen, & Conlin, 2008 ) developed at University of Maryland, College Park. Goertzen et al. ( 2011 ) used the Force Concept Inventory (FCI; Hestenes, Wells, & Swackhamer, 1992 ) as the outcome of interest in their study. Despite the different curriculum from the Pollock ( 2006 ) context, Goertzen et al. found that those students exposed to the LA-supported courses had a 0.24 increase in mean raw gain in scores from pre-test to post-test while students in classes that did not include instructional innovations only saw raw gains of 0.16.
In an attempt to understand the broader relationship between the LA program and student outcomes, White et al. ( 2016 ) investigated the impacts of the LA model on student learning in physics across institutions. In their study, White et al. used paired pre-/post-tests from four concept inventories (FCI, FMCE, Brief Electricity and Magnetism Assessment [BEMA; Ding, Chabay, Sherwood, & Beichner, 2006 ], and Conceptual Survey of Electricity and Magnetism [CSEM]) at 17 different institutions. Researchers used data contributed to the Learning Assistant Alliance through their online assessment tool, Learning About STEM Student Outcomes Footnote 1 (LASSO). This platform allows institutions to administer several common concept inventories, with data securely stored on a central database to make investigation across institutions possible (Learning Assistant Alliance, 2018 ). In order to identify differences in learning gains for students who did and did not receive LA support, White et al. tested differences in course mean effect sizes between the two groups using a two-sample t test. Across all of the concept inventories, White et al. found average Cohen’s d effect sizes 1.4 times higher for LA-supported courses compared to courses that did not receive LA support.
The research about the LA model shows that students exposed to the model tend to have better outcomes than those in more traditional lecture-based learning environments. However, due to the design of the program and the goals of the LA model, there is a reason to expect that there are implications for more long-term outcomes. LAs are trained to help students develop skills such as developing and defending ideas, making connections between concepts, and solving conceptual problems. Prior research suggests that skills such as these develop higher-order thinking for students. Martin et al. ( 2007 ) compared learning outcomes and innovative problem-solving for biomedical engineering students in inquiry-based, active engagement and traditional lecture biotransport courses. They found that both groups reached similar learning gains but that the active engagement group showed greater improvement in innovative thinking abilities. In a similar study, Jensen and Lawson ( 2011 ) investigated achievement and reasoning gains for students in either inquiry-based, active engagement or lecture-based, didactic instruction in undergraduate biology. Results indicated that students in active engagement environments outperformed students in didactic environments on more cognitively demanding items, while the groups performed equally well on items requiring low levels of cognition. In addition, students in active engagement groups showed greater ability to transfer reasoning among contexts.
This research suggests that active engagement such as what is facilitated with the LA model may do more than help students gain knowledge in a particular discipline in a particular course. Over and above, active engagement helps learners grow in reasoning and transfer abilities generally. This increase in higher-order thinking may help students to develop skills that extend beyond the immediate course. However, there is only one study focused on the LA model that investigates long-term outcomes related to the program. Pollock ( 2009 ) investigated the potential long-term relationship between exposure to the LA program and conceptual understanding in physics. In this line of inquiry, Pollock compared BEMA assessment scores for those upper-division physics majors who did and did not receive LA support in their introductory Physics II course, the course in which electricity and magnetism is first covered. Pollock’s results indicate that those students who received LA support in Physics II had higher BEMA scores following upper-division physics courses than those students who did not receive LA support in Physics II. This research provides some evidence to the long-term relationship between exposure to the LA program and conceptual learning. In the current study, we continue this line of inquiry by investigating the relationship between receiving LA support in a gateway course and the potential relationship to course failure in subsequent gateway courses. This study also contributes to the literature on the LA program as no prior research attempts to examine the relationship between taking LA-supported courses and student outcomes while controlling for variables that may confound this relationship. This study thus represents an extension of the previous work regarding the LA model in terms of both the methodology and the outcome of interest.
Data for this study come from administrative records at University of Colorado Boulder. We focus on 16 cohorts of students who entered the university as full-time freshmen for the first time each fall semester from 2001 to 2016 and took Physics I/II, General Chemistry I/II, Calculus I/II (Math department), and/or Calculus I/II for Engineers (Applied Math department). The dataset includes information for 32,071 unique students, 23,074 of whom took at least one of the above courses with LA support. Student-level data includes information such as race/ethnicity, gender, first-generation status, and whether a student ever received financial aid. Additional variables include number of credits upon enrollment, high school grade point average (GPA), and admissions test scores. We translate SAT total scores to ACT Composite Scores using a concordance table provided by the College Board to have a common admissions test score for all students (College Board, 2016 ). We exclude students with no admissions test scores (about 6% of the sample). We also have data on the instructor of record for each course. The outcome of interest in this study is failing an introductory STEM course. We define failing as receiving either a D or an F or withdrawing from the course altogether after the university drop date (i.e., “DFW”).
An important consideration in creating the data set for this study is timing of receiving LA support relative to taking any STEM gateway course. The data begin with all students who took at least one of the courses included in this study. We keep all students who took all of their STEM LA courses either with or without LA support. We also include all students who received LA support in the very first STEM gateway course they took, regardless of if they had LA support in subsequent STEM gateway courses. We would exclude any student who took a STEM gateway course without LA support and then took another STEM gateway course in a subsequent semester with LA support.
This data limitation ensures that exposure to the LA program happened before or at the same time as the opportunity to fail any STEM gateway course. If it were the case that a student failed a STEM gateway course without LA support, say, in their first year and then took LA-supported courses in the second year, this student would be indicated as an LA student in the data, but the courses taken during the first year would not have been affected by the LA program. Students with experiences such as this would misrepresent the relationship between being exposed to the LA program and probability of course failure. Conveniently, there were not any students with this experience in the current dataset. In other words, for every student in our study who took more than one of the courses of interest, their first experience with any of the STEM gateway courses under consideration included LA support if there was ever exposure to the LA program. Although we did not have to exclude any students from our study for timing reasons, other institutions carrying out similar studies should carefully consider such cases when finalizing their data for analysis.
We provide Fig. 1 as a way for readers to gain a better understanding of the adoption of the LA program in each of the departments in this study. This figure also gives information regarding the number of students exposed to LAs or not in each department, course, and term in our study.
Course enrollment over time by LA exposure
Ideally, we would design a controlled experiment to estimate the causal effect of LA exposure on the probability of failing introductory STEM courses. To do this, we would need two groups of students: first, those who were exposed to LA support in a STEM gateway course, and second, a comparable group, on average, that significantly differed only in that they were not exposed to LA support in any STEM gateway course. However, many institutions do not begin their LA programs with such studies in mind, so the available data do not come from a controlled experiment. Instead, we must rely on historical institutional data that was not gathered for this type of study. Thus, this study not only contributes to the body of literature regarding the relationship between LA exposure and student outcomes, but it also serves as a model for other institutions with LA programs that would like to use historical institutional data for similar investigations.
Selection bias
The ways students are assigned to receive LA support in each of the departments represented in this study are not random, and the ways LAs are used in each department are not identical. These characteristics of pre-existing institutional data manifest themselves as issues related to selection bias within a study. For example, in the chemistry department, LA support was only offered in the “on semester” sections of chemistry from 2008 to 2013. “On semester” indicates General Chemistry I in the fall and General Chemistry II in the spring. Thus, there were few opportunities for those students who took the sequence in the “off semester,” or General Chemistry I in the spring and General Chemistry II in the fall to receive LA support in these courses during the span of time covered in this analysis. The most typical reasons why students take classes in the “off semester” are that they simply prioritize other courses more in the fall semester, so there is insufficient space to take General Chemistry I; they do not feel prepared for General Chemistry I in the fall and take a more introductory chemistry class first; or they fail General Chemistry I the first time in the fall and re-take General Chemistry I in the spring. This method of assignment to receiving LA support may overstate the relationship between receiving LA support and course failure in this department. That is, it might be the case that those students who received LA support were those who were more likely to pass introductory chemistry to begin with. Our analysis includes prior achievement variables (described below) to attempt to address these selection bias issues.
In chemistry, LAs attend the weekly lecture meetings and assist small groups of students during activities such as answering clicker questions. Instructors present questions designed to elicit student levels of conceptual understanding. The questions are presented to the students; they discuss the questions in groups and then respond using individual clickers based on their selection from one of several multiple-choice options. LAs help students think about and answer these questions in the large lecture meetings. In addition, every student enrolled in General Chemistry I and II is also enrolled in a recitation section. Recitations are smaller group meetings of approximately 20 students. In these recitation sections, LAs work with graduate TAs to facilitate small group activities related to the weekly lecture material. The materials for these recitation sections are created by the lead instructor for the course and are designed to help students investigate common areas of confusion related to the weekly material.
In the physics and math departments, the introductory courses went from no LA support in any section in any semester to all sections in all semesters receiving LA support. This historical issue affects selection bias in a different way than the off-semester chemistry sequence. One interpretation of decreased course failure rates could be that LA support caused the difference. However, we could not rule out the possibility that failure rates decreased due to other factors that also changed over time. It could be that the university implemented other student supports in addition to the LA model at the same time or that the types of students who enrolled in STEM courses changed. There is no way to determine conclusively which of these (or other) factors may have caused changes in failure rates. Thus, causal estimates of the effect of LA support on failure rates would be threatened by any historic changes that occurred. We have no way of knowing if we might over or underestimate the relationship between LA exposure and course failure rates due to the ways students were exposed (or not) to the LA program in these departments. In order to address this issue, we control for student cohort. This adjustment, described below, attempts to account for differences that might exist among cohorts of students that might be related to probability of failing a course.
The use of LAs in the math department only occurs during weekly recitation meetings. During this weekly meeting, students work in small groups to complete carefully constructed activities designed to enhance conceptual understanding of the materials covered during the weekly lecture. An anomaly in the math department is that though Calculus I/II are considered gateway courses, the math department at this institution is committed to keeping course enrollment under 40. This means that LA support is tied to smaller class sizes in this department. However, since this condition is constant across the timeframe in our study, it does not influence selection bias.
Similar to the math department, the physics department only uses LAs in the weekly recitation meeting. An additional anomaly in physics is that, not incidentally, the switch to the LA model happened concurrently with the adoption of the University of Washington Tutorials in introductory physics (McDermott & Shaffer, 2002 ). LAs facilitate small group work with the materials in the University of Washington Tutorials during recitation meetings. In other words, it is not possible to separate the effects of the content presentation in the Tutorials from the LAs facilitating the learning of the content in this department. Thus, data from this department might overestimate the relationship between receiving LA support and course failure. However, it should be noted that the University of Washington Tutorials require a low student-teacher ratio, and proper implementation of this curriculum is not possible without the undergraduate LAs helping to make that ratio possible.
Finally, every student in every section of Calculus I and II in the applied math department had the opportunity to be exposed to LA support. This is because LAs are not used in lecture or required recitation meetings, but instead facilitate an additional weekly one-unit course, called workgroup, that is open to all students. Thus, students who sign up for workgroup not only gain exposure to LA support, but they also gain an additional 90 min of time each week formally engaging in calculus material. It is not possible to know if lower failure rates might be due to the additional time on task generally, or exposure to LAs during that time specifically. This might cause us to overestimate the relationship between LA support and course failure. Additionally, those students who are expected to struggle in calculus (based on placement scores on the Assessment and LEarning in Knowledge Spaces [ALEKS] assessment) or are not confident in their own math abilities are more strongly encouraged to sign up for the weekly meeting by their instructors and advisors. Thus, those students who sign up for LA support might be more likely to fail calculus. This might lead us to underestimate the relationship between LA exposure and course failure. Similar to the chemistry department, we use prior achievement variables (described below) to address this issue to the best of our abilities.
We mention one final assumption about the LA model before describing our methods of statistical adjustment. Our data span 32 semesters of 8 courses (see Fig. 1 ). Although it is surely the case that the LA model adapted and changed in some ways over the course of this time, we make the assumption that the program was relatively stable within department throughout the time period represented in this study.
Statistical adjustment
Although we do not have a controlled experiment that warrants causal claims, we desire to estimate a causal effect. The current study includes a control group, but it is not ideal because of the potential selection bias in each department described above. However, this study is warranted because it takes advantage of historical data. Our analytic approach is to control for some sources of selection bias. Specifically, we use R to control for standardized high school GPA, standardized admissions test scores, and standardized credits at entry to try and account for issues related to prior aptitude. This helps to address the selection bias issues in the chemistry and applied math departments. Additionally, we control for student cohort to account for some of the historical bias in the physics and math departments. We also control for instructor and course as well as gender (coded 1 = female; 0 = male), race/ethnicity (coded 1 = nonwhite; 0 = white), first-generation status (coded 1 = first-generation college student; 0 = not first-generation college student), and financial aid status (coded 1 = received financial aid ever; 0 = never received financial aid) to disentangle other factors that might bias our results in any department. Finally, we consider possible interaction effects between exposure to LA support and various student characteristics. Table 2 presents the successive model specifications explored in this study. Model 1 controls only for student characteristics. Model 2 adds course, cohort, and instructor factor variables. Model 3 adds an interaction between exposure to the LA program and gender to the model 2 specification.
The control variables in Table 2 help to account for the selection bias described above as well as other unobserved bias in our samples, but we are limited by the availability of observed covariates. Thus, the results presented here lie somewhere between “true” causal effects and correlations. We know that our results tell us more than simple correlations, but we also know that we are surely missing key control variables that are typically not collected by institutes of higher education such as a measure of student self-efficacy, social and emotional health, or family support. Thus, we anticipate weak model fit, and the results presented here are not direct causal effects. Instead, they provide information about the partial association between course failure and LA support.
We begin our analysis by providing raw counts of failure rates for the students who did and did not receive LA support in STEM gateway courses. Next, we describe the differences between those students who did and did not receive LA support with respect to available covariates. If it is the case that we see large differences in our covariates between the group of students who did and did not receive LA support, we expect that controlling for those factors in the regression analysis will affect our results in meaningful ways. Thus, we close with estimating logistic regression models to disentangle some of the relationship between LA-support and course failure. The variable of most interest in this analysis is the indicator for exposure to the LA program. A student received a “1” for this variable if they were exposed to the LA program either concurrently or prior to taking STEM gateway courses, and a 0 if they took any classes in the study but never had any LA support in those classes.
Table 3 includes raw pass and failure rates across all courses. Students are counted every time they enrolled in one of the courses included in our study. We see that those students who were exposed to the LA program in at least one STEM gateway course had 6% lower failure rates in concurrent or subsequent STEM gateway course. We also provide the unadjusted odds ratios for ease of comparison with the logistic regression results. The odds ratio represents the odds that course failure will occur given exposure to the LA program, compared to the odds of course failure occurring without LA exposure. Odds ratios equal to 1.0 indicates the odds of failure is the same for both groups. Odds ratios less than 1.0 indicates that exposure to LA support is associated with a lower chance of failing, while odds ratios greater than 1.0 indicates that exposure to LA support is associated with a higher chance of failing. Thus, the odds ratio of 0.65 in Table 3 indicates a lower chance of failure with LA exposure compared to no LA exposure.
Although the raw data indicates that students exposed to LA support have lower course failure rates, these differences could be due, at least in part, to factors outside of LA support. To explore this possibility, we next examine demographic and academic achievement differences between the groups. In Table 4 , we present the mean values for all of our predictor variables for students who did and did not receive LA support. The top panel presents all of the binary variables, so averages indicate the percentage of students who identify with the respective characteristics. The bottom panel shows the average for the continuous variables. The p values are for the comparisons of means from a t test across the two groups for each variable. Table 4 indicates that students exposed to the LA program were more likely to be male, nonwhite, non-first-generation students who did not received financial aid. They also had more credits at entry, higher high school GPAs, and higher admissions test scores. These higher prior achievement variables might lead us to think that students exposed to LA support are more likely to pass STEM gateway courses. If this is true, then the relationship between LA exposure and failure in Table 3 may overestimate the actual relationship between exposure to LAs and probability for course failure. Thus, we next use logistic regression to control for potentially confounding variables and investigate any resulting change in the odds ratio.
R calculates logistic regression estimates in logits, but these estimates are often expressed in odds ratios. We present abbreviated logit estimates in the Appendix and abbreviated odds ratios estimates in Table 5 . Estimates for all factor variables (i.e., course, cohort, and instructor) are suppressed in these tables for ease of presentation. In order to make the transformation from logits to odds ratios, the logit estimates were exponentiated to calculate the odds ratios presented in Table 5 . For example, the logit estimate for exposure to LA in model 1 from the Appendix converts to the odds ratio estimate in Table 5 by finding exp(− 1.41) = 0.24.
We start off by discussing the results for model 3 as it is the full model for this analysis. Discussion of models 1 and 2 are saved for the discussion of model fit below. The results in model 3 provide information about what we can expect, on average, across all courses and instructors in the sample. We include confidence intervals with the odds ratios. Confidence intervals that include 1.0 suggest results that are not statistically significant (Long, 1997 ). The odds ratio estimate in Table 5 for model 3 is 0.367 for LA exposure with a confidence interval from (0.337–0.400). Since the odds ratio is less than 1.0, LA exposure is associated with a lower probability of failing, on average, and the relationship is statistically significant because the confidence interval does not include 1.0. Compared to the odds ratio in Table 3 (0.65), these results indicate that covariate adjustment has a large impact on this odds ratio. Failure to adjust for possible sources of confounding variables lead to an understatement of the “effect” of exposure to the LA program on course failure.
Our results show that LA exposure is associated with lower odds of failing STEM gateway courses. We also see that the interaction between exposure to the LA program and gender is statistically significant. The odds ratio of 0.37 for exposure to LA support in Table 5 is for male students. In order to find the relationship for female students, we must exponentiate the logit estimates for exposure to the LA program, female, and the interaction between the two variables (i.e. exp[01.002–0.092 + 0.297] = 0.45; see the Appendix ). This means that the LA program actually lowers the odds of failing for male students slightly more than female students. Recall that Table 3 illustrated that the raw odds ratio for failure when exposed to LA support was 0.65. Our results show that after controlling for possibly confounding variables, the relationship between LA support and odds of course failure is better for both male (0.37) and female (0.45) students.
Discussion and limitations
Throughout this paper, we have been upfront about the limitations of the current analysis. Secondary analysis of institutional data for longstanding programs is complex and difficult. In this penultimate section, we mention a few other limitations to the study as well as identify some ideas for future research that could potentially bolster the results found here or identify where this analysis may have gone astray.
First, and most closely related to the results presented above is model fit. The McFadden pseudo R-squared (Verbeek, 2008 ) values for the three models are 0.0708, 0.1793, and 0.1797 respectively. These values indicate two things: (1) that the data do not fit any of the models well and (2) that the addition of the interaction term does little to improve model fit. This is also seen in the comparison of AIC and log likelihood values in Table 5 . We spend significant time on the front end of this paper describing why these data are not ideal for understanding the relationship between exposure to the LA program and probability of failing, so we do not spend additional time here discussing this lack of goodness-of-fit. Instead, we acknowledge this as a limitation of the current analysis and reiterate the desire to conduct a similar type analysis to what is presented here with data more likely to fit the model. Such situations would include institutions that have the ability to compare, for example, large samples of students with and without LA exposure within the same semester, course, and instructor. Another way to improve such data would be to include a way to control for student confidence and feelings of self-efficacy. For example, the descriptions of selection bias above indicate that students in Applied Math might systematically be students who differ in terms of self-confidence. Data that could control for such factors would better facilitate understanding of the relationship between exposure to LA support and course failure. Alternatively, it may be more appropriate to consider the nested structure of the data (i.e., students nested within courses nested within departments) in a context with data better suited for such analysis. Hierarchical linear modeling might even be appropriate for a within-department study if it would be reasonable to consider students nested within classes if there was sufficient sample size at the instructor level.
Second, in addition to a measure of student self-efficacy, there are other variables that might be interesting to investigate such as transfer, out-of-state, or international student status; if students live on-campus; and a better measure of socioeconomic status than receiving financial aid. These are other important student characteristics that might uncover differential relationships between the LA program and particular types of students. Such analysis is important because persistence and retention in gateway courses—particularly for students from traditionally marginalized groups—are an important concern for institutions generally and STEM departments specifically. If we are to maintain and even build diversity in these departments, it is crucial we have solid and clear work in these areas.
Third, although this study controls for course- and instructor-level factors, there are surely complications introduced into this study due to the differential way the LA program is implemented in each department. A more careful study within department is another interesting and valuable approach to understanding the influence of the LA program but one that this data is not well-suited for. Again, there is a need for data which includes students exposed to the LA program and not exposed within the same term, course, and instructor to better disentangle the relationship. Due to the nature of the way the LA program was taken up at University of Colorado Boulder, we do not have the appropriate data for such an analysis.
Finally, an interesting consideration is the choice of outcome variable made in this analysis. Course failure rates are particularly important in gateway courses because failing such a course can lead students to switch majors or drop out of college. We do see a relationship between the LA model and lower failure rates in the current analysis. However, other approaches to course outcomes include course grades, pass rates, average GPA in other courses, and average grade anomaly (Freeman et al., 2014 ; Haak et al., 2011 ; Matz et al., 2017 ; Webb, Stade, & Grover, 2014 ). Similar investigations to what is presented here with other course outcomes are also of interest. For example, course grades would provide more nuanced information regarding how the LA model influences student outcomes. A measure such as Matz et al.’s ( 2017 ) average GPA in other courses could provide more information about how the LA program impacts course other than the ones in which the LA exposure occurred. In either of these situations, it would be interesting to see if the LA program would continue to appear to have a greater impact for male students than female. In short, there are a wide variety of student outcomes that have yet to be fully investigated with data from the LA model and more nuanced information would be a valuable contribution to the research literature.
In this study, we attempt to disentangle the relationship between LA support and course failure in introductory STEM courses. Our results indicate that failure to control for confounding variables underestimates the relationship between exposure to the LA program and course failure. The results here extend the prior literature regarding the LA model by providing evidence to suggest that exposure to the program increases student outcomes in subsequent as well as current courses. Programs such as the LA model that facilitate instructional innovations where students are more likely to be successful increase student retention.
Preliminary qualitative work suggests potential hypotheses for the relationship between LA support and student success. Observations of student-LA interactions indicate that LAs develop safe yet vulnerable environments necessary for learning. Undergraduates are more comfortable revealing their thinking to LAs than to TAs and instructors and are therefore better able to receive input about their ideas. Researchers find that LAs exhibit pedagogical skills introduced in the pedagogy course and course experience that promote deep understanding of relevant content as well as critical thinking and questioning needed in higher education (Top, Schoonraad, & Otero, 2018 ). Also, through their interactions with LAs, faculty seem to be learning how to embrace the diversity of student identities and structure educational experiences accordingly. Finally, institutional norms are changing as more courses adopt new ways of teaching students. For example, the applied math department provides additional time on task because of the LA program. Although we do not know if it is the additional time on task, the presence of LAs, or a combination of both that drives the relationship between LA exposure and lower course failure rates, both the additional time and LA exposure occur because of the LA program generally.
Further work is necessary to more fully understand the relationship between the LA program and student success. Although we controlled for several student-level variables, we surely missed key variables that contribute to these relationships. Despite this limitation, the regression analysis represents an improvement over unadjusted comparisons. We used the available institutional data to control for variables related to the selection bias present in each department’s method of assigning students to receive LA support. More research is needed to identify if the emerging themes in the present study are apparent at other institutions. Additional research with data better suited to isolate potential causal effects is also needed to bolster the results presented here. Despite the noted limitations discussed here, the current findings are encouraging for further development and implementation of the LA program in STEM gateway courses. Identifying relationships between models for change and lower course failure rates are helpful for informing future decisions regarding those models.
For more information about joining LASSO and resources available to support LA programs, visit https://www.learningassistantalliance.org /
Abbreviations
Brief Electricity and Magnetism Assessment
Conceptual Survey of Electricity and Magnetism
Force Concept Inventory
Force and Motion Concept Evaluation
Learning Assistant model
Learning assistants
Peer-led team learning
Science, technology, engineering, and mathematics
Caldwell, J. E. (2007). Clickers in the large classroom: current research and best-practice tips. CBE-Life Sci Educ, 6 (1), 9–20.
Article Google Scholar
Chan, J. Y., & Bauer, C. F. (2015). Effect of peer-led team learning (PLTL) on student achievement, attitude, and self-concept in college general chemistry in randomized and quasi experimental designs. J Res Sci Teach, 52 (3), 319–346.
Close, E. W., Mailloux-Huberdeau, J. M., Close, H. G., & Donnelly, D. (2018). Characterization of time scale for detecting impacts of reforms in an undergraduate physics program. In L. Ding, A. Traxler, & Y. Cao (Eds.), AIP Conference Proceedings: 2017 Physics Education Research Conference .
Google Scholar
College Board. (2016). Concordance tables. Retrieved from https://collegereadiness.collegeboard.org/pdf/higher-ed-brief-sat-concordance.pdf
Cracolice, M. S., & Deming, J. C. (2001). Peer-led team learning. Sci Teach, 68 (1), 20.
Crisp, G., Nora, A., & Taggart, A. (2009). Student characteristics, pre-college, college, and environmental factors as predictors of majoring in and earning a STEM degree: an analysis of students attending a Hispanic serving institution. Am Educ Res J, 46 (4), 924–942 Retrieved from http://www.jstor.org/stable/40284742 .
Ding, L., Chabay, R., Sherwood, B., & Beichner, R. (2006). Evaluating an electricity and magnetism assessment tool: brief electricity and magnetism assessment. Physical Rev Special Topics Physics Educ Res, 2 (1), 010105.
Elby, A., Scherr, R. E., Goertzen, R. M., & Conlin, L. (2008). Open-source tutorials in physics sense making. Retrieved from http://umdperg.pbworks.com/w/page/10511238/Tutorials%20from%20the%20UMd%20PERG
Freeman, S., Eddy, S. L., McDonough, M., Smith, M. K., Okoroafor, N., Jordt, H., & Wenderoth, M. P. (2014). Active learning increases student performance in science, engineering, and mathematics. Proc Nat Acad Sci, 111 (23), 8410–8415.
Goertzen, R. M., Brewe, E., Kramer, L. H., Wells, L., & Jones, D. (2011). Moving toward change: institutionalizing reform through implementation of the Learning Assistant model and Open Source Tutorials. Physical Rev Special Topics Physics Education Research, 7 (2), 020105.
Haak, D. C., HilleRisLambers, J., Pitre, E., & Freeman, S. (2011). Increased structure and active learning reduce the achievement gap in introductory biology. Science, 332 (6034), 1213–1216.
Hake, R. R. (1998). Interactive-engagement versus traditional methods: a six-thousand-student survey of mechanics test data for introductory physics courses. Am J Physics, 66 (1), 64–74.
Hestenes, D., Wells, M., & Swackhamer, G. (1992). Force concept inventory. Physics Teach, 30 (3), 141–158.
Jensen, J. L., & Lawson, A. (2011). Effects of collaborative group composition and inquiry instruction on reasoning gains and achievement in undergraduate biology. CBE-Life Sci Educ, 10 (1), 64–73.
Knight, R. (2004). Physics for scientists and engineers: A strategic approach. Upper Saddle River, NJ: Pearson/Addison Wesley.
Learning Assistant Alliance. (2018). About LASSO. Retrieved from https://www.learningassistantalliance.org/modules/public/lasso.php
Long, J. S. (1997). Advanced quantitative techniques in the social sciences series, Vol. 7. Regression models for categorical and limited dependent variables. Thousand Oaks, CA, US.
Martin, T., Rivale, S. D., & Diller, K. R. (2007). Comparison of student learning in challenge-based and traditional instruction in biomedical engineering. Annals of Biomedical Engineering, 35 (8), 1312–1323.
Matz, R. L., Koester, B. P., Fiorini, S., Grom, G., Shepard, L., Stangor, C. G., et al. (2017). Patterns of gendered performance differences in large introductory courses at five research universities. AERA Open, 3 (4), 2332858417743754.
McDermott, L. C., and Shaffer, P. S. (2002). Tutorials in introductory physics. Upper Saddle Ridge, New Jersey: Prentice Hall.
Mitchell, Y. D., Ippolito, J., & Lewis, S. E. (2012). Evaluating peer-led team learning across the two semester general chemistry sequence. Chemistry Education Research and Practice, 13 (3), 378–383.
Otero, V. K. (2015). Effective practices in preservice teacher education. In C. Sandifer & E. Brewe (Eds.), Recruiting and educating future physics teachers: case studies and effective practices (pp. 107–127). College Park: American Physical Society.
Pollock, S. J. (2006). Transferring transformations: Learning gains, student attitudes, and the impacts of multiple instructors in large lecture courses. In P. Heron, L. McCullough, & J. Marx (Eds.), Proceedings of 2005 Physics Education Research Conference (pp. 141–144). Salt Lake City, Utah.
Pollock, S. J. (2009). Longitudinal study of student conceptual understanding in electricity and magnetism. Physical Review Special Topics-Physics Education Research, 5 (2), 1–8.
Talbot, R. M., Hartley, L. M., Marzetta, K., & Wee, B. S. (2015). Transforming undergraduate science education with learning assistants: student satisfaction in large-enrollment courses. J College Sci Teach, 44 (5), 24–30.
Thornton, R. K., & Sokoloff, D. R. (1998). Assessing student learning of Newton’s laws: the force and motion conceptual evaluation and the evaluation of active learning laboratory and lecture curricula. Am J Physics, 66 (4), 338–352.
Thornton, R. K., Kuhl, D., Cummings, K., & Marx, J. (2009). Comparing the force and motion conceptual evaluation and the force concept inventory. Physical review special topics-Physics education research, 5(1), 010105.
Top, L., Schoonraad, S., & Otero, V. (2018). Development of pedagogical knowledge among learning assistants. Int J STEM Educ, 5 (1). https://doi.org/10.1186/s40594-017-0097-9 .
Verbeek, M. (2008). A guide to modern econometrics . West Sussex: Wiley.
Webb, D. C., Stade, E., & Grover, R. (2014). Rousing students’ minds in postsecondary mathematics: the undergraduate learning assistant model. J Math Educ Teach College, 5 (2).
White, J. S. S., Van Dusen, B., & Roualdes, E. A. (2016). The impacts of learning assistants on student learning of physics. arXiv preprint arXiv, 1607.07469 . Retrieved from https://arxiv.org/ftp/arxiv/papers/1607/1607.07469.pdf .
Wilson, S. B., & Varma-Nelson, P. (2016). Small groups, significant impact: a review of peer-led team learning research with implications for STEM education researchers and faculty. J Chem Educ, 93 (10), 1686–1702.
William H. Sewell, (1992) A Theory of Structure: Duality, Agency, and Transformation. American Journal of Sociology 98 (1):1–29
Download references
Acknowledgements
There is no funding for this study.
Availability of data and materials
The datasets generated and/or analyzed during the current study are available in the LAs and Subsequent Course Failure repository, https://github.com/jalzen/LAs-and-Subsequent-Course-Failure .
Author information
Authors and affiliations.
University of Colorado Boulder, 249 UCB, Boulder, CO, 80309, USA
Jessica L. Alzen, Laurie S. Langdon & Valerie K. Otero
You can also search for this author in PubMed Google Scholar
Contributions
JLA managed the data collection and analysis. All authors participated in writing, revising, and approving the final manuscript.
Corresponding author
Correspondence to Jessica L. Alzen .
Ethics declarations
Ethics approval and consent to participate.
The IRB at University of Colorado Boulder (FWA 00003492) determined that this study did not involve human subjects research. The approval letter specifically stated the following:
The IRB determined that the proposed activity is not research involving human subjects as defined by DHHS and/or FDA regulations. IRB review and approval by this organization is not required. This determination applies only to the activities described in the IRB submission and does not apply should any changes be made. If changes are made and there are questions about whether these activities are research involving human subjects in which the organization is engaged, please submit a new request to the IRB for a determination.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Reprints and permissions
About this article
Cite this article.
Alzen, J.L., Langdon, L.S. & Otero, V.K. A logistic regression investigation of the relationship between the Learning Assistant model and failure rates in introductory STEM courses. IJ STEM Ed 5 , 56 (2018). https://doi.org/10.1186/s40594-018-0152-1
Download citation
Received : 29 August 2018
Accepted : 10 December 2018
Published : 28 December 2018
DOI : https://doi.org/10.1186/s40594-018-0152-1
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
- Learning assistant
- Underrepresented students
IMAGES
VIDEO
COMMENTS
This paper is focused on providing an overview of the most important aspects of LR when used in data analysis, specifically from an algorithmic and machine learning perspective and how LR can be...
In this case vignette, we will examine a subset of the variables included in univariable logistic regression analyses for the outcome of acute toxicity, and will explore various options for building a multivariable logistic regression model.
Complete model reporting for binary logistic regression includes descriptive statistics, a statement on whether assumptions were checked and met, ORs and CIs for each predictor, overall model significance and overall model fit. Keywords: education, epidemiology, public health.
This study reviews the international literature of empirical educational research to examine the application of logistic regression. The aim is to examine common practices of the report and...
In this study, we use logistic regression with pre-existing institutional data to investigate the relationship between exposure to LA support in large introductory STEM courses and general failure rates in these same and other introductory courses at University of Colorado Boulder.
This article demonstrates the preferred pattern for the application of logistic methods with an illustration of logistic regression applied to a data set in testing a research hypothesis.
Logistic regression is an efficient and powerful way to assess independent variable contributions to a binary outcome, but its accuracy depends in large part on careful variable selection with satisfaction of basic assumptions, as well as appropriate choice of model building strategy and validation of results.
Logistic regression is an excellent tool for modeling relationships with outcomes that are not measured on a continuous scale (a key requirement for linear regression).
This research paper explores the application of logistic regression as a predictive tool for diabetes diagnosis. Leveraging a substantial dataset containing clinical and patient-related variables, our study demonstrates the feasibility and efficacy of logistic regression pinpoint individuals susceptible to developing diabetes.
Dimitris Bertsimas and Angela King. Abstract. A high quality logistic regression model contains various desir-able properties: predictive power, interpretability, significance, robustness to error in data and sparsity, among others.