Single-case experimental design studies are common in aphasia research. These studies (also referred to as single-subject experimental design) focus on treatment response at the individual level and establish experimental control within each participant rather than using a control group. Single-case experimental design studies typically include at least 2-4 participants but methods employed in these studies are often extended to within-subject case-series designs with upwards of thirty participants (e.g., Gilmore et al., 2018), which can be used to test theories and explore individual differences in treatment response. While not a replacement for group-level clinical trials, single-case designs and their case-series extensions are a cost-effective method for establishing preliminary treatment efficacy in early phase research.
Effect sizes are an essential measure of treatment efficacy in these studies, describing the magnitude of treatment response. They validate the clinical relevance of treatment and are often used in within-subject case-series design studies to explore underlying cognitive mechansisms. In both cases, effect size accuracy and precision are important.
These 6 effect sizes each have their own strengths and weaknesses. In the plot below, scatterplots visualize the relationship between effect sizes (lower triangle) and coefficients of determination describe the amount of shared variance between measures (upper triangle).
These relationships likely change depending on the treatment design. In this case, these measures were calculated using a large simulated dataset of 500 participants under an AB design with 5 baseline and 10 treatment sessions. In this dataset, we created a distribution of participant ability estimates and item difficulty estimates based on Fergadiotis et al. (2015). Items were assigned to participants such that baseline performance would average 30% correct regardless of participant ability. Then we simulated item-level treatment response where the degree of treatment resposne was randomly assigned to participants.
(1) The relationship between SMD and other measures is characterized by increasing dissimilarity as effect sizes increase. This is likely due to the influence of using baseline variance in the denominator of SMD. In this sample baseline variance accounts of 34% of the variance in SMD scores.
(2) The difference between NAP/Tau-U and other effect size measures is clear, as they do not capture differences in treatment effects when there is no overlap. However, they are high similar to each other.
(3) PMG approximates the BMEM effect size which is expected since the BMEM effect size reflects absolute change between baseline and treatment when baseline performance is neutralized through stimuli selection.
(4) Differences between the GLMM and BMEM measures may be explained by the fact that the GLMM effect size implemented does not take into account baseline trends. Furthermore, the GLMM effect size may not fully account for substantial level changes (improvements immediately following baseline).
Antonucci, S., & Gilmore, N. (2019). Do aphasia core outcome sets require core analysis sets: Where do we go from here in single subject design research? 49th Clinical Aphasiology Conference.
Beeson, P. M., & Robey, R. R. (2006). Evaluating single-subject treatment research: Lessons learned from the aphasia literature. Neuropsychology Review, 16(4), 161–169. https://doi.org/10.1007/s11065-006-9013-7
Bürkner, P. C. (2018). Advanced Bayesian multilevel modeling with the R package brms. R Journal. https://doi.org/10.32614/rj-2018-017
Creet, E., Morris, J., Howard, D., & Nickels, L. (2019). Name it again! Investigating the effects of repeated naming attempts in aphasia. Aphasiology, 33(10), 1202–1226. https://doi.org/10.1080/02687038.2019.1622352
Evans, W. S., Cavanaugh, R., Quique, Y., Boss, E., Dickey, M. W., Doyle, P. J., Starns, J. J., & Hula, W. D. (2020). BEARS - Balancing Effort, Accuracy, and Response Speed in semantic feature verification anomia treatment. Abstract for Platform Presentation, Annual Clinical Aphasiology Conference (Conference Cancelled).
Goldfeld, K. (2019). simstudy: Simulation of Study Data (R package). https://cran.r-project.org/package=simstudy
Jaeger, T. F. (2008). Categorical data analysis: Away from ANOVAs (transformation or not) and towards logit mixed models. Journal of Memory and Language, 59(4), 434–446. https://doi.org/10.1016/j.jml.2007.11.007
King, T. S., & Chinchilli, V. M. (2001). A generalized concordance correlation coefficient for continuous and categorical data. Statistics in Medicine, 20(14), 2131–2147. https://doi.org/10.1002/sim.845
Lambon Ralph, M. A., Snell, C., Fillingham, J. K., Conroy, P., & Sage, K. (2010). Predicting the outcome of anomia therapy for people with aphasia post CVA: both language and cognitive status are key predictors. Neuropsychological Rehabilitation, 20(2), 289–305. https://doi.org/10.1080/09602010903237875
Landis, J. R., & Koch, G. G. (1977). An Application of Hierarchical Kappa-type Statistics in the Assessment of Majority Agreement among Multiple Observers. Biometrics, 33(2), 363–374. JSTOR. https://doi.org/10.2307/2529786
Lee, J. B., & Cherney, L. R. (2018). Tau-U: A Quantitative Approach for Analysis of Single-Case Experimental Data in Aphasia. American Journal of Speech-Language Pathology, 27(1S), 495–503. https://doi.org/10.1044/2017_AJSLP-16-0197
Lin, L. I.-K. (1989). A Concordance Correlation Coefficient to Evaluate Reproducibility. Biometrics, 45(1), 255–268. JSTOR. https://doi.org/10.2307/2532051
Manolov, R., & Solanas, A. (2008). Comparing N = 1 Effect Size Indices in Presence of Autocorrelation. Behavior Modification, 32(6), 860–875. https://doi.org/10.1177/0145445508318866
Parker, R. I., & Vannest, K. (2009). An improved effect size for single-case research: Nonoverlap of all pairs. Behavior Therapy, 40(4), 357–367. https://doi.org/10.1016/j.beth.2008.10.006
Parker, R. I., Vannest, K. J., Davis, J. L., & Sauber, S. B. (2011). Combining nonoverlap and trend for single-case research: Tau-U. Behavior Therapy, 42(2), 284–299. https://doi.org/10.1016/j.beth.2010.08.006
R Core Team. (2020). R: A language and environment for statistical computing (4.0.2). R Foundation for Statistical Computing. https://www.r-project.org/
Wiley, R. W., & Rapp, B. (2018). Statistical analysis in Small-N Designs: Using linear mixed-effects modeling for evaluating intervention effectiveness. Aphasiology, 33(1), 1–30. https://doi.org/10.1080/02687038.2018.1454884
This work was inspired by the 2019 CAC roundtable led by Natalie Gilmore and Sharon Antonucci. Thanks to Natalie and Sam Harvey (La Trobe University) for their extremely helpful feedback on this vignette.
Did I goof somewhere? Do you have recommendations or questions? Contact me here:
2021 Robert Cavanaugh