Skip to content

Comments

closes #71: Function EFA_POOLED()#72

Open
andreassoteriadesmoj wants to merge 4 commits intomdsteiner:masterfrom
moj-analytical-services:efa-pooled
Open

closes #71: Function EFA_POOLED()#72
andreassoteriadesmoj wants to merge 4 commits intomdsteiner:masterfrom
moj-analytical-services:efa-pooled

Conversation

@andreassoteriadesmoj
Copy link

EFA_POOLED() is similar to psych::fa.pooled() and can be used to peform EFA on multiple data imputations using the EFA() function. This PR creates EFA_POOLED() (55fcab5) and a few helper functions (2e022b8) that EFA_POOLED() uses behind the scenes.

EFA_POOLED() returns some of the standard objects that EFA() also returns, including communalities, unrotated and rotated pattern loadings, structure loadings and interfactor correlations (for oblique solutions), and fit indices. Note, however, that in the case of EFA_POOLED(), these results are based on the pooled estimates (see function documentation for details). In addition, EFA_POOLED() also returns confidence intervals for the pooled estimates, as well as the individual EFA() fits for each imputation.

@mdsteiner
Copy link
Owner

Thanks very much! Just so you know what to expect: It will probably take a couple of weaks until I find the time to really go through it and check it, but I will get back to you.

@andreassoteriadesmoj
Copy link
Author

Thanks very much! Just so you know what to expect: It will probably take a couple of weaks until I find the time to really go through it and check it, but I will get back to you.

That's fine, thanks for letting me know!

@mdsteiner
Copy link
Owner

Quick question: In psych::fa.pooled the CI is calculated differently, (i.e., without dividing by sqrt(m), see https://github.com/cran/psych/blob/6c1403fe9c2911552e6bdbabae4730679b3ffea6/R/fa.pooled.r#L59). So the psych intervalls will be broader than the cis from EFA_POOLED. Do you have a rationale for doing it differently?

@andreassoteriadesmoj
Copy link
Author

Quick question: In psych::fa.pooled the CI is calculated differently, (i.e., without dividing by sqrt(m), see https://github.com/cran/psych/blob/6c1403fe9c2911552e6bdbabae4730679b3ffea6/R/fa.pooled.r#L59). So the psych intervalls will be broader than the cis from EFA_POOLED. Do you have a rationale for doing it differently?

It's not clear to me why psych uses the SD instead of the SE. Function .calc_cis() divides SD by the square root of the sample size to get the SE, which is the definition of CIs that I'm familiar with from the literature.

@mdsteiner
Copy link
Owner

If we divide by sqrt(m), the SE becomes smaller, so the assumption would be that we gain precision by adding more and more imputations. I'm not sure that's correct. In Hayes and Enders (2022), eq. 11, they use a more complicated expression for deriving the SEs, but I have no expertise at all in this area. A quick chat with ChatGPT (which by no means has to be correct, but that's what I could do in the time given) tells me that probably only the implementations used in Hayes and Enders is correct. Briefly it states that

  • dividing by sqrt(m) provides the standard error of the mean of the imputation-specific loadings, but this is not the goal of MI, rather we want to get the sampling uncertainty of the loading estimate.
  • The psych version provides an estimate of how much solutions vary across imputations and thus provides a typical range of plausible loadings across imputations, but again not the sampling uncertainty of the loading estimate.

Hayes, T., & Enders, C. K. (2022). Maximum likelihood and multiple imputation missing data handling: How they work, and how to make them work in practice. In H. Cooper (Ed.), APA Handbook of research methods in psychology (2nd ed., pp. 27-51). Washington, DC: American Psychological Association.

@andreassoteriadesmoj
Copy link
Author

andreassoteriadesmoj commented Feb 11, 2026

I don't have access to Hayes and Enders (2022), so I can't see the equation. I have requested the chapter via interlibrary loan, which can take a while. Meanwhile, I'll try to answer to your comments without having seen eq. 11.

If we divide by sqrt(m), the SE becomes smaller, so the assumption would be that we gain precision by adding more and more imputations. I'm not sure that's correct.

I see what you mean. However, the SE is a function of both SD and m- it’s not like the SD is constant. Of course, you would (hopefully) expect your multiple imputations to not lead to model parameter estimates that are all over the place, hence, you wouldn't expect the SD to change much if you added more imputations. Hence, in theory, you could keep increasing the number of imputations to minimize the SE, which is the very process that you rightfully have doubts about. But maybe that's exactly the point- the more imputations you have, the more confident you are that your pool of estimated parameters is consistent.

Meanwhile, the answer from ChatGPT doesn’t seem very clear to me. If we want to get the sampling uncertainty of the loading estimate, don't we need the SE of the mean of the imputation-specific loadings to measure said uncertainty?

I wonder if it would be more pragmatic to use qt() instead of qnorm() in the construction of the CIs, given that the t distribution resembles the normal distribution, although its precise shape depends on the sample size. You'd still use the SE in the construction of the CIs, however, the critical value wouldn't be the z-value- it would be the critical value of the t distribution which depends on m. E.g.

x <- 1 - 0.05 / 2

qnorm(x)
# [1] 1.959964

# 10^8 "imputations" (if that's ever possible!)
qt(x, 10^8)
# [1] 1.959964

# Five imputations, so 5 - 1 degrees of freedom
qt(x, 5 - 1)
# [1] 2.776445

Please note that I will be off in a couple of days, and I will pick this up as soon as I'm back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants