closes #71: Function `EFA_POOLED()` by andreassoteriadesmoj · Pull Request #72 · mdsteiner/EFAtools

andreassoteriadesmoj · 2026-01-26T09:12:08Z

EFA_POOLED() is similar to psych::fa.pooled() and can be used to peform EFA on multiple data imputations using the EFA() function. This PR creates EFA_POOLED() (55fcab5) and a few helper functions (2e022b8) that EFA_POOLED() uses behind the scenes.

EFA_POOLED() returns some of the standard objects that EFA() also returns, including communalities, unrotated and rotated pattern loadings, structure loadings and interfactor correlations (for oblique solutions), and fit indices. Note, however, that in the case of EFA_POOLED(), these results are based on the pooled estimates (see function documentation for details). In addition, EFA_POOLED() also returns confidence intervals for the pooled estimates, as well as the individual EFA() fits for each imputation.

`EFA_POOLED()` & new helper functions

mdsteiner · 2026-01-30T05:13:36Z

Thanks very much! Just so you know what to expect: It will probably take a couple of weaks until I find the time to really go through it and check it, but I will get back to you.

andreassoteriadesmoj · 2026-01-30T07:57:56Z

Thanks very much! Just so you know what to expect: It will probably take a couple of weaks until I find the time to really go through it and check it, but I will get back to you.

That's fine, thanks for letting me know!

mdsteiner · 2026-02-06T12:15:16Z

Quick question: In psych::fa.pooled the CI is calculated differently, (i.e., without dividing by sqrt(m), see https://github.com/cran/psych/blob/6c1403fe9c2911552e6bdbabae4730679b3ffea6/R/fa.pooled.r#L59). So the psych intervalls will be broader than the cis from EFA_POOLED. Do you have a rationale for doing it differently?

andreassoteriadesmoj · 2026-02-06T12:26:02Z

Quick question: In psych::fa.pooled the CI is calculated differently, (i.e., without dividing by sqrt(m), see https://github.com/cran/psych/blob/6c1403fe9c2911552e6bdbabae4730679b3ffea6/R/fa.pooled.r#L59). So the psych intervalls will be broader than the cis from EFA_POOLED. Do you have a rationale for doing it differently?

It's not clear to me why psych uses the SD instead of the SE. Function .calc_cis() divides SD by the square root of the sample size to get the SE, which is the definition of CIs that I'm familiar with from the literature.

mdsteiner · 2026-02-06T16:15:52Z

If we divide by sqrt(m), the SE becomes smaller, so the assumption would be that we gain precision by adding more and more imputations. I'm not sure that's correct. In Hayes and Enders (2022), eq. 11, they use a more complicated expression for deriving the SEs, but I have no expertise at all in this area. A quick chat with ChatGPT (which by no means has to be correct, but that's what I could do in the time given) tells me that probably only the implementations used in Hayes and Enders is correct. Briefly it states that

dividing by sqrt(m) provides the standard error of the mean of the imputation-specific loadings, but this is not the goal of MI, rather we want to get the sampling uncertainty of the loading estimate.
The psych version provides an estimate of how much solutions vary across imputations and thus provides a typical range of plausible loadings across imputations, but again not the sampling uncertainty of the loading estimate.

Hayes, T., & Enders, C. K. (2022). Maximum likelihood and multiple imputation missing data handling: How they work, and how to make them work in practice. In H. Cooper (Ed.), APA Handbook of research methods in psychology (2nd ed., pp. 27-51). Washington, DC: American Psychological Association.

andreassoteriadesmoj · 2026-02-11T08:33:14Z

I don't have access to Hayes and Enders (2022), so I can't see the equation. I have requested the chapter via interlibrary loan, which can take a while. Meanwhile, I'll try to answer to your comments without having seen eq. 11.

If we divide by sqrt(m), the SE becomes smaller, so the assumption would be that we gain precision by adding more and more imputations. I'm not sure that's correct.

I see what you mean. However, the SE is a function of both SD and m- it’s not like the SD is constant. Of course, you would (hopefully) expect your multiple imputations to not lead to model parameter estimates that are all over the place, hence, you wouldn't expect the SD to change much if you added more imputations. Hence, in theory, you could keep increasing the number of imputations to minimize the SE, which is the very process that you rightfully have doubts about. But maybe that's exactly the point- the more imputations you have, the more confident you are that your pool of estimated parameters is consistent.

Meanwhile, the answer from ChatGPT doesn’t seem very clear to me. If we want to get the sampling uncertainty of the loading estimate, don't we need the SE of the mean of the imputation-specific loadings to measure said uncertainty?

I wonder if it would be more pragmatic to use qt() instead of qnorm() in the construction of the CIs, given that the t distribution resembles the normal distribution, although its precise shape depends on the sample size. You'd still use the SE in the construction of the CIs, however, the critical value wouldn't be the z-value- it would be the critical value of the t distribution which depends on m. E.g.

x <- 1 - 0.05 / 2

qnorm(x)
# [1] 1.959964

# 10^8 "imputations" (if that's ever possible!)
qt(x, 10^8)
# [1] 1.959964

# Five imputations, so 5 - 1 degrees of freedom
qt(x, 5 - 1)
# [1] 2.776445

Please note that I will be off in a couple of days, and I will pick this up as soon as I'm back.

andreassoteriadesmoj added 4 commits January 26, 2026 08:45

code: Function EFA_POOLED()

55fcab5

code: Helper functions for internal use by EFA_POOLED()

2e022b8

doc: Add new function documentation

a02df07

`EFA_POOLED()` & new helper functions

Update NAMESPACE

cb076db

andreassoteriadesmoj mentioned this pull request Jan 26, 2026

EFA function for multiply imputed datasets #71

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

closes #71: Function `EFA_POOLED()`#72

closes #71: Function `EFA_POOLED()`#72
andreassoteriadesmoj wants to merge 4 commits intomdsteiner:masterfrom
moj-analytical-services:efa-pooled

andreassoteriadesmoj commented Jan 26, 2026

Uh oh!

mdsteiner commented Jan 30, 2026

Uh oh!

andreassoteriadesmoj commented Jan 30, 2026

Uh oh!

mdsteiner commented Feb 6, 2026

Uh oh!

andreassoteriadesmoj commented Feb 6, 2026

Uh oh!

mdsteiner commented Feb 6, 2026

Uh oh!

andreassoteriadesmoj commented Feb 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

andreassoteriadesmoj commented Jan 26, 2026

Uh oh!

mdsteiner commented Jan 30, 2026

Uh oh!

andreassoteriadesmoj commented Jan 30, 2026

Uh oh!

mdsteiner commented Feb 6, 2026

Uh oh!

andreassoteriadesmoj commented Feb 6, 2026

Uh oh!

mdsteiner commented Feb 6, 2026

Uh oh!

andreassoteriadesmoj commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

andreassoteriadesmoj commented Feb 11, 2026 •

edited

Loading