Hi authors,
Thanks for sharing this impressive work.
While reading the paper, I noticed a numerical discrepancy regarding the performance scores for the Risk Prediction Tasks.
- In Figure 1c, the score for EHR-R1-72B is reported as 0.9328.
- However, in Figure 5, the result for EHR-R1-72B is stated as 0.9523.
Could you clarify if this is a reporting error in the figures, or if I might have misunderstood the evaluation settings?
Thanks!