First of all, thank you for open-sourcing such an insightful piece of work! After reading your paper, I have some doubts and confusions that I hope you could help clarify.
1. Perturbed States and Recovery
On page 5, you mention:
"Perturbed states for one variable (e.g., C) could be recovered by incorporating information from uncorrupted portions of the other variables."
From my understanding of your method, C, F, and E are diffused independently, as stated in the paper, and each has its own Q. Specifically, for C, its corresponding $Q^C$ should be of dimension $\mathbb{R}^{(K_c+2) \times (K_c+2)}$, and the forward diffusion process is simply $Q^C \times C$. Therefore, the states of F and E should not influence the forward diffusion of C.
Thus, I assume that you might be referring to the backward diffusion process, where the denoising graph transformer could infer the value of one variable (e.g., C) based on the uncorrupted portions of the other variables (e.g., F and E). Am I correct in this interpretation?
2. Decoding Using the Graph Prior
When decoding with the graph prior, from what I understand in your paper, you append Gaussian noise (with dimension $\sum_{x \in \{ t, s, r \} }{dim(x)}$) to the node embedding and denoise these attributes. However, the node embedding in this case is just the graph prior and contains three categorical variables (C, F, E), which are inherently discrete variables. Mixing these discrete variables with continuous Gaussian noise seems problematic.
How do you handle this mismatch? Do you include an embedding layer to transform the discrete variables (C, F, E) into continuous variables, or do you simply ignore the mismatch and let the model learn this directly?
Looking forward to your explanation! It would greatly help me understand the nuances of your approach.
First of all, thank you for open-sourcing such an insightful piece of work! After reading your paper, I have some doubts and confusions that I hope you could help clarify.
1. Perturbed States and Recovery
On page 5, you mention:
"Perturbed states for one variable (e.g., C) could be recovered by incorporating information from uncorrupted portions of the other variables."
From my understanding of your method, C, F, and E are diffused independently, as stated in the paper, and each has its own Q. Specifically, for C, its corresponding$Q^C$ should be of dimension $\mathbb{R}^{(K_c+2) \times (K_c+2)}$ , and the forward diffusion process is simply $Q^C \times C$ . Therefore, the states of F and E should not influence the forward diffusion of C.
Thus, I assume that you might be referring to the backward diffusion process, where the denoising graph transformer could infer the value of one variable (e.g., C) based on the uncorrupted portions of the other variables (e.g., F and E). Am I correct in this interpretation?
2. Decoding Using the Graph Prior
When decoding with the graph prior, from what I understand in your paper, you append Gaussian noise (with dimension$\sum_{x \in \{ t, s, r \} }{dim(x)}$ ) to the node embedding and denoise these attributes. However, the node embedding in this case is just the graph prior and contains three categorical variables (C, F, E), which are inherently discrete variables. Mixing these discrete variables with continuous Gaussian noise seems problematic.
How do you handle this mismatch? Do you include an embedding layer to transform the discrete variables (C, F, E) into continuous variables, or do you simply ignore the mismatch and let the model learn this directly?
Looking forward to your explanation! It would greatly help me understand the nuances of your approach.