A Summer of Research
I spent most of this summer as a researcher, working on AI for medical imaging. The specific project involed simulating breathing cycles based on static CT images using diffusion models, but I feel that the most important insights from working on the project are more broad. For that reason, I’m sharing them here.
There are no adults in the room
In life and in education, most of us know there’s someone to fall back on. There’s always someone that knows what to do, whether that be a CEO, politician, doctor, scientist, etc.
In research, this quickly ceases to be true. Although we had a great supervisor, he couldn’t solve our problems for us. No one could; the project was too specialised. This made the success or failure of the project incredibly uncertain all the way up to the very end. A couple of days before the abstract submission deadline, we hardly had any results at all. We were stressing, looking for flaws in the logic, scared that we’d made some major error at the beginning of the project. Frantically, we tried tweaking this or that parameter, getting nowhere. Things looked bleak, and we felt like we’d wasted a lot of time. On the day of the deadline, just as we were gearing up for a long day and night of debugging, the submission deadline was extended by two weeks. This was a godsend to us, and we could work on the issues at hand less hurriedly. Within a few days, we found two issues in our code and models1 that made all the difference.
In hindsight, I think we were running too close to the edge before the deadline extension. Being a day or two away from deadline with no real results seems very likely to induce a state of “let’s just try changing this and wait to see if it works”. It’s hard to sit down, analyse the data for a few hours, sit some more and ruminate for a while, and then calmly write down a hypothesis for how to solve the issue when you only have 24 hours in total. The extension of two weeks struck a nice balance between urgency and calm. We were able to think clearly, inspect what was really going on and create high-quality hypotheses before spending masses of time testing them. At the same time, a deadline was still looming, and that was good for our productivity. The key conclusion from this experience is to always have a good reason for why your proposed solution will actually solve the underlying problem, and that this reason should be backed up by data if possible.
Making research work feels like magic
Since there are no clear answers, you’re left doubting the project’s validity for a long time. Sure, you might have thought the design of your model architecture was great when you first started, but now you’re beginning to wonder: did we waste our time by setting up the problem incorrectly? Is it impossible to solve the problem with this method? Is the dataset too noisy? Again, no one can answer that for you, because no one has done the same thing before. If they had, it wouldn’t be novel research. Despite all this doubt, you keep moving forward for weeks and incrementally get some partial results that at least stop doubt from making you give up for the time being. When this march leads to success in the end, it feels like magic. “Wait, our ideas actually worked?” was the major thought in my mind. The result is an incredible sense of pride and ownership in the work.
Collaboration is invigorating
Throughout the project, I felt very happy and thankful for having my collaborators on this project with me. Having them with me pushed me to work harder and come up with new ideas. Working together on a hard technical problem, reading papers, discussing detailed issues, drawing up equations and figures on the whiteboard, and telling each other about new insights was incredibly fun, and I’m sure we’ll do it again soon.
I’d like to thank Edvard and Hugo for working with me through the ambiguous nature of research and finding all kinds of clever solutions to problems, as well as Mats P. for being a great supervisor.
Note to self: read more papers
A more practical insight I’ve had from this project is that I should spend more time reading the literature. For this project, most of the major errors could have been avoided if we’d had a firmer grasp on the existing papers in the field. In particular, understanding the nitty-gritty details of the most seminal papers is a huge advantage and seems to be very well worth the time invested. I found so many insights from reading appendices. For future work, I strive to spend more time on reading during the initial phases of a research project.
Appendix: details on technical issues encountered
Similarly to the original paper on latent diffusion models2, we made us of a VAE trained with KL-divergence to regularise the latent space. Contrary to their findings, a regularization weighting of $w_{\mathrm{KL}} = 10^{-6}$ was found to push our VAE towards outputting values very near zero. We initially experimented with KL weights two-three orders of magnitude smaller, and according to our histograms this made the latent space look very close to a Gaussian distribution. Unfortunately, the diffusion model was very inconsistent at producing samples when we used such low KL weights, as the VAE optimised to store a few (~10-100 out of 450,000) values very far out in the tails, approx. at $\pm 20$ for a $N(0,1)$ distribution. When these values were clipped, reconstruction quality instantly became terrible. Our current best VAE ended up using KL-scheduling with 80 epochs free from KL-divergence loss, and then used a linear warmup to $w_{\mathrm{KL}} = 10^{-7}$ over 60 epochs. This KL setup was found to regularise the latent space and did not produce extreme values at the tails. We hypothesise that the differences in KL regularisation behaviour are due to vector fields, which we trained the VAE to reconstruct, are much more information-sparse than images.
The other issue is a major reason for my note to self to read more papers. After a long time debugging our diffusion model, we found an issue in our noise scheduling. We used cosine scheduling3, but had made a crucial error. We had read in the DDPM paper4 that they set $\beta_1 = 10^{-4}, \beta_T = 0.02$ and thought we should implement that. We ended up clamping $\beta$-values to $0.02$. This is a huge problem because the DDPM paper used linear noise scheduling. Resolving this issue made a huge difference to output quality.
See appendix for a description of the technical issues we encountered. ↩︎
R. Rombach, A. Blattmann, D. Lorenz, P. Esser, B. Ommer. High-Resolution Image Synthesis with Latent Diffusion Models. CVPR 2022. arXiv:2112.10752 ↩︎
A. Nichol, P. Dhariwal. Improved Denoising Diffusion Probabilistic Models. ICML 2021. arXiv:2102.09672 ↩︎
J. Ho, A. Jain, P. Abbeel. Denoising Diffusion Probabilistic Models. NeurIPS 2020. arXiv:2006.11239 ↩︎