Hello! I’m Thanawat or Sky. I’m a PhD student at The University of Tokyo under the supervision of Prof. Takashi Ishida and Prof. Masashi Sugiyama.
I’m currently interested in reliable evaluation, robust learning, and understanding reward hacking in language models.