As a Software Engineer within the Autopilot AI Infrastructure team, you will work on reinforcing, optimizing, and scaling our infrastructure components supporting AI research activities for Autopilot and the Tesla Bot. At the core of our autonomy capabilities are neural networks that the research team is designing to train on very large amounts of data, across large-scale GPU clusters and our supercomputer Dojo. Robustly training these models at scale and in the shortest amount of time is critical to our mission. We are optimizing the communication collectives used in AI training and inference workloads to ensure they are robust and performant while improving observability.