AWS Neuron is the complete software stack for the AWS Inferentia and Trainium cloud-scale machine learning accelerators and servers that use them. This role is for a software engineer in the Machine Learning Inference Model Enablement and Generality team for AWS Neuron at Annapurna Labs. This role is responsible for development, enablement and performance tuning of a wide variety of LLM model families, including massive scale large language models like the Llama family, DeepSeek and beyond, as well as stable diffusion, vision transformers and many more. The Inference Model Enablement and Generality team works side by side with compiler engineers and runtime engineers to create, build and tune distributed inference solutions with Trainium and Inferentia. Experience optimizing LLM inference performance for both latency and throughput is highly desired. Experience with distributed inference libraries such as vLLM is a bonus.