OpenAIposted 2 days ago
San Francisco, CA
Publishing Industries

About the position

On the Accelerators team, you will help OpenAI evaluate and bring up new compute platforms that can support large-scale AI training and inference. Your work will range from prototyping system software on new accelerators to enabling performance optimizations across our AI workloads. You'll work across the stack, collaborating with both hardware and software aspects - working on kernels, sharding strategies, scaling across distributed systems, and performance modeling. You'll help adapt OpenAI's software stack to non-traditional hardware and drive efficiency improvements in core AI workloads. This is not a compiler-focused role, rather bridging ML algorithms with system performance - especially at scale.

Responsibilities

  • Prototype and enable OpenAI's AI software stack on new, exploratory accelerator platforms.
  • Optimize large-scale model performance (LLMs, recommender systems, distributed AI workloads) for diverse hardware environments.
  • Develop kernels, sharding mechanisms, and system scaling strategies tailored to emerging accelerators.
  • Collaborate on optimizations at the model code level (e.g. PyTorch) and below to enhance performance on non-traditional hardware.
  • Perform system-level performance modeling, debug bottlenecks, and drive end-to-end optimization.
  • Work with hardware teams and vendors to evaluate alternatives to existing platforms and adapt the software stack to their architectures.
  • Contribute to runtime improvements, compute/communication overlapping, and scaling efforts for frontier AI workloads.

Requirements

  • 3+ years of experience working on AI infrastructure, including kernels, systems, or hardware-software co-design.
  • Hands-on experience with accelerator platforms for AI at data center scale (e.g., TPUs, custom silicon, exploratory architectures).
  • Strong understanding of kernels, sharding, runtime systems, or distributed scaling techniques.
  • Familiarity with optimizing LLMs, CNNs, or recommender models for hardware efficiency.
  • Experience with performance modeling, system debugging, and software stack adaptation for novel architectures.
  • Exposure to mobile accelerators is welcome, but experience enabling data center-scale AI hardware is preferred.
  • Ability to operate across multiple levels of the stack, rapidly prototype solutions, and navigate ambiguity in early hardware bring-up phases.
  • Interest in shaping the future of AI compute through exploration of alternatives to mainstream accelerators.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service