Datadogposted 17 days ago
Full-time • Senior
New York, NY
Publishing Industries

About the position

We're looking for a Staff Software Engineer with deep expertise in ML infrastructure and evaluation to join Datadog's AI Platforms team. This team is responsible for building the foundation that empowers AI and ML development across the company, enabling teams to prototype, evaluate, deploy, and monitor models at scale. From model experimentation to evaluation and inference, we're designing the core building blocks that make AI reliable, observable, and performant across all Datadog products. In this role, you'll serve as a technical leader within the AI Platforms organization, with a particular focus on model evaluation. You'll collaborate with AI researchers, platform engineers, and product teams to build scalable systems that support evaluation of model and agent at scale. We're looking for a systems-minded engineer who understands ML fundamentals and thrives in fast-paced, high-scale environments. At Datadog, we place value in our office culture - the relationships and collaboration it builds and the creativity it brings to the table. We operate as a hybrid workplace to ensure our Datadogs can create a work-life harmony that best fits them.

Responsibilities

  • Act as a technical leader on the AI Platforms team, focused on building robust ML infrastructure and evaluation systems.
  • Design and implement scalable, reproducible, and easy-to-use evaluation frameworks that will power the AI platform.
  • Build backend systems to power large-scale model experimentation leveraging Ray.io and Datadog in house tooling.
  • Develop tooling and infrastructure to support dataset versioning, model evaluation, and test set management across multiple use cases.
  • Collaborate closely with AI researchers and application teams to enable rapid iteration and rigorous evaluation.
  • Guide technical direction across multiple projects, ensuring scalability, reliability, and long-term maintainability.
  • Mentor other engineers, contribute to design reviews, and help shape the culture of the AI Platforms team.

Requirements

  • You have a BS/MS/PhD in Computer Science or a related field, or equivalent experience.
  • 10+ years of relevant engineering experience, including backend systems and platform-level infrastructure.
  • Deep experience building ML infrastructure or ML platforms that support training, evaluation, and deployment at scale.
  • Strong understanding of machine learning principles and familiarity with model evaluation workflows and challenges.
  • Proven ability to drive cross-functional initiatives and operate in high-ambiguity environments.
  • Experience building and operating production-grade systems using modern cloud infrastructure (e.g., Kubernetes, GCP, AWS, etc.).
  • You're product-minded, collaborative, and thrive in fast-paced environments.

Benefits

  • Get to build tools for software engineers, just like yourself. And use the tools we build to accelerate our development.
  • Have a lot of influence on product direction and impact on the business.
  • Work with skilled, knowledgeable, and kind teammates who are happy to teach and learn.
  • Competitive global benefits.
  • Continuous professional development.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service