Rivianposted 17 days ago
$129,300 - $161,600/Yr
Full-time • Mid Level
Palo Alto, CA
Transportation Equipment Manufacturing

About the position

Rivian is on a mission to keep the world adventurous forever. This goes for the emissions-free Electric Adventure Vehicles we build, and the curious, courageous souls we seek to attract. As a company, we constantly challenge what's possible, never simply accepting what has always been done. We reframe old problems, seek new solutions and operate comfortably in areas that are unknown. Our backgrounds are diverse, but our team shares a love of the outdoors and a desire to protect it for future generations. The Autonomy org at Rivian is seeking a Software Engineer, MLOps to join the Data team who can provide expertise in cloud and data engineering and collaborate with technical and business users. This candidate needs to have a very good understanding of the AWS Cloud Platform and ML Ops processes that helps to build, test, and release complex mission critical infrastructure services for Rivian's ADAS team on AWS cloud. In this role you will work with the Autonomy Cloud, Data, Perception, Simulation, Vehicle integrations & Vehicle Cloud teams, Product Management, and other Technology Partners to leverage best practices and reference architectures highlighting AWS Cloud Platform and Data/Dev/ML Ops practices.

Responsibilities

  • Build, test and release complex mission-critical ML infrastructure services for Rivian's Autonomy team on cloud and/or on-prem.
  • Setup fault tolerant multi-region environments for ML/data operations.
  • Own CI/CD pipeline for apps, data and Machine learning projects.
  • Define on-call strategy and participate in on-call rotations.
  • Make developers' lives smooth via automated workflows.
  • Build and optimize highly reliable, scalable, and distributed infra using microservice architecture.
  • Collaborate with the security & privacy team to perform audits and mitigate any findings.
  • Collaborate with cross-functional ADAS teams for development and integrations.
  • Cost optimization in AWS across multiple accounts and services.

Requirements

  • 2+ Yrs. of software engineering or in ML/Dev/Data Ops role.
  • 2+ Yrs. Experience authoring, scaling, and managing production infrastructure especially related to ML/Training.
  • 2+ Yrs. Experience Kubernetes, AWS CI/CD tools, AWS networking stack, S3, Lambda, EKS, ECS, RDS, System Manager, Secrets Manager, CloudTrail, etc.
  • 1+ Yrs of experience managing distributed training infrastructure in pytorch or tensorflow.
  • 2+ Yrs. Infra as Code and configuration management (Terraform, AWS cloud formation, AWS CDK).
  • 2+ Yrs. Experience with monitoring applications in the cloud using Grafana, AWS CloudWatch or Prometheus.
  • 2+ Yrs. Experience debugging production systems and performing RCA on incidents.
  • 1+ Yrs. Hands-on with Python, Go or Java and Gitlab for automation.
  • 1+ Yrs. Of CI/CD and/or GitOps patterns (using Gitlab, Jenkins, Allure etc.).
  • 2+ Yrs. Of experience in data processing & ML technologies like AWS Batch/Ray.
  • 2+ Yrs. Microservice-oriented architectures (using Kubernetes (EKS), AWS ECS, or Docker swarm).
  • 2+ Yrs. Knowledge of Agile Development of Accessible Software Tools.
  • Linux internals, networking, and distributed computing are a plus.
  • AWS or Cloud Native certification is a plus.

Benefits

  • Robust medical/Rx, dental and vision insurance packages for full-time employees, their spouse or domestic partner, and children up to age 26. Coverage is effective on the first day of employment.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service