Meta Platformsposted 17 days ago
Mid Level
Redmond, WA
Broadcasting and Content Providers

About the position

XR Codec Interactions and Avatars (XRCIA) brings together a highly interdisciplinary team of researchers and engineers to create the future of augmented and virtual reality. On the Research Oriented Cluster Foundations team, you'll work on building and maintaining tools, libraries, and frameworks that will help researchers collaborate with each other and empower their research towards the generation of Codec Interactions and Avatars. Our team cultivates an honest and considerate environment where self-motivated individuals thrive. We encourage ownership and embrace the ambiguity that comes with working on the frontiers of research. In this software engineer role, you will serve as the point of contact for Meta's research GPU super clusters. You are a hybrid software/systems/infrastructure engineer who ensures that Meta's Research Super Clusters run smoothly and have the capacity for future growth.

Responsibilities

  • Serve as the point of contact for Meta's research GPU super clusters.
  • Ensure that Meta's Research Super Clusters run smoothly and have the capacity for future growth.
  • Build and maintain tools, libraries, and frameworks for researchers.
  • Collaborate with team members to improve user experience and develop long-term strategies for compute/storage needs.

Requirements

  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience.
  • 3+ years of experience in UNIX/LINUX and clear understanding of TCP/IP network fundamentals.
  • 5+ years of experience coding in at least one of the following languages: C++, Python, or Rust.
  • Experience with software development practices such as source control, code reviews, unit testing, debugging and profiling.
  • Experience with Internet service architecture capacity planning and/or handling needs for urgent capacity augmentation.
  • Knowledge of common web technologies and/or Internet service architectures (such as LAMP or MEAN stacks, CDN, Load Balancing techniques, etc.).
  • Experience configuring and running infrastructure level applications, such as Kubernetes, Terraform, MySQL, SLURM, etc.
  • Thorough understanding of Linux operating system, including the networking subsystem.
  • Experience in distributed system performance measurement, logging, and optimization.

Nice-to-haves

  • Prior experience in cluster oncall operations, including troubleshooting server/scheduler/storage errors, maintaining compute/storage environments/libraries/tools, helping onboard users to the cluster, and answering general questions from users.
  • Prior experience in cluster coordination and strategy planning, including collecting/understanding needs of users, developing tools to improve user experience, providing guidance on best practices, forecasting compute/storage needs, and developing long-term user experience/compute/storage strategies.
  • Prior experience building tooling for monitoring and telemetry.
  • Prior experience in developing/managing distributed network file systems.
  • Prior experience in network security.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service