KLAposted 11 days ago
$114,100 - $194,000/Yr
Full-time • Entry Level
Hybrid • Milpitas, CA
Computer and Electronic Product Manufacturing

About the position

Are you inquisitive, flourishes with challenge, resourceful, resilient, persistent, and enjoy problem solving? If so, HPC Team in Broadband Plasma Division at KLA (BBP) is a perfect place to make your contributions and amplify them by collaborating with a diverse, dynamic and a high performing team. We are seeking passionate individuals to join our High-Performance Computing (HPC) team in the Broadband Plasma Division (BBP). Our team provides a cutting-edge HPC Computational Platform for executing image processing algorithms and enabling real-time wafer inspections. As a Product Development Engineer for an Embedded Linux HPC Cluster, which is part of the KLA Wafer inspection tool, you will: Review requirements and translate them to design an optimized HPC cluster. Create Operating System Golden Images to enable HPC Application workloads. Work with multiple stakeholders to drive Hardware/OS stack qualification. Work towards performance tuning, compute optimization and diagnostics development. Manage design development efforts, assist with documentation for Mfg/Service teams and support L4 escalations. International traveling as needed, approximately 2-3 times per year.

Responsibilities

  • Review requirements and translate them to design an optimized HPC cluster.
  • Create Operating System Golden Images to enable HPC Application workloads.
  • Work with multiple stakeholders to drive Hardware/OS stack qualification.
  • Work towards performance tuning, compute optimization and diagnostics development.
  • Manage design development efforts, assist with documentation for Mfg/Service teams and support L4 escalations.
  • International traveling as needed, approximately 2-3 times per year.

Requirements

  • In-depth knowledge of one or more Linux distributions: SuSE, RedHat, CentOS, Ubuntu, including experience with System-D, Net boot/PXE, and Linux HA.
  • Experience with one or more configuration management utilities (Salt, Chef, Puppet, etc.).
  • Proficiency in shell scripting (Bash) and Python, with a strong understanding of object-oriented concepts.
  • Strong understanding of TCP/IP fundamentals and knowledge of DNS, DHCP, and InfiniBand fabric troubleshooting.
  • Good working knowledge of x86 hardware platforms and proven ability to benchmark performance across various hardware platforms with GPUs.
  • Familiarity with observability tools and proven ability to collect metrics, create visuals, and analyze them for data-based decision-making.
  • Possess excellent written and verbal communication skills.

Nice-to-haves

  • Degree in Computer Science, Data Science, Computer Engineering, Electrical Engineering, or related fields.
  • DevOps focus: Knowledge of setting up a continuous development pipeline (Jenkins), repository software (Git-based), and Docker containers.
  • Knowledge of Apache/Nginx, setting up proxy/reverse proxy, application server routing, and load balancing (HA Proxy).
  • Working knowledge of Prometheus/Grafana.
  • Knowledge of PKI & SSL/TLS certificate management.

Benefits

  • Medical, dental, vision, life, and other voluntary benefits.
  • 401(K) including company matching.
  • Employee stock purchase program (ESPP).
  • Student debt assistance.
  • Tuition reimbursement program.
  • Development and career growth opportunities and programs.
  • Financial planning benefits.
  • Wellness benefits including an employee assistance program (EAP).
  • Paid time off and paid company holidays.
  • Family care and bonding leave.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service