Nubankposted 17 days ago
Full-time • Mid Level
Remote • Durham, NC
Credit Intermediation and Related Activities

About the position

At Nubank, one of our engineering principles is "Leverage Through Platforms". We believe that platforms are a very efficient way of solving complex concerns that are needed for different products and teams. The AI Infrastructure Squad within the AI Core BU builds and scales the foundational cloud, data, and AI infrastructure that powers machine learning workloads across the organization. We focus on performance, reliability, and scalability in AI systems - working on everything from training infrastructure to low-latency inference. As a Software Engineer in the AI Core BU, we expect you to demonstrate deep experience with GPU programming (CUDA, Triton, or OpenCL), with a focus on performance optimization for deep learning workloads. Strong understanding of large language model architectures (e.g., Transformer variants) and experience profiling and tuning their performance. Familiarity with memory management, kernel fusion, quantization, tensor parallelism, and GPU-accelerated inference. Experience with PyTorch internals or custom kernel development for AI workloads. Hands-on knowledge of low-level optimizations in training and inference pipelines, such as FlashAttention, fused ops, and mixed-precision computation. Proficiency in Python and C++ and familiarity with inference acceleration frameworks like TensorRT, DeepSpeed, vLLM, or ONNX Runtime.

Responsibilities

  • Build and scale foundational cloud, data, and AI infrastructure for machine learning workloads.
  • Focus on performance, reliability, and scalability in AI systems.
  • Profile and debug GPU performance bottlenecks in LLM training or inference pipelines.
  • Optimize large-scale ML workloads for throughput, latency, or cost.
  • Contribute to or implement custom GPU kernels for high-impact components.
  • Design infrastructure that scales across hundreds or thousands of GPUs.

Requirements

  • Deep experience with GPU programming (CUDA, Triton, or OpenCL).
  • Strong understanding of large language model architectures.
  • Experience profiling and tuning performance of LLMs.
  • Familiarity with memory management, kernel fusion, quantization, tensor parallelism.
  • Experience with PyTorch internals or custom kernel development.
  • Hands-on knowledge of low-level optimizations in training and inference pipelines.
  • Proficiency in Python and C++.
  • Familiarity with inference acceleration frameworks like TensorRT, DeepSpeed, vLLM, or ONNX Runtime.

Nice-to-haves

  • Experience in a fast-paced environment.
  • Ability to work across research and engineering teams.
  • Curiosity and a self-driven mindset.

Benefits

  • Remote work with quarterly trips to Sao Paulo.
  • Top Tier Medical Insurance.
  • Top Tier Dental and Vision Insurance.
  • 20 days time off and 14 company holidays.
  • Life Insurance and AD&D.
  • Extended maternity and paternity leaves.
  • Nucleo - Our learning platform of courses.
  • NuLanguage - Our language learning program.
  • NuCare - Our mental health and wellness assistance program.
  • 401K.
  • Saving Plans - Health Saving Account and Flexible Spending Account.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service