Senior/Staff Software Engineer (CUDA Expert)

Nubankposted 17 days ago

Full-time • Mid Level

Remote • Durham, NC

Credit Intermediation and Related Activities

Upload and Match ResumeTrack Jobs with Teal

About the position

At Nubank, one of our engineering principles is "Leverage Through Platforms". We believe that platforms are a very efficient way of solving complex concerns that are needed for different products and teams. The AI Infrastructure Squad within the AI Core BU builds and scales the foundational cloud, data, and AI infrastructure that powers machine learning workloads across the organization. We focus on performance, reliability, and scalability in AI systems - working on everything from training infrastructure to low-latency inference. As a Software Engineer in the AI Core BU, we expect you to demonstrate deep experience with GPU programming (CUDA, Triton, or OpenCL), with a focus on performance optimization for deep learning workloads. Strong understanding of large language model architectures (e.g., Transformer variants) and experience profiling and tuning their performance. Familiarity with memory management, kernel fusion, quantization, tensor parallelism, and GPU-accelerated inference. Experience with PyTorch internals or custom kernel development for AI workloads. Hands-on knowledge of low-level optimizations in training and inference pipelines, such as FlashAttention, fused ops, and mixed-precision computation. Proficiency in Python and C++ and familiarity with inference acceleration frameworks like TensorRT, DeepSpeed, vLLM, or ONNX Runtime.

Responsibilities

Build and scale foundational cloud, data, and AI infrastructure for machine learning workloads.
Focus on performance, reliability, and scalability in AI systems.
Profile and debug GPU performance bottlenecks in LLM training or inference pipelines.
Optimize large-scale ML workloads for throughput, latency, or cost.
Contribute to or implement custom GPU kernels for high-impact components.
Design infrastructure that scales across hundreds or thousands of GPUs.

Requirements

Deep experience with GPU programming (CUDA, Triton, or OpenCL).
Strong understanding of large language model architectures.
Experience profiling and tuning performance of LLMs.
Familiarity with memory management, kernel fusion, quantization, tensor parallelism.
Experience with PyTorch internals or custom kernel development.
Hands-on knowledge of low-level optimizations in training and inference pipelines.
Proficiency in Python and C++.
Familiarity with inference acceleration frameworks like TensorRT, DeepSpeed, vLLM, or ONNX Runtime.

Nice-to-haves

Experience in a fast-paced environment.
Ability to work across research and engineering teams.
Curiosity and a self-driven mindset.

Benefits

Remote work with quarterly trips to Sao Paulo.
Top Tier Medical Insurance.
Top Tier Dental and Vision Insurance.
20 days time off and 14 company holidays.
Life Insurance and AD&D.
Extended maternity and paternity leaves.
Nucleo - Our learning platform of courses.
NuLanguage - Our language learning program.
NuCare - Our mental health and wellness assistance program.
401K.
Saving Plans - Health Saving Account and Flexible Spending Account.

A Smarter and Faster Way to Build Your Resume

Go to AI Resume Builder

Senior/Staff Software Engineer (CUDA Expert)

About the position

Responsibilities

Requirements

Nice-to-haves

Benefits

Tools

Career Hubs

Guides

Company