At Nubank, one of our engineering principles is "Leverage Through Platforms". We believe that platforms are a very efficient way of solving complex concerns that are needed for different products and teams. The AI Infrastructure Squad within the AI Core BU builds and scales the foundational cloud, data, and AI infrastructure that powers machine learning workloads across the organization. We focus on performance, reliability, and scalability in AI systems - working on everything from training infrastructure to low-latency inference. As a Software Engineer in the AI Core BU, we expect you to demonstrate deep experience with GPU programming (CUDA, Triton, or OpenCL), with a focus on performance optimization for deep learning workloads. Strong understanding of large language model architectures (e.g., Transformer variants) and experience profiling and tuning their performance. Familiarity with memory management, kernel fusion, quantization, tensor parallelism, and GPU-accelerated inference. Experience with PyTorch internals or custom kernel development for AI workloads. Hands-on knowledge of low-level optimizations in training and inference pipelines, such as FlashAttention, fused ops, and mixed-precision computation. Proficiency in Python and C++ and familiarity with inference acceleration frameworks like TensorRT, DeepSpeed, vLLM, or ONNX Runtime.