Datadogposted 18 days ago
Mid Level
New York, NY
Publishing Industries

About the position

The Observability Infrastructure SRE Team is part of the Core Observability SRE group and the SRE/Security organization overall. Core Observability is a group focused on managing Datadog's internal observability tooling and practices, which includes using a variety of Datadog products. The group is focused on observability's adoption and efficiency throughout Datadog's organization, as it is a key aspect of an effective SRE approach for engineers to build and operate systems more effectively. The Observability Infrastructure team owns the Datadog telemetry data plane, collecting large volumes of observability data (such as metrics, logs, traces but also various security-related data types) across all Datadog environments. This newly formed team focuses on the performance and scalability of that pipeline in a multi-region, multi-cloud provider ecosystem. In this pipeline, they work on libraries, fleet of agents, data processors and endpoints, partnering closely with the engineering teams who are building these components. We routinely work together on testing or building new functionality, improving the product for both Datadog internally and our customers. As a Software Engineer working on this team, you'll help own critical observability systems to ensure scalability, reliability, and efficiency across Datadog. We aim to integrate the latest Datadog features, drive product innovation, and maintain operational excellence. At Datadog, we place value in our office culture - the relationships and collaboration it builds and the creativity it brings to the table. We operate as a hybrid workplace to ensure our Datadogs can create a work-life harmony that best fits them.

Responsibilities

  • Lead projects impacting Datadog specific observability, instrumentation or telemetry collection problems, by designing and implementing internal solutions or contributing to our products.
  • Build, scale and operate a robust telemetry data plane, processing a large amount of data with strict performance and reliability objectives.
  • Work hand in hand with our infrastructure team to design an architecture for all of Datadog environments, across many different cloud providers, regions of the world, or technology stacks.
  • Support the scale and growth of Datadog in all of these dimensions.
  • Gather requirements for operational usecases with the relevant stakeholders (developers, infrastructure teams, on-call/incident response, security investigations and incidents, compliance, etc.) and work on implementing relevant supporting telemetry collection, processing and configuration.

Requirements

  • 5+ years experience in software engineering, running production systems at scale.
  • Expertise in building telemetry pipelines or observability platforms in a cloud-native environment.
  • Comfortable navigating complex technical challenges to propose efficient and easy to adopt solutions.
  • Strong programming skills with a structured programming language (Go, Python) capable of writing robust, maintainable code.
  • Strong communication skills and experience working in cross-team program/projects.
  • Experience leading the adoption of programs/projects with wide impact across Engineering.

Benefits

  • New hire stock equity (RSUs) and employee stock purchase plan (ESPP)
  • Continuous professional development, product training, and career pathing
  • Intradepartmental mentor and buddy program for in-house networking
  • An inclusive company culture, ability to join our Community Guilds (Datadog employee resource groups)
  • Access to Inclusion Talks, our Internal panel discussions
  • Free, global mental health benefits for employees and dependents age 6+
  • Competitive global benefits
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service