Jobgetherposted 1 day ago
$146,880 - $220,320/Yr
Full-time • Senior
Washington, DC

About the position

This position is posted by Jobgether on behalf of Allen Institute for AI (AI2). We are currently looking for a Senior Software Engineer, Data in Washington (USA). Join a mission-driven team focused on advancing scientific discovery through large-scale data engineering. As a Senior Software Engineer, Data, you will help design and build sophisticated data pipelines and machine learning–powered services that integrate vast patent and academic datasets. This role is ideal for engineers who thrive in dynamic, collaborative environments and are passionate about data quality, scalability, and long-term maintainability. You’ll work with impactful tools used by millions of researchers worldwide, contributing directly to the future of open science and innovation.

Responsibilities

  • Build and maintain scalable data pipelines using Airflow for integrating complex corpora and resolving citations.
  • Develop and deploy lightweight machine learning models to disambiguate inventors/authors and classify patents.
  • Train or adapt topic models to label patents using various text sources like claims, titles, and abstracts.
  • Extend and maintain REST APIs to provide structured access to linked metadata and classification results.
  • Create dashboards and internal tools to evaluate data quality and model performance.
  • Collaborate closely with other engineers to ensure strong testing practices, documentation, and operational stability.
  • Contribute to architecture and design discussions with a focus on maintainability, performance, and scale.

Requirements

  • Bachelor’s degree and 8+ years of relevant technical experience (or equivalent combination).
  • Expertise in Python for data engineering, including pipeline development and automation.
  • Proficiency in SQL and production-grade schema design (PostgreSQL preferred).
  • Hands-on experience with ML pipelines: training, fine-tuning, and inference for structured data.
  • Strong familiarity with structured data formats (JSON, XML, Parquet) and ETL practices.
  • Experience with Airflow or similar workflow orchestration tools, as well as AWS and container technologies like Docker.
  • Strong ownership mindset and communication skills.

Nice-to-haves

  • Experience with entity resolution, author disambiguation, or record linkage at scale.
  • Familiarity with vector similarity techniques or topic modeling in real-world data scenarios.
  • Experience with scholarly and citation datasets (USPTO, arXiv, OpenAlex).
  • Building internal APIs and dashboards for ML or data QA.

Benefits

  • Base salary range: $146,880 – $220,320, with additional performance-based annual bonuses.
  • Comprehensive medical, dental, and vision insurance for you and your family.
  • Flexible spending accounts (FSA), HSA, and HRA plans available.
  • 401(k) retirement plan with employer contributions.
  • Monthly stipends: $125 for internet/commuting and $200 for fitness/wellbeing.
  • Generous PTO policy: up to 20 vacation days, 7 personal days, 10 sick days, and 12 paid holidays annually.
  • Remote work flexibility (within the U.S.).
  • Supportive work environment that emphasizes work-life balance, inclusion, and personal growth.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service