End-to-end solutions for the entire ML lifecycle

DevOps Engineer - Federal

Location
Washington, DC, USA
Job Type
Full-time
Experience
3+ years
Apply to Scale AI and hundreds of other fast-growing YC startups with a single profile.
Apply to role ›

About the role

About Us

At Scale, our mission is to accelerate the development of Machine Learning and AI applications across multiple markets. Our first product is a suite of APIs that allow AI teams to generate high-quality ground truth data. Our customers include OpenAI, General Motors, NVIDIA, Pinterest, Airbnb, and many more.

Open to candidates located in the DC, St. Louis and San Francisco Bay Area

Scale is building out one of the largest hybrid human-machine systems. Scale’s self-regulating system automatically trains workers and ensures continuous quality and optimal allocation. We have thousands of human labelers that complete millions of tasks a month, and that comes with a host of interesting technical challenges. From product to systems to infrastructure engineering, we’re tackling it all to accelerate the development of AI.

Responsibilites

  • Creating, building, educating, training and designing cloud computing architectures for our Federal customers.
  • Work directly with our federal clients to create backend and infrastructure solutions to meet their challenging data and security needs.
  • Define standard practices and build tooling around incidents, postmortems, changes, and capacity and work with other Engineering teams to help them adopt these practices to improve their services
  • Fix support escalation issues, Conduct post-incident reviews, optimize on-call rotations and processes Propose, design, build, and deploy security improvements across scale’s federal environments.
  • Work with our advisors and third party vendors and auditors on security compliance, pen tests and mitigations.
  • Build systems capable of handling millions of frames of data every day, making it available to both our workforce and our internal teams with high availability.

This role could be a fit if you have

  • 2-7 years of industry experience as a DevOps or software engineer post graduation
  • Systems engineering experience with real-time and distributed system architecture.
  • Experience working with systems that process large volumes of data.
  • Experience or interest in using the following: AWS, Typescript, Node, Mongo, MLflow, Spark, Presto, Python (note that we are mostly language-agnostic and are open to using whatever is the best tech for the problem at hand)
  • At least a Bachelor’s degree (or equivalent) in a relevant field.
  • Live in the DC, St Louis, or SF metro areas and/or be willing to travel.
  • Have a security clearance or the ability to hold a security clearance.

Nice to haves:

  • Prior startup experience to help us grow responsibly
  • Experience with core AWS technologies such as VPC, EC2, ALB, ASG, Spot Instances
  • Experience in operating or managing Infrastructure such as Spark, Presto, Hive
  • Experience working with Docker, Kubernetes, and Infrastructure as code (eg terraform); especially for running GPU/ML workloads
  • Experience with compliance programs such as SOC2, ISO27001, FedRAMP, HIPAA, PCI or operating within a compliance driven environment.
  • Mentored and grown members of your team or been a tech lead on large projects

Scale AI is an equal opportunity employer. We aim for every person at Scale to feel like they matter, belong, and can be their authentic selves so they can do their best work. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

Scale AI is committed to working with and providing reasonable accommodations to applicants with physical and mental disabilities. If you need assistance and/or a reasonable accommodation in the application or recruiting process due to a disability, please contact us at accommodations@scale.com. Please see the United States Department of Labor's EEO poster and EEO poster supplement for additional information.

Why you should join Scale AI

Scale's mission is to accelerate the development of AI applications.

Scale is a vital part of bringing machine learning enabled technologies to the world, from autonomous driving to drones, robots and VR / AR. Our suite of managed labeling services combines manual labeling with best in class tools and machine driven checks to yield stunningly accurate training data. Scale is committed to continual innovation in combining humans with AI to prepare intelligent data, passing on these improvements to our customers (such as Alphabet (Google), Zoox, Lyft, Pinterest, Airbnb, nuTonomy) and powering a growing future of AI applications.

Scale's team consists of 120 people and 35 engineers of industry veterans from Dropbox, Google, Palantir, Quora, OpenAI and others.

Scale AI
Founded:
Team Size:440
Location:San Francisco
Founders
Alexandr Wang
Alexandr Wang
CEO