Production Engineer
: Job Details :


Production Engineer

Covariant

Location: Emeryville,CA, USA

Date: 2024-09-26T05:40:46Z

Job Description:
Emeryville, CAEngineering – Software /Full-time /HybridTHE COMPANYOur mission is to build the Covariant Brain, a universal AI to give robots the ability to see, reason and act on the world around them. Bringing AI from research in the lab to the infinite variability and constant change of our customer's real-world operations requires new ideas, approaches and techniques.Success in the real world requires a team that represents that world: diversity of backgrounds, points of view, and experiences. Our common denominator: ambitious expectations, love of learning, empathy for those around us, and a team-first mindset.THE ROLEProduction Engineers at Covariant play a mission-critical role in ensuring our services' seamless operation and future scalability. In this role, you'll be at the forefront of every significant engineering endeavor embedded within our production and research teams. As a production engineer, you will drive innovation and efficiency in our projects by applying your expertise in AWS, Docker, Kubernetes, Puppet, and Terraform to architect scalable and resilient infrastructure for our innovative AI robotics systems.AREAS OF FOCUSOwn and orchestrate large GPU clusters across different cloud providers using IaaC and scripts to provide researchers with a single cohesive interfaceHelp other teammates architect and build scalable tooling for our edge robot fleetCollaborate with brilliant researchers to evolve our training and inference tooling to be state-of-the-artYOU WILLDesign, build, manage and monitor the infrastructure we use to deploy our AI software and robotics solutionsDevelop and evolve software engineering and operational practices for the unique needs of distributed AI-powered cyber-physical systemsIdentify and establish healthy engineering and operational culture and processesDeliver previously impossible robotics capabilities that solve real needs for our partners and customersCollaborate with, learn from, and support a diverse and cross-functional team, including mechanical, electrical, and robotics engineers, AI/ML researchers, and business developmentYOU HAVE Substantial previous experience in operating and automating production systems in both cloud and bare metal, deploying and administering Linux systems and/or wide-area networks, and building new tools and/or extending existing tools to add new capabilitiesA track record of accelerating developer productivity through improved tooling, automation, and educationA track record of partnering with stakeholders to deliver solutions throughout the development processA solid foundation in Python, Linux, and networkingCommitment to continuous learning and willingness to pick up new languages or technologies as needed, to solve real problems and deliver business impactNICE TO HAVESDesire to work with a small collaborative team, with a high degree of autonomy and responsibilityAre motivated to work on challenging real-world engineering problems without prior solutionsAre excited to join coworkers who strive to be inclusive, thoughtful, and down-to-earthAre self-directed and enjoy figuring out what is the most important problem to work onHave previously done one or more of the following: deployed client-side software, including protecting source code, establishing secure licensing, and performing release engineering; or, set up and scaled developer tooling and CI/CD systems; or built ML or IoT data pipelines processing images and metadata from live deployments; or managed high-bandwidth deep learning or super-computing hardwareSAMPLE WEEK IN THE LIFE Monday: Start the week with a team meeting to discuss ongoing projects and explore potential collaborations. Resume work on the rollout of BigProxy v2 in the development environment, refining probing tests to enhance its reliability. Also, schedule a discussion with our Tailscale account representative to renew our contract.Tuesday: Address an urgent issue with the networking backplane of one of our GPU clusters not performing optimally. Conduct a troubleshooting session with the cluster provider to adjust the NCCL topology file, following unexpected changes on their end.Wednesday: Develop a new alert in Datadog to monitor the performance of the GPU cluster backplane, ensuring it is adaptable for use with various providers.Thursday: Collaborate with a colleague on deploying a PyPi server in our cloud infrastructure. Continue the implementation and testing of BigProxy v2 which was paused on Tuesday.Friday: Lead a presentation at the weekly engineering deep dive to discuss the features and potential rollout of BigProxy v2, which consolidates all connections from remote deployments to the cloud through a single channel and simplifies SSH access to GPU clusters outside AWS/GCP. Gather and incorporate feedback from the team to finalize the deployment strategy.$165,000 - $210,000 a yearSALARY RANGEBase pay is one element of our total rewards package which may also include comprehensive benefits and equity etc., depending on eligibility. The annual base salary range for this position is from $165,000 to $210,000. The actual base pay offered will be determined on factors such as years of relevant experience, skills, education etc. Decisions will be determined on a case-by-case basis.COMPANY CORE VALUESLEARNING CONSTANTLY STRIVING FOR EMPATHY TAKING ON THE IMPOSSIBLE, TOGETHER BENEFITS (US)Health, dental, and vision insurance for you and your familyUnlimited PTO and Flexible work hours401(k) plan and company matchLunch and dinner each day (for on-site employees)Monthly Health & Wellness budgetQuarterly Learning budgetAt covariant.ai we don't just accept difference—we celebrate it, we support it, and we thrive on it for the benefit of our employees, our products, and our community. Covariant.ai is proud to be an equal opportunity workplace. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, or Veteran status.
Apply Now!

Similar Jobs (0)