About the roleWe are looking for Research engineers to help design and build safety and oversight algorithms for our AI models and products. As a Trust and Safety Research Engineer, you will work to design and train ML models based on research progress, which detect harmful user/model behaviors and help ensure society's well-being. You will apply your research skills to uphold our principles of safety, transparency, and oversight while enforcing our terms of service and acceptable use policies.What you will be working on:
- Design, iterate and build ML models to detect unwanted or anomalous behaviors from both users and LLM models
- Work with T&S ML engineers to review and iterate experiment ideations. Co-author the experiment success criteria and production deployment roadmaps
- Partner with T&S Policy and Enforcement cross-functional teams to understand emerging and sustained abuse patterns from user prompts and behaviors. Incorporate the insights into T&S research datasets
- Surface abuse patterns to sibling research teams in the company. Collaborate together to harden Anthropic's LLMs at the pre/post training stages
- Stay current with state-of-the-art research in AI and machine learning, and propose ways to apply these advancements to T&S systemsYou may be a good fit if you:
- Have 4+ years of experience in a research engineering or an applied research scientist position, preferably with a focus on trust and safety
- Have significant Python programming experience and machine learning experience
- Have proficiency in building trustworthy and safe AI technology
- Have strong communication skills and ability to explain complex technical concepts to non-technical stakeholders
- Care about the societal impacts and long-term implications of your work and are results orientedStrong candidates may also:
- Have experience fine-tuning large language models with supervised learning or reinforcement learning
- Have experience with machine learning frameworks like Scikit-Learn, Tensorflow, or Pytorch
- Have experience authoring research papers in machine learning, NLP, or AI alignment or similar industry experience
- Have developed evaluations for language models #J-18808-Ljbffr