Site Reliability Engineer
: Job Details :


Site Reliability Engineer

Grafbase

Location: New York,NY, USA

Date: 2024-10-01T05:47:52Z

Job Description:

We are looking for a Site Reliability Engineer to join our Engineering team. As an SRE, you will play a crucial role in ensuring the reliability, availability, and performance of our systems and services. You will collaborate, design, implement, and maintain infrastructure and automation solutions, supporting the continuous improvement of our platform's reliability and scalability. What you will do:

  • Work across teams to ensure software is developed and deployed for maximum reliability
  • Develop, run and improve processes and tools
  • Build automation to support reliability efforts for all of our production services
  • Join incidents, help solve them, and assist in drafting RCAs and other documentation that are provided directly to customers
About You:
  • You have at least 8+ years of experience working with production systems
  • Experienced in managing large-scale production systems
  • Strong proficiency in the Rust programming language
  • Hands-on experience with containerization technologies like Helm, Docker or Kubernetes
  • Solid experience with cloud platforms such as AWS, Azure, Google Cloud
  • Knowledgeable of network protocols, load balancing, and DNS management
  • Familiar with monitoring and logging tools and best practices
  • Deployed and monitored systems using infrastructure as code
  • Excellent problem-solving and troubleshooting skills
Apply Now!

Similar Jobs (0)