Site Reliability Engineer
: Job Details :


Site Reliability Engineer

EVONA

Location: Santa Rosa,CA, USA

Date: 2025-01-06T04:10:15Z

Job Description:

Site Reliability Engineer (SRE)

Location: San Francisco Bay Area

Role Overview:

We are seeking a highly skilled Site Reliability Engineer (SRE) to join a dynamic team at a rapidly growing technology company. As an SRE, you will be responsible for ensuring the reliability, scalability, and performance of mission-critical systems, while implementing automation and optimizing cloud infrastructure. This role offers the opportunity to work with cutting-edge AI/ML technologies, leveraging them to solve complex challenges in cloud infrastructure management and performance optimization.

Key Responsibilities:

  • System Reliability & Performance: Design, implement, and maintain scalable systems, ensuring high availability, performance, and disaster recovery across production environments.
  • Automation & Tool Development: Develop automation tools to streamline operations, improve system reliability, and reduce manual interventions.
  • Cloud Infrastructure Management: Create and manage cloud instances (e.g., dev, staging, production) using AWS, GCP, or Azure, optimizing infrastructure performance and cost.
  • Integration of AI/ML Models: Collaborate with engineering teams to integrate machine learning models into production environments, ensuring that these models scale efficiently and perform optimally.
  • Incident Management: Respond to and resolve incidents, minimizing downtime and ensuring quick recovery. Lead post-incident reviews and implement preventive measures.
  • Continuous Improvement: Identify areas of improvement and drive initiatives to enhance system reliability, performance, and security.
  • Security & Compliance: Ensure that infrastructure and applications adhere to security best practices and compliance standards.

Qualifications:

  • Educational Background: Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent experience).
  • Experience: Proven experience as a Site Reliability Engineer or in a similar role within a SaaS environment, managing and optimizing cloud infrastructure (preferably AWS, GCP, or Azure), and familiarity with integrating AI and machine learning technologies.
  • Technical Skills:
  • Proficiency in programming and scripting languages such as Python, Go, or Bash.
  • Experience with containerization and orchestration tools like Docker and Kubernetes.
  • Solid understanding of networking, security, and performance optimization practices.
  • Knowledge of CI/CD pipelines and DevOps practices to ensure smooth development and deployment cycles.
  • Problem-Solving: Strong analytical and problem-solving skills with attention to detail.
  • Collaboration & Communication: Excellent interpersonal skills, with the ability to work collaboratively in cross-functional teams and communicate technical concepts clearly.

Benefits:

  • Competitive Salary: Attractive compensation package, including equity options.
  • Health & Wellness: Comprehensive health, dental, and vision insurance, along with other benefits.
  • Work Environment: A collaborative and innovative work environment within a growing company.
  • Growth Opportunities: Opportunities for career growth, professional development, and a chance to shape the future of the company's technology and infrastructure.

Apply Now!

Similar Jobs (0)