Zilliz
Location: California,MO, USA
Date: 2024-12-20T01:22:05Z
Job Description:
What you will do:
- Work at the intersection of development and site reliability. Creating SRE tools and systems, as well as supporting existing infrastructure and platforms.
- Ensure the reliability, availability, and performance of Zilliz's distributed database systems.
- Develop and implement strategies for monitoring, incident management, and disaster recovery.
- Automate system operations and maintenance tasks to improve efficiency and reduce manual intervention.
- Design and build tools to manage and monitor infrastructure, ensuring scalability and robustness.
- Collaborate with software engineers to enhance system reliability, scalability, and performance.
- Maintain and improve the CI/CD pipeline to ensure smooth and rapid deployment of changes.
- Actively contribute to the Milvus open-source community, focusing on improving reliability and operational efficiency.
What we are looking for:
- 4+ years of experience in site reliability engineering or similar roles with a focus on cloud-native systems.
- Proficiency in scripting languages such as Python, Go, or Java.
- Strong knowledge of container orchestration technologies like Kubernetes and Docker.
- Expertise with cloud platforms such as AWS, GCP, or Azure, and their respective monitoring and management tools.
- Experience with infrastructure as code tools such as Terraform or Ansible.
- Familiarity with CI/CD tools such as Jenkins, GitLab CI, or Argo.
- Proven ability to troubleshoot complex distributed systems and resolve issues promptly.
- Bachelor's degree or above in computer science, software engineering, or other relevant disciplines.
- Ability to thrive in a fast-paced, startup environment and handle multiple projects simultaneously.
#J-18808-Ljbffr
Apply Now!