About the job Site Reliability Engineer ***W2 only***Position: Site Reliability Engineer (SRE)Work Authorization: All Work AuthorizationsLocation: Richmond, VAContract: 24 monthsAs one of the Site Reliability Engineers, youll be able to work closely with customers, product management, and other subject matter experts in the technology industry to drive forward solutions that have immediate impact on the day-to-day ability for other data scientists and machine learning engineers to productionize their models by iteratively improving how we operate and scale our cloud based containerized service.What You'll Do
- Develop, deploy, and operate our secure infrastructure built on cloud services (AWS, Kubernetes, etc)
- Ensure the high availability, resiliency, performance, business continuity and compliance capabilities of our cloud services.
- Define SLA standards for SAAS solutions that are used by several groups within the company.
- Work with our engineering teams to deploy and operate cloud services, scale our development, QA and production environments.
- Build solutions for developer productivity. Develop and operate our build automation and continuous delivery systems.
- Participate in an on-call rotation, drive incident resolution and improve platform resiliency
Basic Qualifications
- Experience with container management technologies including Docker and Kubernetes.
- Experience with AWS including EKS, ECS, IAM, S3, RDS, Security Groups, Route53, VPC Flow Logs, etc.
- Experience with automation/configuration management using Terraform or similar solutions.
- Experience with CI tools such as Jenkins.
- Experience with operational monitoring tools, such as Datadog, NewRelic and Splunk.
- Proficient in Linux tools and shell scripting or other Linux automation
- An interest in designing, analyzing and troubleshooting large-scale distributed systems.
- Well-versed with the entire software development lifecycle, devops, and SRE practices.
Preferred Qualifications
- Experience with automated unit and integration testing of infrastructure code
- Experience with container security and vulnerability management
- Experience in one or more languages such as Python or GoLang
- Certified Kubernetes Administrator (CKA)