Senior Site Reliability Engineer
: Job Details :


Senior Site Reliability Engineer

BlankFactor

Location: Charlotte,NC, USA

Date: 2025-01-07T03:26:54Z

Job Description:

Please apply only if you are local to the Charlotte, NC area for a hybrid role - 3 days in office, 2 days remote per week. This is a W2 full time position, we are unable to work via C2C, C2H, or other contracting options, please apply if you have the valid authorization.

What to expect in this role

We are seeking a highly skilled Senior Site Reliability Engineer (SRE) to join our team and ensure the reliability, performance, and scalability of our cloud-based infrastructure. The ideal candidate will possess strong expertise in AWS architecture, observability tools, event-driven systems, and infrastructure as code, with a proven ability to automate processes and enhance system resilience.

  • Monitoring & Observability: Design, implement, and manage robust monitoring systems using CloudWatch, CloudTrail, Splunk, and other observability tools.
  • Incident Management: Lead incident response efforts, diagnose root causes, and implement long-term fixes to prevent future occurrences.
  • System Performance: Optimize system reliability and performance by identifying bottlenecks and proactively implementing solutions.
  • Alerting & Logging: Develop and manage alerting mechanisms to ensure early detection and resolution of system anomalies.
  • Disaster Recovery: Create and maintain disaster recovery plans and conduct regular testing to ensure system availability in case of failures.
  • Event-Driven Architectures: Design and manage event-driven architectures using AWS services such as AWS Events, Custom Events, and other cloud-native tools.
  • Infrastructure as Code (IaC): Utilize Terraform (HCL) and Ansible for provisioning and maintaining cloud infrastructure.
  • Security & Compliance: Implement security best practices using tools like KMS, Access Analyzer, and ensure systems adhere to compliance standards.
  • Automation: Build and maintain automation scripts using Python, Go, and Rego to streamline workflows and reduce manual intervention.

Qualifications And Tech Proficiency

  • Cloud Expertise: Deep knowledge of AWS architecture, including observability tools like CloudWatch, CloudTrail, and security services such as KMS and Access Analyzer.
  • Event-Driven Systems: Proficient in managing event-driven architectures (EDS) and tools like AWS Events and Custom Events.
  • Observability & Analysis: Hands-on experience with monitoring and log analysis tools such as Splunk and designing Custom Metrics/Events.
  • IaC & Automation: Proficiency in writing and managing infrastructure using Terraform (HCL) and Ansible.
  • Programming Languages: Advanced skills in Python, Go, and Rego for automation and tool development.
  • Problem-Solving: Strong troubleshooting skills with a focus on system performance optimization and root cause analysis.

Preferred Qualifications

  • Experience with MRF and DAP systems in AWS.
  • Knowledge of compliance and audit processes in cloud environments.
  • Strong interpersonal skills for collaboration with cross-functional teams.

What We Offer

  • Competitive salary with an attractive benefits package
  • Working on wide-ranging and interesting areas with highly skilled professionals, international clients, and the latest technologies.
  • A culture of collaboration and continuous learning in a work environment invested both in nurturing your skills and shaping our common technological future.

We believe that diversity of experience and background contributes to more robust ideas and a stronger team. All qualified applicants will receive consideration for employment without regard to religion, race, sex, sexual orientation, gender identity, national origin, or disability.

Apply Now!

Similar Jobs (0)