Site Reliability Engineer
: Job Details :

Site Reliability Engineer

Thought Machine

Location: New York,NY, USA

Date: 2024-07-26T06:56:34Z

Job Description:

Description

Thought Machine s mission is bold to properly and permanently rid the world s banks of legacy technology. To achieve this, we have developed the foundations of modern banking through core and payments technology which run natively in the cloud. What we are attempting is hard and means we need great people working together to build great technology.

We have grown rapidly in the past few years growing our team to more than 550 individuals across offices in London, New York, Singapore and Sydney. We have raised more than $500m in funding and are now valued at $2.7bn. Our investors include Molten Ventures, Eurazeo, Intesa Sanpaolo, Temasek, Nyca Partners, JPMorgan Chase Strategic Investments, Standard Chartered Ventures, and more.

We have created a culture that enables our team to produce the best work in the industry while ensuring we have fun along the way. We're regularly cited as having a fantastic workplace culture and have been recognised by Sifted magazine as having one of the highest Glassdoor ratings for a UK fintech company and the industry's most generous employee share package. Global Finance Magazine named us one of the world s most innovative fintechs, and the Financial Times recognised us as one of Europe s fastest-growing companies in 2023 and 2024.

We are spinning up a new regional SaaS platform team responsible for providing a world-class SaaS offering, by continuously improving and maintaining our SaaS platform. The team will be geographically distributed across our two main hubs: UK, SG. Joining this team is an excellent opportunity to get exposure to how mission-critical systems are run in production. You will be part of a team that owns the system end-to-end and have a deeper understanding of exactly how our clients use the system (for example by extracting usage analytics). The team will own the platform end-to-end, making use of existing infrastructure, improving core Terraform modules, as well as developing operators, tooling and additional infrastructure where appropriate. They will also be responsible for L2 support (for client-initiated support requests) and L1 (for alerting-based incidents). Support will be provided during working hours, with a follow-the-sun model and handovers happening between the 3 regions. Definition and development of the SaaS roadmap is another critical responsibility of this team. Alongside the Product Management function, they will define technical requirements, features and implement them with the goal of offering an excellent SaaS experience to our clients. Duties

Provision SaaS environments as new clients are onboarded.

Be part of the on-call rota (during business hours), responsible for resolving alerts generated by proactive monitoring and working closely with CANs to provide L2 support for client-initiated support requests.

Define and implement the feature roadmap to improve the SaaS platform, for example by implementing self-service functionality, exposing metrics to clients, improving automation and self-healing properties of the system.

Improving the scalability, security and performance of the SaaS platform, by implementing automated compliance and controls, testing different Kafka and DB setups (e.g. Aurora vs RDS) and running load tests at every level of the stack.

Implementing and regularly testing DR strategies to ensure the highest level of resilience and fault tolerance of the platform.

Requirements Essential

Strong background in Linux/Unix administration, e.g. Ubuntu, Debian

A strong background in at least one of Go, Python or Java

A strong background in one of the following: database administration, Kafka, observability tools (such as Prometheus or Zipkin) or infrastructure automation.

Experience with AWS or GCP is essential

Experience or knowledge of container orchestration tools, e.g. Kubernetes

Desirable

Experience in supporting production systems
Experience with automation/configuration management, e.g. Terraform, Puppet, Chef, Ansible

Benefits

Highly competitive salary
Pension plan (match up to 7%)
Life insurance - three times annual salary
Competitive maternity (six months fully paid) and paternity leave (four weeks fully paid)
Shared parental leave (matched to our maternity leave for the same point in time)
25 days holiday and bank holidays
Private health insurance with Bupa for you and your family
Health cash plan (including dental and optical)
Flexible working hours
Cycle-to-work scheme
Electric car scheme
Season ticket loan
Access to outstanding learning materials and courses
Sports and hobby clubs, subsidised by Thought Machine
All the latest tech you need
Start the day properly with fresh fruit and cereals
Huge range of healthy (and not-so-healthy) snacks, smoothies and drinks
A talented and experienced team as your colleagues
An environment where we encourage learning and progress
Two charity days a year
Weekly food pop-up

Thought Machine is committed to making a measurable positive impact on people's everyday lives. We are an equal-opportunity employer and value diversity at our company. We actively hire candidates who demonstrate technical excellence in their field and welcome people of all ages and backgrounds, providing everyone with equal access toprofessional development.You are encouraged to apply even if your experience doesn't accurately match the job description. We also encourage applications from those withdifferent abilities, including candidates with ADHD, autism, dyslexia or dyspraxia.

Apply Now!

Similar Jobs (0)

-- View More Similar Jobs --

Site Reliability Engineer: Job Details :

Site Reliability Engineer

Site Reliability Engineer
: Job Details :