***This is an in-office position (5 days per week) located in NYC's Flatiron neighborhood and offered on a full time permanent basis.***
A Next Ventures client is seeking a Staff Level Data Engineer with infrastructure experience.
As a leader on the Data & Infrastructure team, you will design, build, and maintain the scalable infrastructure that supports AI research and products. In addition to architecting systems to handle large-scale data and computation, you will play a critical role in shaping our engineering culture and growing the early engineering team.
This role offers the opportunity to collaborate closely with co-founders and the head of engineering to define the vision for company infrastructure and ensure systems meet both internal and customer needs.
Key Responsibilities:
- Define and own the vision for data and infrastructure, ensuring alignment with business and product goals.
- Architect and implement scalable server infrastructure to meet customer demands and internal requirements.
- Build and optimize ETL pipelines for efficient data movement, transformation, and storage.
- Design and configure systems for optimal scaling, performance, and fault tolerance.
- Develop alerts and monitoring systems to ensure reliability and proactively identify issues.
- Collaborate with cross-functional teams, including product managers, researchers, and co-founders, to align infrastructure priorities.
- Foster a strong engineering culture and mentor early team members as the team scales.
Requirements:
- 5-10 years of designing and building data and/or core infrastructure; open to exceptional candidates with fewer years and evidence of exceptional ability
- Infrastructure Design and Scaling:
- Proven experience designing scalable, fault-tolerant systems for large-scale data processing and real-time applications.
- Expertise in server architecture and configuration for high availability and scalability.
- Data Engineering:
- Proficiency in building ETL pipelines using modern tools like Apache Airflow, dbt, or similar frameworks.
- Experience with distributed data systems (e.g., Kafka, Spark, Hadoop) for batch and real-time processing.
- Cloud Platforms:
- Deep knowledge of cloud services (e.g., AWS, GCP, or Azure), including compute, storage, and networking components.
- Hands-on experience with containerization and orchestration (e.g., Docker, Kubernetes).
- Monitoring and Reliability:
- Familiarity with observability tools such as Prometheus, Grafana, and the ELK stack for performance monitoring.
- Experience setting up automated alerts and self-healing mechanisms for production systems.
- Programming and Automation:
- Strong programming skills in Python, Go, or a similar language, with a focus on building maintainable, production-grade systems.
- Expertise in Infrastructure-as-Code tools (e.g., Terraform, CloudFormation) for managing cloud infrastructure.
Preferred Qualifications:
- Experience working in fast-paced startup environments, ideally with early-stage teams.
- Familiarity with AI/ML infrastructure requirements, such as distributed training or model serving pipelines.
- A track record of mentoring engineers and helping shape technical culture within a growing team.