Location: Sacramento,CA, USA
Location : Sacramento ,CA Hybrid role
Duration; 1 yr
The Data Engineer for Diagnostics is responsible for development and implementation of end-to-end Ops pipelines to support ML model deployment throughout the entire ML lifecycle. This position is part of the data science located in Sacramento, California and will be a hybrid role. The data engineer will be a part of the development science functional group and report to the data science manager. If you thrive in a cross functional team and want to work to build a world-class biotechnology organization—read on. Responsibilities • Collaborate with stakeholders to understand data requirements for ML, Data Science and Analytics projects. • Assemble large, complex data sets from disparate sources, writing code, scripts, and queries, as appropriate to efficiently extract, QC, clean, harmonize and visualize Big Data sets. • Write pipelines for optimal extraction, transformation, and loading of data from a wide variety of data sources using Python, SQL, Spark, AWS ‘big data' technologies. • Develop and Design data schemas to support Data Science team development needs • Identify, design, and implement continuous process improvements such as automating manual processes and optimizing data delivery. • Design, Develop and maintain a dedicated ML inference pipeline on AWS platform (SageMaker, EC2, etc.) • Deployment of inference on a dedicated EC2 instance or Amazon SageMaker • Establish a data pipeline to store and maintain inference output results to track model performance and KPI benchmarks • Document data processes, write data management recommended procedures, and create training materials relating to data management best practices. Required Qualifications • BS or MS in Computer Science, Computer Engineering, or equivalent experience. • 5-7 years of Data and MLOps experience developing and deploying Data and ML pipelines. • 5 years of experience deploying ML models via AWS SageMaker, AWS Bedrock. • 5 years of programming and scripting experience utilizing Python, SQL, Spark. • Deep knowledge of AWS core services such as RDS, S3, API Gateway, EC2/ECS, Lambda etc • Hands-on experience with model monitoring, drift detection, and automated retraining processes • Hands-on experience with CI/CD pipeline implementation using tools like GitHub (Workflows and Actions), Docker, Kubernetes, Jenkins, Blue Ocean • Experience working in an Agile/Scrum based software development structure • 5-years of experience with data visualization and/or API development for data science users