About the job:
We are developing an advanced learning-model based robot designed for real-world deployment in industries such as manufacturing, warehousing, and logistics to address critical labor shortages and help businesses optimize operations.
WHO ARE WE:
We're looking for a highly talented Research Scientist with a deep understanding in developing and fine-tuning foundation modals, particularly in the areas of VLMs, VLAs, Vision Transformers, and multi-modal data. We're looking for someone passionate about tackling difficult AI challenges, such as enabling a learning-model robot to perform dynamic and dexterous tasks in unstructured environments.
Key Responsibilities
- Build foundational model for vision language and action that can exhibit good reasoning and maneuvering capability. Understands transformer based ML architecture really well.
- Design, train and deploy learning-based perception models for on-robot perception systems. Perception models should be able to do multi-modal learning capturing different semantics such as segmentation, object detection, scene understanding and tracking.
- Work with ML infrastructure engineers to assess and monitor model performance, analyze and resolve performance bottlenecks.
- Collaborate with various teams to understand real-world problems and define tasks, incorporating insights into ML products.
- Produce high-quality code for software development, participate in code reviews to ensure the quality of code, and share knowledge with the team.
- Comfortable working with sql queries and ETL logic for data ingress.
Qualifications
- Ms/PhD in Computer Science with minimum 5 years of industry experience with focus in ML/DL, Robotics, similar technical field of study, or equivalent practical experience
- Minimum 2 years of industry experience with training & shipping ML models into production and tracking its lifecycle maintenance process.
- Deep understanding of computer vision, machine learning and deep learning basic concepts.
- Strong C++ and Python programming skills for efficient and robust code.
- Experience with multiple sensors such as Lidar, Mono/Stereo cameras, IMU, etc.
- Strong communication skills.
What makes you stand out
- Publications on top conferences or journals such as CVPR, NeurIPS, ICCV, TPAMI, TRO, etc.
- Demonstrated proficiency in tackling robotics and computer vision challenges within at least two of the following domains: multi-sensor feature extraction and fusion, object detection and tracking, 3D Estimation, and embodied AI with Transformer based models.
- Familiarity with edge-device perception stack deployment, experience with NVIDIA software libraries such as CUDA or TensorRT.
- Open source project contributor.
- Experience with GCP or AWS, Kubernetes and Docker.