Onsite Pay rate range: $60 - $62 Job Title: Cloud Operations Lead Location: Rockville, MD - Onsite from day1 - Contract To Hire Duration: 6 months - Contract to hire Job Description:
- Oversee the management and maintenance of cloud infrastructure, ensuring high availability and reliability. Act as the primary point of contact for all Cloud infrastructure related issues and escalations.
- Ensure cloud resources are optimally configured and managed to meet performance and cost objectives.
- Implement and maintain monitoring solutions to track the health and performance of cloud infrastructure.
- Drive the major incidents and potential incidents end to end with periodic updates to client stake holders for approvals/recommendations.
- Ensure due diligence and impact analysis for all the changes that get implemented in the cloud platforms.
- Lead and mentor a team of cloud engineers and administrators, fostering a collaborative and high-performing work environment.
- Provide guidance and support to team members, facilitating their professional development and growth.
- Coordinate and manage the team's daily activities, ensuring alignment with organizational goals and priorities.
- Lead the response to cloud-related incidents, ensuring timely resolution and minimal impact on business operations.
- Develop and implement incident management processes and procedures.
- Perform root cause analysis and implement preventive measures to avoid recurrence of issues.
- Identify opportunities to automate repetitive tasks and processes to improve efficiency and reduce operational overhead.
- Develop and implement automation scripts and tools, leveraging Infrastructure as Code (IaC) practices.
- Continuously evaluate and improve cloud operations processes and procedures.
- Ensure cloud infrastructure adheres to security policies, standards, and best practices.
- Implement and maintain security controls to protect cloud resources and data.
- Ensure compliance with regulatory requirements and industry standards (e.g., GDPR, HIPAA).
- Monitor and analyze cloud resource usage, ensuring efficient utilization and avoiding over-provisioning.
- Conduct capacity planning to support future growth and demand.
- Implement cost management strategies to optimize cloud spending.
- Develop and implement disaster recovery and business continuity plans for cloud infrastructure.
- Ensure regular testing and validation of disaster recovery procedures.
- Ensure cloud infrastructure is resilient and can recover quickly from failures or disruptions.
- Work closely with other IT teams, business units, and stakeholders to understand requirements and deliver cloud solutions that meet their needs.
- Collaborate with vendors and service providers to evaluate and integrate new cloud technologies and services.
- Communicate effectively with stakeholders, providing regular updates on cloud operations and performance.
- Maintain comprehensive documentation of cloud infrastructure, configurations, processes, and procedures.
- Generate regular reports on cloud performance, incidents, and operational metrics.
- Ensure documentation is up-to-date and accessible to relevant stakeholders.
- Here ae some of the detailed responsibilities primarily from AWS environment followed by Azure and OCI environments.
IAM and User Management
- IAM administration for new and existing users.
- Managing IAM and cloud SSO/Organization/Permission Sets.
Cloud Services Management
- Managing Cloud Services (EC2, EKS, ELB, etc.).
- Managing Cloud Native Network (VPC, Transit Gateway, Route 53, API Gateways, CDN).
- Managing Cloud Native Storage (FSX, EFS, Lustre, EBS, and other options).
- Managing cloud native autoscaling and load balancers.
- Managing public Cloud WAF/Imperva administration.
- Managing Cloud Trail, Event Hub, Guard Rails.
- Managing Cloud STS Token Services.
- Managing cloud Management Services (ARM, CFT, system Manager functions).
- Managing cloud Config.
- Managing SQS, SNS, Kinesis.
- Managing cloud SIEM integrations.
- Managing Cloud Patching.
Cloud Resource and Cost Management
- Managing Cloud Cost Management.
- Managing Cost Explorers.
- Managing Cloud Management Group, IDCS, Organizations, user Groups.
- Managing Auto Scaling Policies.
- Managing AWS/Azure/Oracle Backups.
- Managing cloud Log insights, Log group, cloud Watch services, other cloud Monitoring Services.
- Managing Step functions.
- Managing Code Pipeline, Code Build.
- Managing application migration services.
- Managing AWS Lambda.
- Managing Dynamo DB and cloud Database Management (infrastructure).
- Managing Cloud Object Storages.
Cloud Automation and DevOps
- Managing cloud Tags.
- Managing Cloud Rekognition services.
- Managing Cloud Transcribe Services.
- Managing Cloud Comprehend Services.
- Managing EKS and other Micro services across the platform.
- Managing cloud Monitoring and Logging across cloud (AWS/AZ/OCI).
- Managing Cloud Elastic Search.
- Managing Cloud API management.
- Managing Serverless Computing.
- Managing cloud CDN.
- Managing Machine learning and AI.
- Managing Cloud Data Management and Analytics.
Project Activities
- Disaster Recovery Test activities.
- dditional DR Test activities due to Customer Application Requirements.
- utomation on cloud infra Build/Image Build Process.
- utomation on Patch/Inventory/Tag Management.
- dditional Application Deployment.
- Implementing, Managing, and Automating all Cloud enabled Services.
- Managing Image Builder - Create Process for Regular Updates.
- Managing Azure Automation for creating and managing automated tasks and runbooks.
- Managing Automation via Azure DevOps and GitHub.
- Managing Azure DevOps Pipelines.
- Managing Azure DevOps, Repos, Projects and Organizations.
- Managing GitHub GitLab.
Cloud Security Services
- WS Guard Duty
- WS WAF / Imperva WAF
- WS Inspector
- Key Vault + KMS + Secrets management
- WS Macie
- Security groups
- Rapid 7
- zure - NSG, Routing tables.
- OCI - VCN, Native Firewalls
OCI Cloud Operations
- OCI Infrastructure Management, Golden gate, Rack Clusters, Guard Duty and Essbase Services support.
zure Cloud Operations
- zure Subscription, Resource groups, Storage Accounts, Networking and License Management, SSO, App proxy, NDES, AAD Sync.
Qualifications we seek in you! Minimum Qualifications / Skills
- Bachelor's degree in computer science, Information Technology, Electrical Engineering, or a related field. Advanced degrees or relevant professional training are a plus.
- Good experience in System administration, and good experience in Cloud operations and leadership/senior technical role.
- Proficiency in AWS cloud platforms. Strong working experience in Azure, OCI clouds platforms.
- Strong understanding of cloud architecture, services, and best practices.
- Experience with cloud management and monitoring tools.
- Proficiency in scripting and automation (e.g., PowerShell, Python, Terraform, Ansible Playbooks/Ansible Tower, Cloud Formation, Puppet, Chef).
- Strong knowledge of cloud security principles and practices.
- Proficiency in Windows/Linux Server administration and management.
- Proficiency and working experience in VMWare/AD and Azure AD SSO platforms.
- Strong networking skills - DNS, DHCP, PKI and LAN/WAN protocol understanding.
- Effective communication and interpersonal skills, with the ability to interact with stakeholders at all levels.
- Experience in vendor management and contract negotiations.
- proactive approach to continuous improvement and innovation in data center operations.
Preferred Certifications and experience
- Cloud certifications such as AWS Certified Solutions Architect - Associate or Professional. Microsoft Certified: Azure Architect Certified,
- Experience with DevOps practices and tools (CI/CD, Jenkins, Git).
- Familiarity with ITIL or other IT service management frameworks.
- Excellent communication and collaboration skills, with the ability to effectively interact with technical and non-technical stakeholders at all levels of the organization.
- Strong analytical and problem-solving skills, with the ability to identify root causes of issues and implement effective solutions in a timely manner.
- Proven ability to work independently as well as part of a team, with a proactive and self-motivated attitude towards achieving project goals.