- System Architecture and Design:
- Design, implement, and maintain scalable and reliable infrastructure solutions.
- Work closely with development teams to architect applications for optimal performance and reliability.
- Automation and Continuous Integration/Continuous Deployment (CI/CD):
- Develop and maintain automated deployment pipelines for applications and infrastructure.
- Implement and manage CI/CD processes to enable rapid and reliable software releases.
- Monitoring and Alerting:
- Establish and maintain monitoring and alerting systems to ensure the health and performance of our applications and infrastructure.
- Respond to alerts, troubleshoot issues, and implement proactive solutions to prevent recurrence.
- Incident Response and Disaster Recovery:
- Participate in incident response activities, including root cause analysis, mitigation, and post-mortem reviews.
- Develop and maintain disaster recovery plans and procedures to ensure business continuity.
- Capacity Planning and Performance Optimization:
- Monitor system performance and resource utilization, and plan for capacity upgrades as needed.
- Identify and implement optimizations to improve system performance and efficiency.
- Security and Compliance:
- Implement and maintain security best practices and controls to protect against threats and vulnerabilities.
- Ensure compliance with relevant regulations and standards, such as GDPR and SOC 2.
- Documentation and Knowledge Sharing:
- Create and maintain detailed documentation of systems, processes, and procedures.
- Share knowledge and best practices with team members to foster collaboration and continuous improvement.
- Bachelor's degree in Computer Science, Engineering, or related field.
- 3+ years of experience in a DevOps or SRE role, with a focus on designing, implementing, and maintaining infrastructure and automation solutions.
- Strong proficiency in scripting and programming languages such as Python, Shell, or Go.
- Experience with cloud platforms such as AWS, Azure, or Google Cloud.
- Hands-on experience with containerization and orchestration technologies such as Docker and Kubernetes.
- Proficiency in configuration management tools such as Ansible, Puppet, or Chef.
- Solid understanding of networking concepts, including TCP/IP, DNS, and HTTP(S).
- Excellent problem-solving skills and the ability to troubleshoot complex issues under pressure.
- Strong communication and collaboration skills, with the ability to work effectively in a fast-paced team environment.
- Master's degree in Computer Science, Engineering, or related field.
- Relevant certifications such as AWS Certified DevOps Engineer, Certified Kubernetes Administrator (CKA), or Certified Site Reliability Engineer (SRE).
- Experience with infrastructure as code (IaC) tools such as Terraform or CloudFormation.
- Knowledge of logging and monitoring solutions such as ELK Stack, Prometheus, or Grafana.
- Familiarity with Agile methodologies and DevOps practices.
- Experience with database administration and optimization.
DevOps Site Reliability Engineer - India - Steadfast IT Consulting
1 week ago
Description
Job Title: DevOps Site Reliability Engineer (SRE)
Location: Remote
Overview:
We are seeking a skilled and experienced DevOps Site Reliability Engineer (SRE) to join our dynamic team. The ideal candidate will have a strong background in software engineering, system administration, and automation, with a focus on ensuring the reliability, scalability, and performance of our systems. As a DevOps SRE, you will play a critical role in designing, implementing, and maintaining our infrastructure, as well as optimizing our development and deployment processes.
Responsibilities:
Qualifications:
Preferred Qualifications:
How to Apply:
Please send your resume and cover letter to
In your cover letter, please highlight your relevant experience and why you are interested in joining our team as a DevOps Site Reliability Engineer. We look forward to hearing from you