Site Reliability Engineer - Bengaluru, India - Mitr HR Solution

    Mitr HR Solution
    Mitr HR Solution Bengaluru, India

    2 weeks ago

    Default job background
    Full time
    Description

    Job Overview:

    Design and build reliable systems: SRE engineers are responsible for designing and building IT systems and applications that are reliable, scalable, and efficient.

    Monitor system performance: SRE engineers monitor system performance and proactively identify and resolve issues before they impact users.

    Develop and maintain automation tools: SRE engineers develop and maintain automation tools to streamline IT operations and reduce manual intervention.

    Collaborate with development teams: SRE engineers collaborate closely with development teams to ensure that systems are designed and built with reliability and scalability in mind.

    Primary Skills:

    Bachelors degree in computer science, Information Technology, or related field.

    Minimum of [insert number] years of experience as an SRE, with a focus on AWS.

    Strong knowledge of AWS services, such as EC2, RDS, S3, and CloudWatch.

    Experience with infrastructure as code tools such as Terraform, CloudFormation, and AWS CLI.

    Strong experience with automation and scripting tools such as Ansible, Go-lang , Java

    Excellent analytical and problem-solving skills, with the ability to quickly identify and resolve issues.

    Strong communication and collaboration skills, with the ability to work effectively with other IT teams and stakeholders.

    AWS certifications such as AWS Certified Solutions Architect, AWS Certified DevOps Engineer, or AWS Certified SysOps Administrator are a plus.

    Good to have Skills:

    List additional skills which might help

    Responsibilities and Duties:

    Design, build, and maintain highly available and scalable AWS infrastructure, including EC2, RDS, S3, and other services as needed.

    Implement and maintain automation tools for AWS infrastructure, using tools such as Terraform, CloudFormation, and AWS CLI, Go-lang or Java

    Monitor and analyze system performance, identifying and resolving issues proactively.

    Develop and maintain monitoring and alerting systems for AWS infrastructure, using tools such as CloudWatch and Prometheus.

    Collaborate closely with development teams to ensure that applications are designed and built with reliability and scalability in mind.

    Implement effective incident response processes, ensuring that incidents are quickly detected, escalated, and resolved.

    Work closely with other IT teams, such as network and security teams, to ensure that AWS infrastructure is secure, stable, and performant.

    Continuously evaluate and improve AWS infrastructure, identifying areas for optimization and implementing improvements to increase reliability and efficiency.

    Provide support to internal and external customers, responding to inquiries and resolving issues in a timely and effective manner.

    Stay up to date with AWS trends and best practices, continuously learning and incorporating new approaches and technologies to improve IT operations.

    Keywords

    AWS , NoSQL , Cassandra , Aerospike , Kubernetes , Datadog , CI-CD , Jenkin