Site Reliability Engineer - Mumbai, India - RXO, Inc.

    RXO, Inc.
    RXO, Inc. Mumbai, India

    1 week ago

    Default job background
    Transportation / Logistics
    Description

    We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our team. As an SRE, you will be responsible for ensuring the reliability, scalability, and performance of our systems and applications. You will work closely with cross-functional teams to design, implement, and maintain robust infrastructure and automation solutions.

    Responsibilities:

    Design, build, and maintain scalable and reliable infrastructure solutions.

    Implement automation tools and processes to streamline operations and improve efficiency.

    Monitor system performance and troubleshoot issues to ensure high availability and reliability.

    Collaborate with development teams to design and deploy applications in production environments.

    Conduct root cause analysis (RCA) and implement preventive measures to minimize downtime and outages.

    Develop and maintain documentation, runbooks, and playbooks for operational processes.

    Participate in on-call rotations and provide timely response to incidents and emergencies.

    Implement best practices for security, compliance, and disaster recovery.

    Continuously evaluate and improve system performance, reliability, and scalability.

    Requirements:

    Bachelor's degree in Computer Science, Engineering, or related field.

    Proven experience as a Site Reliability Engineer or similar role.

    Strong knowledge of cloud platforms, such as Google Cloud Platform (GCP) or Amazon Web Services (AWS).

    Experience with infrastructure as code (IaC) tools, such as Terraform or Puppet.

    Proficiency in scripting and programming languages, such as Python, Go, or Bash.

    Hands-on experience with monitoring and observability tools, such as NewRelic, Grafana, or Kibana.

    Solid understanding of containerization technologies, such as Docker and Kubernetes.

    Excellent troubleshooting and problem-solving skills.

    Strong communication and collaboration skills.

    Ability to work effectively in a fast-paced and dynamic environment.

    Preferred Qualifications:

    Experience with chaos engineering tools, such as Gremlin or Chaos Mesh.

    Knowledge of machine learning and artificial intelligence (AI) technologies.

    Certification in cloud platforms, such as Google Cloud Certified Professional Cloud Architect or AWS Certified Solutions Architect.