Site Reliability Engineer - Indore, India - Innogent Technologies

    Innogent Technologies
    Innogent Technologies Indore, India

    2 weeks ago

    Default job background
    Description

    POSITION SUMMARY:

    We are looking for an SRE to join our SaaS Technology and Operations (STO) team. This role is a match for you if you love building highly scalable, resilient, and automated services.

    We are an innovative team that aims to provide exceptional customer experience by leveraging best-in-class automation and orchestration practices for our SaaS platform. As a Site Reliability Developer 1, you will be part of the first level of contact with the STO team, focusing on improving the observability, scalability, stability, and security of our SaaS platform.

    We strive to hire people who are looking to make an impact and thrive in a flexible work environment driven by business objectives.

    RESPONSIBILITIES:

    · Support key ITIL processes, including Incident management, request management, problem management, and change management.

    · Define and document runbooks and standard operating procedures.

    · Field operational requests from our Application Support team and other internal stakeholders

    · Triage and solve issues within defined SLAs to ensure an excellent customer experience and to unblock other development and support teams

    · Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.

    · Identify and troubleshoot problems, investigate root causes, and champion fixes across the organization.

    · Work with infrastructure-as-code (IaC) with a focus on continuous improvement.

    · Collaborate with cross-functional team members on features and implementation within an agile environment.

    · Report on SLAs and performance metrics as part of the Operations function.

    · Participate in on-call rotation.

    WHAT WE USE:

    Please note this reflects only a portion of our current technical stack, and we are constantly evolving and revisiting our stack as we grow:

    · A modern AWS cloud infrastructure managed through infrastructure-as-code (Terraform), configuration-as-code (Ansible), and CI/CD (Jenkins)

    · RDS MySQL, Redshift, Redshift Spectrum, MongoDB, and Elasticsearch

    · Kinesis, SQS, and RabbitMQ

    · DevOps tools written in Python

    · Back-end applications written using Java, Dropwizard, Spring Boot, and Hibernate

    · Front-end applications written using TypeScript, JavaScript, React (Context API and Hooks), and Redux

    · Monitoring with DataDog, and CloudWatch

    SKILLS / KNOWLEDGE / EXPERIENCE / EDUCATION:

    · Bachelor's degree in computer science, Software engineering, or equivalent experience

    · 2+ years of experience in an IT Operational, DevOps, SRE, or Software Engineering role.

    · Experience with cloud computing (AWS and Azure) services and a developing level of knowledge with the management and setup of cloud infrastructure.

    · You can write code - in any language. You have implemented your work in a production environment and can back it up with examples.

    · Experience with tools and platforms such as Ansible, Build/Release Pipelines, Docker, Github, Terraform, etc.

    · Developing-level of knowledge with distributed systems in the cloud using observability and telemetry for oversight of code deployments and service level objectives (SLOs).

    · Developing experience with the operational aspects of software systems using telemetry, centralized logging, and alerting with tools such as CloudWatch, Datadog, Prometheus, etc.

    DECISION MAKING & AUTHORITY:

    · Works on problems of limited scope. Follows standard practices and procedures in analyzing situations

    · Executes specified routine tasks. Erroneous decisions or failure to achieve results will cause delays in schedules. Meets unit targets typically against a weekly plan