Senior Site Reliability Engineer - Hyderabad / Secunderabad, Telangana - confidential

    confidential
    confidential Hyderabad / Secunderabad, Telangana

    2 days ago

    Full time
    Description

    About the Role and the Team

    We are looking for an intelligent, resourceful, and highly skilled Senior Site Reliability Engineer (SRE) to join our Platform Site Reliability Engineering (PSRE) team . This team plays a critical role in ensuring the stability, reliability, and availability of mission-critical production applications on the Arcesium platform.

    The PSRE team is responsible for:

    1. Observability, monitoring, logging, and tracing to proactively detect and prevent issues.
    2. Building tools and infrastructure that enhance system stability and resilience.
    3. Troubleshooting live production issues with a deep focus on rapid incident resolution.
    4. Governing, declaring, managing, and recovering from platform-wide incidents to minimize downtime and business impact .

    As an SRE in this high-impact team , you will work under tight timelines in a high-pressure environment , where every second counts in resolving critical production incidents. This means you must be quick-thinking, highly analytical, and proactive in preventing and resolving disruptions.

    What You Will Do:

    • Incident Management: Serve as a primary contact and leader for incidents and critical issues impacting our platform during NY business hours. Take ownership of incidents, drive effective communication, and facilitate swift resolution by collaborating with relevant engineering teams.
    • Proactive Monitoring Analysis: Continuously monitor the health and performance of our applications and infrastructure. Analyze trends, identify potential risks, and proactively implement measures to prevent incidents and improve overall system reliability.
    • Troubleshooting Problem Solving: Troubleshoot complex technical issues across various layers of the stack (application, infrastructure, network). Utilize your analytical skills and technical expertise to identify root causes and implement effective solutions.
    • Collaboration Communication: Work closely with engineering, development, and operations teams to ensure seamless collaboration during incident response and in proactive reliability initiatives. Communicate effectively with stakeholders at all levels, providing clear and concise updates on incidents and system status.
    • Automation Optimization: Identify opportunities to automate tasks, improve operational efficiency, and enhance the resilience of our systems. Develop tools and scripts as needed to streamline processes and reduce manual intervention.
    • Continuous Improvement: Contribute to the ongoing development and improvement of our SRE practices, tools, and processes. Share your knowledge and expertise with the team to foster a culture of learning and growth

    What we re looking for:

    • Up to 5 years of experience in a Site Reliability Engineering (SRE), DevOps, or Production Engineering role , with a deep understanding of SRE principles and best practices.
    • Incident management expertise , including triaging, escalation, and resolution of high-severity outages .
    • Proficiency in at least one coding language (Python or Java) for automation and debugging.
    • Hands-on experience in Kubernetes (K8s) for managing and orchestrating containerized applications.
    • C loud experience (AWS preferred) with exposure to key services like EC2, S3, Lambda, and Cloud Watch.
    • Excellent communication skills to articulate technical challenges and solutions effectively .
    • Strong troubleshooting and problem-solving skills , with experience diagnosing complex production issues.
    • Ability to stay calm under pressure , multitask, and prioritize effectively in fast-moving environments .
    • Must have availability to work on shifts.
    • Fluency in English (spoken and written) is required.
    • Must have the legal right to work in the country.

    Nice-to-Have Skills:

    • Experience with Terraform or Cloud Formation for infrastructure-as-code.
    • Experience with monitoring tools (e.g., Data dog, Prometheus, Grafana)
    • Familiarity with web application architectures and best practices.
    • Exposure to CI/CD pipelines and DevOps workflows.

  • confidential Hyderabad / Secunderabad, Telangana Full time

    Design scalable infrastructure and develop automation tools to improve operational efficiency as a Site Reliability Engineer in Hyderabad/Secunderabad. · Design build maintain scalable highly available resilient infrastructure. · Develop automation tools scripts to improve opera ...

  • confidential Hyderabad / Secunderabad, Telangana Full time

    The SRE Manager at Tech Blocks India will lead the reliability engineering function ensuring infrastructure resiliency and optimal operational performance. · This hybrid role blends technical leadership with team mentorship and cross-functional coordination.10+ years total experi ...

  • confidential Hyderabad / Secunderabad, Telangana Full time

    The Site Reliability Engineer will be responsible for designing and maintaining distributed tracing, · metrics, and logging using OpenTelemetry,Prometheus,Loki,and Tempo.Ensure complete instrumentation of .NET Core applications · for end-to-end visibility.Develop and manage SLIs, ...

  • Only for registered members Hyderabad/ Secunderabad

    Join us as we pursue our purpose to make the world work better for everyone. Drive immediate relief and provide a sustainable resolution to issues within the ServiceNow platform. · ...

  • Only for registered members Hyderabad/ Secunderabad

    The Site Reliability Engineer supports the reliability, performance, and operability of customer environments by contributing to routine change, incident and problem management, and continuous improvement of observability and automation across non-production and production. · Lea ...

  • confidential Hyderabad / Secunderabad, Telangana Full time

    We are looking for an experienced Site Reliability Engineer to support our multi-tier applications with complex upstream downstream interactions. The ideal candidate will have expertise in understanding application request flow and analyzing application logs for investigating and ...

  • confidential Hyderabad / Secunderabad, Telangana Full time

    This Senior Site Reliability Engineer will lead incident management and monitoring processes to ensure timely detection and resolution of production issues. · ...

  • confidential Hyderabad / Secunderabad, Telangana Full time

    We are seeking a highly experienced Senior Site Reliability Engineer to join our team in Hyderabad / Secunderabad. · ...

  • confidential Hyderabad / Secunderabad, Telangana Full time

    The Lead Site Reliability Engineer will collaborate with development teams to define and implement reliability standards and best practices. · The role involves designing highly available architectures for applications and infrastructure, · developing automated tools to optimize ...

  • confidential Hyderabad / Secunderabad, Telangana Full time

    We are looking for Senior Software Engineers who are eager to build in a fast-paced startup environment inside a stable profitable company. · ...

  • Only for registered members Hyderabad/ Secunderabad

    Own and scale mission-critical ERP/SaaS services while building intelligent, cloud-native capabilities. This role requires a SRE mindset combined with AI/ML expertise and strong application engineering skills across public and private cloud environments. · ...

  • confidential Hyderabad / Secunderabad, Telangana Full time

    Contribute in the of adoption of DevOps as well as DevOps architecture and design for various services in the organization. · Be part of an innovative company that is revolutionizing the wellness and beauty industry. ...

  • Only for registered members Hyderabad/ Secunderabad

    Own and scale mission-critical ERP/SaaS services while building intelligent, cloud-native capabilities. This role requires a SRE mindset combined with AI/ML expertise and strong application engineering skills across public and private cloud environments. · End-to-end service owne ...

  • Only for registered members Hyderabad/ Secunderabad

    Principal Service Reliability Engineer: design for telemetry, security, resiliency, scalability, and performance; lead sizing/architecture; drive service health reviews and process simplification. · End-to-end service ownership: design for telemetry, security, resiliency, scalabi ...

  • confidential Hyderabad / Secunderabad, Telangana Full time

    Contribute to the adoption of DevOps architecture design for various services. · Lead technical discussions with leaders to help enterprises speed their adoption of new technologies. · ...

  • confidential Hyderabad / Secunderabad, Telangana Full time

    We're looking for a Senior Site Reliability Engineer to join our team of Phenom. In this position, you'll work on our core product environment upgradations, · production issues fixing and incident response. · We are expecting below skill expertise to fulfill this role.Expert in C ...

  • confidential Hyderabad / Secunderabad, Telangana Full time

    Design security controls and practices in cloud environments. · ...

  • confidential Hyderabad / Secunderabad, Telangana Full time

    We're looking for a Senior Site Reliability Engineer to join our team of Phenom. · In this position,you'll work on our core product environment upgradations, · production issues fixing and incident response. · Experience in software development, · product development organization ...

  • confidential Hyderabad / Secunderabad, Telangana Full time

    +Zenoti provides an all-in-one cloud-based software solution for the beauty and wellness industry. · +What's the opportunity · The Lead Database Administrator will work in the product Engineering team of Zenoti. · ...

  • confidential Hyderabad / Secunderabad, Telangana Full time

    +Job summary · SRE new headcount to assist with day-to-day activities supporting ST Application services related to deployment and incident management. · +ResponsibilitiesEmploy deep troubleshooting skills to improve the availability, performance, and security of IMT Services. · ...

Jobs
>
Senior site reliability engineer