Lead Site Reliability Engineer T500-23189 - Hyderabad - Inspire

    Inspire
    Inspire Hyderabad

    1 week ago

    Tourism / Travel / Hospitality
    Description

    About Inspire Brands:

    Inspire Brands is disrupting the restaurant industry through digital transformation and operational efficiencies. The company's technology hub, Inspire Brands Hyderabad Support Center, India, will lead technology innovation and product development for the organization and its portfolio of distinct brands. The Inspire Brands Hyderabad Support Center will focus on developing new capabilities in data science, data analytics, eCommerce, automation, cloud computing, and information security to accelerate the company's business strategy. Inspire Brands Hyderabad Support Center will also host an innovation lab and collaborate with start-ups to develop solutions for productivity optimization, workforce management, loyalty management, payments systems, and more.

    POSITION SUMMARY:

    Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, distributed, fault-tolerant systems enabling online ordering for thousands of restaurants across multiple brands. SRE ensures that Inspire Digital Platform (IDP) services have reliability, uptime appropriate to users' needs and a fast rate of improvement. Additionally, SRE's will keep an ever-watchful eye on our systems capacity and performance. SRE is also responsible to perform regular capacity planning exercise. Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating toil through automation.

    ESSENTIAL JOB RESPONSIBILITIES:

    • Responsibility
    • Technical
    • Mentoring / Technical Escalations
    • Education

    Knowledge and Skills (General and Technical):

    • Review current workload patterns, understand the business case and prioritize areas of weakness within the platform through log and metric investigation as well as application profiling.
    • Work with senior engineering and testing team members to build tools and recommend testing strategies for problem prevention, detection.
    • Employ deep troubleshooting skills to improve the availability, performance, and security to ensure services are designed with 24/7 availability and operational readiness and rigor.
    • Perform in depth postmortem on production incidents, to assess effective business impact and for Engineering to learn from these.
    • Create Dashboards and alerts for Monitoring the IDP platform, define key metrics and service level indicators and ensure relevant metric data is collected to create actionable alerts for SRE and Network Operation Center.
    • Participate in the 24/7 on call rotation.
    • Automate toil, by building software and automation for seamless application deployment and third-party tool integration.
    • Ensure the platform holds a high degree of reliability, at least three 9s.
    • Define non-functional requirements as part of the product lifecycle to influence the new designs, standards, and methods for scalable, highly available distributed systems.
    • own technically intricate issues that cross between DevOps, Databases, Networking, Code, Infrastructure and people; drive them to satisfactory completion.
    • Provide recommendations and feedback in design reviews and review sessions.
    • Mentor and guide junior members of team.
    • Identify gaps and create a curated technology learning path for team members.
    • Troubleshooting and triage of technical roadblocks for scheduled deliverables.

    KNOWLEDGE, SKILLS AND ABILITIES:

    • 4-year degree in computer science, Information Technology, or related field.
    • Minimum 10+ years of experience as a Software Engineer, Platform, SRE or DevOps engineer supporting large scale SAAS Production B2C or B2B Cloud Platforms.
    • Hands-on problem-solving and troubleshooting.
    • Minimum 10 years of experience as a Software Engineer, Platform, SRE or DevOps engineer supporting large scale SAAS Production B2C or B2B Cloud Platforms.
    • Development skills, Java, TypeScript, python, OOP expertise is a must.
    • Hands on Azure Cloud experience particularly with AKS, API management, Azure Cache for Redis, Azure Blob Storage, Cosmo DB, Service Bus, Azure Functions.
    • Proficiency in monitoring, APM and profiling tools, New Relic, Splunk, Prometheus, Grafana.
    • Working experience with containers, Kubernetes and Helm.
    • Functional knowledge of Cloud Network, Firewalls, Ingress and Egress controllers, Service Mesh and experience with Auth0 Secret management and Cloudflare, CDN, Load Balancer, Cache, Firewall, worker features.
    • Experience with Argo CD, GitLab, CICD, Terraform , Infrastructure as Code.
    • Strong communication skills and ability to explain technical concepts clearly
    • A Willingness to dive into understanding, debugging, and improving any layer of the stack

    Technical Skills:

    Level of competency 4 on a scale of 5 for skills mentioned below.

    • Java Application Support / Development.
    • Cloud Provider: Azure.
    • Core Services: Elastic pool, SQL, Application Gateway, API Management (APIM), Key Vaults, AKS (Azure Kubernetes Service), VMSS (Virtual Machine Scale Sets), VM.
    • Networking: NSG (Network Security Groups), Private Endpoints, Private Linked Service, VNet, Subnets, WAF (Web Application Firewall), Geo Replication.
    • Storage: Storage Accounts.
    • Messaging and Events: EventHub, Event Grid, Azure Service Bus (Namespaces, Queues, Topics).
    • Identity and Security: Managed Identities / Workload Identities, Private DNS, Auth0.

    Containerization and Orchestration:

    • Kubernetes (K8s): For container orchestration.

    Monitoring and Observability:

    • New Relic / Splunk.
    • Automation and Scripting
    • Other requirements (licenses, certifications, specialized training)
  • Good to have certifications:

    • PowerShell
    • Python
    • Certified Kubernetes Administrator / Developer
    • AZ-104: (Microsoft Certified: Azure Administrator Associate)
    • AZ-305: Designing Microsoft Azure Infrastructure Solutions

  • Work in company

    Reliability Engineer

    Only for registered members

    +5 Years of experienced Lead Reliability Engineer to join our Transportation Systems team.Leading the development and implementation of reliability strategies ...

    Hyderabad, Telangana

    1 week ago

  • Work in company

    Reliability Engineer

    Only for registered members

    +Job summary · As a Reliability Engineer at Provate you will be part of a Site Reliability & Observability team responsible for ensuring the reliability performance and scalability of a global high traffic digital platform operating at significant scale. · +What We're Looking For ...

    Hyderabad, Telangana

    1 month ago

  • Work in company

    Reliability Engineer

    Only for registered members

    Join our Information Systems and Technology group and play a vital function on one of two Apple teams: Software and Services and Corporate Functions. · Manage diverse workloads across ML/Data/Inference platforms, · Explore and evaluate latest open source technologies, · and innov ...

    Hyderabad, Telangana

    1 month ago

  • Work in company

    Reliability Engineer

    Only for registered members

    We are seeking a Reliability Engineer with deep Containerization expertise to join our Tech for M&S team within the Digital Technology department. · ...

    Hyderabad, Telangana

    2 weeks ago

  • Work in company

    Reliability Engineer

    Only for registered members

    We are seeking a Reliability Engineer with deep Containerization expertise to join our Tech for M&S team within the Digital Technology department. · ...

    Hyderabad, Telangana

    1 month ago

  • Work in company

    Reliability Engineer

    Only for registered members

    Our vision is to transform how the world uses information to enrich life for · all · . · Micron Technology is a world leader in innovating memory and storage solutions that accelerate the transformation of information into intelligence, inspiring the world to learn, communicate a ...

    Hyderabad, Telangana ₹900,000 - ₹2,200,000 (INR) per year

    1 day ago

  • Work in company

    Reliability Engineer

    Only for registered members

    We are seeking 5+ Years of experienced Lead Reliability Engineer to join our Transportation Systems team. · ...

    Hyderabad

    1 month ago

  • Work in company

    Reliability Engineer

    Only for registered members

    +Join the AI and Data Platforms team at Apple where we build and manage cloud-based data platforms handling petabytes of data at scale. We are looking for a passionate Software Engineer specializing in reliability engineering for data platforms with a strong understanding of data ...

    Hyderabad Full time

    1 month ago

  • Work in company

    Reliability Engineer

    Only for registered members

    We help enterprises and high-scale digital platforms build resilient scalable and high-performance systems by combining deep technical expertise with strong operational practices. · ...

    Hyderabad

    1 month ago

  • Work in company

    Reliability Engineer

    Only for registered members

    We are seeking a highly analytical and detail-oriented Reliability Engineer with specialized experience in Weibull analysis and aircraft reliability data.Conduct Weibull and life data analysis to model failure distributions and predict component reliability. · ...

    Hyderabad

    1 month ago

  • Work in company

    Reliability Engineer

    Only for registered members

    We are looking for a passionate and independent Software Engineer specializing in reliability engineering for data platforms, with a strong understanding of data and ML systems. · As part of our team, you will be responsible for developing and operating our big data platform usin ...

    Hyderabad

    2 weeks ago

  • Work in company

    Reliability Engineer

    Only for registered members

    We are seeking a Reliability Engineer with deep Containerization expertise to join our Tech for M&S team within the Digital Technology department. · ...

    Hyderabad Full time

    3 weeks ago

  • Work in company

    Reliability Engineer

    Only for registered members

    +We are looking for a talented engineer to join our team to bring passion for infrastructure and distributed systems, to build world-class platforms/products at a very large scale across cloud environments.+ · +You shall be entrusted with the stewardship of ensuring unparalleled ...

    Hyderabad

    1 month ago

  • Work in company

    Reliability Engineer

    Only for registered members

    We are looking for a passionate and independent Software Engineer specializing in reliability engineering for data platforms, · Responsibilities:Develop and operate large-scale big data platforms using open source and other solutions. · Support critical applications including ana ...

    Hyderabad

    1 month ago

  • Work in company

    Reliability Engineer

    Only for registered members

    We are looking for enthusiastic engineers with interest in one of the following areas:Platform Reliability EngineerBig Data EngineerML EngineerManagement of diverse workloads across ML/Data/Inference platforms. · Exploration and evaluation of latest open source technologies and i ...

    Hyderabad

    1 month ago

  • Work in company

    Reliability Engineer

    Only for registered members

    Join the AI and Data Platforms team at Apple where we build and manage cloud-based data platforms handling petabytes of data at scale We are looking for a passionate and independent Software Engineer specializing in reliability engineering for data platforms with a strong underst ...

    Hyderabad

    1 month ago

  • Work in company

    Reliability Engineer

    Only for registered members

    We are seeking a Reliability Engineer with deep Containerization expertise to join our Tech for M&S team within the Digital Technology department. · This highly specialized technical role focuses on architecting implementing optimizing container-based infrastructures powering ind ...

    Hyderabad

    1 month ago

  • Work in company

    Reliability Engineer

    Only for registered members

    We are looking for a talented engineer to join our team to bring passion for infrastructure and distributed systems, to build world-class platforms/products at a very large scale across cloud environments. · The Applied Machine Learning team has been at the forefront of accelerat ...

    Hyderabad

    2 weeks ago

  • Work in company

    Reliability Engineer

    Only for registered members

    Our vision is to transform how the world uses information to enrich life for all. · Micron Technology is a world leader in innovating memory and storage solutions that accelerate the transformation of information into intelligence, inspiring the world to learn, communicate and a ...

    Hyderabad ₹900,000 - ₹2,200,000 (INR) per year Full time

    2 days ago

  • Work in company

    Reliability Engineer

    Only for registered members

    We are looking for a passionate and independent Software Engineer specializing in reliability engineering for data platforms, with a strong understanding of data and ML systems. · As part of our team, you will be responsible for developing and operating our big data platform usin ...

    Hyderabad, Telangana

    2 weeks ago

  • Work in company

    System Reliability Engineer

    Only for registered members

    This role combines software engineering and operations expertise to build and maintain highly available, scalable systems. As a leader in DevOps and cloud reliability practices the engineer supports continuous improvement of automation deployment pipelines observability and incid ...

    Hyderabad

    1 month ago

Jobs
>
Hyderabad