Site Reliability Engineer - Gurugram - ValueFirst

    ValueFirst
    ValueFirst Gurugram

    9 hours ago

    Technology / Internet
    Description

    About the Job

    The Site Reliability Engineering (SRE) team is responsible for ensuring the reliability, scalability, and performance of large-scale telecom and CPaaS platforms. This role combines software engineering and systems operations to build resilient, observable, and automated infrastructure that supports high-throughput messaging services. The team operates in a 24/7 environment and works closely with Engineering, CX and Products to maintain carrier-grade service reliability.

    What you'll be responsible for

    • Ensure high availability, performance, and reliability of CPaaS production systems speread across mutiple locations hosted over cloud and data centers
    • Own and improve SLIs, SLOs, and SLAs for messaging platforms and supporting services.
    • Monitor system health, latency, TPS, error rates, and delivery metrics using observability tools.
    • Participate in on-call rotations and handle production incidents with a focus on fast recovery and root cause analysis.
    • Deploy, configure, and optimize for high-throughput messaging (multiple channels)
    • Troubleshoot telecom-specific issues including DLR failures, encoding problems, TPS drops and routing issues.
    • Work directly with multiple teams for integrations, testing, and incident resolution.
    • Perform packet-level analysis using tcpdump and Wireshark to diagnose network and protocol-level issues.
    • Write and maintain shell scripts and automation to eliminate repetitive operational tasks and reduce human intervention.
    • Contribute to infrastructure automation using tools like Ansible and CI/CD pipelines where applicable.
    • Improve deployment, configuration, and rollback processes for messaging services.
    • Design and enhance monitoring, alerting, and dashboards using tools such as Datadog, Site24x7, ELK and Grafana.
    • Administer and troubleshoot Linux based servers in production environments.
    • Manage and optimize MySQL and MongoDB databases including performance tuning, backups, and recovery.
    • Works on API's and webhooks across the product & services. Its enhancements and troubleshooting.
    • Maintain web and application servers such as Apache, Nginx, and jboss (WildFly)
    • Support cloud-based and virtualized environments with exposure to auto-scaling and containerization concepts.
    • Collaborate with engineering teams on release planning, production deployments, and post-release validation.
    • Lead or contribute to incident response & RCA focusing on long-term reliability improvements.
    • Track issues, changes, and reliability work using Jira and related tools.

    What you'd have

    • B.Tech / B.E in Computer Science or related field with 2–3 years of experience in SRE, DevOps, telecom, or CPaaS operations.
    • Hands-on experience with SMS gateways and messaging workflows.
    • Solid understanding of Linux systems, networking fundamentals, and production troubleshooting.
    • Strong experience with MySQL & MongoDB administration, queries, and performance optimization.
    • Proficiency in shell scripting and a mindset toward automation and reliability engineering.
    • Hands-on experience with tcpdump, Wireshark, and protocol-level troubleshooting.
    • Experience with monitoring, logging, and alerting systems (Datadog, ELK, Grafana, Site24x7, etc.).
    • Familiarity with configuration management tools like Ansible and version control systems (Git).
    • Working knowledge of cloud platforms, virtualization, auto-scaling, and containerization.
    • Strong incident management, analytical thinking, and communication skills.
    • Certifications such as RHCE, AWS, or SRE-related credentials are a plus

  • Work in company

    Reliability Engineer

    Only for registered members

    We are looking for experienced SREs who can deliver insights into system bottlenecks and ensure system reliability and scalability. · ...

    Gurugram.India

    1 week ago

  • Work in company

    Reliability Engineer

    Only for registered members

    Tower Research Capital is a leading quantitative trading firm founded in 1998. Tower has built its business on a high-performance platform and independent trading teams. We have a 25+ year track record of innovation and a reputation for discovering unique market opportunities. · ...

    Gurgaon, Haryana

    1 month ago

  • Work in company

    Reliability and Maintainability Engineer

    Only for registered members

    We are looking for an experienced Reliability and Maintainability (RAM) engineer proficient in doing reliability calculations. · Graduate in reliability engineering or Mechanical/Electrical graduate with a certification in Reliability engineering. · 5 years of experience required ...

    Gurugram, Haryana, India

    1 week ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    We are looking for a seasoned DevOps Engineer with a strong background in Kubernetes (k8s), AWS Cloud, and infrastructure automation to join our growing engineering team.Manage and optimize multi-region Kubernetes clusters across multiple cloud platforms e.g. AWS, Azure to ensure ...

    Gurgaon, Haryana, India

    3 days ago

  • Work in company

    Application Reliability Engineer

    Only for registered members

    Graviton is a privately funded quantitative trading firm striving for excellence in financial markets research. We are seeking a skilled Application Reliability Engineer to be the first line of defense for ensuring the reliability, · availability,and performance of our databases, ...

    Gurgaon, Haryana

    2 weeks ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    We are seeking an experienced Site Reliability Engineer to ensure the stability, scalability, and performance of our Enterprise Agentic AI platform. · ...

    Gurgaon, Haryana, India

    1 week ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    We're looking for a Site Reliability Engineer to help design and operate large-scale, distributed technology systems that power our identity applications. · Key Responsibilities · Reliability Engineering · Define and measure SLOs SLIs and error budgets for key services. · ...

    Gurugram, Hyderabad

    2 weeks ago

  • Work in company

    Application Reliability Engineer

    Only for registered members

    We are seeking a skilled Application Reliability Engineer to ensure the reliability, availability, and performance of our databases, services, and trading support systems. · Monitor production services and respond quickly to alerts. · Triage issues across trading support services ...

    Gurugram

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    Join as a Site Reliability Engineer in an inclusive team with collaborative ethos and commitment to innovation and professional development. · ...

    Gurugram, India

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    In this role you will play a crucial part in shaping the firm's infrastructure reliability and efficiency by implementing robust Site Reliability Engineering practices. · ...

    Gurugram, Panchkula

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    We are looking for a skilled Site Reliability Engineer (SRE) to join our custom software engineering team. · ...

    Gurugram

    4 weeks ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    We are looking for a seasoned DevOps Engineer with a strong background in Kubernetes (k8s), AWS Cloud, and infrastructure automation to join our growing engineering team. · ...

    Gurugram

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    This role involves implementing Site Reliability Engineering practices to ensure infrastructure reliability and efficiency. The candidate will work towards enhancing the reliability of systems and minimizing downtime. Key responsibilities include ensuring uptime and stability of ...

    Gurugram, Haryana

    1 month ago

  • Work in company Remote job

    Site Reliability Engineer

    Only for registered members

    Job summary · SITE RELIABILITY ENGINEER Job Description · As a Site Reliability Engineer you will play a key role in ensuring our systems remain reliable available and performant for both our customers and internal teams Your expertise will directly impact our users experience an ...

    Gurgaon, Haryana

    3 weeks ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    +Join us as a Site Reliability Engineer in Gurugram. In this key role, you'll support improvement of non-functional characteristics like availability and performance. · +You'll work alongside colleagues to meet defined service level objectives. · You'll contribute new ideas and i ...

    Gurugram

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    The Site Reliability Engineering (SRE) team is responsible for ensuring the reliability, · scalability, and performance of large-scale telecom and CPaaS platforms.This role combines software engineering and systems operations to build resilient, · observable, and automated infras ...

    Gurugram

    5 days ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    We are seeking a talented and motivated Senior Site Reliability Engineer (SRE) to join our organization. We are responsible for monitoring the stability and availability of mission-critical production systems, · The experienced SRE will play a crucial role in ensuring the reliabi ...

    Gurugram

    3 weeks ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    Join us as a Site Reliability Engineer to support the improvement of non-functional and operational characteristics of our products and services. You'll enjoy significant stakeholder interaction, working in collaboration with engineers to ensure a principled approach to deliver c ...

    Gurugram Full time

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    +Job summary · We are seeking a seasoned Site Reliability Engineer with a solid background in payment systems and high-availability architectures.The ideal candidate will have hands-on experience managing large-scale, distributed systems in production, · +ResponsibilitiesDesign, ...

    Gurugram

    1 month ago

  • Work in company

    Site Reliability Engineer

    Only for registered members

    We are seeking a talented and motivated Senior Site Reliability Engineer (SRE) to join our organization. · The team also manages and maintains internal tools/infra which is consumed by other development teams. · The experienced SRE will play a crucial role in ensuring the reliabi ...

    Gurugram

    3 weeks ago

  • Work in company

    Site Reliability Engineer

    confidential

    Work with customers to implement Observability solutions, build scalable systems, develop monitoring tools. · ...

    Gurgaon / Gurugram Full time

    4 days ago

Jobs
>
Site reliability engineer
>
Jobs for Site reliability engineer in Gurgaon