Jobs
>
Site Reliability Engineer
>
Pune

    Site Reliability Engineer - Pune, India - PubMatic

    PubMatic
    PubMatic Pune, India

    Found in: Talent IN 2A C2 - 3 days ago

    PubMatic background
    Description

    As an SRE Engineer, you will be responsible for the Activate and Production Infrastructure. Your essential duties encompass ensuring the seamless operation and optimal performance of large-scale distributed software applications. Your role revolves around maintaining a robust and high-performing environment, contributing to the reliability of our services, and innovating solutions to guarantee 24/7 availability. By leveraging your technical expertise and dedication, you contribute to maintaining a seamless experience for our users while upholding the highest standards of operational excellence. Your specific responsibilities include:

    Role and Responsibilities:

    1. Monitoring and Alerting

    a. Review existing and set up new monitoring tools and systems as needed to track system performance, key metrics.

    2. Incident Management

    a. monitor the alerts and logs to promptly identify incidents or anomalies.

    b. Prioritize incidents based on severity and potential impact on stability and reliability.

    c. Engage in effective incident resolution, applying necessary fixes and mitigations to restore normal operations.

    3.On-Call Responsibilities

    a. Organize on-call schedules to ensure 24/7 coverage for incident response.

    b.Respond to alerts, troubleshoot issues, and coordinate with NOC and Engineering teams for incident resolution.

    c. Conduct post-incident reviews to identify root causes, learn from incidents, and implement preventive measures.

    4. Automation and Tooling

    a.Review pre-existing and build new automation scripts and tools as needed to streamline repetitive tasks, enhance efficiency, and reduce manual errors.

    b.Regularly update and maintain tools used for monitoring, deployment, and incident management to align with evolving needs.

    5.Performance Optimization

    a. Analyze application performance using profiling and monitoring tools to identify bottlenecks and areas for improvement.

    b. Work on optimizations, infrastructure upgrades, and architectural improvements to enhance system performance and efficiency.

    6.Capacity Planning and Scaling

    a. Monitor resource utilization and trends to predict capacity needs and plan for scaling.

    b. Scale resources, such as servers and databases, are based on usage patterns and anticipated growth to maintain performance and reliability. Also, automate the entire sizing process.

    7. Disaster Recovery and Redundancy

    a. Develop and maintain disaster recovery plans and procedures to ensure business continuity in case of failures or disasters.

    b. Implement redundancy and failover strategies to minimize downtime and maintain service availability during failures.

    8. Knowledge Sharing and Documentation

    a. Create and maintain comprehensive documentation for configurations, procedures, incidents, and best practices.

    b. Foster a culture of knowledge sharing within the team, conducting regular knowledge-sharing sessions and training programs.

    9.Feedback Loop and Continuous Improvement

    a. Collect feedback from incidents, post-mortems, and NOC/Dev team interactions to identify areas for improvement.

    b. Continuously iterate on processes, tools, and systems based on feedback and lessons learned to drive continuous improvement.

    10. Collaboration and Communication

    a. Collaborate closely with Engineering and DC/NOC teams to align goals and priorities.

    b. Ensure open and transparent communication within the team and with stakeholders, providing regular updates on incidents, progress, and initiatives.

    Required Skills and Qualifications

    • Bachelor's degree in computer science or related disciplines
    • Total 3+ years' experience in software application/product support
    • Ability to program using programming languages like Go, Scripting languages like Shell or Python
    • Good to have prior experience in technical engineering
    • A proactive approach to identify the problems, performance bottlenecks, and areas of improvement
    • Must know, Networking, Database (MySQL) and Linux System concepts, Debugging and analyzing the core dumps
    • Hands-on experience with monitoring and observability tools like Grafana, Nagios, Influx, ELK, etc.
    • Familiarity with orchestration tools like Docker and Grafana and incident management systems like Zenduty
    • Excellent communication and collaboration skills, with the ability to work effectively across teams.
    • Self-motivated and positive mindset to examine any incidents

  • UST

    Reliability Engineer

    Found in: Talent IN 2A C2 - 3 days ago


    UST Pune, India

    Mandatory Skills – Reliability Test Planning and Reporting, Ultrasound (Plus), Medical Device Domain, · Experienced in R&D environment. Preferable in a Regulated environment (Medical, Automotive or Aerospace/Defense). · Strong in Reliability Engineering Fundamentals, proficient ...

  • Philips

    Reliability Engineer

    Found in: beBee S2 IN - 5 days ago


    Philips Pune, India Full time

    Job Title · Reliability EngineerJob Description · Your challenge · Do you want to be a transformation leader, teaming up with over 20 Quality and Reliability professionals to support all Philips businesses globally with End-to-end (E2E) product Quality and Reliability? · Do you ...

  • Seagate

    Engineer - Reliability

    Found in: Talent IN C2 - 6 days ago


    Seagate Pune, India Regular

    The Reliability Engineering Team (part of the Product Assurance & Customer Advocacy organization) is accountable and responsible for reliability prediction, modeling, Design for Reliability (DFR), and Design Quality Assurance (DQA) of NPI products in Seagate's Systems, SSD, and H ...

  • SLB

    Site Reliability Engineer

    Found in: Talent IN 2A C2 - 3 days ago


    SLB Pune, India

    About us · We are a global technology company, driving energy innovation for a balanced planet.​ · Together, we create amazing technology that unlocks access to energy for the benefit of all.​ · Our inclusive culture is the key to our success. We collaborate with our internal com ...

  • HCLSoftware

    Site Reliability Engineer

    Found in: Talent IN 2A C2 - 3 days ago


    HCLSoftware Pune, India

    The Role: · HCL BigFix is looking for a Site Reliability Engineer to work on infrastructure for a new · product that will help keep our customers' end points secure. You will be a part of a team · that leverages modern technological solutions to drive growth and efficiency. Your ...

  • PhonePe

    Site Reliability Engineer

    Found in: Talent IN 2A C2 - 14 hours ago


    PhonePe Pune, India

    SRE SYSTEMS · JOB DESCRIPTION: · We are looking for engineers who are passionate about reliability, performance, and efficiency, · and with experience in building tools, services, and automation to manage and improve · production services. · Systems internals/security, Linux, Net ...

  • LTIMindtree

    Site Reliability Engineer

    Found in: Talent IN 2A C2 - 3 days ago


    LTIMindtree Pune, India

    About the Job: · Position: SRE Devops · Location: Chennai/Bangalore/Hyderabad/Pune/Mumbai · Experience: 5 to 8 Years only · Primary Skill- SRE, Dynatrace, Prometheus, Grafana, Kubernetes, AWS Native components, CloudWatch, (Puppet/ Chef/Ansible), CDK · Responsibilities · • Engage ...

  • Arista Networks

    Site Reliability Engineer

    Found in: Appcast Linkedin IN C2 - 1 day ago


    Arista Networks Pune, India

    Site Reliability Engineers at Arista are critical team members that have a breadth of knowledge encompassing all aspects of service delivery. They develop software solutions to enhance, harden and support our service delivery processes. This can include building and managing CI/C ...

  • SLB

    Site Reliability Engineer

    Found in: Talent IN 2A C2 - 14 hours ago


    SLB Pune, India

    About us · We are a global technology company, driving energy innovation for a balanced planet.​Together, we create amazing technology that unlocks access to energy for the benefit of all.​ · Our inclusive culture is the key to our success. We collaborate with our internal commun ...

  • Roche

    Site reliability engineer

    Found in: Talent IN C2 - 3 days ago


    Roche Pune, India Full time

    The Position · KEY ROLES & RESPONSIBILITIES (required): · Responsibilities: · Design, implement, and maintain site reliability engineering (SRE) practices that ensure the reliability and performance of our production systems. · Design and implement SRE practices that align wit ...

  • Mobile Programming LLC

    Site Reliability Engineer

    Found in: Talent IN 2A C2 - 3 days ago


    Mobile Programming LLC Pune/Maharashtra, India permanent

    Location : Pune · NP : Immediate / Serving Notice Period · Years of Experience : 12+ · Role : Site Reliability Engineer · Mandatory Skill : Java, GCP, AWS, CICD · Job Description : · Requirements : · Minimum 12+ years experience as a Site Reliability engineer supporting diff ...

  • Ensono

    Site Reliability Engineer

    Found in: Talent IN C2 - 5 days ago


    Ensono Pune, India

    About Us (Ensono) · Ensono is an expert technology adviser and managed service provider. As a relentless ally, we accelerate clients' digital transformation to achieve business outcomes that stand to last. Our dedicated team helps organizations optimize today's systems across an ...

  • TripleLift

    Application Reliability Engineer

    Found in: Talent IN 2A C2 - 3 days ago


    TripleLift Pune, India permanent

    TripleLift is seeking an Application Reliability Engineer to contribute to our technical escalations operations. This candidate will focus on ensuring that external clients are incredibly satisfied with TL platform and internal stakeholders are properly leveraged to execute. This ...

  • TSYS

    Site Reliability Engineer

    Found in: Talent IN C2 - 5 days ago


    TSYS Pune, India Full time

    Every day, Global Payments makes it possible for millions of people to move money between buyers and sellers using our payments solutions for credit, debit, prepaid and merchant services. Our worldwide team helps over 3 million companies, more than 1,300 financial institutions an ...

  • GfK

    Site Reliability Engineer

    Found in: Talent IN C2 - 5 days ago


    GfK Pune, India Full time

    Description · About You · You are a DevOps or Site Reliability Engineer with a passion for cloud infrastructure and automation. You're a self-starter and you love keeping up to date with the latest developments in cloud, configuration management and container technologies. You u ...

  • Hansen Technologies

    Site Reliability Engineer

    Found in: Talent IN C2 - 6 days ago


    Hansen Technologies Pune, India Full time

    About The Role · If you are an experienced Site Reliability Engineer join our team in Pune location to become a driving force in ensuring the reliability, performance, and scalability of our systems. As an SRE, you'll be more than just a technical expert, you'll be a creative p ...

  • FIS

    Site Reliability Engineer

    Found in: Talent IN C2 - 3 days ago


    FIS Pune, India Experienced (relevant combo of work and education)

    Position Type : · Full time Type Of Hire : · Experienced (relevant combo of work and education) Education Desired : · Associate's Degree Travel Percentage : · 0% Site Reliability Engineer (SRE) · Are you curious, motivated, and forward-thinking? At FIS you'll have ...

  • Jobs for Humanity

    Site Reliability Engineer

    Found in: Talent IN C2 - 3 days ago


    Jobs for Humanity Pune, India Full time

    Job Description · Position Type : · Full time Type Of Hire : · Experienced (relevant combo of work and education) Education Desired : · Associate's Degree Travel Percentage : · 0%Site Reliability Engineer (SRE) · Are you curious, motivated, and forward-thinking? At FIS you'll ...

  • PubMatic

    Site Reliability Engineer

    Found in: Talent IN C2 - 3 days ago


    PubMatic Pune, India

    PubMatic (Nasdaq: PUBM) is an independent technology company maximizing customer value by delivering digital advertising's supply chain of the future. · PubMatic's sell-side platform empowers the world's leading digital content creators across the open internet to control access ...

  • TripleLift

    Application Reliability Engineer

    Found in: Talent IN C2 - 3 days ago


    TripleLift Pune, India

    The Role · TripleLift is seeking an Application Reliability Engineer to contribute to our technical escalations operations. This candidate will focus on ensuring that external clients are incredibly satisfied with TL platform and internal stakeholders are properly leveraged to e ...