Monitoring Engineer - Chennai, India - Photon

    Photon
    Default job background
    Description

    The Monitoring engineer will be responsible for Designing, configuring, monitoring, implementing, and maintaining our observability solutions and troubleshooting IT systems and applications to ensure optimal performance and reliability. You will work closely with cross-functional teams to identify potential issues and provide innovative insights to optimize system performance, stability, and availability. The engineer will also be responsible for automating alerting and remediation processes to reduce mean time to resolution (MTTR) and improve system uptime.

    Responsibilities:

    Configure and maintain monitoring and observability tools and systems. Azure& Dynatrace

    Troubleshoot issues and outages, working closely with development and operations teams to identify root causes and develop solutions.

    Automate alerting and remediation processes to reduce mean time to resolution (MTTR) and improve system uptime.

    Conduct capacity planning and forecasting to ensure scalability and optimal performance of IT systems and applications.

    Collaborate with cross-functional teams to support incident management, change management, and problem management processes.

    Creating and enhancing our customer experience with application monitoring, by being able to identify and resolve issues in real-time before they become impactful

    APM (Application Performance Monitoring) solution implementations in cloud/on-prem infrastructures

    Own and create Monitoring solutions to monitor all Platforms/applications/jobs/IA ecosystem

    Qualifications:

    Mandatory:

    3+ years of experience in Dynatrace, installing agents, forwarders, APIs, performance monitoring tool alerts, dashboards and data trend analysis in a monitoring tool

    Understanding of data platforms, batch jobs and ETL processes Experience

    Experience with health checks of the systems to make sure 100% availability of Splunk / Dynatrace / Zabbix applications

    Strong troubleshooting and problem-solving skills, with the ability to quickly diagnose and resolve complex issues.

    Preferred:

    Deep understanding of IT infrastructure monitoring and observability best practices.

    Strong analytical skills, with the ability to analyze large amounts of data and identify patterns and trends.

    Programming skills in languages such Perl, Shell, or JavaScript.

    Experience with automation tools such as Ansible, Puppet or Terraform.

    Experience with container orchestration tools like Kubernetes.

    Experience with cloud platforms such as AWS, GCP, or Azure.

    Experience with CI/CD tools like Jenkins.