Monitoring Engineer - Chennai, India - Photon
Description
The Monitoring engineer will be responsible for Designing, configuring, monitoring, implementing, and maintaining our observability solutions and troubleshooting IT systems and applications to ensure optimal performance and reliability. You will work closely with cross-functional teams to identify potential issues and provide innovative insights to optimize system performance, stability, and availability. The engineer will also be responsible for automating alerting and remediation processes to reduce mean time to resolution (MTTR) and improve system uptime.
Responsibilities:
Configure and maintain monitoring and observability tools and systems. Azure& Dynatrace
Troubleshoot issues and outages, working closely with development and operations teams to identify root causes and develop solutions.
Automate alerting and remediation processes to reduce mean time to resolution (MTTR) and improve system uptime.
Conduct capacity planning and forecasting to ensure scalability and optimal performance of IT systems and applications.
Collaborate with cross-functional teams to support incident management, change management, and problem management processes.
Creating and enhancing our customer experience with application monitoring, by being able to identify and resolve issues in real-time before they become impactful
APM (Application Performance Monitoring) solution implementations in cloud/on-prem infrastructures
Own and create Monitoring solutions to monitor all Platforms/applications/jobs/IA ecosystem
Qualifications:
Mandatory:
3+ years of experience in Dynatrace, installing agents, forwarders, APIs, performance monitoring tool alerts, dashboards and data trend analysis in a monitoring tool
Understanding of data platforms, batch jobs and ETL processes Experience
Experience with health checks of the systems to make sure 100% availability of Splunk / Dynatrace / Zabbix applications
Strong troubleshooting and problem-solving skills, with the ability to quickly diagnose and resolve complex issues.
Preferred:
Deep understanding of IT infrastructure monitoring and observability best practices.
Strong analytical skills, with the ability to analyze large amounts of data and identify patterns and trends.
Programming skills in languages such Perl, Shell, or JavaScript.
Experience with automation tools such as Ansible, Puppet or Terraform.
Experience with container orchestration tools like Kubernetes.
Experience with cloud platforms such as AWS, GCP, or Azure.
Experience with CI/CD tools like Jenkins.