Jobs
>
Pune

    Sr System Reliability Engineer Automation Monitoringtool - Pune, India - Fulcrum Digital

    Default job background
    Description

    Whoare we:
    Fulcrum Digital is an agile andnextgeneration digital accelerating company providing digitaltransformation and technology services right from ideation toimplementation. These services have applicability across a varietyof industries including banking & financial servicesinsurance retail higher education food healthcare andmanufacturing.

    TheRole:

    Plan manage andoversee all aspects of a Production Environment
    Definestrategies for Application Performance Monitoring Optimization inProdenvironment.
    Respond toIncidents and improvise platform based on feedback and measure thereduction of incidents overtime.
    Supportdeployment of code into multiple lower environments. Supportingcurrent processes with an emphasis on automating everything as soonaspossible.
    Design developand standardize Monitoring and Alerting mechanism for the supportedapplications.
    Take a holisticapproach to problem solving by connecting the dots during aproduction event through the various technology stack that makes upthe platform to optimize meantime torecover.
    Engage in andimprove the whole lifecycle of services from inception and designthrough deployment operation andrefinement.
    Analyze ITSMactivities of the platform and provide feedback loop to developmentteams on operational gaps or resiliencyconcerns.
    Support servicesbefore they go live through activities such as system designconsulting capacity planning and launchreviews.
    Support theapplication CI/CD pipeline for promoting software into higherenvironments through validation and operational gating and lead inDevOps automation and bestpractices.
    Maintainservices once they are live by measuring and monitoringavailability latency and overall systemhealth.
    Scale systemssustainably through mechanisms like automation and evolving systemsby pushing for changes that improve reliability andvelocity.
    Work with aglobal team spread across tech hubs in multiple geographies andtimezones.
    Ability to shareknowledge and explain processes and procedures toothers.
    Share knowledgeand mentor juniorresources.
    Able to performoncall duties on a rotationalbasis.
    Occasional offhours workrequired.
    Candidate shouldhave an inclination for Training and should be good trainer andready to mentor others.

    Requirements

    MustHave:

    • Linux
    • ShellScripting
    • ITIL/ITSM
    • AnyMonitoring tool (PreferredSplunk/Dynatrace)
    • JenkinsCI/CD
    • GroovyScripting/Yaml
    • Gitbasic/bitbucket

    GoodToHave:

    • EvenFrameworkarchitecture
    • PL/SQL
    • ApplicationTroubleshooting
    • Ansible/Chef


    The Role Plan, manage, and oversee all aspects of a ProductionEnvironment Define strategies for Application PerformanceMonitoring, Optimization in Prod environment. Respond to Incidentsand improvise platform based on feedback and measure the reductionof incidents over time. Support deployment of code into multiplelower environments. Supporting current processes with an emphasison automating everything as soon as possible. Design, develop andstandardize Monitoring and Alerting mechanism for the supportedapplications. Take a holistic approach to problem solving, byconnecting the dots during a production event through the varioustechnology stack that makes up the platform, to optimize meantimeto recover. Engage in and improve the whole lifecycle of servicesfrom inception and design, through deployment, operation andrefinement. Analyze ITSM activities of the platform and providefeedback loop to development teams on operational gaps orresiliency concerns. Support services before they go live throughactivities such as system design consulting, capacity planning andlaunch reviews. Support the application CI/CD pipeline forpromoting software into higher environments through validation andoperational gating, and lead in DevOps automation and bestpractices. Maintain services once they are live by measuring andmonitoring availability, latency, and overall system health. Scalesystems sustainably through mechanisms like automation and evolvingsystems by pushing for changes that improve reliability andvelocity. Work with a global team spread across tech hubs inmultiple geographies and time zones. Ability to share knowledge andexplain processes and procedures to others. Share knowledge andmentor junior resources. Able to perform on-call duties on arotational basis. Occasional off hours work required. RequirementsMust-Have: Linux Shell Scripting ITIL / ITSM - Basic PL/SQL- BasicApplication Troubleshooting - Basic Monitoring Tool - Preferred -Splunk, Dynatrace. Any other monitoring tool Jenkins - CI/CD GroovyScripting/Yaml Git basic/bit bucket Chef Dev-ops - CI-CD, Overviewof git, Bit bucket, SonarQube, CI(Jenkins), Chef, Good To Have: LogMonitoring Tool - Splunk Application Monitoring tool DynatraceTicketing incident/problem management tool - Remedy


  • Fulcrum Digital Pune, India

    Whoare we · Fulcrum Digital is an agile andnextgeneration digital accelerating company providing digitaltransformation and technology services right from ideation toimplementation. These services have applicability across a varietyof industries including banking & financial servi ...