- Own the Infrastructure, and APM and work with Developers and Systems engineers to Build, Release, Monitor, and run the reliability of the service exceeding the agreed SLAs.
- Write software to automate APIdriven tasks at scale and contribute to the product codebase in Java, JS, React, Node, Go, and Python.
- Write automation to reduce toil and eliminate manual, repeatable tasks.
- Work with Ansible, Puppet, Chef, Terraform, or another config management/orchestration suite, know where it's broken, work toward fixing them, and explore new alternatives.
- Define and accelerate the implementation of support processes, tools, and best practices Maintain services once they are live by measuring and monitoring availability, latency, and overall system reliability.
- Handle crossteam performance issues from identification of the cause, to determining the areas of improvement and driving those actions to closure.
- Performance and maturity baselining of Systems, tools maturity, coverage, metrics, technology, and engineering practices.
- Define, Measure, and Improve Reliability Metrics (SLO/SLI), Observability (Monitoring, Logging-Tracing solutions), Ops process (Incident, Problem Mgmt) and streamline automate release management.
- Build dashboards to provide visibility into the performance of the applications.
- Create chaos in the production environment purposefully in a controlled manager to validate the reliability of systems.
- Mentor and coach other SREs in the organization.
- Provide written and verbal updates to executives and the stakeholders of the application in the organization.
- Understand the current process, and system setup and propose the improvements needed in the processes, and technology so that the application exceeds the desired Service Level Objective.
- Troubleshoot, debug and diagnose operational issues and drive them to closure.
- Understanding of software delivery life cycles, particularly Agile/Lean, and DevOps.
- A strong believer in automation to bring in sustained continuous improvement by automating Toil, and Runbooks, improving the ability of the applications to autoheal leading to improved reliability.
- 15+ years of experience in the Development and Operations of applications/services in production that have uptime over 99.9%.
- 8+ years of experience as a SRE in handling webscale applications.
- Strong handson coding experience in one or more programming languages such as Python, Golang, Java, Bash, etc.
- Good understanding of Observability (monitoring, logging, tracing, metrics) and chaos engineering concepts.
- Proficiency in using Observability tools (for example : New Relic, Datadog, etc) for monitoring, logging, and tracing.
- Expert level handson knowledge in public cloud platform AWS and/or Google Cloud Platform.
- A professionallevel certificate in one of the public clouds is highly desirable.
- Must have handson experience in using configuration management systems such as Ansible or SaltStack and infrastructure automation tools like Terraform or CloudFormation.
- Should have used altering systems such as Pager Duty.
- Should have implemented solutions around Service Level Indicators (SLIs) and Service Level Objectives (SLOs) for services.
- Measurement should have been within a system and across systems in distributed systems.
- Should have supported Production Incidents (PIs) on critical applications of a company.
- Proven experience in handling largescale and growing infrastructure across Data Centers and heterogeneous Cloud platforms.
- Experience as a service owner in managing large geographically diverse stakeholders.
- Ability to work with creative fastgrowing engineering teams and motivate them to deliver their best work.
- History of driving innovation.
-
Site Reliability Engineer
Found in: Talent IN C2 - 1 day ago
SkySys Delhi Division, IndiaRole: Site Reliability Engineer (SRE) · Position Type: Full-Time Contract (40hrs/week) · Contract Duration: Long Term · Work Time zone: IST · Work Schedule: 8 hours/day (Mon-Fri) · Location: 100% remote (candidate can work from anywhere in India) · Must haves: Monitoring and dep ...
-
Site Reliability Engineer
Found in: Talent IN 2A C2 - 2 days ago
MAYNOR CONSULTING Any Location/Hyderabad, India permanentOn-Call Responsibility : · You will be point of contact for alerts and incidents and responsible for overall system reliability and availability · - Help maintain mission critical services. · - Maintain services once they are live by measuring and monitoring availability, latenc ...
-
Software Reliability Engineer II
Found in: Talent IN C2 - 5 days ago
Esri New Delhi, IndiaOverview · Esri is the world leader in geographic information systems (GIS) and developer of ArcGIS, the leading mapping and analytics software used in 75 percent of Fortune companies. At Esri, we believe in helping our customers take on challenging geospatial problems and makin ...
-
Senior Site Reliability Engineer
Found in: Talent IN 2A C2 - 2 days ago
ARR Recruitment Solutions Any Location/Bangalore, India permanentEducational Qualification : & Experience : · Experience Level (Years): 6 - 8 · Primary Skill : CI/CD · Relevant Years of Experience for Primary Skills : 4+ · Secondary Skill : Python · Relevant Years of Experience for Secondary Skills : 3+ · Job Description : · - 5+ years of expe ...
-
Architects/Engineers - Mobility
Found in: beBee S2 IN - 3 days ago
TravelKhana Delhi, India Full timeApply for Architects/Engineers Mobility, Career Progress Consultants in Delhi ,Delhi/ NCR for Year of Experience on ...
-
Fincent - Site Reliability Engineer - Cloud Services
Found in: Talent IN 2A C2 - 4 days ago
FINCENT SOFTWARE SERVICES PRIVATE LIMITED Any Location, India permanentResponsibilities : · Help to eliminate operational toil - seek to automate repetitive operations work. · Work with product development teams to ensure that our new features are able to meet SLAs. · Help mature the delivery process for teams; defining/managing automated deployment ...
-
Site Reliability Engineer
Found in: Talent IN 2A C2 - 2 days ago
Daxko Noida, IndiaCompany Description · Daxko powers health & wellness throughout the world. Every day our team members focus their passion and expertise in helping health & wellness facilities operate efficiently and engage their members. · Whether a neighborhood yoga studio, a national franchise ...
-
Site Reliability Engineer
Found in: Talent IN C2 - 2 days ago
Idemia Noida, IndiaYou may not know our name, but you have surely used our innovations and solutions. · Our mission is to unlock the world and make it safer through cutting-edge identity technologies. Every day, around the globe, we are enabling citizens and consumers alike to perform their daily ...
-
CTO/Architects/Engineers - Server Side
Found in: beBee S2 IN - 3 days ago
TravelKhana Delhi, India Full timeApply for CTO/Architects/Engineers Server Side, Career Progress Consultants in Delhi ,Delhi/ NCR for Year of Experience on ...
-
Lead Site Reliability Engineer
Found in: Talent IN 2A C2 - 4 days ago
Coforge Noida, IndiaDescription: · thought leader in the SRE space to help design a strategy and roadmap to help us mature as an organization · and translate business requirements to technical requirements, solution designing with commercial viability, and build business cases. · the sales team on s ...
-
Site Reliability Engineering IC2
Found in: Talent IN C2 - 4 days ago
Microsoft Noida, India Full timeOverview · Site Reliability Engineering - 1 · Job Summary · Do you want to work on a product that is used by millions of people around the world daily, and growing rapidly? Do you care deeply about how software is designed with a focus on supporting global scale? Do you want to ...
-
Senior Site Reliability Engineer
Found in: Talent IN C2 - 1 day ago
Global Payments Noida, India Full timeEvery day, Global Payments makes it possible for millions of people to move money between buyers and sellers using our payments solutions for credit, debit, prepaid and merchant services. Our worldwide team helps over 3 million companies, more than 1,300 financial institutions an ...
-
Senior Site Reliability Engineer
Found in: Talent IN C2 - 1 day ago
TSYS Noida, India Full timeEvery day, Global Payments makes it possible for millions of people to move money between buyers and sellers using our payments solutions for credit, debit, prepaid and merchant services. Our worldwide team helps over 3 million companies, more than 1,300 financial institutions an ...
-
Principal Site Reliability Engineer
Found in: Talent IN C2 - 22 hours ago
Oracle Noida, India Regular EmployeeJob description · We are looking for dynamic and forward-looking engineers to join our database cloud engineering team. Candidate must have Oracle Database Administration experience as a Site Reliability Engineer or DBA on large production environments. Understand the end-to-end ...
-
Site Reliability Engineering IC3
Found in: Talent IN C2 - 3 days ago
Microsoft Noida, India Full timeOverview · Site Reliability Engineer 2- WEST · Job Summary · Do you want to work on a product that is used by millions of people around the world daily, and growing rapidly? Do you care deeply about how software is designed with a focus on supporting global scale? Do you want to ...
-
Senior Site Reliability Engineer
Found in: Talent IN C2 - 5 days ago
TSYS Noida, India Full timeEvery day, Global Payments makes it possible for millions of people to move money between buyers and sellers using our payments solutions for credit, debit, prepaid and merchant services. Our worldwide team helps over 3 million companies, more than 1,300 financial institutions an ...
-
Senior Site Reliability Engineer
Found in: Talent IN C2 - 2 days ago
Microsoft Noida, India Full timeOverview · Are you passionate about building and maintaining the world's computer? Do you want to work on the cutting-edge of cloud technology and solve challenging problems at hyperscale? If so, join us as a Site Reliability Engineer (SRE) in the Microsoft Azure Networking team ...
-
Site Reliability Engineer II
Found in: Talent IN C2 - 2 days ago
Microsoft Noida, India Full timeOverview · Are you looking to make a real difference in Microsoft's mission to empower every person and organization to achieve more, with the power of cloud computing? Are you passionate about driving reliability of the services to make customers' mission critical workloads run ...
-
Database Reliability Engineer 3
Found in: Talent IN C2 - 2 days ago
Adobe Noida, India Full timeOur Company · Changing the world through digital experiences is what Adobe's all about. We give everyone—from emerging artists to global brands—everything they need to design and deliver exceptional digital experiences We're passionate about empowering people to create beautiful ...
-
Site Reliability Engineer
Found in: Talent IN 2A C2 - 2 days ago
Airtel Digital Gurugram, IndiaSite Reliability Engineer is one of the critical role in the technology team and the person working in this team will be responsible for application performance, availability, reliability and system uptime. Candidate is responsible to provide consultation and strategic recommenda ...
Site Reliability Engineer/Architect - Bangalore/Any Location, India - Grizmo Labs
Description
Responsibilities :
Requirements :