- Design new services, tools, and monitoring to be implemented by the entire team.
- Analyze the tradeoffs of the proposed design and make recommendations based on these tradeoffs.
- Mentor new engineers to achieve more than they thought possible. You enjoy making other teams successful and are fulfilled through the success of others. Work on database reliability projects, including:
- HA, Business Continuity Planning, disaster recovery, backup/restore, RTO, RPO
- Database uptime and performance
- Capacity management & planning
- SLIs, SLOs, error budgets, and monitoring dashboards
- Responsible for deployment and operations of large-scale distributed data stores and streaming services
- Establishing design patterns for monitoring and benchmarking
- Establishing and documenting production run books and guidelines for developers
- Tooling, toil reduction, runbooks & automation to manage production environments
- Incident management and improving MTTD/MTTR for services
- Cloud cost optimization QualificationsMandatory
- 3+ years of experience with deployment, operations, and performance management of large-scale Cassandra clusters along with Zookeeper.
- 2+ years of experience in managing large-scale cloud-native microservices platforms.
- Experience with infrastructure automation and scripting using Python and/or bash scripting.
- Excellent problem-solving, troubleshooting, and debugging skills in large-scale distributed systems Preferred
- Confluent Certified Administrator for Apache Kafka and/or Apache Cassandra Administrator Associate certifications are preferred
- AWS Solutions Architect certification preferred.
- Strong hands-on experience deploying, managing, and monitoring large-scale Kubernetes clusters in the public cloud specifically AWS or GCP
- Experience with deployment, operations, and performance management of one or more of the following large-scale clusters such as Cassandra, Kafka, Elastic/Open Search, MongoDB, ZooKeeper, Redis, etc.
- Experience with Infrastructure-as-Code using Terraform, CloudFormation, Google Deployment Manager, Pulumi, Packer, ARM, etc.
- Proven skills to effectively work across teams and functions to influence the design, operations, and deployment of highly available software. Bachelors/Masters in Computer Science, Engineering, or related technical field, or equivalent practical experience.
-
Site Reliability Engineer
1 day ago
Wall Street Consulting Services LLC Hyderabad, IndiaRole: SRE · Exp: 6+ years · Location: Pune, Bengaluru, Chennai · JOB DESCRIPTION: · The Role: · As a Site Reliability Engineer, you will be critical in ensuring our software products' reliability, scalability, and performance. You will be responsible for designing and implementin ...
-
Site Reliability Engineer
1 day ago
Quiktrak, LLC Hyderabad, IndiaJob Title: Azure Site Reliability Engineer (SRE) / DevOps Engineer · Job Description: · Summary: · As an Azure Site Reliability Engineer (SRE) / DevOps Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud infrastructure on the ...
-
Site Reliability Engineer
2 days ago
ValueLabs Hyderabad, IndiaExperienced in SRE or Site Reliability Engineer · Design, implement, and maintain automated processes for deploying, monitoring, and managing applications on Azure DevOps. · Collaborate with cross-functional teams to optimize system performance, reliability, and scalability. · D ...
-
Site Reliability Engineer
5 days ago
PURVIEW Hyderabad, IndiaRole: Site Reliability Engineer · Location: Hyderabad · Job Type: Contract (Permanent to Purview Services) · NOTE: Client is looking for immediate joiners or 1 Month Notice Period candidates only. · Job Description: · Primary job responsibilities · Ability to operate and maintai ...
-
Site Reliability Engineer
6 days ago
SID Global Solutions Hyderabad, IndiaJob Title: Site Reliability Engineer · Location: Hyderabad - Onsite · Work Mode: 5 Days Working from Office · JOB DESCRIPTION · • Experience in Cloud administration and troubleshooting(GCP is recommended) or AWS or AZURE · • Experience with Kubernetes or comparable technolog ...
-
Site Reliability Engineer
1 day ago
DATAMTX LLC Hyderabad, IndiaThe Company · Datamtx / formerly Datamatics) established in 1993 and globally HQ'd in Atlanta has a stellar history supporting both Tier 1 and 2 ERP rollouts ranging from implementations, data cleanse, migrations, customization, hypercare and Day 1 support. We are also nationall ...
-
Site Reliability Engineer
6 days ago
Wall Street Consulting Services LLC Hyderabad, IndiaRole: SREExp: 6+ yearsLocation: Pune, Bengaluru, Chennai · JOB DESCRIPTION:The Role:As a Site Reliability Engineer, you will be critical in ensuring our software products' reliability, scalability, and performance. You will be responsible for designing and implementing highly ava ...
-
Service Reliability Engineer
17 hours ago
Banyan Cloud Hyderabad, IndiaAbout US · Honest Data technologies Pvt Ltd, is a wholly owned subsidiary of Banyan Cloud, USA, the Cyber Security Product Company, headquartered in San Jose, California, USA, owning the SaaS product "Banyan Cloud", first of its kind Cyber Security CNAP Platform that simplifies t ...
-
Database Reliability Engineer
1 day ago
Zyoin group Hyderabad, IndiaWe are seeking a highly skilled and experienced Database Reliability Engineer (DBRE) to join our team and play a crucial role in ensuring the performance, scalability, and high availability of our customer database services on the Tessell Platform. · Minimum Requirements : · year ...
-
Site Reliability Engineer
1 day ago
Coforge Hyderabad, IndiaRole: Site Reliability Engineer · Location: Hyderabad · Work Mode: WFO · Experience: 6-10 yrs · Job Description: · Deep knowledge of version control. · Sound knowledge of operating Systems (like LINUX). · Should be aware of DevOps concepts and best practices. · CI/CD implementati ...
-
Site Reliability Engineer
6 days ago
WaferWire Cloud Technologies Hyderabad, IndiaHi,This is Sundeep from · Waferwire Technologies · and we are hiring · Site Reliability Engineer (SRE). · Role: Site Reliability Engineer (SRE)Location: Hyderabad OfficeExperience: 5 to 8 Years · Role Description:Responsibilities:Implement and maintain robust DevOps practices wit ...
-
Site Reliability Engineer
2 days ago
Shining Sheroes Bangalore/Hyderabad, India permanentPrimary job responsibilities : · - Ability to operate and maintain various OS platforms, with a focus on debugging, automation, availability, performance, and scale. · - Diagnose and troubleshoot complex distributed systems. · - Work and collaborate across teams, such as OS, Appl ...
-
Site Reliability Engineer
4 days ago
PURVIEW Hyderabad, IndiaRole: Site Reliability EngineerLocation: HyderabadJob Type: Contract (Permanent to Purview Services) · NOTE: Client is looking for immediate joiners or 1 Month Notice Period candidates only. · Job Description: · Primary job responsibilitiesAbility to operate and maintain various ...
-
Site Reliability Engineer
5 days ago
Kiash Solutions LLP Hyderabad, India Full timeConsidering candidates with 7+ yrs of exp. · MUST be able to onboard in 0-15 days MAX 20 days. · Location - Hyderabad (work from Office) · Timings - IST · Budget - open to discussion · Please find the JD for reference. · Design and develop tools that will aide in improving reli ...
-
Site Reliability engineer
10 hours ago
Virtusa Hyderabad, IndiaSite Reliability engineer - CREQ188641 Description Position : SRE · Primary skills: devops CI/CD pipeline · Location: Hyderabad · Should have proficiency in understanding of application monitoring stack(Logs, Events, Metrics and Alerts) and ability to visualize and setup end-to- ...
-
Site Reliability Engineer
1 day ago
Electronic Arts Hyderabad, IndiaPogo has been the leader in online casual games since 1998. Featuring a growing library of 60+ titles · spanning popular genres like Solitaire, Mahjong, Match 3, and more, Pogo exists to be the best · destination for online casual games. We strive to produce high-quality HTML5-po ...
-
Site Reliability Engineer
1 day ago
Oriontek INC Hyderabad, India Full timeThe Role · You will be responsible for : · Gathering and evaluating user feedback. · Providing code documentation and other inputs to technical documents. · Supporting continuous improvement by investigating alternatives and new technologies and presenting these for architectur ...
-
Site Reliability Engineer
1 day ago
Alter Domus Hyderabad, IndiaABOUT US · We are Alter Domus. Meaning "The Other House" in Latin, Alter Domus is proud to be home to 85% of the top 30 asset managers in the alternatives industry, and more than 5,000 professionals across 23 countries. · With a deep understanding of what it takes to succeed in ...
-
Site Reliability Engineer
5 days ago
Ampcus Tech Pvt. Ltd Hyderabad, India Full timeJob Title: Site Reliability Engineer (SRE) · Work Location: Hyderabad (work from office) · Type: Fulltime Permanent · Notice Period: 30 Days · Experience: 5+Years · JD: · Responsibilities: · Build and maintain our cloud platforms and support applications, demonstrating agile and ...
-
Site Reliability Engineering
1 day ago
Wipro Hyderabad, IndiaRequirement- SRE · Experience: 6+Years · Location : Pan India · Key responsibilities · Review Monitoring & alerts to provide recommendations for enhancement towards 360° coverage · Create dashboards, setup synthetic and real user monitoring, visualize large data sets with inter ...