Site Reliability Engineer - Pune, India - HCLSoftware

    HCLSoftware
    Default job background
    Technology / Internet
    Description

    The Role:

    HCL BigFix is looking for a Site Reliability Engineer to work on infrastructure for a new

    product that will help keep our customers' end points secure. You will be a part of a team

    that leverages modern technological solutions to drive growth and efficiency. Your daily

    responsibilities will be centered on HCL BigFix's cloud infrastructure, with daily tasks related

    to improving scalability, reliability, and observability.

    The ideal candidate will have a strong background in software engineering and system

    administration, with a proficiency in modern infrastructure tools (e.g., Kubernetes, Docker,

    AWS/GCP/Azure), with a passion for designing, implementing, and maintaining reliable and

    scalable systems. On-call duties are involved in this role.

    What You Do ?

    • with development and operations teams to design, implement,
    • maintain scalable and reliable infrastructure solutions.
    • and manage monitoring, alerting, and logging systems to ensure
    • identification and resolution of issues.
    • Work on the automation of infrastructure provisioning.
    • regular system and application performance analysis, tuning, and planning
    • cost efficiency and efficacy of complex, multi-cloud product and tackle
    • cost minimization efforts.

    Ensure the availability of new and existing developer tools.

    • the migration of large-scale, distributed diagnostics applications
    • cloud-native microservices.
    • and plan for capacity management and lead infrastructure change
    • for cloud-based services.
    • with SWE counterparts to identify and mitigate production issues.
    • Document and implement failover/disaster recovery plans.

    in code reviews and contribute to technical architecture

    • Participate in team on-call rotations.

    What You Bring:

    • in Computer Science or related technical field or proof of exceptional skills
    • related fields with practical software engineering experience.
    • knowledge of cloud operating system internals, filesystems,
    • technologies, and storage protocols, and networking stack.
    • leading troubleshooting and full-cycle incident response, including correction, and prevention.
    • 3+ years of managing services in distributed systems.
    • years of experience with common containerization tools, such as Docker
    • knowledge of at least one higher-level language such as Python or
    • Expert knowledge of CI/CD tools, Jenkins or GitHub Actions