- Deploy updates and fixes
- Implement and own the CI.
- Manage CD tooling.
- Implement and maintain monitoring and alerting.
- Build and maintain highly available production systems.
- Build tools to reduce occurrences of errors and improve customer experience
- Develop software to integrate with internal back-end systems
- Perform root cause analysis for production errors
- Investigate and resolve technical issues
- Develop scripts to automate visualization
- Design procedures for system troubleshooting and maintenance
- Linux - Strong knowledge of Linux / Process management / System Administration / Network Troubleshooting (telnet / netstat / ping / nmap / ngrep / nslookup / tcpdump / trace-route) / File system and disk management / DNS / IP-Addressing, subnetting, masking /
- Scripting - Bash (Different type of automation and backups) / groovy / python scripting
- Terraform - Must require implementation of highly available, scalable and fault tolerant multi-tier Azure and AWS environments spanning across multiple availability zones using Terraform (including terraform-workspace for different environment)
- DR - Strong understanding of RTO and RPO with Disaster recovering planning and implementation to achieve zero-downtime deployment solutions (Setup alerting and monitoring for service downtime and proper actions against it)
- Docker - Excellent knowledge of microservice architecture, docker, docker-compose, Networking, storage, Dockerization of different application stacks.
- Jenkins - Excellent knowledge of Jenkins and CI/CD process (Including Master-slave architecture, Scripted and declarative pipelines, tools integrations, RBAC for user management, parameterized and schedule jobs, share library), Github-Actions knowledge would be a plus
- Database - Experience in database management and optimization (Creation / Administration / debugging and Monitoring) —> Postgres, Mysql, MongoDB, Redis (Both on-prem and Cloud-services like RDS, DynamoDB, Elasticache or Azure managed Databases)
- Ansible - Experience with Ansible and roles for different kind of automation, configuration management, server setup, patching etc
- Monitoring - Good Experience with Prometheus, Gharana, Cloudwatch / Graylog any equivalent stack or tools
- Logging - grafana-Loki-promtail, ELK or any equivalent stack or tools
- Web and Proxy servers - Proficient in configuring and optimizing web servers and reverse proxy servers (Nginx, Apache, HaProxy)
- Security and Compliance - Knowledge of Compliance and security practices for both Azure and AWS
- Dive deep into the software stack to troubleshoot and resolve issues related to application development, deployment, and operations.
- Performance tuning, monitoring, maintaining fault-tolerant/HA infrastructure to deliver highly scalable services
Senior DevOps Engineer - Udaipur, India - GKM IT
Description
Company Introduction -
GKM IT is an outsourcing company specializing in product development and technical consulting.
We are consultants covering all aspects of product development - design, backend, frontend, mobile, digital, DevOps, etc. Our global presence can be found in Silicon Valley, Europe, Australia and India. The major domains we specialize in are - Fintech, Edutech, HealthCare, and Hospitality.
Responsibilities:
Required Skills:
Cloud Knowledge
1) Azure
Must have experience with azure services like -
Active Directory
Virtual Network
Virtual Machine and scale sets
load balancer and App-Gateway
Web apps / App-service
Containers-apps
Functions
Storage Account and Blobs
Databases for Postgres and Mysql
Computer Vision / Azure-Open-AI
cognitive service
Azure Kubernetes service - Including setup with terraform, Deployment best practices (Helm Charts and ArgoCD), logging and monitoring setup, security, scalability, Rolling updates and rollback strategies to maintain application availability, audit and update the Cluster with latest versions.
Good experience with Azure-Devops pipelines for deployments and automations tasks
1) AWS
IAM (Users, Groups, Policies, Roles)
VPC (NAT, Subnet, IGW, Security-groups, NACL, Endpoint-service, Peering)
EC2, Auto-scaling
S3, RDS, EFS, Elasticache, RedShift
ALB, NLB, Transit-Gateway
ECS (Fargate and EC2), EKS(Including deployment best practices with add-ons)
CloudFront (CDN), Route53
ServerLess (Lambda, DynamoDB, API-Gateway, EventBridge)
SNS, SES, SQS
AWS-DevOps (Code-pipeline, Code-deploy)
CloudTrail, CloudWatch
AWS-CLI (Automation)
Control Tower, Landing-Zone, AWS-SSO