Job Specification: Site Reliability Engineer (Mid-Level)
Role Overview
We are seeking a Site Reliability Engineer (Mid-level) with strong expertise in AWS cloud infrastructure containerized platforms and Azure DevOps CI/CD pipelines. The successful candidate will focus on improving system reliability availability performance and scalability while enabling engineering teams to deliver high-quality services efficiently.
This role blends software engineering with operational excellence emphasizing automation observability incident response and continuous improvement across cloud-native environments.
Note: This is a reliability-focused engineering role with on-call responsibilities and involvement in platform modernization initiatives.
Qualifications :
Key Responsibilities
- Design build and operate highly available AWS infrastructure using Infrastructure as Code (Terraform / CloudFormation).
- Develop and maintain CI/CD pipelines to support automated deployments and testing.
- Implement and manage EC2 / containerised workloads using Docker and Kubernetes (EKS/ECS).
- Improve system reliability through automation monitoring alerting and self-healing mechanisms.
- Define and track SLIs/SLOs and error budgets for critical services.
- Participate in incident response lead root cause analysis and drive post-incident improvements.
- Build observability platforms using CloudWatch Prometheus Grafana ELK or similar tooling.
- Automate operational tasks to reduce toil and improve deployment consistency.
- Optimise AWS environments for performance scalability and cost efficiency.
- Implement security best practices including IAM secrets management and network segmentation.
- Collaborate with development teams to improve application reliability and deployment strategies.
- Maintain runbooks architectural documentation and operational playbooks.
Key Characteristics
- Reliability-driven: Focused on uptime performance and resilience.
- Automation-first mindset: Actively reduces manual effort and operational toil.
- Ownership mentality: Takes responsibility for services from design through production.
- Strong communicator: Clearly articulates incidents improvements and technical concepts.
- Collaborative: Works closely with platform security and application teams.
- Continuous learner: Keeps pace with SRE practices and cloud-native technologies.
Core Experience & Technical Skills
- 57 years of IT experience with at least 3 years in SRE DevOps or Cloud Engineering roles.
- Strong hands-on experience with AWS services including EC2 VPC IAM S3 RDS CloudWatch ALB/ELB and Route53.
- Proven experience creating managing and optimising CI/CD pipelines using Azure DevOps.
- Solid Linux/Windows system administration and troubleshooting skills across production environments.
- Hands-on experience with Docker for containerization and working knowledge of Kubernetes ECS/EKS including container networking scaling rolling deployments and service mesh concepts.
- Strong experience implementing Infrastructure as Code using Terraform and/or CloudFormation.
- Scripting proficiency in Bash and Python for automation and operational tooling.
- Experience automating infrastructure provisioning deployments and operational workflows.
- Practical experience implementing observability platforms including monitoring logging and alerting solutions.
- Strong understanding of SRE principles including SLIs SLOs error budgets incident management postmortems and capacity planning.
- Familiarity with performance tuning load testing and reliability optimisation techniques.
Additional Information :
D&I statement
Remote Work :
No
Employment Type :
Full-time
Job Specification: Site Reliability Engineer (Mid-Level)Role OverviewWe are seeking a Site Reliability Engineer (Mid-level) with strong expertise in AWS cloud infrastructure containerized platforms and Azure DevOps CI/CD pipelines. The successful candidate will focus on improving system reliability ...
Job Specification: Site Reliability Engineer (Mid-Level)
Role Overview
We are seeking a Site Reliability Engineer (Mid-level) with strong expertise in AWS cloud infrastructure containerized platforms and Azure DevOps CI/CD pipelines. The successful candidate will focus on improving system reliability availability performance and scalability while enabling engineering teams to deliver high-quality services efficiently.
This role blends software engineering with operational excellence emphasizing automation observability incident response and continuous improvement across cloud-native environments.
Note: This is a reliability-focused engineering role with on-call responsibilities and involvement in platform modernization initiatives.
Qualifications :
Key Responsibilities
- Design build and operate highly available AWS infrastructure using Infrastructure as Code (Terraform / CloudFormation).
- Develop and maintain CI/CD pipelines to support automated deployments and testing.
- Implement and manage EC2 / containerised workloads using Docker and Kubernetes (EKS/ECS).
- Improve system reliability through automation monitoring alerting and self-healing mechanisms.
- Define and track SLIs/SLOs and error budgets for critical services.
- Participate in incident response lead root cause analysis and drive post-incident improvements.
- Build observability platforms using CloudWatch Prometheus Grafana ELK or similar tooling.
- Automate operational tasks to reduce toil and improve deployment consistency.
- Optimise AWS environments for performance scalability and cost efficiency.
- Implement security best practices including IAM secrets management and network segmentation.
- Collaborate with development teams to improve application reliability and deployment strategies.
- Maintain runbooks architectural documentation and operational playbooks.
Key Characteristics
- Reliability-driven: Focused on uptime performance and resilience.
- Automation-first mindset: Actively reduces manual effort and operational toil.
- Ownership mentality: Takes responsibility for services from design through production.
- Strong communicator: Clearly articulates incidents improvements and technical concepts.
- Collaborative: Works closely with platform security and application teams.
- Continuous learner: Keeps pace with SRE practices and cloud-native technologies.
Core Experience & Technical Skills
- 57 years of IT experience with at least 3 years in SRE DevOps or Cloud Engineering roles.
- Strong hands-on experience with AWS services including EC2 VPC IAM S3 RDS CloudWatch ALB/ELB and Route53.
- Proven experience creating managing and optimising CI/CD pipelines using Azure DevOps.
- Solid Linux/Windows system administration and troubleshooting skills across production environments.
- Hands-on experience with Docker for containerization and working knowledge of Kubernetes ECS/EKS including container networking scaling rolling deployments and service mesh concepts.
- Strong experience implementing Infrastructure as Code using Terraform and/or CloudFormation.
- Scripting proficiency in Bash and Python for automation and operational tooling.
- Experience automating infrastructure provisioning deployments and operational workflows.
- Practical experience implementing observability platforms including monitoring logging and alerting solutions.
- Strong understanding of SRE principles including SLIs SLOs error budgets incident management postmortems and capacity planning.
- Familiarity with performance tuning load testing and reliability optimisation techniques.
Additional Information :
D&I statement
Remote Work :
No
Employment Type :
Full-time
View more
View less