Site Reliability Engineer

Dublin - Ireland

Monthly Salary: Not Disclosed

Posted on: 2 days ago

Vacancies: 1 Vacancy

Job Summary

Job Specification: Site Reliability Engineer (Mid-Level)

Role Overview

We are seeking a Site Reliability Engineer (Mid-level) with strong expertise in AWS cloud infrastructure containerized platforms and Azure DevOps CI/CD pipelines. The successful candidate will focus on improving system reliability availability performance and scalability while enabling engineering teams to deliver high-quality services efficiently.

This role blends software engineering with operational excellence emphasizing automation observability incident response and continuous improvement across cloud-native environments.

Note: This is a reliability-focused engineering role with on-call responsibilities and involvement in platform modernization initiatives.

Qualifications :

Key Responsibilities

Design build and operate highly available AWS infrastructure using Infrastructure as Code (Terraform / CloudFormation).
Develop and maintain CI/CD pipelines to support automated deployments and testing.
Implement and manage EC2 / containerised workloads using Docker and Kubernetes (EKS/ECS).
Improve system reliability through automation monitoring alerting and self-healing mechanisms.
Define and track SLIs/SLOs and error budgets for critical services.
Participate in incident response lead root cause analysis and drive post-incident improvements.
Build observability platforms using CloudWatch Prometheus Grafana ELK or similar tooling.
Automate operational tasks to reduce toil and improve deployment consistency.
Optimise AWS environments for performance scalability and cost efficiency.
Implement security best practices including IAM secrets management and network segmentation.
Collaborate with development teams to improve application reliability and deployment strategies.
Maintain runbooks architectural documentation and operational playbooks.

Key Characteristics

Reliability-driven: Focused on uptime performance and resilience.
Automation-first mindset: Actively reduces manual effort and operational toil.
Ownership mentality: Takes responsibility for services from design through production.
Strong communicator: Clearly articulates incidents improvements and technical concepts.
Collaborative: Works closely with platform security and application teams.
Continuous learner: Keeps pace with SRE practices and cloud-native technologies.

Core Experience & Technical Skills

57 years of IT experience with at least 3 years in SRE DevOps or Cloud Engineering roles.
Strong hands-on experience with AWS services including EC2 VPC IAM S3 RDS CloudWatch ALB/ELB and Route53.
Proven experience creating managing and optimising CI/CD pipelines using Azure DevOps.
Solid Linux/Windows system administration and troubleshooting skills across production environments.
Hands-on experience with Docker for containerization and working knowledge of Kubernetes ECS/EKS including container networking scaling rolling deployments and service mesh concepts.
Strong experience implementing Infrastructure as Code using Terraform and/or CloudFormation.
Scripting proficiency in Bash and Python for automation and operational tooling.
Experience automating infrastructure provisioning deployments and operational workflows.
Practical experience implementing observability platforms including monitoring logging and alerting solutions.
Strong understanding of SRE principles including SLIs SLOs error budgets incident management postmortems and capacity planning.
Familiarity with performance tuning load testing and reliability optimisation techniques.

Additional Information :

D&I statement

Remote Work :

Employment Type :

Full-time

Job Specification: Site Reliability Engineer (Mid-Level)Role OverviewWe are seeking a Site Reliability Engineer (Mid-level) with strong expertise in AWS cloud infrastructure containerized platforms and Azure DevOps CI/CD pipelines. The successful candidate will focus on improving system reliability ...

Job Specification: Site Reliability Engineer (Mid-Level)

Role Overview

This role blends software engineering with operational excellence emphasizing automation observability incident response and continuous improvement across cloud-native environments.

Note: This is a reliability-focused engineering role with on-call responsibilities and involvement in platform modernization initiatives.

Qualifications :

Key Responsibilities

Design build and operate highly available AWS infrastructure using Infrastructure as Code (Terraform / CloudFormation).
Develop and maintain CI/CD pipelines to support automated deployments and testing.
Implement and manage EC2 / containerised workloads using Docker and Kubernetes (EKS/ECS).
Improve system reliability through automation monitoring alerting and self-healing mechanisms.
Define and track SLIs/SLOs and error budgets for critical services.
Participate in incident response lead root cause analysis and drive post-incident improvements.
Build observability platforms using CloudWatch Prometheus Grafana ELK or similar tooling.
Automate operational tasks to reduce toil and improve deployment consistency.
Optimise AWS environments for performance scalability and cost efficiency.
Implement security best practices including IAM secrets management and network segmentation.
Collaborate with development teams to improve application reliability and deployment strategies.
Maintain runbooks architectural documentation and operational playbooks.

Key Characteristics

Reliability-driven: Focused on uptime performance and resilience.
Automation-first mindset: Actively reduces manual effort and operational toil.
Ownership mentality: Takes responsibility for services from design through production.
Strong communicator: Clearly articulates incidents improvements and technical concepts.
Collaborative: Works closely with platform security and application teams.
Continuous learner: Keeps pace with SRE practices and cloud-native technologies.

Core Experience & Technical Skills

57 years of IT experience with at least 3 years in SRE DevOps or Cloud Engineering roles.
Strong hands-on experience with AWS services including EC2 VPC IAM S3 RDS CloudWatch ALB/ELB and Route53.
Proven experience creating managing and optimising CI/CD pipelines using Azure DevOps.
Solid Linux/Windows system administration and troubleshooting skills across production environments.
Hands-on experience with Docker for containerization and working knowledge of Kubernetes ECS/EKS including container networking scaling rolling deployments and service mesh concepts.
Strong experience implementing Infrastructure as Code using Terraform and/or CloudFormation.
Scripting proficiency in Bash and Python for automation and operational tooling.
Experience automating infrastructure provisioning deployments and operational workflows.
Practical experience implementing observability platforms including monitoring logging and alerting solutions.
Strong understanding of SRE principles including SLIs SLOs error budgets incident management postmortems and capacity planning.
Familiarity with performance tuning load testing and reliability optimisation techniques.

Additional Information :

D&I statement

Remote Work :

Employment Type :

Full-time

Key Skills

Kubernetes
FMEA
Continuous Improvement
Elasticsearch
Go
Root cause Analysis
Maximo
CMMS
Maintenance
Mechanical Engineering
Manufacturing
Troubleshooting

Apply Now

About Company

Mars Capital

Due to continued growth of our servicing platform we are looking for a Team Leader to support the business as it goes through this current period of growth. The successful candidates will act as team leader for a team of Customer Service Executives and Asset Managers working within th ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click