Site Reliability Engineer

Mars Capital

Not Interested
Bookmark
Report This Job

profile Job Location:

Dublin - Ireland

profile Monthly Salary: Not Disclosed
Posted on: 2 days ago
Vacancies: 1 Vacancy

Job Summary

Job Specification: Site Reliability Engineer (Mid-Level)

Role Overview

We are seeking a Site Reliability Engineer (Mid-level) with strong expertise in AWS cloud infrastructure containerized platforms and Azure DevOps CI/CD pipelines. The successful candidate will focus on improving system reliability availability performance and scalability while enabling engineering teams to deliver high-quality services efficiently.

This role blends software engineering with operational excellence emphasizing automation observability incident response and continuous improvement across cloud-native environments.

Note: This is a reliability-focused engineering role with on-call responsibilities and involvement in platform modernization initiatives.


Qualifications :

Key Responsibilities

  • Design build and operate highly available AWS infrastructure using Infrastructure as Code (Terraform / CloudFormation).
  • Develop and maintain CI/CD pipelines to support automated deployments and testing.
  • Implement and manage EC2 / containerised workloads using Docker and Kubernetes (EKS/ECS).
  • Improve system reliability through automation monitoring alerting and self-healing mechanisms.
  • Define and track SLIs/SLOs and error budgets for critical services.
  • Participate in incident response lead root cause analysis and drive post-incident improvements.
  • Build observability platforms using CloudWatch Prometheus Grafana ELK or similar tooling.
  • Automate operational tasks to reduce toil and improve deployment consistency.
  • Optimise AWS environments for performance scalability and cost efficiency.
  • Implement security best practices including IAM secrets management and network segmentation.
  • Collaborate with development teams to improve application reliability and deployment strategies.
  • Maintain runbooks architectural documentation and operational playbooks.

Key Characteristics

  • Reliability-driven: Focused on uptime performance and resilience.
  • Automation-first mindset: Actively reduces manual effort and operational toil.
  • Ownership mentality: Takes responsibility for services from design through production.
  • Strong communicator: Clearly articulates incidents improvements and technical concepts.
  • Collaborative: Works closely with platform security and application teams.
  • Continuous learner: Keeps pace with SRE practices and cloud-native technologies.

Core Experience & Technical Skills

  • 57 years of IT experience with at least 3 years in SRE DevOps or Cloud Engineering roles.
  • Strong hands-on experience with AWS services including EC2 VPC IAM S3 RDS CloudWatch ALB/ELB and Route53.
  • Proven experience creating managing and optimising CI/CD pipelines using Azure DevOps.
  • Solid Linux/Windows system administration and troubleshooting skills across production environments.
  • Hands-on experience with Docker for containerization and working knowledge of Kubernetes ECS/EKS including container networking scaling rolling deployments and service mesh concepts.
  • Strong experience implementing Infrastructure as Code using Terraform and/or CloudFormation.
  • Scripting proficiency in Bash and Python for automation and operational tooling.
  • Experience automating infrastructure provisioning deployments and operational workflows.
  • Practical experience implementing observability platforms including monitoring logging and alerting solutions.
  • Strong understanding of SRE principles including SLIs SLOs error budgets incident management postmortems and capacity planning.
  • Familiarity with performance tuning load testing and reliability optimisation techniques.

Additional Information :

D&I statement


Remote Work :

No


Employment Type :

Full-time

Job Specification: Site Reliability Engineer (Mid-Level)Role OverviewWe are seeking a Site Reliability Engineer (Mid-level) with strong expertise in AWS cloud infrastructure containerized platforms and Azure DevOps CI/CD pipelines. The successful candidate will focus on improving system reliability ...
View more view more

Key Skills

  • Kubernetes
  • FMEA
  • Continuous Improvement
  • Elasticsearch
  • Go
  • Root cause Analysis
  • Maximo
  • CMMS
  • Maintenance
  • Mechanical Engineering
  • Manufacturing
  • Troubleshooting

About Company

Due to continued growth of our servicing platform we are looking for a Team Leader to support the business as it goes through this current period of growth. The successful candidates will act as team leader for a team of Customer Service Executives and Asset Managers working within th ... View more

View Profile View Profile