An engineer who keeps large-scale services reliable, scalable, and observable using software engineering practices.
What site reliability engineers do
Site reliability engineers set and monitor service-level objectives, build observability tooling, run incident response, conduct postmortems, plan capacity, and improve deployment safety. They also write code that automates toil and prevents repeat outages.
Training path
Most US SREs hold a bachelor’s degree in computer science or a related field. Many start as software engineers or systems administrators. Strong understanding of distributed systems, networking, and operating system internals is critical.
Tooling
Common tools include Prometheus, Grafana, Datadog, Splunk, Kubernetes, Istio, Terraform, Argo, and on-call platforms such as PagerDuty. SRE teams often own service mesh, deployment pipelines, and chaos engineering programs.
Find more professions by letter
Site Reliability Engineer starts with S and ends with R. Browse other professions along the same letter.
Professions that contain a letter from "Site Reliability Engineer":