https://platformengineering.org logo
#general
Title
# general
e

Endre Karlson

04/25/2023, 12:44 PM
Hey y'all, do platform teams typically have on-call ?
s

Shubham Girdhar

04/25/2023, 12:45 PM
Yes - as far as I could tell.
l

Louise Ogilvy

04/25/2023, 12:48 PM
I would say it's more common to be on-call than not to be - depends on the size of the company. A lot use a follow the sun system if they are fully remote, to ensure someone is working hours to support the customers
s

Steve Fenton

04/25/2023, 1:07 PM
They should be on-call, but specifically for incidents relating to the platform. They shouldn't be removing any on-call from steam aligned teams, which ought to respond to issues with their own applications. There is usually an escalation path from the on-call steam aligned folks to the on-call platform folks.
e

Elliot Partridge

04/25/2023, 1:21 PM
@Louise Ogilvy What's the sun system?
e

Endre Karlson

04/25/2023, 1:22 PM
Follow the sun - 24/7 coverage with teams around the globe
l

Louise Ogilvy

04/25/2023, 1:25 PM
Hi @Elliot Partridge absolutely that! It's more common in remote teams where they will hire engineers in all time zones to be available during working hours to support customers. It can help towards removing the need to expect engineers to be on-call :)
b

Bob Eckert

04/25/2023, 1:59 PM
At Slice our DevOps/Platform/SRE function is on call as we are the only ones who have a fully holistic view of all the services.
a

Azy Sir

04/25/2023, 2:12 PM
I wouldn’t go signing a large and expensive support contract - i’ll put it that way. It really depends on your culture if you have a culture where everyone takes ownership of their work and the platform is designed properly with a self service type of setup then it’s most likely your platform team has also designed self healing type of applications, rolling updates etc. So in this space i’d argue that 1 person from the Platform on call should suffice - it really does depend on team dynamic, platform capabilities, culture etc.
Having this issue right now with a client where we have a support contract with them - but if they just built the platform properly they could get rid of us (trying to automate myself out of a job - that’s always the aim of the game!) However a lot of it is falling on deaf ears - which highlights a huge cultural problem (could be backed by financial issues that I have no visibility on).
k

Krzysztof H.

04/25/2023, 2:47 PM
We have ;)
b

Bob Eckert

04/25/2023, 2:51 PM
(trying to automate myself out of a job - that’s always the aim of the game!)
Always this.
However a lot of it is falling on deaf ears - which highlights a huge cultural problem
People / Politics are the 8th layer of the 7-layer OSI model.
a

Azy Sir

04/25/2023, 2:58 PM
@Bob Eckert if i write that into LinkedIn will I be expecting calls from Lawyers? that is bloody terrific
j

Jessica Fink

04/26/2023, 5:09 AM
One take-away is also to have platform oncalls but to not page them for every outage it they may not be causing. Especially for internal dev platforms, a team need to distinguish the cause from the owner, i.e. a broken user flow because of a wrong config switch has to be fixed by the config owner…not the person running the platform.
v

Vivek Dwivedi

04/26/2023, 10:14 AM
Follow up on this, how do you manage you on calls? Is there a tool out there that is widely adopted? or is this just an entry in calendars?
s

Steve Fenton

04/26/2023, 10:16 AM
I've used PagerDuty to manage schedules and to sound the alarm. It escalates if the on-call doesn't acknowledge, so you can handle issues with people being out-of-signal (or really heavy sleepers!)
j

Jessica Fink

04/26/2023, 10:16 AM
From the bottom of my heart, do not try and build your own paging platform..there are great SAAS solutions out there. 😄 Oncall work is not just alerting. Its noise management, its incident analytics, its an auto-recovery enabler, a great gateway to enable efficient automated incident comms and at its core a schedule and escalation tool for your oncall teams.
v

Vivek Dwivedi

04/26/2023, 10:18 AM
Oh no, I am not building one 😅. Just trying to understand widely used systems so that we can integrate that in our product.
a

Azy Sir

04/26/2023, 3:51 PM
@Vivek Dwivedi we use PagerDuty also - though an ugly beast a working beast 🙂
19 Views