Queues in healthcare
October 09, 2025
Healthcare Queueing Models¶
HPDM139 Coding for ML and DS
Introduction: Queueing in Healthcare¶
Waiting lines are common in healthcare systems. For example:
- In online triage, patient requests arrive at unpredictable times, but only a limited number of clinicians are available to respond.
- In vaccination clinics, patients often arrive in waves, while staff capacity is fixed.
Queueing theory provides a way to study these systems mathematically. It helps answer practical questions such as:
- How long will patients wait on average?
- How long can a clinician spend on each patient, to keep waiting times acceptable?
- How many staff are needed to keep waiting times acceptable?
- Is it more effective to add an extra staff member, or to make each consultation a little faster?
In this project three queueing models are implemented and tested. Each assumes patients arrive at random and that the time taken for service also varies randomly:
- M/M/1 models a single server (for example, one GP handling triage requests).
- M/G/1 allows more realistic variation in consultation times.
- M/M/s represents multiple servers working in parallel, such as several vaccinators running a flu clinic.
Running experiments with these models explores how arrival rate, consultation length, and number of staff affect waiting times.
Aims & Study Design¶
Aim. Quantify how arrival rate (λ), service rate (μ), service‑time variability, and staffing (s) affect queue length, waiting times and resource utilisation in a primary healthcare setting.
Key performance indicators (KPIs).
- Utilisation: ρ
- Mean number in system/queue: L, Lq
- Mean time in system/queue: W, Wq
- Delay probability and tail metrics: e.g., P(wait > 10/15/30 min)
Experiments
- A. GP Same‑day Triage (M/M/1) (A1. utilisation ramp, A2. SLA sensitivity)
- B. Minor op clinic with variable complexity (M/G/1) (B1. effect of variance, B2. two mode case mix)
- C. Flu Vaccination Clinic (M/M/s) (C1. arrival planning curve, C2. staffing vs micro‑efficiencies)
Model 1: GP Same‑day Triage (M/M/1)¶
In queueing theory, a M/M/1 queue represents the queue length in a system having a single server, where arrivals occur at rate λ according to a Poisson process, and service times have an exponential distribution with rate parameter μ, where 1/μ is the mean service time.
Image source: Tsaitgaist - Mm1 queue.png, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=7037566
Key metrics that can be explored are:
Utilisation $(\rho)$ — fraction of time the GP is busy
$$\rho = \frac{\lambda}{\mu}$$Average waiting time $(W_q)$ — mean time before triage
$$W_q = \frac{\rho}{\mu - \lambda}$$
Experiment 1 — Utilisation ramp¶
In the United Kingdom's National Health Service (NHS), it has become common for patients to submit medical requests to their GP via online triage systems such as AccuRx, eConsult and AskMyGP.
The average GP practice in the UK in 2023 had 9,724 patients (Pettigrew, 2024) and collectively, England's GP surgeries offer over 36 million appointments per year (NHS Digital, 2022).
A typical arrangement involves a single GP recieving and triaging all online requests.
Experiment A1 is a stress test of a such a system, where a single GP "serves" same day triage, modelled as a M/M/1 queue.
In this experiment, service rate is fixed at μ = 12 patients/hour (≈5 min per patient), and arrival rate λ is ramped up from 4 to 11 patients per hour. The utilisation (p) and average wating time (Wq) are plotted against the arrival rate λ.
"""
Experiment A1 — Utilisation Ramp (M/M/1)
-------------------------------------------------
Model a single-GP same-day triage system using an M/M/1 queue.
Plot arrival rate (λ) against utilisation (ρ) and average waiting time (Wq).
Assumptions:
- Service rate μ = 12 patients/hour (≈5 minutes per patient)
- Arrival rate λ varies from 4 to 10 patients/hour
- Exponential inter-arrival and service times (Poisson arrivals)
"""
import matplotlib.pyplot as plt
import numpy as np
def mm1_metrics(mu: float, lambdas: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
"""
calculate utilisation (rho) and mean wait time (q)
Parameters:
mu : float
service rate (patients per hour)
lambdas: np,ndarray
Array of arrival rates (patients per hour)
Returns:
tuple[np.ndarray, np.darray]
arrays of utilisation (rho) and waiting time (Wq) in minutes.
"""
rho = lambdas / mu
wq_hours = rho / (mu - lambdas)
wq_minutes = wq_hours * 60
return rho, wq_minutes
def plot_utilisation_and_wait(mu: float, lambdas: np.ndarray) -> None:
"""
Plot λ vs ρ and Wq
Parameters:
mu : float
Service rate(patients per hour).
lambdas : np.ndarray
Array of arrival rates (patient per hour)
"""
rho, wq = mm1_metrics(mu, lambdas)
fig, ax1 = plt.subplots()
# plot utilisation on left Y axis
ax1.plot(lambdas, rho, 'o-', color='tab:blue', label = 'Utilisation (ρ)')
ax1.set_xlabel("Arrival rate λ (patients/hour)")
ax1.set_ylabel("Utilisation ρ", color='tab:blue')
ax1.tick_params(axis='y', labelcolor='tab:blue')
ax1.set_ylim(0, 1)
# Plot waiting time on the right y-axis
ax2 = ax1.twinx()
ax2.plot(lambdas, wq, 's--', color='tab:red', label='Waiting time Wq (minutes)')
ax2.set_ylabel("Average waiting time Wq (minutes)", color='tab:red')
ax2.tick_params(axis='y', labelcolor='tab:red')
# Title
plt.title("Experiment A1 — NHS GP Same-day Triage (M/M/1 Queue, μ=12/hr)")
# Formatting
fig.tight_layout()
plt.show()
# Parameters
MU = 12.0 # service rate (patients/hour)
lambdas = np.array([i for i in range (4,12)], dtype=float)
# Plot
plot_utilisation_and_wait(MU, lambdas)
This shows that while resource utilisation rises linearly with increased arrival rate, average waiting time increases sharply.
Experiment 2 — Meeting service target¶
In this experiment, the organisation wants to meet a target that 90% of patient triage requests are started within 20 minutes. How long should the triaging GP spend on each case, with increasing arrivals rate (from 4 to 11 per hour)?
This can be determined with the Lambert W fuction: $$\mu = \frac{1}{t}\, W\!\left( \frac{\lambda t}{0.1}\, e^{\lambda t} \right)$$
"""
Experiment A2 — Meeting service target (M/M/1)
-------------------------------------------------
Model a single-GP same-day triage system using an M/M/1 queue.
Plot arrival rate (λ) against service rate required to ensure that
90% of triage requests are started within 30 minutes.
Assumptions:
- Arrival rate λ varies from 4 to 11 patients/hour
- Target: 90% start within 30 minutes
- Exponential inter-arrival and service times
"""
import math
from scipy.special import lambertw
def required_mu(lam: float, t: float, target: float = 0.9) -> float:
"""
Solve for μ in the M/M/1 queue condition:
P(wait ≤ t) = target
where:
1 - (λ/μ) * exp(-(μ - λ)t) = target
"""
p = 1 - target
arg = (lam * t / p) * math.exp(lam * t)
mu = lambertw(arg).real / t
return mu
def plot (serivce_times, lambdas):
plt.figure(figsize=(7, 5))
plt.plot(lambdas, service_times, marker='s', label='Mean service time allowed (min)')
plt.xlabel('Arrival rate λ (patients/hour)')
plt.ylabel('Mean service time (minutes)')
plt.title('Service rate required for 90% of triages to start within 30 minutes')
plt.legend()
plt.grid(True)
plt.show()
# --- Parameters ---
T = 30 / 60 # 30 minutes in hours
target = 0.9
lambdas = np.arange(4, 12) # 4 → 11 arrivals per hour
# --- Compute required μ values ---
mus = np.array([required_mu(lam, T, target) for lam in lambdas])
service_times = 60 / mus # minutes per patient
plot(service_times, lambdas)
Model 2: Completing versus signposting (M/G/1)¶
The triaging GP can choose to fully deal with each triage problem presented, thereby "completing" the case, or to refer the case to another provider to deal with. Completing the case takes longer than referring the case, and the following experiments investigate this.
Experiments A1 and A2 assumed exponential service times (memoryless), giving an M/M/1 queue. That assumption allowed the use of a clean tail formula (via the Lambert) to calculate the service rate needed to meet a service target.
In Experiments 3 and 4 the triaging GP must choose between two service modes, complete or refer, with a general distribution of service times rather than an exponential distribution. therefore the correct single-server model is M/G/1 (“Markovian arrivals / General service / 1 server”).
Experiment 3 - Complete or signpost? (M/G/1)¶
This experient models a single-server triage queue with Poisson arrivals at rate λ per hour, and a two-point service-time distribution:
- Complete with probability $p$ (S = 10 minutes)
- Refer with probability $1-p$ (S = 3 minutes)
The GP wishes to complete as many cases as possible, while meeting the service target that 90% of triage should start within 30 minutes.
The average service time in minutes is: $$\mathbb{E}[S]=10p+3(1-p)=3+7p$$ The effective service rate in hours is: $$\mu_{\text{effective}}(p) = \frac{1}{\mathbb{E}[S]/60}= \frac{60}{3 + 7p}$$ Because $0<=p<=1$: $$p_{\max} = \min\!\left(1,\; \max\!\left(0,\; \frac{60/\mu_{\text{req}} - 3}{7}\right)\right)$$
The maximum proportion of cases that a GP can complete (rather than signpost) is plotted against a range of arrival rates λ.
import math, numpy as np, matplotlib.pyplot as plt
from scipy.special import lambertw
def mu_required(lam, t=30/60, target=0.9):
p = 1 - target
arg = (lam * t / p) * math.exp(lam * t)
return (lambertw(arg).real) / t
def pmax_from_mu(mu_req):
# p <= (60/mu_req - 3)/7, clipped to [0,1]
return np.clip((60/mu_req - 3)/7, 0, 1)
lambdas = np.arange(4, 12)
mu_req = np.array([mu_required(lam) for lam in lambdas])
pmax = pmax_from_mu(mu_req)
#for lam, mu, p in zip(lambdas, mu_req, pmax):
#print(f"λ={lam}/hr → μ_req={mu:.2f}/hr → p_max={p:.3f}")
plt.figure(figsize=(7,4.5))
plt.plot(lambdas, pmax, marker='o')
plt.ylim(0,1); plt.grid(True, alpha=.3)
plt.xlabel('Arrival rate λ (per hour)')
plt.ylabel('Max completion share p')
plt.title('Max % of cases GP can complete while keeping 90% start ≤ 30 min')
plt.show()
Experiment 4 - Mean waiting time (M/G/1)¶
Using the same model a single-server triage queue with Poisson arrivals at rate λ per hour, and a two-point service-time distribution: Complete with probability $𝑝$ (S = 10 minutes) and signpost with probability $1-p$ (S = 3 minutes)
The Pollaczek–Khinchine (M/G/1) formula yields the mean wait in the queue, contingent on arrival rate λ and completion rate $p$
$$W_q = \frac{\lambda\,\mathbb{E}[S^2]_{\mathrm{h}^2}}{2\,(1 - \rho)} \quad \text{(hours)}$$
def mg1_mean_wait_minutes(lam, p):
"""
M/G/1 mean waiting time in queue (minutes) for a 2-point service mix:
S = 10 min with prob p, 3 min with prob (1-p)
lam : float or np.ndarray
Arrival rate λ (per hour)
p : float or np.ndarray in [0,1]
Completion share (10-minute jobs)
"""
ES_min = 3 + 7*p # minutes
ES2_min = 9 + 91*p # minutes^2
ES_h = ES_min / 60 # hours
ES2_h = ES2_min / 3600 # hours^2
rho = lam * ES_h
rho = np.clip(rho, None, 0.999999)
Wq_h = np.where(rho < 1, lam * ES2_h / (2 * (1 - rho)), np.inf) # hours
return 60 * Wq_h # minutes
def mg1_mean_system_time_minutes(lam, p):
ES_min = 3 + 7*p
return mg1_mean_wait_minutes(lam, p) + ES_min
lambdas = np.arange(5, 11) # 4..11 per hour
ps = np.linspace(0, 1, 51)
plt.figure(figsize=(7,4.8))
for lam in lambdas:
Wq = mg1_mean_wait_minutes(lam, ps)
plt.plot(ps, Wq, label=f'λ={lam}/h')
plt.axhline(30, ls='--', alpha=0.4) # visual guide at 30 min (not the SLA metric)
plt.ylim(0,180)
plt.xlabel('Completion share p')
plt.ylabel('Mean wait Wq (minutes)')
plt.title('M/G/1 mean queue wait vs completion share')
plt.legend(ncol=2, fontsize=8)
plt.grid(True, alpha=.3)
plt.show()
Model C: multiple triaging GPs (M/M/s)¶
Adding a second triager affects performance metrics such as mean wait time. Experiments 5 and 6 investigate this.
Experiment 5 — Effect of Adding a Second GP (M/M/2)¶
Objective: Quantify how adding a second GP triager (two servers working in parallel) improves patient wait times and SLA attainment.
Scenario:
Each GP can triage at a mean rate of μ = 12 patients/hour (≈5 min per case).
Patients arrive at rate λ = 4–20 per hour (Poisson).
Service times are exponential.
Model: M/M/2 queue.
The probability that a patient must wait is given by the Erlang C formula for a M/M/s queue:
$$P_W = \frac{ \dfrac{(\lambda / \mu)^s}{s!\,(1 - \rho)} }{ \displaystyle \sum_{n=0}^{s-1} \frac{(\lambda / \mu)^n}{n!} \;+\; \dfrac{(\lambda / \mu)^s}{s!\,(1 - \rho)} }, \qquad \rho = \frac{\lambda}{s\mu}.$$
The Mean wait in queue $Wq$ is:
$$W_q = \frac{P_W}{s\mu - \lambda}$$
def erlang_c(lam, mu, s):
"""
Calculate Erlang C probability (probability that an arrival waits)
for an M/M/s queue.
"""
rho = lam / (s * mu)
if rho >= 1:
return 1.0 # system unstable
# numerator and denominator of Erlang C formula
num = (lam/mu)**s / (math.factorial(s) * (1 - rho))
denom = sum((lam/mu)**n / math.factorial(n) for n in range(s)) + num
return num / denom
def mean_wait_mm_s(lam, mu, s):
"""
Mean waiting time in queue Wq (hours) for M/M/s.
"""
rho = lam / (s * mu)
if rho >= 1:
return np.inf # unstable queue
Pw = erlang_c(lam, mu, s)
return Pw / (s*mu - lam)
# --- parameters ---
mu = 12 # each triager: 12 patients/hour (~5 min)
lambdas = np.arange(4, 31) # arrivals/hour, from 4 to 20
servers = [1, 2, 3, 4] # number of triagers
# --- compute and plot ---
plt.figure(figsize=(8,5))
for s in servers:
Wq_hours = np.array([mean_wait_mm_s(lam, mu, s) for lam in lambdas])
Wq_minutes = 60 * Wq_hours
plt.plot(lambdas, Wq_minutes, marker='o', label=f'{s} triager{"s" if s>1 else ""}')
plt.xlabel('Arrival rate λ (patients/hour)')
plt.ylabel('Mean queue wait $W_q$ (minutes)')
plt.title('Effect of adding triagers (M/M/s queue)')
plt.ylim(0, 120)
plt.legend()
plt.grid(alpha=0.3)
plt.show()
Experiment 6: Capacity planning (M/M/s)¶
The probability that a patient must wait at all, $P_W$, comes from the Erlang C formula for the M/M/s queue: $$P_W = \frac{ \dfrac{(\lambda / \mu)^s}{s!\,(1 - \rho)} }{ \displaystyle \sum_{n=0}^{s-1} \frac{(\lambda / \mu)^n}{n!} \;+\; \dfrac{(\lambda / \mu)^s}{s!\,(1 - \rho)} }, \qquad \rho = \frac{\lambda}{s\mu}.$$
If the patient must wait, then the waiting time is exponentially distributed with rate $s\mu - \lambda$:
$$P(W_q > t \mid \text{must wait}) = e^{-(s\mu - \lambda)t}$$
By total (unconditional) probability:
$$P(W_q \le t) = 1 - P_W\, e^{-(s\mu - \lambda)t}$$
import numpy as np
import math
import matplotlib.pyplot as plt
def erlang_c(lam, mu, s):
"""Erlang C (probability an arrival must wait) for M/M/s."""
rho = lam / (s * mu)
if rho >= 1:
return 1.0 # unstable
a = lam / mu
num = (a**s) / (math.factorial(s) * (1 - rho))
denom = sum((a**n) / math.factorial(n) for n in range(s)) + num
return num / denom
def p_wait_leq_t(lam, mu, s, t_hours):
"""P(Wq ≤ t) for M/M/s."""
if lam >= s * mu:
return 0.0
Pw = erlang_c(lam, mu, s)
rate = s*mu - lam
return 1.0 - Pw * math.exp(-rate * t_hours)
def min_servers_for_sla(lam, mu, t_hours, target=0.9, s_max=20):
"""Find smallest s meeting SLA."""
for s in range(1, s_max+1):
if lam < s * mu:
pw_leq = p_wait_leq_t(lam, mu, s, t_hours)
if pw_leq >= target:
return s
return np.nan
# --- Parameters ---
mu = 12.0 # service rate per triager (patients/hour)
target = 0.90 # SLA: 90% start within t
lambdas = np.arange(4, 41) # arrivals per hour (4–40)
t_30 = 30/60 # 30 min in hours
# --- Compute required servers for each SLA ---
s_30 = [min_servers_for_sla(lam, mu, t_30, target) for lam in lambdas]
# --- Plot ---
plt.figure(figsize=(8,5))
plt.step(lambdas, s_30, where='post', label='90% ≤ 30 min', color='tab:orange')
plt.xlabel('Arrival rate λ (patients/hour)')
plt.ylabel('Minimum triagers (s)')
plt.title('Minimum triagers to meet SLA (M/M/s)')
plt.grid(alpha=0.3)
plt.legend()
plt.yticks(range(1, int(np.nanmax(s_30))+1))
plt.show()