The Opportunity
This is a great contract opportunity for a Senior Site Reliability Engineer with a strong database focus to join the digital team of one of Australia's largest consumer-facing platforms.
You will play a key role in owning and improving the reliability, observability, and performance of critical backend systems used by millions of Australians every day. The role sits within the W-Digital functional group and reports into the Site Reliability Engineering team within the Loyalty vertical.
About the Role
As a Senior SRE with a database focus, you will bridge software engineering and IT operations, applying SRE principles to build and maintain resilient, observable systems. You will be the go-to escalation point for complex database and infrastructure incidents and will help build a culture of blameless post-mortems, toil reduction, and continuous improvement.
Day-to-day, this role involves:
- Achieving and maintaining high availability targets through robust monitoring and observability
- Leading database performance tuning, query optimisation, indexing strategies, and capacity planning
- Designing and evolving end-to-end observability solutions covering monitoring, logging, tracing, and alerting
- Responding to incidents on a rostered on-call basis and driving actionable post-mortems
- Delivering measurable improvements in database scalability, latency, and cost efficiency
- Mentoring engineering teams on SRE best practices, database reliability, and observability principles
- Collaborating with Software Engineering, Platform, Data, and Security teams to embed reliability into the development lifecycle
Essential experience:
- 5+ years in a Site Reliability Engineer or Database Engineer capacity in large-scale, high-traffic production environments
- Advanced proficiency in PostgreSQL or MySQL, and/or NoSQL databases (MongoDB, Cassandra, Redis)
- Deep expertise in query optimisation, replication, sharding, backup/restore, high-availability setups, and database-specific observability (slow query logs, pg_stat_statements, EXPLAIN analysis)
- Hands-on experience designing and implementing observability solutions using tools such as Prometheus, Grafana, OpenTelemetry, Dynatrace, Datadog, or New Relic
- Strong SRE fundamentals including error budgets, SLIs/SLOs/SLAs, toil reduction, and blameless post-mortems
- Proficiency with cloud platforms (GCP or Azure) and managed database services
- Strong scripting and automation skills in Python, Bash, Go, or similar
- Experience with infrastructure tooling including Terraform, Ansible, Kubernetes, and Docker
Nice to have:
- Cloud native application development
- Service integration patterns and distributed systems architecture
- DevSecOps practices
Beyond the technical skills, we're looking for someone who takes ownership seriously, keeps a cool head during incidents, and communicates well across teams. If you enjoy mentoring others and have a genuine interest in building reliable systems at scale, this role will suit you well.
How to Apply
Please apply with your updated CV and we will be in touch. For a confidential chat about the role, feel free to reach out directly.
Applications are being reviewed on a rolling basis, so we'd encourage you to get in touch sooner rather than later.
