Site Reliability Engineer
Own the reliability of the routing layer that moves €100M+ a month for partner exchanges and wallets — from sub-3.5s route selection latency to 24/7 incident response.
About the role
UnifyVerse is the routing layer above cross-chain aggregators and bridges. We evaluate 15+ aggregators across 16+ chains per request and have routed €872M+ for partner integrations since launch. We run lean, in production, with real money flowing through every commit.
We are hiring a Site Reliability Engineer to take ownership of the systems that keep that pipeline fast, observable, and recoverable. You will work directly with the CTO and the core routing team to harden production, build out our observability stack, and shape the reliability practices we bring into SOC 2 Type II and ISO 27001.
This is not a ticket-shop SRE role. You will have leverage over architecture decisions and own the on-call rotation as we scale partner traffic.
What you'll do
- Own production reliability for the routing engine, RPC fleet, and partner-facing API — uptime, latency budgets, and error SLOs.
- Design and operate the observability stack: metrics, structured logs, distributed tracing, and routing-decision audit trails.
- Build and maintain the CI/CD, infrastructure-as-code, and progressive-delivery pipelines (canary, blue/green, automated rollback).
- Run capacity planning and cost controls across multi-chain RPC providers, compute, and managed databases.
- Lead incident response: triage, mitigation, blameless postmortems, and follow-through on action items.
- Define and operate the on-call rotation, runbooks, and escalation policy for a small but growing team.
- Harden the platform against the threat model of a financial-infrastructure operator: supply-chain integrity, secret management, key rotation, and least-privilege access.
- Contribute to the controls evidence and operational documentation that underpin our SOC 2 and ISO 27001 roadmap.
What we're looking for
- 5+ years operating production systems with meaningful uptime and latency requirements (financial, payments, infrastructure, or comparable).
- Deep experience with at least one of Kubernetes or a serious VM/orchestration setup, and with infrastructure-as-code (Terraform, Pulumi, or equivalent).
- Fluency with a modern observability stack: Prometheus, Grafana, OpenTelemetry, Loki/ELK, or comparable tooling — and a clear philosophy for SLOs and alerting.
- Strong scripting and systems programming skills in at least one of Go, Rust, TypeScript, or Python — enough to read and modify the services you operate.
- Hands-on experience with Postgres and at least one cache/queue (Redis, NATS, Kafka), including operating them under load.
- Practiced incident commander — you have led production incidents and written postmortems people actually read.
- Comfortable carrying primary on-call in a small team and shaping the rotation as we grow.
- Eligible to work in the Netherlands or another EU country.
Nice to have
- Production experience with EVM RPC infrastructure, mempool/relayer operations, or MEV-aware execution.
- Background operating regulated fintech or payments infrastructure under a controls regime (SOC 2, ISO 27001, PCI, or equivalent).
- Experience with HSM-backed key management, multi-sig operations, or threshold signing.
- Familiarity with chaos engineering and load-testing methodologies.
What we offer
- Direct ownership of production at a company where reliability is the product — not a cost center.
- Senior-level compensation with equity, calibrated to Amsterdam/Utrecht market and your experience.
- Hybrid working out of our Utrecht office; EU-remote considered for the right candidate.
- A small, senior team. No layers between you and architectural decisions.
- Budget for the tools, training, and conference attendance the role actually needs.
How to apply
Send a short note about why this role and a CV or link to your work to info@unifyverse.exchange. We read everything and reply within one working week. No recruiters, please.