When we founded SpeedVault, we set ourselves an ambitious performance target: average upload speeds of 94 MB/s on standard consumer fibre connections. To put that in perspective, the average cloud storage service delivers somewhere between 12 and 25 MB/s on the same infrastructure. Achieving a 4-6x improvement required us to rethink the entire upload pipeline from first principles — not just throw more hardware at the problem.
In this post, I'll walk through the three core technologies that make SpeedVault's upload engine tick: chunked parallel uploads with adaptive concurrency, dynamic stream multiplexing over QUIC, and our edge-optimised routing layer. I'll also share real benchmark data and show you exactly how to use the SpeedVault API to take advantage of these speeds.
Chunked Parallel Uploads
The fundamental insight behind SpeedVault's speed is that a single TCP connection — even one tuned for maximum throughput — is fundamentally limited by latency, packet loss, and the congestion window dynamics of a single stream. The solution is to split files into independent chunks and upload them in parallel across multiple connections.
Adaptive Chunk Sizing
SpeedVault uses a dynamic chunk sizing algorithm that adjusts based on real-time network conditions:
- Ideal conditions (latency < 20ms, jitter < 5ms): 16 MB chunks — larger chunks mean less overhead from per-chunk metadata and handshakes.
- Moderate conditions (latency 20-80ms): 4 MB chunks — smaller chunks allow faster retransmission when packets are lost.
- High-latency or lossy conditions (> 80ms or > 2% loss): 1 MB chunks — aggressive parallelism compensates for poor link quality.
These thresholds are continuously evaluated throughout the upload. If a user starts an upload on a fast office connection and then moves to a congested coffee shop wifi, SpeedVault detects the change within approximately 2 seconds and re-adjusts chunk sizes and concurrency accordingly.
Concurrency Windowing
Rather than using a fixed number of parallel connections, SpeedVault implements a congestion-aware concurrency window similar to TCP's slow-start algorithm. The client starts with 4 parallel chunk uploads and scales up by 2 each time a chunk completes successfully, up to a maximum of 32 concurrent streams. If any chunk fails or takes longer than 3x the running average completion time, the window is reduced by half.
This approach means SpeedVault automatically finds the optimal parallelism for any given network path — saturating the available bandwidth without overwhelming the link. In our testing, this algorithm consistently achieves >90% of the theoretical maximum throughput measured by iPerf3 on the same connection.
Dynamic Stream Multiplexing Over QUIC
While parallel HTTP/2 connections over TCP work well, we chose to build our upload protocol on top of QUIC (RFC 9000) for several reasons that directly impact upload performance:
- Head-of-line blocking elimination: In TCP, a single lost packet blocks all streams until retransmission. QUIC multiplexes independent streams, so a lost packet only affects the specific chunk it belongs to.
- Zero-RTT resumption: Users uploading multiple files in a session can establish new streams with zero round-trip time after the initial handshake.
- Connection migration: QUIC connections survive network changes — if a user switches from WiFi to cellular mid-upload, the connection continues seamlessly without re-authentication.
- User-space congestion control: We implement a custom congestion control algorithm tuned for interactive upload workloads, rather than relying on kernel-space TCP stacks that optimise for bulk download.
Behind the numbers: In internal benchmarks, migrating from TCP/TLS 1.3 to QUIC improved upload throughput by an average of 23% on lossy networks (packet loss > 1%). On connections with no packet loss, the improvement was 8%, primarily from reduced handshake overhead and better stream multiplexing.
Edge-Optimised Routing
Even the best client-side upload protocol can be bottlenecked by slow routing to the destination server. SpeedVault operates 12 Points of Presence (PoPs) across six continents, but we don't just route traffic to the geographically nearest PoP — we use a real-time performance measurement system to choose the optimal path.
Probe-Based Path Selection
When a SpeedVault client starts an upload session, it sends lightweight probe packets to each PoP in a selected subset (based on geography, the client probes 3-5 PoPs). Each probe measures round-trip time, throughput, and packet loss. The client then selects the PoP that offers the best measured throughput — which is not always the nearest one.
In practice, we frequently observe that a slightly further PoP (e.g., 80ms vs 40ms RTT) can deliver 2x better throughput because it has less congested uplinks or better peering agreements with the user's ISP. Our system captures and acts on these differences.
Anycast + DNS Load Balancing
SpeedVault's ingestion endpoints use Anycast routing with health-check aware DNS. Each PoP advertises the same IP address block, and BGP routing automatically directs users to the nearest available PoP. We augment this with DNS-based steering that accounts for PoP capacity and current load, distributing upload traffic evenly to avoid any single PoP becoming a bottleneck.
Real-World Benchmarks
Our published 94 MB/s figure is the average across all uploads on our platform — not a cherry-picked best-case scenario. Here's how performance breaks down by region and file size, based on data from March 2026:
| Region | Avg Upload Speed | P95 Latency | Sample Size |
|---|---|---|---|
| North America | 102 MB/s | 18 ms | 1.2M uploads |
| Europe | 98 MB/s | 22 ms | 890K uploads |
| Asia-Pacific | 87 MB/s | 45 ms | 540K uploads |
| South America | 76 MB/s | 68 ms | 210K uploads |
| Oceania | 82 MB/s | 52 ms | 95K uploads |
Speed by file size distribution:
| File Size | Median Speed | Avg Speed | Notes |
|---|---|---|---|
| 0-10 MB | 45 MB/s | 52 MB/s | Connection overhead dominates |
| 10-100 MB | 88 MB/s | 91 MB/s | Full chunk window utilisation |
| 100 MB-1 GB | 96 MB/s | 98 MB/s | Optimal steady-state throughput |
| 1 GB+ | 101 MB/s | 104 MB/s | Sustained high throughput |
Using the SpeedVault API
SpeedVault's upload engine is exposed through a simple, RESTful API. Our client libraries and CLI use the same API, so third-party integrations get the same performance automatically. Here's a minimal example using curl — note that SpeedVault handles chunking and parallelism transparently on the server side for REST API uploads:
# Upload a file using SpeedVault's direct upload endpoint
curl -X POST /v1/files/upload \
-H "Authorization: Bearer $SV_API_TOKEN" \
-H "Content-Type: application/octet-stream" \
--data-binary @large-file.bin \
--progress-bar | jq '{
id: .file_id,
name: .file_name,
size: .file_size,
uploaded_at: .created_at
}'
For maximum performance with large files, we recommend using our multipart upload API, which gives you direct control over chunking and concurrency:
# Step 1: Initiate multipart upload
curl -X POST /v1/files/multipart \
-H "Authorization: Bearer $SV_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"file_name": "large-backup.tar.gz",
"file_size": 1073741824,
"part_size": 8388608
}'
# Response: {"upload_id": "upl_abc123", ...}
# Step 2: Upload each part in parallel
# (In practice, your client does this concurrently)
for i in $(seq 1 128); do
curl -X PUT "/v1/files/multipart/upl_abc123/part/$i" \
-H "Authorization: Bearer $SV_API_TOKEN" \
-H "Content-Type: application/octet-stream" \
--data-binary @part-$i.bin &
done
wait
# Step 3: Complete the upload
curl -X POST /v1/files/multipart/upl_abc123/complete \
-H "Authorization: Bearer $SV_API_TOKEN"
The multipart API supports up to 1,000 parts per file, with each part ranging from 5 MB to 5 GB. Our client libraries (available for Python, Node.js, Go, and Rust) handle the entire lifecycle automatically, including retry logic, checksum verification, and adaptive concurrency.
Conclusion
SpeedVault's 94 MB/s average upload speed isn't the result of a single magic bullet — it's the product of careful engineering across the entire upload path: adaptive chunking that responds to network conditions in real time, QUIC-based multiplexing that eliminates head-of-line blocking, and an intelligent routing layer that picks the fastest path for every upload.
These optimisations matter because upload speed directly impacts user productivity. For a design team pushing 4K video assets, a 4x speed improvement turns a 30-minute wait into a 7-minute one. For a law firm uploading case files before a deadline, it turns anxiety into confidence.
Of course, our focus on speed never comes at the expense of security. All the data transferred through this pipeline is encrypted client-side with AES-256-GCM before it ever leaves your device. The encryption itself adds negligible overhead — our SIMD-optimised AES-NI implementation processes data at over 2 GB/s on modern CPUs.
We're continuously working on pushing these numbers higher. Our next major milestone is 120 MB/s average upload speeds, which we expect to reach in Q3 2026 through improvements to our congestion control algorithm and the deployment of additional PoPs in Africa and the Middle East.
If you'd like to experience the difference yourself, sign up for a free account — no credit card required. You'll see the speed from your first upload.