← Back to blog Engineering

How We Achieve 94 MB/s Average Upload Speeds

AC Alex Chen, CTO March 12, 2026 8 min read

When we founded SpeedVault, we set ourselves an ambitious performance target: average upload speeds of 94 MB/s on standard consumer fibre connections. To put that in perspective, the average cloud storage service delivers somewhere between 12 and 25 MB/s on the same infrastructure. Achieving a 4-6x improvement required us to rethink the entire upload pipeline from first principles — not just throw more hardware at the problem.

In this post, I'll walk through the three core technologies that make SpeedVault's upload engine tick: chunked parallel uploads with adaptive concurrency, dynamic stream multiplexing over QUIC, and our edge-optimised routing layer. I'll also share real benchmark data and show you exactly how to use the SpeedVault API to take advantage of these speeds.

Chunked Parallel Uploads

The fundamental insight behind SpeedVault's speed is that a single TCP connection — even one tuned for maximum throughput — is fundamentally limited by latency, packet loss, and the congestion window dynamics of a single stream. The solution is to split files into independent chunks and upload them in parallel across multiple connections.

Adaptive Chunk Sizing

SpeedVault uses a dynamic chunk sizing algorithm that adjusts based on real-time network conditions:

  • Ideal conditions (latency < 20ms, jitter < 5ms): 16 MB chunks — larger chunks mean less overhead from per-chunk metadata and handshakes.
  • Moderate conditions (latency 20-80ms): 4 MB chunks — smaller chunks allow faster retransmission when packets are lost.
  • High-latency or lossy conditions (> 80ms or > 2% loss): 1 MB chunks — aggressive parallelism compensates for poor link quality.

These thresholds are continuously evaluated throughout the upload. If a user starts an upload on a fast office connection and then moves to a congested coffee shop wifi, SpeedVault detects the change within approximately 2 seconds and re-adjusts chunk sizes and concurrency accordingly.

Concurrency Windowing

Rather than using a fixed number of parallel connections, SpeedVault implements a congestion-aware concurrency window similar to TCP's slow-start algorithm. The client starts with 4 parallel chunk uploads and scales up by 2 each time a chunk completes successfully, up to a maximum of 32 concurrent streams. If any chunk fails or takes longer than 3x the running average completion time, the window is reduced by half.

This approach means SpeedVault automatically finds the optimal parallelism for any given network path — saturating the available bandwidth without overwhelming the link. In our testing, this algorithm consistently achieves >90% of the theoretical maximum throughput measured by iPerf3 on the same connection.

Dynamic Stream Multiplexing Over QUIC

While parallel HTTP/2 connections over TCP work well, we chose to build our upload protocol on top of QUIC (RFC 9000) for several reasons that directly impact upload performance:

  • Head-of-line blocking elimination: In TCP, a single lost packet blocks all streams until retransmission. QUIC multiplexes independent streams, so a lost packet only affects the specific chunk it belongs to.
  • Zero-RTT resumption: Users uploading multiple files in a session can establish new streams with zero round-trip time after the initial handshake.
  • Connection migration: QUIC connections survive network changes — if a user switches from WiFi to cellular mid-upload, the connection continues seamlessly without re-authentication.
  • User-space congestion control: We implement a custom congestion control algorithm tuned for interactive upload workloads, rather than relying on kernel-space TCP stacks that optimise for bulk download.

Behind the numbers: In internal benchmarks, migrating from TCP/TLS 1.3 to QUIC improved upload throughput by an average of 23% on lossy networks (packet loss > 1%). On connections with no packet loss, the improvement was 8%, primarily from reduced handshake overhead and better stream multiplexing.

Edge-Optimised Routing

Even the best client-side upload protocol can be bottlenecked by slow routing to the destination server. SpeedVault operates 12 Points of Presence (PoPs) across six continents, but we don't just route traffic to the geographically nearest PoP — we use a real-time performance measurement system to choose the optimal path.

Probe-Based Path Selection

When a SpeedVault client starts an upload session, it sends lightweight probe packets to each PoP in a selected subset (based on geography, the client probes 3-5 PoPs). Each probe measures round-trip time, throughput, and packet loss. The client then selects the PoP that offers the best measured throughput — which is not always the nearest one.

In practice, we frequently observe that a slightly further PoP (e.g., 80ms vs 40ms RTT) can deliver 2x better throughput because it has less congested uplinks or better peering agreements with the user's ISP. Our system captures and acts on these differences.

Anycast + DNS Load Balancing

SpeedVault's ingestion endpoints use Anycast routing with health-check aware DNS. Each PoP advertises the same IP address block, and BGP routing automatically directs users to the nearest available PoP. We augment this with DNS-based steering that accounts for PoP capacity and current load, distributing upload traffic evenly to avoid any single PoP becoming a bottleneck.

Real-World Benchmarks

Our published 94 MB/s figure is the average across all uploads on our platform — not a cherry-picked best-case scenario. Here's how performance breaks down by region and file size, based on data from March 2026:

RegionAvg Upload SpeedP95 LatencySample Size
North America102 MB/s18 ms1.2M uploads
Europe98 MB/s22 ms890K uploads
Asia-Pacific87 MB/s45 ms540K uploads
South America76 MB/s68 ms210K uploads
Oceania82 MB/s52 ms95K uploads

Speed by file size distribution:

File SizeMedian SpeedAvg SpeedNotes
0-10 MB45 MB/s52 MB/sConnection overhead dominates
10-100 MB88 MB/s91 MB/sFull chunk window utilisation
100 MB-1 GB96 MB/s98 MB/sOptimal steady-state throughput
1 GB+101 MB/s104 MB/sSustained high throughput

Using the SpeedVault API

SpeedVault's upload engine is exposed through a simple, RESTful API. Our client libraries and CLI use the same API, so third-party integrations get the same performance automatically. Here's a minimal example using curl — note that SpeedVault handles chunking and parallelism transparently on the server side for REST API uploads:

# Upload a file using SpeedVault's direct upload endpoint
curl -X POST /v1/files/upload \
  -H "Authorization: Bearer $SV_API_TOKEN" \
  -H "Content-Type: application/octet-stream" \
  --data-binary @large-file.bin \
  --progress-bar | jq '{
    id: .file_id,
    name: .file_name,
    size: .file_size,
    uploaded_at: .created_at
  }'

For maximum performance with large files, we recommend using our multipart upload API, which gives you direct control over chunking and concurrency:

# Step 1: Initiate multipart upload
curl -X POST /v1/files/multipart \
  -H "Authorization: Bearer $SV_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "file_name": "large-backup.tar.gz",
    "file_size": 1073741824,
    "part_size": 8388608
  }'

# Response: {"upload_id": "upl_abc123", ...}

# Step 2: Upload each part in parallel
# (In practice, your client does this concurrently)
for i in $(seq 1 128); do
  curl -X PUT "/v1/files/multipart/upl_abc123/part/$i" \
    -H "Authorization: Bearer $SV_API_TOKEN" \
    -H "Content-Type: application/octet-stream" \
    --data-binary @part-$i.bin &
done
wait

# Step 3: Complete the upload
curl -X POST /v1/files/multipart/upl_abc123/complete \
  -H "Authorization: Bearer $SV_API_TOKEN"

The multipart API supports up to 1,000 parts per file, with each part ranging from 5 MB to 5 GB. Our client libraries (available for Python, Node.js, Go, and Rust) handle the entire lifecycle automatically, including retry logic, checksum verification, and adaptive concurrency.

Conclusion

SpeedVault's 94 MB/s average upload speed isn't the result of a single magic bullet — it's the product of careful engineering across the entire upload path: adaptive chunking that responds to network conditions in real time, QUIC-based multiplexing that eliminates head-of-line blocking, and an intelligent routing layer that picks the fastest path for every upload.

These optimisations matter because upload speed directly impacts user productivity. For a design team pushing 4K video assets, a 4x speed improvement turns a 30-minute wait into a 7-minute one. For a law firm uploading case files before a deadline, it turns anxiety into confidence.

Of course, our focus on speed never comes at the expense of security. All the data transferred through this pipeline is encrypted client-side with AES-256-GCM before it ever leaves your device. The encryption itself adds negligible overhead — our SIMD-optimised AES-NI implementation processes data at over 2 GB/s on modern CPUs.

We're continuously working on pushing these numbers higher. Our next major milestone is 120 MB/s average upload speeds, which we expect to reach in Q3 2026 through improvements to our congestion control algorithm and the deployment of additional PoPs in Africa and the Middle East.

If you'd like to experience the difference yourself, sign up for a free account — no credit card required. You'll see the speed from your first upload.

AC

Alex Chen

CTO & Co-Founder at SpeedVault

Alex is the co-founder and CTO of SpeedVault. Previously, he led infrastructure engineering at a major CDN provider and holds several patents in distributed systems and transport protocol optimisation. He writes about system design, performance engineering, and the challenges of building secure, scalable infrastructure.