How We Built a 50-User WebGL Training Simulator Without Dedicated Servers

The client's procurement team had one hard requirement before signing: no dedicated server infrastructure. Their IT department had spent two years fighting to decommission legacy server overhead, and they weren't going to approve a training tool that added it back. The L&D team had a separate hard requirement: 50 employees, same session, same virtual factory floor, at once.

Those two constraints sound like they cancel each other out. They don't — but making them work together forced us to make architectural decisions we hadn't had to justify this explicitly before.

This is the story of how we built that system for a manufacturing client's multi-user onboarding simulator. No dedicated game servers. No Unity Relay at scale costs. Fifty concurrent users in a browser tab.

Why the Default Answer (Dedicated Servers) Fails Enterprise Training Budgets

When most developers hear "50 concurrent users in a 3D environment," they reach for dedicated authoritative servers. It's the correct instinct for a competitive game. It's the wrong instinct for enterprise training.

Here's the math problem. Premium game server hosts in 2026 charge between $1 and $5 per gigabyte of RAM. A 50-user session needs roughly 2–4 GB of RAM and a moderate CPU allocation. That's $200–500 per month minimum, before traffic costs, before redundancy, before the DevOps time to manage it. Multiply that across multiple regional deployments so your manufacturing client in three countries gets acceptable latency, and you're looking at a five-figure annual infrastructure commitment — for a training tool that runs maybe 20 sessions a week.

L&D budgets don't work that way. Training buyers are comparing you against a license for an LMS that costs $8 per user per month. You cannot walk in with a proposal that includes an open-ended server bill.

So we didn't.

The Three Options We Actually Considered

We evaluated three architectures seriously. Not theoretically — we prototyped each one.

Pure peer-to-peer. One participant acts as host, others connect directly. Zero server costs, instant deployment. We killed this option in week two. At 50 users, the host's connection becomes the single point of failure for 49 other people. One employee on a hotel Wi-Fi network hosting a session tanks the experience for the entire cohort. IP addresses are also exposed between participants, which a manufacturing client's security team will reject on sight.

Authoritative dedicated servers. Gold standard for consistency and anti-cheat. Also the option we just explained costs too much for this use case. Eliminated.

Relay-based hybrid with WebRTC. Peers connect to each other where they can establish direct connections. Where they can't — due to firewalls, NAT, or corporate network restrictions that are extremely common in manufacturing environments — traffic routes through lightweight relay nodes. The relay forwards packets. It doesn't store state. It doesn't run game logic. That distinction is what makes it cheap.

This is what we built.

The Architecture, Specifically

The WebGL multiplayer training simulator runs on a mesh of WebRTC data channels between clients, with a signaling server handling the initial handshake and TURN relay servers handling traffic for peers that can't connect directly.

Signaling layer. A minimal WebSocket server — we're talking under 200 lines of Node.js — brokers the initial connection between peers. It passes session descriptions and ICE candidates between participants long enough for them to establish their own channels. Once the WebRTC handshake completes, the signaling server is mostly idle. It doesn't touch training data.

ICE and TURN. WebRTC's Interactive Connectivity Establishment protocol attempts direct peer connections first. In our testing, roughly 60–70% of peer pairs in typical corporate environments establish direct connections successfully. The remaining 30–40% route through TURN relay nodes. TURN relay traffic is the only real infrastructure cost in this architecture, and it's usage-based. You pay for what you forward, not for a server running 24 hours a day waiting for a session to start.

State authority. We designated one client as the session host — the training facilitator's machine — as the authoritative source of simulation state. This is different from a pure P2P host because the facilitator is always on a known, stable network connection (we made this a technical requirement in the client spec). All state changes validate through that client before propagating. It's not a server, but it performs the consistency function of one for a training scenario where you have a structured facilitator-participant hierarchy anyway.

Delta compression. We don't broadcast full state on every tick. We calculate diffs and send only what changed. In a manufacturing onboarding scenario — trainees walking through a virtual assembly line, picking up components, completing checklist steps — the actual state delta per tick is small. Most objects aren't moving. Most participants are watching an instruction sequence, not simultaneously manipulating 50 different objects. In our experience, delta compression reduced bandwidth per client by roughly 70% compared to full-state broadcasting, which meaningfully lowers TURN relay costs.

Client-side prediction. For local actions — a trainee picking up a component, walking to a station — we apply the result immediately on the local client without waiting for authority confirmation. We reconcile afterward. This masks the latency that browser-based WebRTC unavoidably introduces. WebGL multiplayer applications run 2–5x higher latency than native equivalents because browsers can't use raw UDP. Client-side prediction makes that latency invisible for the interaction patterns that matter in training.

What 50 Users Actually Looks Like at the Network Level

We stress-tested this with simulated load before handing it to the client. Some things we expected. Some things surprised us.

The signaling server barely registers at 50 users. It does its job during session join — roughly a 3–8 second window per participant — and then sits quiet. You could run this on the cheapest cloud compute tier available and it wouldn't break a sweat.

TURN relay load correlates almost entirely with how many participants are in environments with strict NAT or corporate firewall restrictions. For this manufacturing client, with a mix of factory-floor kiosks and remote participants on VPN, we saw about 40% of traffic hitting TURN nodes. That's higher than our baseline estimate, but still resulted in relay costs well under $80 per month across all sessions.

The hard limit we discovered: the WebRTC mesh topology doesn't scale linearly. At 50 participants, each client maintains up to 49 peer connections. That's manageable on modern hardware with a well-tuned implementation. At 80 participants in the same mesh, we started seeing connection management overhead eat into frame budgets on lower-end machines. We documented this clearly for the client: 50 concurrent is the tested ceiling, and it's a real one.

The training scenario structure actually helps here. Participants aren't all sending high-frequency position updates simultaneously. They're moving through a guided sequence. The facilitator controls pacing. Interaction is structured. This is fundamentally different from a battle royale game where 50 players are all sprinting and shooting at maximum update frequency. Enterprise training workflows are calmer at the network layer than gaming scenarios, which is one reason this architecture works for L&D and wouldn't work for a competitive shooter.

The Misconception That Almost Derailed This Project

Midway through development, the client's internal IT consultant reviewed the architecture and flagged "no authoritative server" as a security concern. His argument: without a server validating all actions, participants could manipulate training completion data.

It's a fair concern for a competitive game. It's the wrong concern for this training scenario.

We're not tracking leaderboards. We're not awarding prizes based on performance. We're recording that an employee completed a safety procedure walkthrough. The facilitator's client — which is the session authority — holds that state. Manipulation would require a participant to compromise the facilitator's machine, not inject packets into a WebRTC stream. The threat model doesn't match the architecture concern.

We spent half a day in a call walking through this. The outcome was a clear security threat model document in the project handoff, which we now include as standard deliverable for any multiplayer simulation we ship. If you're building this kind of system, write that document early. L&D buyers don't always have a gaming background, and "no server" sounds insecure to people who learned security in a client-server world.

WebGL Compatibility in 2026: This Is No Longer a Risk

One objection we stopped encountering this year: "will WebGL work on our machines?" WebGL 2.0 now shows 92% browser compatibility across major platforms. Chrome, Firefox, Safari 15.2+, Edge — all supported. For enterprise deployments where you can specify a minimum browser version in your technical requirements, you're effectively at 100% coverage for any machine purchased in the last four years.

The no-headset requirement was the original reason we pushed to WebGL for this client. They wanted 50 people trained simultaneously, and they were not going to purchase 50 VR headsets. WebGL in the browser is the answer to that problem. What we didn't expect is that the browser constraint also pushed us toward an architecture that ended up being cheaper to operate than what we'd have built for a native VR deployment anyway.

What to Check Before You Build This

If you're evaluating a similar architecture for a browser-based enterprise training simulator, run through this before you commit:

Network environment audit

What percentage of your participants are behind strict corporate NAT or VPN? Higher percentages mean more TURN relay traffic and higher relay costs.
Do your target machines support WebRTC data channels? Test on the actual hardware your participants will use, not dev machines.

Session structure analysis

Is there a natural facilitator-participant hierarchy? If yes, you have a clean candidate for session authority without needing a dedicated server.
What's the maximum simultaneous interaction rate? If 50 people are all manipulating objects at once, your bandwidth math changes significantly.

State synchronization requirements

Does every participant need to see every other participant's position in real time? Or do they mostly operate in distinct areas of the simulation?
What's the acceptable latency for state updates? Training scenarios can typically tolerate 100–200ms. If you need sub-50ms, reconsider the browser platform entirely.

Cost model

Price out TURN relay costs at your estimated traffic volume before committing. Services like Twilio's TURN relay or Cloudflare's equivalent charge per gigabyte forwarded.
Compare that to a dedicated server baseline. In our experience, relay-based architecture wins on cost below roughly 200 concurrent users across a training organization.

Ceiling planning

Document your tested concurrent user limit and put it in the contract. Ours is 50. Beyond that, the mesh topology creates overhead that requires a different architectural approach — likely a hybrid with a lightweight state relay that's more than a TURN node but less than a full authoritative server.

The architecture works. We've shipped it. The manufacturing client runs weekly onboarding cohorts through it without an IT ticket to spin up a server. That was the requirement, and it's what we delivered.