Intermediate
ONNX Runtime Web
Run any ONNX model in the browser or Node.js using ONNX Runtime — the universal model inference engine supporting WebAssembly and WebGPU.
What is ONNX?
ONNX (Open Neural Network Exchange) is an open format for representing machine learning models. It allows you to train a model in PyTorch, TensorFlow, or scikit-learn, export it to ONNX format, and run it anywhere — including the browser.
Setup
Terminal
npm install onnxruntime-web # Browser npm install onnxruntime-node # Node.js
Running a Model
JavaScript
import * as ort from 'onnxruntime-web'; // Load model const session = await ort.InferenceSession.create('model.onnx', { executionProviders: ['webgpu', 'wasm'] // fallback chain }); // Prepare input tensor const inputData = new Float32Array([1.0, 2.0, 3.0, 4.0]); const inputTensor = new ort.Tensor('float32', inputData, [1, 4]); // Run inference const feeds = { input: inputTensor }; const results = await session.run(feeds); // Read output const output = results.output.data; console.log('Predictions:', output);
Exporting Models to ONNX
Python - Export from PyTorch
import torch import torch.onnx # Your trained PyTorch model model = MyModel() model.eval() # Export to ONNX dummy_input = torch.randn(1, 3, 224, 224) torch.onnx.export(model, dummy_input, "model.onnx", input_names=["input"], output_names=["output"], dynamic_axes={"input": {0: "batch"}} )
Execution Providers
| Provider | Environment | Performance | Compatibility |
|---|---|---|---|
| WebGPU | Browser | Fastest (GPU) | Chrome 113+, Edge |
| WebGL | Browser | Fast (GPU) | All modern browsers |
| WASM | Browser | Good (CPU) | Universal |
| CUDA | Node.js | Fastest | NVIDIA GPUs |
| CPU | Node.js | Baseline | Universal |
Model Optimization
Python - Quantize for Web
from onnxruntime.quantization import quantize_dynamic, QuantType # Reduce model size by 4x with dynamic quantization quantize_dynamic( "model.onnx", "model_quantized.onnx", weight_type=QuantType.QUInt8 )
When to use ONNX Runtime: Choose ONNX when you have an existing model trained in Python and want maximum compatibility and performance in the browser. It supports more model architectures than TensorFlow.js conversion.
Lilly Tech Systems