Name: AuraByt Inc.
Price range: $$

NVIDIA now ships (or has announced) two different machines with “Spark” in the name, and people are already mixing them up. There’s the DGX Spark, a developer box that’s been on sale since late 2025, and the brand-new RTX Spark, a consumer Windows platform NVIDIA unveiled at Computex 2026. They share DNA but aim at completely different buyers.

This post sorts them out: what each one is, what they compete with, and whether either will make a small team more productive. We’ve written before about what it costs to run AI on your own hardware and the full total cost of self-hosting an LLM; the Sparks are a specific, interesting answer to that question.

The DGX Spark: what it is

The DGX Spark is a desktop appliance built around NVIDIA’s GB10 Grace Blackwell Superchip, a single piece of silicon that combines an Arm CPU and a Blackwell GPU sharing one pool of memory.

Spec	DGX Spark
Chip	GB10 Grace Blackwell Superchip
CPU	20-core Arm (10× Cortex-X925 + 10× Cortex-A725)
GPU	Blackwell, 5th-gen Tensor Cores
Memory	128 GB LPDDR5x, coherent unified (CPU+GPU share it)
Memory bandwidth	273 GB/s
AI compute	up to 1 PFLOP FP4 (sparse, theoretical)
Storage	up to 4 TB NVMe (self-encrypting)
Networking	ConnectX-7 @ 200 Gbps
Power	240 W power supply (GB10 TDP ~140 W)
OS	NVIDIA DGX OS (Ubuntu-based)

The headline capability is that 128 GB of unified memory. Because the CPU and GPU share one memory pool, almost all of it is available to hold a model. NVIDIA rates it for running inference on models up to ~200 billion parameters, and fine-tuning models up to ~70 billion. Link two Sparks over the 200 Gbps ConnectX port and you can run up to ~405-billion-parameter models. A consumer graphics card can’t come close to that capacity.

It launched at $3,999 as the Founder’s Edition. Prices moved with the memory market: after a February 2026 increase, street pricing sits closer to $4,699, and OEM versions (Acer, ASUS, Dell, GIGABYTE, HP, Lenovo, MSI) vary around that.

”DGX Spark” vs “DGX”: the naming trap

This trips people up, so let’s be clear. For years, “DGX” meant NVIDIA’s data-center systems (DGX H100, DGX B200, the DGX Station), machines with eight datacenter GPUs, HBM memory with multiple terabytes per second of bandwidth, and price tags in the hundreds of thousands of dollars.

The DGX Spark is the smallest, cheapest member of that family. It shares the brand and the software stack, but not the class of performance. Comparing a Spark to a DGX B200 server is like comparing a hatchback to a transport truck because they both say the same logo. The Spark is a development and prototyping machine you put on a desk; the DGX servers are production training infrastructure you put in a rack.

So when someone asks “DGX Spark or DGX?” the answer is that they’re not really the same purchase. The Spark’s real competition is other desktop hardware.

And now there’s a second Spark: the RTX Spark

This is the part that’s about to confuse everyone. At Computex 2026 (announced May 31, 2026), NVIDIA, alongside Microsoft, unveiled RTX Spark, its first real push into mainstream Windows PCs. It reuses the Grace-Blackwell unified-memory idea from the DGX Spark, but aims at consumers and creators instead of AI developers.

Under the hood it’s NVIDIA’s new N1X chip (with a lighter N1 variant), and the top N1X is, by all accounts, essentially a repackaged GB10. It’s a 2.5D package on TSMC’s 3nm process pairing a MediaTek-designed Arm CPU die with an NVIDIA Blackwell GPU die.

	DGX Spark (GB10)	RTX Spark (N1X)
Status	Shipping since Oct 2025	Announced Computex 2026; ships fall 2026
Aimed at	AI developers, local model work	Consumer/creator PCs: AI + creative + gaming
OS	NVIDIA DGX OS (Linux)	Windows on Arm
CPU	20-core Arm (X925 + A725)	20-core Arm (X925 + A725), up to ~4.1 GHz
GPU	Blackwell	Blackwell, 6,144 CUDA cores (~desktop RTX 5070)
Memory	128 GB unified, 273 GB/s	up to 128 GB unified LPDDR5X, ~300 GB/s
Power	~140 W (240 W PSU)	~45–80 W
Form factor	Desktop appliance	Laptops + compact desktops

NVIDIA says 30+ laptops and ~10 desktops will launch this fall from ASUS, Dell, HP, Lenovo, Microsoft, and MSI (Acer and GIGABYTE to follow), and it laid out a multi-year roadmap: a Rubin-GPU / Vera-CPU generation with LPDDR6 around 2028, and a Feynman / Rosa generation around 2030.

A few notes, since this is an announcement, not a product you can benchmark yet:

It’s a consumer machine, not a DGX. Same “Spark” word, totally different intent: RTX Spark is about running local AI, creative apps, and games on a thin, efficient Windows device, with an RTX 5070-class GPU, not a datacenter part.
The bandwidth caveat carries straight over. ~300 GB/s is in the same modest league as the DGX Spark’s 273 GB/s. Great for holding big models efficiently; not the path to maximum token speed (see the next section).
Windows on Arm is the real question mark. The hardware looks strong, but app and game compatibility under x86 emulation is the thing to watch. Past “Windows on Arm” attempts stumbled there. NVIDIA’s CUDA stack and a multi-generation roadmap give this attempt more weight than previous ones, but wait for shipping reviews before assuming your software runs well.

For the rest of this piece, “the Spark” means the DGX Spark unless noted; it’s the one you can actually buy and benchmark today.

The one number that matters: memory bandwidth

The spec sheet buries this part, and reviewers keep flagging it. For running a language model, the bottleneck is rarely raw compute. It’s how fast you can move the model’s weights through memory on every token. That’s memory bandwidth.

The Spark’s 273 GB/s is low for AI inference. It’s the direct cost of the design goal: a tiny, quiet, ~140-watt box. Low power buys you low bandwidth, and low bandwidth means slower token generation. That’s the trade, and it’s physics, not a flaw NVIDIA can patch. For comparison, a desktop RTX 5090 moves ~1,792 GB/s, roughly 6–7× faster. A datacenter Blackwell GPU is faster still.

What that means in practice (independent benchmarks, your mileage will vary):

A 70B model at 4-bit lands around 35–45 tokens/second on a Spark.
A 120B model (e.g. gpt-oss-120B) runs around 38 tokens/second.

Readable, but not the snappy, instant-feeling speed you get from a big GPU on a model that fits in its VRAM. The Spark’s pitch isn’t “fastest.” It’s “I can hold a model your other hardware physically cannot, at the wall-power and noise of a Mac mini.”

How it compares for local AI

This is the comparison that matters: desktop-class options for running models locally:

Box	Memory for models	Bandwidth	~70B Q4 speed	Rough price	Best at
DGX Spark	128 GB unified	273 GB/s	~35–45 tok/s	~$4,000–4,700	Big models, tiny footprint, CUDA stack
RTX 5090 (desktop)	32 GB GDDR7	1,792 GB/s	~14–22 tok/s*	~$2,000 GPU + a PC	Speed on models that fit in 32 GB
AMD Ryzen AI Max+ 395 (Strix Halo)	128 GB unified	~256 GB/s	~4–5 tok/s	~$2,000	Cheapest path to 128 GB unified
Mac Studio (M3 Ultra)	up to 512 GB unified	~819 GB/s	~25–32 tok/s	$4,000+	Big memory and high bandwidth

*The RTX 5090 has to offload a 70B model to system RAM because it won’t fit in 32 GB of VRAM, which is why a card that’s far faster on an 8B model (~186 tok/s) collapses on a 70B one. This is the whole story in one row: capacity vs speed.

A few takeaways from that table:

If your models fit in 32 GB, a desktop RTX 5090 will run circles around the Spark. It’s not close.
If you need 128 GB of unified memory specifically, AMD’s Strix Halo (e.g. in a Framework Desktop or mini-PC) gets you there for roughly half the money, but with slower real-world speeds and a less mature software stack than NVIDIA’s CUDA ecosystem, which is the Spark’s quiet advantage.
A Mac Studio is, on paper, the strongest all-rounder for big local models (more memory and triple the bandwidth) if your toolchain is happy on Apple Silicon. Plenty of AI tooling still assumes CUDA, which is exactly where the Spark wins.

Where the Spark helps productivity

Stripping away the hype, here’s where we think it earns its place:

Local prototyping against the real CUDA stack. You develop on the same software environment as the DGX servers and cloud Blackwell instances, then push to production without rewriting for a different runtime. For a team that ships AI features, that continuity saves real time.
Running a big model on sensitive data, on-prem. A clinic or law firm that can’t send data to a third party (see our notes on why local deployment matters) can keep a 70B–120B model entirely in the building, drawing less power than a space heater.
A quiet, always-on inference box. At ~140 W it can sit on a desk running a coding assistant or a retrieval agent all day without the noise, heat, or electricity bill of a 575 W desktop GPU.
Fitting models a single consumer GPU can’t. The narrow win: if you must run something in the 70B–200B range locally, the Spark does it in a footprint nothing else NVIDIA-branded matches.

Where it’s the wrong tool

You want maximum speed and your model fits in 32 GB. Buy the RTX 5090. It’s cheaper and far faster for that case.
You’re training foundation models. That’s what the rackmount DGX systems and cloud clusters are for. The Spark fine-tunes; it doesn’t train from scratch at scale.
You only send a few million tokens a month. A cloud API is still cheaper and zero-maintenance. We say this constantly: self-hosting only wins past a real volume or privacy threshold. A $4,000 box that’s idle most of the day is an expensive way to avoid a $100 API bill.

Our take

The DGX Spark is a genuinely clever piece of hardware aimed at a specific person: a developer or small team that needs to run or fine-tune large models locally, on the CUDA stack, in a tiny low-power box, and who values capacity and ecosystem over raw speed.

It is not a desktop-GPU replacement, and the “supercomputer” branding oversells it for anyone whose models fit in 32 GB. The 273 GB/s memory bandwidth is the truth-teller in the spec sheet: this is a machine optimized to hold big models quietly, not to generate tokens as fast as physically possible. Judge it on that and it’s a strong tool. Judge it as “a $4,000 GPU” and you’ll be disappointed.

If you’re weighing local AI hardware against a cloud API for an actual use case, we’re happy to run the numbers with you before you spend anything.

NVIDIA's two 'Sparks': the DGX Spark and the new RTX Spark, explained

The DGX Spark: what it is

”DGX Spark” vs “DGX”: the naming trap

And now there’s a second Spark: the RTX Spark

The one number that matters: memory bandwidth

How it compares for local AI

Where the Spark helps productivity

Where it’s the wrong tool

Our take

Have something that needs shipping?

The DGX Spark: what it is

”DGX Spark” vs “DGX”: the naming trap

And now there’s a second Spark: the RTX Spark

The one number that matters: memory bandwidth

How it compares for local AI

Where the Spark helps productivity

Where it’s the wrong tool

Our take

Computex 2026 was all about 'AI PCs.' Should your small business buy one?

The real total cost of running your own LLM in 2026

The actual break-even point for running LLMs yourself

Have something that needs shipping?