Nvidia Personal AI Super Computer

At CES, Nvidia just announced at Project Digits which is branded as your “personal AI Super Computer.” What makes this interesting:

  • 128GB of memory is enough to run 70B models. That can open up some new experimentation options.
  • You can link a couple of these together to run even larger models. This is the technique used for data center shizzle, so bringing this to the desktop is cool.
  • The Nvidia stack. Having access to the Blackwell architecture is sweet, but the secret sauce is the software stack, specifically CUDA. This is really Nvidia’s moat that gives them the competitive advantage. Build against this, and you can run on any of Nvidia’s stuff from the edge on up to hyperscale data centers.

If you are a software engineer, IMO it’s worth investing some $ in this type of hardware vs. spinning up cloud instances to learn. Why? There are things you can do locally on your own network that allow you to experiment/learn faster than in the cloud. For instance, video feeds are very high bandwidth that are easier to experiment with locally than pushing that feed to the cloud (and all the security that goes with exposing outside your firewall.)

Some related posts….
https://www.seanfoley.blog/visual-programming-with-tokens/
https://www.seanfoley.blog/musings-on-all-the-ai-buzz/

Visual Programming with Tokens

I bought a few Nvidia Jetson devices to use around the house and experiment with. I went with these vs. a discrete GPU + desktop machine because of power consumption: A desktop machine + GPU will use 300W+, and these Jetson edge devices use 10-50W.

I normally experiment with machine learning & AI shizzle using either a Jupyter notebook or a python IDE. But in this demo, I decided to check out Agent Studio. You fire up the container image, open up the link in your browser, and start dragging/dropping shizzle on to the canvas. Seriously rapid experimentation.

  • The video source is an RSTP video feed from my security camera.
  • The video output also produces an RSTP video feed. Not shown in the demo but I also experimented with overlaying the LLM output (tokens) with the video source to produce an LLM augmented video feed showing what it “saw”
  • This feeds into a multimodal LLM with a prompt of “describe the image concisely.”
  • I wire up the output to a text-to-speech model. Since video/LLM is operating in a constant loop, I also experiment with wiring up a deduping node.

This demo allowed my to get an idea of how these bits perform. I was interested in tokens/sec, memory utilization, and CPU/GPU utilization. Next, I plan to build out an Agentic AI solution architecture for my home security.