Visual Programming with Tokens

I bought a few Nvidia Jetson devices to use around the house and experiment with. I went with these vs. a discrete GPU + desktop machine because of power consumption: A desktop machine + GPU will use 300W+, and these Jetson edge devices use 10-50W.

I normally experiment with machine learning & AI shizzle using either a Jupyter notebook or a python IDE. But in this demo, I decided to check out Agent Studio. You fire up the container image, open up the link in your browser, and start dragging/dropping shizzle on to the canvas. Seriously rapid experimentation.

  • The video source is an RSTP video feed from my security camera.
  • The video output also produces an RSTP video feed. Not shown in the demo but I also experimented with overlaying the LLM output (tokens) with the video source to produce an LLM augmented video feed showing what it “saw”
  • This feeds into a multimodal LLM with a prompt of “describe the image concisely.”
  • I wire up the output to a text-to-speech model. Since video/LLM is operating in a constant loop, I also experiment with wiring up a deduping node.

This demo allowed my to get an idea of how these bits perform. I was interested in tokens/sec, memory utilization, and CPU/GPU utilization. Next, I plan to build out an Agentic AI solution architecture for my home security.