I bought a few Nvidia Jetson devices to use around the house and experiment with. I went with these vs. a discrete GPU + desktop machine because of power consumption: A desktop machine + GPU will use 300W+, and these Jetson edge devices use 10-50W.
I normally experiment with machine learning & AI shizzle using either a Jupyter notebook or a python IDE. But in this demo, I decided to check out Agent Studio. You fire up the container image, open up the link in your browser, and start dragging/dropping shizzle on to the canvas. Seriously rapid experimentation.
- The video source is an RSTP video feed from my security camera.
- The video output also produces an RSTP video feed. Not shown in the demo but I also experimented with overlaying the LLM output (tokens) with the video source to produce an LLM augmented video feed showing what it “saw”
- This feeds into a multimodal LLM with a prompt of “describe the image concisely.”
- I wire up the output to a text-to-speech model. Since video/LLM is operating in a constant loop, I also experiment with wiring up a deduping node.
This demo allowed my to get an idea of how these bits perform. I was interested in tokens/sec, memory utilization, and CPU/GPU utilization. Next, I plan to build out an Agentic AI solution architecture for my home security.