software Archives - Sean Foley

January 7, 2025

Nvidia Personal AI Super Computer

At CES, Nvidia just announced at Project Digits which is branded as your “personal AI Super Computer.” What makes this interesting:

128GB of memory is enough to run 70B models. That can open up some new experimentation options.
You can link a couple of these together to run even larger models. This is the technique used for data center shizzle, so bringing this to the desktop is cool.
The Nvidia stack. Having access to the Blackwell architecture is sweet, but the secret sauce is the software stack, specifically CUDA. This is really Nvidia’s moat that gives them the competitive advantage. Build against this, and you can run on any of Nvidia’s stuff from the edge on up to hyperscale data centers.

If you are a software engineer, IMO it’s worth investing some $ in this type of hardware vs. spinning up cloud instances to learn. Why? There are things you can do locally on your own network that allow you to experiment/learn faster than in the cloud. For instance, video feeds are very high bandwidth that are easier to experiment with locally than pushing that feed to the cloud (and all the security that goes with exposing outside your firewall.)

Some related posts….
https://www.seanfoley.blog/visual-programming-with-tokens/
https://www.seanfoley.blog/musings-on-all-the-ai-buzz/

January 3, 2025January 3, 2025

Visual Programming with Tokens

I bought a few Nvidia Jetson devices to use around the house and experiment with. I went with these vs. a discrete GPU + desktop machine because of power consumption: A desktop machine + GPU will use 300W+, and these Jetson edge devices use 10-50W.

I normally experiment with machine learning & AI shizzle using either a Jupyter notebook or a python IDE. But in this demo, I decided to check out Agent Studio. You fire up the container image, open up the link in your browser, and start dragging/dropping shizzle on to the canvas. Seriously rapid experimentation.

The video source is an RSTP video feed from my security camera.
The video output also produces an RSTP video feed. Not shown in the demo but I also experimented with overlaying the LLM output (tokens) with the video source to produce an LLM augmented video feed showing what it “saw”
This feeds into a multimodal LLM with a prompt of “describe the image concisely.”
I wire up the output to a text-to-speech model. Since video/LLM is operating in a constant loop, I also experiment with wiring up a deduping node.

This demo allowed my to get an idea of how these bits perform. I was interested in tokens/sec, memory utilization, and CPU/GPU utilization. Next, I plan to build out an Agentic AI solution architecture for my home security.

January 2, 2025January 2, 2025

Musings on all the AI Buzz…

Some think we’re in an AI bubble where AI is being over-hyped. Massive data centers are being built specifically for AI hardware (aka NVIDIA). Power consumption is so extreme that Big Tech wants to go nuclear.

I’ve used the foundational models. I have also used LLMs on the edge. I watched the GTC March 2024 Keynote with NVIDIA CEO Jensen Huang.

Here are the things that stuck with me:

Gen-ai is good at extracting context from unstructured data. This is a game changer.
There is a ton of investment $$$ flowing into this space. This will impact how we build software and what/how users interact with computers. This is very much like the previous Dot Com boom where the Internet/Ecommerce would fundamental change commerce… it was just the huge investments were too early for the “other” technical changes needed to make ecommerce ubiquitous: mobile phones and cheap networking.
Data centers and hyper-scale are needed for some workflows, but you can’t overcome physics and the round-trip latency of data transfer, which makes true interactive multimodal interactions challenging.
AI at the Edge is where I think the future is headed. We will want to use our phone or tablet with multi-modal gen-ai to assist us with “things.” This requires low-latency. For example, the speech-to-text models running on your phone performing real-time transcription of a voice mail is immensely useful to avoid spam calls.
According to Jensen Huang’s keynote, the future of programming is tokens. You feed tokens around to different “AI’s” that are specialized in a particular domain. This is the basis for Agentic AI.

I’ve seen shifts in tech and the software industry over my career. Historically, it has been about higher and higher level of abstractions:

Assembly abstracted machine op codes. C/C++ abstracted assembly.
Managed languages abstracted unmanaged languages which conversely abstracted the host CPU and hardware architecture (write once/run anywhere).
Operating systems abstracted the computer hardware
Databases abstracted the file system
Sockets abstracted the network.

But in each one of those abstractions, you still wrote code to sequence all that shizzle. It was deterministic logic based on discrete math. Now, in this new gen-ai world you decompose your domain into tokens. And within this domain, you have sub-domains that are specialized/optimized AIs. Each one of these gen-ai domains produces a probabilistic result based on linear algebra and statistics. Basically, it is a very smart guesstimate.

So now, tokens are abstracting the programming languages. And the holy grail everyone is chasing is Artificial General Intelligence (AGI) where the AI can do its own planning/reasoning. This abstracts out the programming language because the computer can figure out its own shizzle.

If you are a software engineer, you will absolutely need to add these tools to your toolkit. And I am not talking about just using a gen-ai API LLM wrapper. You really should dig behind that LLM API wrapper and:

Learn how machine learning models are created and build your own “hello world” model. You could build the Not Hotdog App and become the next tech unicorn.
Learn about distillation techniques to reduce model size.
Learn about how Low-Rank Adaptation of Large Language Models (LoRA) can be used for fine tuning LLMs.
Learn/experiment with how quantized LMMs behave.
How to use Ollama to load your own LLM and how the various settings influence model behavior and performance.

April 26, 2021May 10, 2021

LeoNTP Time Server Monitoring

This little device uses the GPS satellite’s atomic clock to provide a highly accurate stratum-1 time source to your network. Wouldn’t it be great if you can pull metrics from it so you can monitor performance? Then you might want to checkout the monitoring shizzle I built. It runs in a docker container and feeds time series data to InfluxDB.

https://github.com/sean-foley/leo-ntp-monitor

May 21, 2018

BMW S1000R Accessory Light Flasher – First Prototype

I built this about a year ago. I prototyped a circuit that used an ATTiny85 microcontroller to drive a p-channel (high side) mosfet. The idea was to use the microcontroller to strobe the Clearwater Darla LED accessory lights.

The circuit worked as expected, but that click-click-click-click noise is bad. I thought the functionality of the led “instant-on” was via a 12v signal to the dc-dc circuit in the led light, but it is actually a mechanical relay. Strobing anything mechanical is no bueno.

I completely changed my strategy after this test. A little experimenting and I discovered that the accessory lights are controlled by a PWM signal which controls the light intensity (low to full power).

February 26, 2018February 26, 2018

Wizbangadry

I met up with one of my peeps for lunch the other week. We’re chatting about stuff, then we started talking about coding. We have a nice rivalry – I’m very much about the “art and craft” of software engineering, and he’s all about using the latest/greatest to build stuff. I call B.S. on his shinny new, and he calls B.S. on my old and crusty.

Me: “Dependency Injection used to be your jammy jam. You told me that my code sucked because I called a constructor directly. So what’s your newest hotness?” Continue reading “Wizbangadry”

February 9, 2018February 9, 2018

Progressive Web Apps and The Microsoft Store

Welcoming Progressive Web Apps to Microsoft Edge and Windows 10

Microsoft announced that Progressive Web Apps (PWA) will be added to the Microsoft Store (the “Store”). This means just like a native app (or Universal App in Microsoft Store parlance), you can build a PWA app and have that added to the Store. From a developer perspective, this is great. A PWA app in theory should be much more cross-platform than a native app. But what I find more interesting is thinking about the “why’s” a company would do something.

The big tech companies have been battling for years. When you are building your business, trying to navigate the cesspool of technologies is a challenge. You have to be careful of betting on a technology that could get dropped when it isn’t a strategic fit anymore. Remember a thing called Silverlight? As developers, we know its possible to have standards and the Web has been that shinning light. But Apple, Google, and Microsoft all have different objectives. Unfortunately, rather than evaluating a technology on its technical merits, it’s actually more important to evaluate it on the viability of its long term success. Continue reading “Progressive Web Apps and The Microsoft Store”