Archive - Historical Articles
You are viewing records from 02/15/2026 23:27:43 to 06/17/2026 20:17:48. I'll be adding support for selecting a date range in future.
I kept having out of memory situations where I was unable to finish fine-tuning or quanization jobs when I had hundreds of GB's of video memory seemingly free - I kept getting:
Failed to load checkpoint: Some modules are dispatched on the CPU or the disk
It turns out that in a unified memory Grace Blackwell system you need to drop your OS cache or it can consume too much of the unified memory, resulting in paging to disk instead of GPU use.
sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'Permalink
Today I tried to get voice recognition working using hardware acceleration on an ARM64 system with an Nvidia GPU. I eventually found the easiest way was to make my own Dockerfile based on their own, but using this base image:-
FROM nvidia/cuda:12.3.2-cudnn9-runtime-ubuntu22.04
This comes with all the necessary Cuda support libraries pre-loaded and allowed me to quickly compile, build and publish a Grace Blackwell compatible docker image of faster-whisper. Once I've tested it a bit and set up an automated pipeline I'll publish it.
The difference between CPU and GPU processing was only a second or so, but it does make the recognition feel snappy and you can use the large model easily instead of the tiny one.
Permalink