dhrv's blog
dhrv
A personal blog
all
learn
worklog
pass function pointers to kernels
2026-04-06
softmax with CUDA worklog - pt II (Block Reduction to Tiled Online Softmax)
2026-03-26
softmax with CUDA worklog - pt 1 (Naive to Kernel Fusion)
2026-03-19
Nsight Compute - a primer on profiling
2026-03-08
float2 vs float4 in GPUs
2026-02-28
Autopsy of llm.c adamw optimizer
2026-02-26