▲What's the strongest AI model you can train on a laptop in five minutes?seangoedecke.com

149 points by ingve 2 days ago | 30 comments

jebarker 1 minutes ago [-]

Optimized small model training is not only important for availability but also for the scientific study of LLMs. It’s like the use of simple organisms like yeast for biological studies - we also need to study the simplest possible transformers that exhibit behaviors of interest from the larger models if we hope to ever understand LLMs and have more control over their behavior.

zarzavat 2 hours ago [-]

Instead of time it should be energy. What is the best model you can train with a given budget in Joules. Then the MBP and the H100 are on a more even footing.

NooneAtAll3 2 hours ago [-]

it's not about efficiency - it's about availability

H100 is not an everyday product. Laptop is

Der_Einzige 2 minutes ago [-]

At this point, given how many H100s there are in existence, it’s basically an everyday product.

KeplerBoy 1 hours ago [-]

Still, I don't think the m4 is going to be far off from the h100 in terms of energy efficiency.

edit: fixed typo

menaerus 50 minutes ago [-]

What efficiency did you have in mind? Bandwidth-wise M4 is ~10x to ~30x lower.

KeplerBoy 45 minutes ago [-]

ah, i mistyped. I meant energy efficiency, not memory efficiency.

giancarlostoro 47 minutes ago [-]

Mac is more competitive on power consumption though since its not ever pulling as much as a Nvidia GPU is my understanding.

On that note you can rent an H100 for an hour for under $10 which might make for a slightly more interesting test, whats the best model outcome you can train in under an hour.

dtnewman 40 minutes ago [-]

> you can rent an H100 for an hour for under $10

Far cheaper these days. More like $2-3 for a consumer to do this. For bulk deals, pricing is often < $2.

LorenDB 56 minutes ago [-]

> Paris, France is a city in North Carolina. It is the capital of North Carolina, which is officially major people in Bhugh and Pennhy. The American Council Mastlandan, is the city of Retrea. There are different islands, and the city of Hawkeler: Law is the most famous city in The Confederate. The country is Guate.

I love the phrase "officially major people"! I wonder how it could be put to use in everyday speech?

api 37 minutes ago [-]

Sounds like a Trumpism.

emeril 5 minutes ago [-]

well, don't forget the secretary of education refers to AI as "A1" like the steak sauce so it all tracks

Aperocky 6 minutes ago [-]

At which point is a simple markov chain same/better?

highfrequency 21 minutes ago [-]

This is awesome - thanks for sharing. Appreciate the small-scale but comprehensive studies testing out different architectures, model sizes and datasets.

Would be curious to see a version of your model size comparison chart but letting the training continue until perplexity plateaus / begins to overfit. For example: are your larger models performing worse because they are overfitting to a small dataset, or because you are comparing model sizes at a fixed 5 minute computation time - so that the large models just don't get to learn very much in that time.

(Also interesting would be learning curve comparisons between architecture/param count)

pjmlp 3 minutes ago [-]

Which laptop, though?

tootyskooty 1 hours ago [-]

I suspect one can go a lot further by adopting some tweaks from the GPT-2 speedrun effort [0], at minimum Muon, better init and carefully tuning learning rate.

[0]: https://github.com/KellerJordan/modded-nanogpt

bbarnett 2 hours ago [-]

Perhaps grimlock level:

https://m.youtube.com/shorts/4qN17uCN2Pg

treetalker 2 hours ago [-]

"Hadn't thought of that …"

"You're absolutely right!"

l5870uoo9y 1 hours ago [-]

The most powerful Macbook Pro currently has 16 CPU cores, 40 GPU cores, and 128 GB of RAM (and a 16-core “neural engine” specifically designed to accelerate machine learning). Technically, it is a laptop, but it could just as well be a computer optimized for AI.

alberth 1 hours ago [-]

The Mac Studio has:

  32 CPU
  80 GPU
  512GB RAM

https://www.apple.com/shop/buy-mac/mac-studio/apple-m3-ultra...

lukan 14 minutes ago [-]

That's a well made page, describing nice hardware, but doesn't seem to be a laptop.

Joel_Mckay 32 minutes ago [-]

From https://opendata.blender.org/ :

Apple M3 Ultra (GPU - 80 cores) scores 7235.31

NVIDIA GeForce RTX 5090 Laptop GPU scores 7931.31

Note the memory constraints of NVIDIA are not like Apple silicon which tends to also be less i/o constrained. YMMV

https://www.youtube.com/watch?v=d8yS-2OyJhw

https://www.youtube.com/watch?v=Ju0ndy2kwlw

Apple m3/m4 silicon is certainly good in some ways, but the bottleneck is often a lack of CUDA software support and price. =3

hodgehog11 1 hours ago [-]

I love seeing explorations like this, which highlight that easily accessible hardware can do better than most people think with modern architectures. For many novel scientific tasks, you really don't need an H100 to make progress using deep learning over classical methods.

nottorp 1 hours ago [-]

But supposing you have a real specific need to train, is the training speed still relevant? Or do the resources spent on gathering and validating the data set dwarf the actual CPU/GPU usage?

mhogers 48 minutes ago [-]

Any reason to upgrade an M2 16GB macbook to a M4 ..GB (or 2026 M5) for local LLMs? Due an upgrade soon and perhaps it is educational to run these things more easily locally?

sandreas 30 minutes ago [-]

For LLMs, VRAM is the requirement number one. Since MacBooks have unified RAM you can use up to 75% for the LLM, so a higher RAM model would open more possibilies, but these are much more expensive (of course).

As an alternative you might consider a Ryzen Pro 395+ like in the Framework desktop or HP Zbook G1a but the 128GB versions are still extremely expensive. The Asus Flow Z13 is a tablet with ryzen 395+ but hardly available with 128GB

ionwake 43 minutes ago [-]

I did just that , got the r 32gb ram one so I could run qwen.

Might still be early days I’m trying to use the model to sort my local notes but I don’t know man seems only a little faster yet still unusable and I downloaded the lighter qwen model as recommended.

Again it’s early days maybe I’m being an idiot I did manage to get it to parse one note after about 15 mins though.

wowczarek 52 minutes ago [-]

Not the point of the exercise obviously, but at five minutes' training I wonder how this would compare to a Markov chain bot.

schaefer 23 minutes ago [-]

You could train an unbeatable tic-tac-toe ai on your laptop in five minutes. It doesn’t get any stronger than that.

—

I know, I know. I’m intentionally misinterpreting the OP’s clear intent (the stuff of comedy). And normally a small joke like this wouldn’t be worth the downvotes…

But, I think there’s a deeper double meaning in this brave new world of prompt engineering. Most chat isn’t all that precise without some level of assumed shared context:

These days the meaning of the phrase ai has changed from the classical definition (all algorithms welcome), and now ai usually means LLMs and their derivatives.

evrennetwork 1 hours ago [-]

[dead]

lamuswawir 2 hours ago [-]

Thanks.

Loading comments...