Ultimately, it is all about data throughput to the CPU caches because tensors are so large. The M2 claims a 128 bit bus. The instruction support for ARM built into llama.cpp is weak compared to x86. If you want to run big models that require lots of memory, without spending five figures, find a Intel chip that supports AVX-512 and has support for 96+ GB of ram. AVX-512 and the related sub commands are directly supported in llama.cpp and that gets you 512 bit instructions. Apple can't match that.
If you want a laptop, get something with a 3080Ti. It needs to specifically be the Ti version. This has 16GBV ram and came in several 2022 models.
Run Fedora with it. They have Nvidia support including a slick script that builds the GPU driver from source with every kernel update automatically, and keeps secure boot working all the time.