llama.cpp adds 1-bit inference#
New 1-bit quantization runs 70B models on a laptop.
Why it mattersLocal inference just got dramatically cheaper for indie builders.
0.90 pts#local#quantization#inference
Tag
1 item tagged #inference
New 1-bit quantization runs 70B models on a laptop.
Why it mattersLocal inference just got dramatically cheaper for indie builders.