The Llamas are ComingPublished on 30 April 2023

Cover image for blog post with the title: The Llamas are Coming

Would you pet a llama? I know I would. I've never seen one up close, to be honest, but they look funky and cool. They're like that chill dude next door that you just want to hang out with, chug down a beer or two, watch The Big Lebowski with, and sit on the roof of your house, counting stars in silence. And when it's time to go, you can just get up and leave, and he'll understand.

Until next time, friend. Until next time.

It shouldn’t surprise anyone, then, that the AI team over at Facebook decided to nickname their version of HAL 9000 exactly that: Llama. It might destroy the world, but at least it’ll snuggle with you while it does so.

Alright, it’s also a wordplay on LLM. Large Language Model. LLaMa.

…Get it?

Now, before we proceed I’d like to reassure you, that if you copy paste this article into NoGPT, it’ll tell you it was written by a human. So read on, dear… OpenAI content scrubber.

Read on.

This is the part where I pretend you wandered onto a tech-ish (yeah, “tech-ish”) blog in April 2023 without knowing what an LLM is, and I give you the definition of it to improve my SEO rating on Bing. (Yep, Bing. We’ve switched over. What’s Google, anyway?)

Now, for the non-techies out there, if I had to describe an LLM I’d tell you to imagine a super-brain, a language-savvy piece of software that can do pretty much anything you throw at it - well, except maybe tell a good joke or close your Apple Watch rings for you.

TL;DR it’s like having a walking encyclopaedia that can help you with your homework, write your essays, or even generate a whole new article from scratch.

And again - no. Not this article.

As you read this, if you notice planes flying overhead, covering the skies, and all of a sudden you’re on board a ship named after a Babylonian king, with wires sticking out of your brain and some rando asking you to “show him” if you know Kung Fu, keep calm and do what he says.

Unlike that rando, though, as things stand you don’t need a weird-looking breed between a space shuttle and a nuclear submarine to run an LLM yourself.

Yes, you heard me right. Local LLMs have landed, and you might want to try one out on your under/over powered (depends on who you ask) MacBook Pro with the M1 chip inside it.

Rejoice. You don’t need a beefy cloud instance or overpriced Nvidia cards. GPU bros might be LLM-ing while running Cyberpunk 2077 in RT Overdrive, but hey, your MacBook can do stuff too! What a great purchase!

And you owe all it to a dude out there, who knows C++.

His name is Georgi Gerganov and he wrote this nifty little thing called “llama.cpp”, a pure C++ implementation of Facebook’s Llama, whose main goal is, as the project states, “to run the model using 4-bit quantization on a MacBook”.

GitHub - ggerganov/llama.cpp: Port of Facebook's LLaMA model in C/C++

What’s 4-bit quantization, you ask? It’s stuff for smart people. You and I, we can just focus on the outputs, Ok?

“Not Ok”, you say in protest? You “clicked on the link hoping to learn something”? Lol. Alright.

Let’s make this quick: in LLMs, 4-bit quantization refers to the process of compressing the model to only use 4 bits of data instead of the typical 32 bits. Basically, this allows for faster and more efficient computation on devices with limited resources, such as mobile phones or… You guessed it! Laptops, à la MacBook.

Now, keep in mind, this comes at the cost of loss in precision and accuracy. Which I’m more than happy to accept in order to try and make friends with my future overlord, who comes in several sizes: 7B, 13B, 30B, and 65B. Those aren’t t-shirt sizes (I thought they were, silly me), but numbers that refer to the number of parameters the model was trained on, in billions. 7B - 7 billion parameters, and so on. A parameter is like a value within the model that can be fine-tuned to improve the model's performance on a specific task or domain. Think of parameters as skills: you can fine-tune it for rhyme and poetry, code writing, blog articles…

Again, no! Not this blog article. Jeez.

A downside is that larger models also require more resources to run. The 7B model has a smaller number of parameters compared to the 13B and 30B models, and requires less beefy hardware.

On the other side of the ring, the larger models, such as the 65B, are better suited for tasks that require a higher level of accuracy and precision, which makes them resource-hungry.

You see where I’m heading with this, right?

Yes, the bad news is that your MacBook Pro probably isn’t capable of running the 65B Llama model. Then again, I don’t own a M1 Max or Ultra.

The good news is that the M1 Pro runs the 7B model like a charm, the 13B model surprisingly well, and the 30B model… Well, if you have a few hours in front of you, you’ll get to see it type a couple of lines onto your terminal.

But all is not perfect in Llama land. The model is prone to hallucinating, getting things incorrectly a lot of the time, using outdated information, and just plain bugging out on you. It doesn’t seem to be able to do more intricate tasks like finding bugs in complex code, and in terms of content originality I’d give it less than a F-.

Nonetheless, just a few months ago this was only possible in OpenAI’s GPT sandbox. A billion-dollar company, spending small-government-type amounts on GPUs and running its models in the cloud.

And right now, it’s landing onto my hard drive and it’s talking to me. Sure, it’s a confident bullshit peddler, but man…

It’s impressive, it’s scary, and the perfect glimpse into the actual future of these LLMs.

What do I mean by “actual future”?

I mean, open source. I mean, not gated behind a private API. I mean DIY, raw and untethered.

A public-facing LLM gated by a corporation, no matter how magical it may be, will never reach its full potential (whether it has to is a question for another article). Not even with a publicly exposed API. It will always remain in the “outstanding and groundbreaking consumer product” realm. There’s always going to be that kid in his parents’ garage, tinkering and tweaking, who’s going to be steamrolling ahead of Big Startup/Corp.

Emerging consciousness, in the hands of each and every one of us.

That’s… A lot to unpack.

“Oh but it’s not consciousness, it’s just ML.”

“Oh but it’s not good enough yet.”

“Oh but this but that”.

Look, what we’re witnessing are probably clumsy baby steps. But taking into account the break-neck speed at which they’ve been happening, it’s not unreasonable to assume that more frontiers are going to be conquered in the months & years to come, both for corporate-level LLMs and open-sourced ones. The space is bound to stay unregulated for some time (not the best of news), which will help fuel an emergence of every kind of LLM imaginable.

You think porn Stable Diffusion models were the tip of the iceberg? Wait until you get hallucinating LLMs out there trained on the darkest corners of your mind. Mirror images of the scariest thoughts your mental can conjure.

Llama, and its already sprouting like mushrooms variants are just the beginning. In the weeks after Llama released, Alpaca and Vicuna, both tweaked Llama models, made their appearance. And Vicuna was trained for just around $300.

In the hands of a well-mannered individual (TBD what that actually is, maybe someone who reads the Bible every day) able to work alongside them, open sourced LLMs are the real deal. Your Stable Diffusion model, taught on your techniques, your art, your inspiration and imagination, helping you accomplish things you couldn’t before, on a laptop in a plane or in your office.

Your LLM sidekick, learning from your writing, your way of putting words together, your humour, your satire, now assisting you in unearthing ideas - your ideas - that you might’ve spent decades arriving to.

Plug it into your future BMI, the Neuralink behind your ear, and your transformation to all-knowing, all-powerful entity is complete. Congratulations, you’re now God. Plant some apples in your garden, call it Eden. Look around, at all these other Edens. Crap, there’s so many of them. So many Gods, such a small planet…

Maybe take a vacation on the Moon. Starships haven’t exploded in a while, tickets have gotten cheaper, and with your Neuralink subscription you get 20% off a cruise. Good deal. And don’t forget, if you use your “SpaceX Black Hole” credit card you’ll get enough points to transition from God to…

Hm. Few eternal questions can boggle your mind quite like: what is there after God? What rhymes with “hug me”? What does the goddamn fox say?

No idea. But man, you can’t wait to discover.

All the way up, right?

Lonely saxophone music wafts in… Only you can’t hear it because the Neuralink strapped to your brain malfunctioned and fried the shit out of it. Now you’re just a piece of meat floating above your king-sized bed in the “Armstrong” suite in the “Old Apollo Hotel” on the moon.

We’re all Gods. Until we’re useless pieces of meat.

Keep your hand on the pulse.

Go pet a real Llama in a farm or something.

Remember who you are.