AI & Serverless

Mar 13, 2024

Thanks for clicking the latest edition of AWSCQ.

This issue’s guest editor is the brilliant Yan Cui. Yan was one of the first ever Comsum speakers back in 2018; We were incredibly lucky to have him speak for us then and we feel the same way about him helming this issue today.

Yan is an AWS Serverless Hero and independent consultant. If you’re looking to improve feature velocity, reduce costs, and make your systems more scalable, secure, and resilient, then check out his services here.

Today Yan takes on the fastest moving topic in tech: AI & Serverless.

Over to you Yan.

So much is happening in the AI space. It feels like we are getting groundbreaking news every week! There are also a lot of synergies between AI and serverless. So in this issue of AWS Comsum Quarterly let’s catch up on what’s happening in AI and how to build AI-powered apps using serverless technologies.

Claude 3 takes the crown from ChatGPT (for now)

Last week, Anthropic announced the Claude 3 family of LLMs - Haiku, Sonnet and Opus with Opus being the largest model. Opus outperforms ChatGPT 4 and Gemini 1.0 across the board, according to Anthropic’s own benchmarks.

You should take these benchmarks with a pinch of salt. They are unreliable and contain many errors. But the one that stood out is the human-evaluated coding ability. This result has been supported by independent reports, such as the AI Explained YouTube channel.

OpenAI announced Sora

Sora’s introduction to the world was nothing sort of breathtaking. OpenAI has taken text-to-video AI models to the next level and took the wind out of RunwayML’s sail at the same time.

There are still some obvious glitches. But how long before all Hollywood movies are made with AI and we’re flooded with fake videos on social media? This is both exciting and horrifying.

Groq makes LLMs fast

As powerful as the LLMs are getting, one problem with them is that they are sloooowww. The current streaming response approach is a hack to hide this throughput issue, not a feature. I want my chatbots to give me an instant and complete response, not spit them out one character at a time.

That’s where Groq comes in. Its inference engine uses a custom Language Processing Unit (LPU) chip architecture and is capable of generating over 500 tokens per second!

See for yourself at

https://groq.com.

What happened with Mamba?

The Mamba paper came out with a great deal of fanfare and promised faster (5x) inference and a larger (million tokens) context window compared to the transformer architecture.

But as far as I can tell, it has not been adopted by any of the recent LLMs. Perhaps the technical advances in the transformer architecture have already made it redundant. Groq is able to solve the throughput issue with custom hardware. And both Gemini 1.5 Ultra and Claude 3 Opus can support a context window of up to a million tokens with 100% recall.

Oh, and what about Small Language Models?

Does anyone remember Microsoft’s Phi-2 announcement? It’s been a few months, which feels like years in the AI timeline…

Training a new LLM takes a tremendous amount of data and computing power. What Phi-2 has shown us is that you can achieve comparable results with a far smaller model by training the model more efficiently. This is done using a mix of real-world and synthetic data to create a higher-quality dataset for training.

Cognition Labs announces Devin, the first AI software engineer

The demo is very impressive. Is it game over for us already? I thought we had more time.

Ok, in all seriousness, while the demo is very cool and it looks more capable than the autonomous AI agents we have seen so far, it’s still far from perfect. This step-by-step example by Andrew Gao should give you a better sense of what it can do and what it still struggles with.

We’re not quite out of the job just yet. But these AI agents are getting better, and it’s only a matter of time before “coding” is automated. The economic incentives are there for it to happen.

It’s not necessarily a bad thing. More importantly, it’s inevitable and we should embrace it. The role of the “software engineer” needs to move up the value chain, as we have done many times already. Do you remember the time when we had to configure and patch servers? Now we just ship some code in a zip file and watch it scale on-demand.