AI & Serverless
Thanks for clicking the latest edition of AWSCQ.
This issue’s guest editor is the brilliant Yan Cui. Yan was one of the first ever Comsum speakers back in 2018; We were incredibly lucky to have him speak for us then and we feel the same way about him helming this issue today.
Yan is an AWS Serverless Hero and independent consultant. If you’re looking to improve feature velocity, reduce costs, and make your systems more scalable, secure, and resilient, then check out his services here.
Today Yan takes on the fastest moving topic in tech: AI & Serverless.
Over to you Yan.
So much is happening in the AI space. It feels like we are getting groundbreaking news every week! There are also a lot of synergies between AI and serverless. So in this issue of AWS Comsum Quarterly let’s catch up on what’s happening in AI and how to build AI-powered apps using serverless technologies.
Claude 3 takes the crown from ChatGPT (for now)
Last week, Anthropic announced the Claude 3 family of LLMs - Haiku, Sonnet and Opus with Opus being the largest model. Opus outperforms ChatGPT 4 and Gemini 1.0 across the board, according to Anthropic’s own benchmarks.
You should take these benchmarks with a pinch of salt. They are unreliable and contain many errors. But the one that stood out is the human-evaluated coding ability. This result has been supported by independent reports, such as the AI Explained YouTube channel.
OpenAI announced Sora
Sora’s introduction to the world was nothing sort of breathtaking. OpenAI has taken text-to-video AI models to the next level and took the wind out of RunwayML’s sail at the same time.
There are still some obvious glitches. But how long before all Hollywood movies are made with AI and we’re flooded with fake videos on social media? This is both exciting and horrifying.
Groq makes LLMs fast
As powerful as the LLMs are getting, one problem with them is that they are sloooowww. The current streaming response approach is a hack to hide this throughput issue, not a feature. I want my chatbots to give me an instant and complete response, not spit them out one character at a time.
That’s where Groq comes in. Its inference engine uses a custom Language Processing Unit (LPU) chip architecture and is capable of generating over 500 tokens per second!
See for yourself at
https://groq.com.
What happened with Mamba?
The Mamba paper came out with a great deal of fanfare and promised faster (5x) inference and a larger (million tokens) context window compared to the transformer architecture.
But as far as I can tell, it has not been adopted by any of the recent LLMs. Perhaps the technical advances in the transformer architecture have already made it redundant. Groq is able to solve the throughput issue with custom hardware. And both Gemini 1.5 Ultra and Claude 3 Opus can support a context window of up to a million tokens with 100% recall.
Oh, and what about Small Language Models?
Does anyone remember Microsoft’s Phi-2 announcement? It’s been a few months, which feels like years in the AI timeline…
Training a new LLM takes a tremendous amount of data and computing power. What Phi-2 has shown us is that you can achieve comparable results with a far smaller model by training the model more efficiently. This is done using a mix of real-world and synthetic data to create a higher-quality dataset for training.
Cognition Labs announces Devin, the first AI software engineer
The demo is very impressive. Is it game over for us already? I thought we had more time.
Ok, in all seriousness, while the demo is very cool and it looks more capable than the autonomous AI agents we have seen so far, it’s still far from perfect. This step-by-step example by Andrew Gao should give you a better sense of what it can do and what it still struggles with.
We’re not quite out of the job just yet. But these AI agents are getting better, and it’s only a matter of time before “coding” is automated. The economic incentives are there for it to happen.
It’s not necessarily a bad thing. More importantly, it’s inevitable and we should embrace it. The role of the “software engineer” needs to move up the value chain, as we have done many times already. Do you remember the time when we had to configure and patch servers? Now we just ship some code in a zip file and watch it scale on-demand.
Getting started with AI and serverless
Honestly, this AI space is moving way so fast it’s hard to keep up.
So how do you get into AI and start learning?
There are two courses from AWS and taught by the amazing Mike Chambers:
Marcia Villalba also has a series of videos on Amazon Bedrock to help you get started, including examples in Python and Node.js.
Here are examples of Bedrock being used in the wild:
Build an internal SaaS service with cost and usage tracking for foundation models on Amazon Bedrock
How I Used Amazon Bedrock to Write, Schedule, and Post My Tweets
A huge thank you to Yan for putting this fascinating edition of AWSCQ together.
Before we go we’ve just time to plug out next live event; the (deep breath) Post AWS London Summit Community Network Evening.
If you’re in London for the summit then register below and end your day with some food and drink with friends and colleagues from the AWS community.
This was great fun last year - expect more of the same!
And that’s all folks!
We’ll be back with another issue and Guest editor in the coming weeks.
Before you go be sure to give our sponsors a click. AWSCQ, our live and digital events are all made possible by their support.