Description Transcript
Large language models (LLMs) are all over the news, but do you know how they work? What are they made up of, and what are their levels of complexity? Watch our developer advocate, Angelik Laboy Torres, answer these questions and explore how you can get started on building on your own.
đ Chapters
00:00 - Intro
00:19 - What is an LLM?
00:33 - Prompt
00:49 - System Message
01:18 - Token
01:36 - Context Window
01:51 - Temperature
02:14 - Hallucination
02:32 - Multimodal
02:52 - Building a Pre-Trained LLMs
03:49 - Three Tiers of Complexity: Prompt Engineering
04:08 - Three Tiers of Complexity: RAG
04:25 - Three Tiers of Complexity: Fine Tuning
Read more 0:02 In the intersection between probability and chaos 0:05 a new kind of mind takes shape. 0:07 Where once we program explicit rules, 0:10 a model is there to predict the next piece of sequence 0:13 based on the previous information. 0:16 Trained on a massive amount of data, 0:17 why of course! 0:19 Think of it like this, 0:21 If REST APIs 0:22 are function calls 0:23 over HTTP, 0:24 LLMs are conversations 0:26 with a system that predicts and fills each word 0:29 of its answer, based on everything its ever read. 0:33 Prompt is your question, 0:35 your instruction, 0:36 and your starting point. 0:38 But unlike code where your syntax errors break everything, 0:42 prompts are forgiving. 0:43 They are an art form that shapes the AI's response 0:47 without dictating the next steps. 0:49 And just how you will configure 0:51 your development environment before writing code... 0:54 The system message sets the stage! 0:56 It is invisible to the user 0:58 but critical for the AI, 1:00 functioning like both environmental variables and foundational instructions, 1:04 it shapes not HOW your application behaves 1:08 but establishes the foundational rules 1:10 and capabilities 1:11 like a blueprint or prototype 1:13 defining what can and cannot be created within the conversation. 1:18 Each word, part-word, or symbol 1:20 the model processes is a token. 1:23 If strings are your basic unit in code, 1:25 tokens are the fundamental unit in LLMs. 1:30 Roughly four tokens here. 1:32 The economy of language becomes a literal economy when you're paying per token. 1:36 The context window is your model's working memory; 1:40 how much it can see at once. 1:42 Think of it like buffer size or stack memory, 1:44 8k tokens or 32k tokens, 1:48 that's all your model has to work with. 1:50 And here is where it gets interesting, 1:52 Temperature controls randomness! 1:54 At zero is deterministic. 1:56 Always taking the most probable path 1:59 like your sorting algorithm always giving the same result. 2:02 Higher temperature introduces chaos! 2:05 More creative, 2:06 more surprising, 2:07 and more human. 2:08 Just how you might add randomization 2:11 to prevent overfitting in your ML models. 2:14 But... with creativity comes 2:16 hallucinations!!!! 2:18 The model's equivalent of a memory leak 2:20 or an uncaught exception. 2:24 Similar 2:25 even if the output looks right 2:27 and FEEELLSSS right, 2:28 you need to validate what the AI is giving back to you. 2:31 Traditional code handles different data types 2:34 through interfaces and polymorphism, 2:36 multimodal modes do the same 2:39 but seamlessly! 2:40 Text becomes image, 2:41 image becomes description, 2:43 audio becomes transcription, 2:44 all throughout a unified interface. 2:47 It is like your ultimate polymorphic function that just works! 2:52 For us, 2:53 this isn't the technology 2:54 it's a paradigm shift. 2:55 The fundamentals will still remain 2:57 input, process, output. 2:59 Now, you might have decided to add a 3:02 pre-trained LLM into your system. 3:05 What do you need to know? 3:06 How do you get started, right? 3:07 Well, the first thing would be to ask yourself 3:09 about the functionality. 3:11 What do you need it for? 3:12 Is it multimodal? 3:14 Does it speak the language that you need? 3:16 If you have global users, this actually matters a lot 3:19 or... how about does it handle theÂ
inputs that your application processes? 3:24 Then, you have to think about performance, right! 3:27 You got to look at it and be like 3:28 qualitatively, you have to ask yourself does it meets the expectations that you have on it? 3:34 Quantitatively, how does it do on benchmarks 3:37 like GLUE or LMSYS Chatbot Arena 3:41 or even the ones that you've established for yourself 3:43 and then obviously, lastly, the cost! 3:46 Your budget will determine everything!!! 3:49 So now that you have selected a model, 3:51 it's all about optimization 3:53 and think of the next three tiers as levels of complexity. 4:00 Prompt engineering, 4:01 the simplest approach, 4:02 modify how you communicate with the model like 4:04 writing a config file for a human brain. 4:11 RAG (or Retrieval Augmented Generation) 4:13 gives your LLM more context 4:15 by connecting it to external knowledge bases. 4:18 We sort of already do this 4:19 with services when we connect them to databases 4:22 however, we're going to be learning about vectors 4:24 in the future. 4:28 Last but not least 4:29 fine-tuning! 4:30 It's about retraining your model with additional data, 4:33 techniques, and specialized strategy. 4:35 It can actually be the most complex of them all 4:38 but the most rewarding, depending on your needs. 4:40 Now, what makes it so exciting 4:42 are the magical incantations. 4:44 Is about the combination of prompts, 4:46 of training methods, 4:48 and parameters 4:49 that can produce very specialized and remarkable results 4:52 depending on your use cases. 4:54 Managing these components individually 4:56 means writing code for model loading, 4:58 context management, 5:00 prompt templates, 5:00 memory handling, 5:01 and tool integrations. 5:02 That is why LLM frameworks emerged! 5:05 The idea is to give you pre-built tools 5:07 that come from common patterns 5:09 just like how web frameworks give you authentication and database connections 5:13 out of the box. 5:14 I would think of frameworks as a way to build AI agents 5:17 which are essentially orchestratedÂ
loops of LLM calls 5:22 that have memory and tool integration. 5:24 And the core of what are AI agents 5:27 is the same as prompt engineering: 5:29 it's all about the conversation interaction! 5:31 And just remember, 5:33 we already know that the coin might land 5:35 on one side or the other. 5:37 That doesn't change the coin. 5:38 And you know, you already know how to make it dance! 5:42 See you in the next one!