0 Comentários
0 Compartilhamentos
145 Visualizações
Diretório
Diretório
-
Faça o login para curtir, compartilhar e comentar!
-
WWW.CREATIVEBLOQ.COMThe 5 best VFX movies of 2024From Dune to Alien Romulus and Wicked, it was great year for CGI in movies.0 Comentários 0 Compartilhamentos 167 Visualizações
-
WWW.WIRED.COMHow NASA Might Change Under Donald TrumpThe Trump transition team is looking for big changes at NASAincluding some cuts.0 Comentários 0 Compartilhamentos 150 Visualizações
-
WWW.NYTIMES.COMElon Musks xAI Raises $6 Billion in New FundingA lot of compute is needed, Mr. Musk said in a post about the financing, in which BlackRock, Fidelity and Sequoia participated.0 Comentários 0 Compartilhamentos 177 Visualizações
-
APPLEINSIDER.COMApple's bad blood with Nvidia continues, after decades of fightingApple is ramping up research and development of its own AI chip to reduce its reliance on third-party developers, potentially finally completely ending its decades-long unhappy relationship with Nvidia.Siri icon in a datacenterIn November 2020, Apple announced the M1 chip, its first foray into in-house designed processors for its Mac lineup. The move effectively severed ties between Apple and Intel, the latter responsible for previous processors in Apple's computers.Now, it seems like Apple is gearing up to reduce its reliance on another third-party developer Nvidia. Currently, Apple works with Nvidia to power many of the features behind Apple Intelligence. Continue Reading on AppleInsider | Discuss on our Forums0 Comentários 0 Compartilhamentos 166 Visualizações
-
ARCHINECT.COMUniversity of Miami scientists find 35 buildings sinking along city's coastlineAn alarming new study produced by the University of Miami has shown 35 buildings to be sinking at depths of up to 8 centimeters (3.1 inches). They are all located in betweenMiami Beachand its northernmost Sunny Isles Beach neighbor. The subsidence has gone on since at least 2016 and affected multiple structures in the vicinity of the former Champlain Towers South in Surfside, where nearly 100 people were killed tragically in a 2021 building collapse. Roughly half of those documented by the survey were constructed after 2014, according to its authors. (h/t NBC 6 South Florida).0 Comentários 0 Compartilhamentos 163 Visualizações
-
WWW.THEVERGE.COMTheres a reason Metaphor: ReFantanzios battle music sounds as cool as it doesMetaphor: ReFantanzio is one of 2024s best games racking up a stack of Game Awards including best RPG, best art direction, and best narrative. But one category in which Metaphor particularly stood out was its music. The soundtrack, produced by Shoji Meguro the long-time music director of the Persona series, is one of the outstanding achievements in video game music this year, particularly its battle theme which became a viral hit. In an interview with The Verge, Meguro talked about his work on the Metaphor soundtrack including what went into what is perhaps the coolest piece of video game battle music ever made.Meguro, known for his work producing the pop-y, jazzy vibes of the Persona soundtracks, acknowledged that Metaphors heavy orchestral / choral sound is not something Persona fans would expect from him and definitely outside his own wheelhouse. He said in order to effectively change gears from Persona to Metaphor, he had to relearn classical music theory.But thats what makes creating this score so exciting, he said. When I was first told about Metaphor: ReFantazio, I was told it would be an epic, high-fantasy RPG. And immediately I heard the sound of great orchestras playing and thought this might be an opportunity to write songs Ive never really written before, which excited me greatly.In developing the music for Metaphor, Meguro said that he wanted to evoke a classical, fantasy experience but feature a unique twist that he said fans have come to expect from Atlus games. That twist became what Meguro called a spiritual musical style that defines the soundtrack, particularly the battle music.As it was nominated for Game of the Year, Metaphors music made an appearance during this years Game Awards.If youve spent any amount of time on gaming social media this year, youve probably seen tons of posts talking about Metaphors battle music. For a battle theme it goes extremely hard, with one version starting off with an orchestra-backed choir singing with the kind of gusto youd expect for a meeting with Sephiroth, not something that plays during every minor encounter in the game. Then, somehow, the song goes even harder with the addition of a Japanese monk chanting in a rapid-fire cadence that could go toe-to-toe with Eminem. To further elevate the songs, the chants were written in an original language inspired by Esperanto, a language that was invented in 1887 and designed to be used as an internationally universal secondary language.But finding the right voice for the job wasnt easy. I was looking for a specific type of voice that could sustain a fast rhythm while reading Esperanto-inspired scripture, Meguro said.His search led him to YouTube, where scrolling through performances was how he found a monk named Keisuke Honryo performing in Nam Jazz Experiment, a musical group that combines jazz with the recitation of traditional Buddhist sutras. It was so great, I immediately made [Honryo] an offer and luckily he accepted and was happy to be a part of this game.Your fairy sidekick in Metaphor is also your DJ throughout the game. Image: AtlusBut theres a reason why Metaphors battle music is so arresting and its not just because of the musical stylings of a Japanese monk chanting in an invented language inspired by another invented language. Meguro had to reframe his thinking in developing the soundtrack, leading to the creation of something truly unique that changes how players perceive the game.Ive always considered game scores to be similar to UI elements, constructs that exist solely to service the player, Meguro said. Although the score has to capture the atmosphere of the story for the user, its worth reminding ourselves that this music is not actually playing directly within the world the characters are in.Meguro explained that in conversations with the games director Katsura Hashino, the two discussed ways to connect what players are hearing to what the characters are hearing as well.He said the thought experiment allowed them to approach the music composition through a different lens. The idea wound up implemented in the game itself. In Metaphors opening hours, the players sidekick casts a spell that allows them to hear music as they roam about the world and, inevitably, get into fights.That moment dramatically changes the context of all of Metaphors music, especially its battle themes. Taking those songs from fun bits of ambience for solely players and turning them into something the characters experience too, explains why the songs go hard as they do. Every fight for us is one more event on the way to the credits, for the characters its life or death and it makes sense that the music they hear as they fight for their lives, reflects that gravity.Meguro used Metaphors music to bring the players further into the game and hes delighted by how well his work has been received. The two battle songs, called Warriors in Arms and Warriors in Valor instantly resonated with players, inspiring memes and even animated shorts.That brings me so much joy that fans are responding enthusiastically to the music of Metaphor, Meguro said. Its an honor to get that kind of reaction.0 Comentários 0 Compartilhamentos 150 Visualizações
-
WWW.THEVERGE.COMKobos Elipsa 2E, an excellent e-reader for taking notes, is down to its best priceAmazon might have released a new Kindle Scribe earlier this month for $399.99, but after testing it, I still dont think it can compete with its rivals. If youre looking for a good e-reader with more useful note-taking capabilities, the Kobo Elipsa 2E is still one of my favorites, and its down to an all-time low of $349.99 ($50 off)atAmazonorTarget. Rakuten Kobo will also throw in $10 in credit when you buy a $50 gift card, which you can use toward buyingbooks or styli.RelatedIf youre the type of person who likes to scribble in margins while reading ebooks, youll likely prefer the Kobo Elipsa 2E. The Elipsa 2E lets you directly write on ebook pages, taking notes in margins or anywhere else youd like, just as you would on paper. The new Kindle Scribe lets you directly write on pages, too, but its a lot more complicated and you cant even circle phrases or words.The Elipsa 2E also offers other helpful features beyond just a more natural note-taking experience. It boasts double the storage (32GB) as the entry-level Kindle Scribe, for example, and accurately converts handwriting into typed text faster than the Scribe. The biggest drawback, of course, is that it doesnt natively support Kindle ebooks, so youll have to convert your library if you want to read them from your Kobo device. Just a few more dealsSonys WH-CH720N noise-canceling headphones are currently down to just $74.99 ($75 off) at Amazon, Best Buy, and Target, which is their all-time low price. Their noise cancellation isnt as effective as rivals like Sonys WH-1000XM5, but for the price, they do a decent job of tuning out noise. They also deliver good sound, support for multipoint Bluetooth connectivity,and up to 35 hours of continuous playback, rendering them a great option if youre on a budget.If youre in need of USB-C chargers, you can buy the Anker PowerPort III and the eco-friendly version of the 30-watt Anker 511 Charger bundled together for $24.99 ($29.99 off) from Best Buy. Both are USB-C chargers thatll quickly power up your phone, laptop, tablet, and electronics, but the 30-watt Anker 511 Charger is smaller and thus more portable with a single port. In contrast, Ankers PowerPort III is a 65-watt wall charger that lets you quickly juice up to three devices at once.If youre looking out for porch pirates, the Blink Outdoor 4 camera is on sale for $49.99 ($50 off) one of its better prices to date at Amazon with a Sync Module 2 included. Blinks 1080p security camera offers better image quality than its predecessor, two-zone package detection, and an impressive two years of battery life. You dont need a premium subscription to record motion events either, thanks to the included Sync Module 2, which lets you record motion-activated video locally (with a USB stick) for free.0 Comentários 0 Compartilhamentos 125 Visualizações
-
TOWARDSAI.NETLlm Fine Tuning Guide: Do You Need It and How to Do ItAuthor(s): Igor Novikov Originally published on Towards AI. Working with LLMs, one of the most popular questions we get is about fine-tuning. Every second client asks if they should do additional training on their model.In most cases the answer is no, they dont need it. Modern LLMs are good enough without fine-tuning for many commercial applications, like a bot that helps clients order flowers from a flower shop. Besides they dont have data to do that, and no, 20 samples of dialogues they have do not count (and 200 too).Training and finetuning models is an expensive ordeal, and you really should avoid it if you can and spend the money saved on a trip to Aruba, or whatever vacation place you fancy.Image by the authorBut, there are cases when you do need it. For example, if you want LLM to follow a very specific chat format or have knowledge in a very specific domain or you want to cut costs by training a small model to do a very specialized task, instead of using large LLM with hundreds of billions of parameters. These are all valid cases for creating a tailored mode through fine-tuning.So lets look at the ways to do just that.When to fine-tuneAs said above, you only should fine-tune if you have to. Try to solve the task with prompt engineering first or build a RAG system. If that fails consider fine-tuning.Finetuning has the following disadvantages:It costs money and takes timeYou will need good training data or it will not workIt can lead to more frequent hallucinations even if done properly, as we are adding new behavior to a model that was not initially tailored for that. In case you make recurrent updates to the model, at some point it is almost guaranteed and is called drift, so you will have to evaluate your mode for that.Once you consider all the above, and still think a general LLM is not good enough you need to fine-tune.DataTo fine-tune you will need data in a specific format, called instruction dataset.Where to get dataThere are a lot of open datasets that you can use, for example, the Anthropic HH-RLHF dataset for model alignment, MIMIC-III for healthcare, and CodeSearchNet for coding. There are:Domain-specific datasets: medicine, law, coding, and so onTasks-specific datasets are useful to train the model to do one specific task and make RPAsGeneral-purpose datasets with generic knowledge, usually created from data crawled from the internetAlignment datasets: used for format, style, and safety alignmentThe Hugging Face Hub has lots of instruction datasets you can use for different domains, I suggest starting there.But since you decided to fine-tune you likely have your data, so you will need to create your dataset. Otherwise, why do you do that?If you dont have enough samples, you can generate synthetic data using large LLMs like ChatGTP by extrapolating from the data you have. Ill talk about it later.Data requirementThe dataset size depends on model size, task complexity, and training method. Companies like OpenAI are using humongous datasets with millions of items, which is not feasible for most companies due to cost so realistically we are going to have several thousands of samples.For simple changes like communication style alignment you dont need a lot of samples, several hundred will do, for domain-specific knowledge training you will need several thousand to hundreds of thousands, depending on the domain. In general, more is better, and it is better to have at least several thousand samples.Quality of data means not less, probably even more than quantity. You need to make sure the data reflects correctly the behaviors you want to model to learn, in both meaning AND format. I want to stress the format you want the model to output information in a way your users can understand, in terms of clarity and style. There is no use in a model that tells the truth in rap verses unless you want to create an Eminem twin.Data preparationData preparation is a critical step, as the quality of your data directly impacts the performance and accuracy of your model. Preparing your data involves several processes to ensure it is clean, relevant, and suitable for training:1. DeduplicationDuplicated data points can inflate training costs, introduce unnecessary noise, and lead to overfitting or biases in your model. Here are common approaches:Text Normalization:Convert text to lowercase.Remove special characters, extra spaces, and punctuation to standardize the content.Hash-Based Deduplication:Generate a hash of the normalized text. A commonly used technique is MinHash, which captures the essence or semantic fingerprint of an item rather than its exact text. This allows for identifying duplicates even if their format or small details differ. You can use libraries like datasketch to do thatCompare hashes and remove matching entriesVector-Based Deduplication:Convert items into vector representations (embeddings) to measure their semantic similarity.Use a vector database like Quadrant, Pinecone, or Weaviate to efficiently find similar items.Apply a cross-encoder on top of retrieved items to compute their similarity scores more accurately. This step helps you confidently identify and eliminate near-duplicates.2. Personal Information RemovalYou need to de-identify the data because you dont want the model to learn (and then tell everybody) the personal information of people (unless thats what you want). This can have serious legal and ethical implications, especially with regulations like GDPR. Besides, usually, personal data is not relevant to the domain knowledge.De-identification:Use Regex patterns for detecting common formats (e.g., emails or phone numbers).Leverage pre-trained NLP models designed for named entity recognition (NER) to identify and redact personal data.Domain-Specific Filtering:You may create your filters based on the context of your data. For example, medical data may require removing health-related identifiers as defined by HIPAA.3. DecontaminationYour dataset might contain content that can negatively affect model behavior:Malicious Content:Detect and filter out embedded commands targeting large language models (e.g., prompt injections), scripts, XSS, SQL injection code, etc.Automated scanning tools or specialized LLM-based classifiers can assist in identifying such patterns.Inappropriate Language:Filter curse words, slurs, offensive content, slang.4. Rule-Based FilteringNot all data in your dataset will be relevant to your domain or task. Rule-based filtering helps eliminate irrelevant or harmful content:Define exclusion criteria based on the task. For instance, if you are training a financial model, exclude non-financial data.Use keyword searches, phrases, or topic modeling to identify irrelevant content.I suggest using a hybrid approach:Use simple tools first:Regex or keyword-based search for patterns, like identifying email addresses or phone numbers.On the remaining items useLLM as a judge to evaluate the relevance or quality of data. For example, ask an LLM to label whether an item is appropriate for the training task.Use specialized ML models for complex cleaning tasks, such as detecting and filtering out toxic language. There are a bunch of pre-trained models on HuggingFace for that.Data EvaluationAfter all these steps I suggest having a separate pipeline to check the data quality. This can be done by humans, and if you have only several hundreds of samples you can do that. But if you have thousands, that is unlikely. So, again, you can use LLM as a judge approach or use a simpler classifier model for automated assessment. See, for example, HuggingFaceFW/fineweb-edu-classifier.For LLM you can use a prompt like:You are a data quality evaluator. Your goal is to assess the quality of an instruction and its corresponding answer. Determine how effectively the answer addresses the given task in a clear, accurate, and complete manner.Evaluation Criteria:Relevance: Does the answer directly address the instruction?Clarity: Is the answer clear and easy to understand?Completeness: Does the answer provide all the necessary information to fulfill the instruction?Accuracy: Is the information in the answer factually correct?Instructions:Carefully read the provided instructions and answer.Provide a score (15) for each of the evaluation criteria above.1 = Very poor5 = Excellent3. Justify your score with specific examples or observations for each criterion.Example for Evaluation:Instruction: Explain the water cycle.Answer: The water cycle involves evaporation, condensation, and precipitation, moving water between the Earth's surface and atmosphere.Your Evaluation:<Relevance>: 5 - The answer directly explains the water cycle.<Clarity>: 4 - The answer is clear but could elaborate on each step.<Completeness>: 3 - Missing details on processes like runoff or groundwater flow.<Accuracy>: 5 - The provided information is correct.Now, evaluate the following instruction-answer pair:Instruction: [Insert instruction here]Answer: [Insert answer here]What the acceptable threshold here is up to you, generally I would start with 8090%.Also be aware of which LLM you use for that and the fact that LLMs have certain biases (almost like humans):They prefer verbose, long and argument answers to concise ones, even if the shorter answer is more correctItems that are first on the list are often preferred by the model over others. This is also known as Baby Duck Syndrom. Thats important if you are creating preference datasets (more on that later).Model bias LLMs from the same family are likely to prefer data generated by the model of the same family. Thats important if you are going to generate syntectic data for training.DataSet FormatsThere are several popular formats, they are all kinda small and use JSON, so you can use any of them.OpenAI formatOpenAIs fine-tuning process utilizes a JSONL (JSON Lines) format, where each line represents a distinct training example.{ "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Can you explain the concept of photosynthesis?"}, {"role": "assistant", "content": "Photosynthesis is the process by which green plants convert sunlight into chemical energy."} ]}Alpaca Dataset FormatDeveloped by Stanfords Center for Research on Foundation Models. Each entry in this dataset is structured as follows:{ "instruction": "Describe the structure of an atom.", "input": "", "output": "An atom consists of a nucleus containing protons and neutrons, with electrons orbiting this nucleus."}ShareGPTThe ShareGPT dataset format is designed to capture multi-turn conversations between users and AI assistants, accommodating various roles such as human, gpt, observation, and function. This structure enables the representation of complex dialogues, including tool interactions and function calls.Each conversation is represented as a JSON object with the following components:{ "conversations": [ {"from": "human", "value": "What is the capital of France?"}, {"from": "gpt", "value": "The capital of France is Paris."}, {"from": "human", "value": "Show me a map of Paris."}, {"from": "function_call", "value": "map_search('Paris')"}, {"from": "observation", "value": "<image of Paris map>"}, {"from": "gpt", "value": "Here is a map of Paris."} ], "system": "You are a helpful assistant.", "tools": "map_search"}There are also OASST and others, you got the idea.Fine-Tuning techniquesNow that you have your training data, lets look at what we can do with it. The main techniques are:Full re-trainingLoraQLoRADirect preference optimization (DPO)Full re-trainingThis is the process of training an entire model (all layers) on a specific dataset to optimize it for a particular task or domain. Most effective, in theory, but requires significant computing power to do, as it requires backpropagation through the entire model.Since we are messing up with model weight directly, it comes with certain risks:Risk of Overfitting: since all weights are updated, theres a higher risk of overfitting to the fine-tuning dataset, especially if the dataset is small.Loss of Generality: fine-tuned models may lose their general-purpose capabilities and previous knowledgeSo how much memory do we need to do full re-train? We need to load at least the following for training:Model Prams + Gradients + Activations + Optimizer StatesModel Parameter and Gradients:7B model: Approximately 7 billion parameters,12B model: Approximately 12 billion parameters, 12 *10*4 = 48GBEach parameter typically requires 4 bytes (FP32 precision) or 2 bytes (FP16 precision). Lets assume 2 bytes, soFor 7B model 7*10 * 2 = 14GBFor 12B model 12*10 * 2 = 24GGradients add another 2 bytes per param, so additionally:For 7B model 7*10 * 2 = 14GBFor 12B model 12*10 * 2 = 24G2. Activations:Larger batch sizes as well as higher sequence lengths increase memory requirements. For a typical batch size of 832 and sequence length of 512 tokens, activation memory might add:7B model: 1020 GB.12B model: 1530 GB.3. Optimizer States:Optimizers like Adam require memory for additional states (e.g., gradients and moment estimates). Adam requires two additional parameters, with 3 states each so:7B model: 14GB * 2 * 3 = 42GB12B model: 24GB * 2 * 3 = 72GBThere are going to be some additional things that will consume memory, so we are looking at a minimum of 14 + 14 + 10 + 42 = 80GB for 7B model.That is a lot of memory for a small model, you can imagine how much you will need for anything big. So full retraining is not practical and rarely used. So what are the alternatives?LoRaImage by the authorSuppose you want to change the models behavior, but dont want to change the whole model. Changing model behavior means changing its weights so it changes the outputs. Heres the trick if only we could somehow modify model outputs without changing their weightsAnd there is a way of course. In a brute-force solution, we can technically feed the model outputs into another model, to transform them. It would work only, we have two models now and a lot of added complexity.But what if we can add a filter on top of the model, that will keep the original model layers intact and change their outputs? Its kinda putting on AR glasses. You see the world differently, but the world hasnt changed.Thats basically what LORA is. We freeze the original model weights and apply a transformation by adding an additional weight matrix called the Lora matrix, so it forms an additional trainable layer of a much smaller size.Where:Wnew new weightsWpre-trained original model weighsW the trainable weight adjustmentHow do we calculate this Lora matrix? We do the finetuning/training on that additional matrix instead of the original model, using standard methods so it learns how to predict the difference between the desired results and the original model results.And the beauty is that the Lora matrix can be way smaller than the original model weight matrix. Thats why it is called Low-Rank Adaptation, the matrix is a lower rank than the original.Say you have a weight matrix of size d:It will have d*d elements. If d is one million, it will have one trillion elements.Now here is LoRas matrix:It will have d*r + r*d elements. If d is one million and rank (r) is 8, it will have 16 million elements.Here is how it works:y = x * (W + W) = x * W + x*(A*B)y: The output after applying weights.x: The input to the layerW=A * BWhere:A: a matrix of shape d*r, where r is the rank (small dimensionality chosen for LoRA fine-tuning) and d is the same dimensionality as the original weights matrixB: a matrix of shape r*dOr in visual form:A common starting point for rank is 8. Values up to 256 have been used with good results in certain cases but you will need to experiment to see what works for you. Using larger ranks can improve performance in some tasks, particularly those requiring more expressive power to capture complex patterns. However, this also increases the risk of overfitting, especially on smaller datasets. This risk is well-known in machine learning when model capacity exceeds the complexity of the data.During training, we need to store in memory the weights W of the original model and W of the fine-tuned model, while computing gradients only for the new small matrices A and B. That provides a significant reduction in required memory and computing power. The training will be much faster and 7b models can be easily finetuned on a PC with a desktop GPU.More than that, we can have several different lenses like this one, that we can put on the base model, without the need to change it.LoRA fine-tuning often achieves performance comparable to full fine-tuning, particularly when the low-rank approximation is well-suited to the task and LoRA adapters can be tested or applied without risking degradation of the base model.QLoRASame as LoRa but to lower the memory footprint we quantize the base model to a custom data type, typically to NF4 (Normal Float 4-bit). Regular models use 32-bit or 16-bit floating point as a base data type for storing weights.NF4 enables QLoRA to retain most of the accuracy of the base model while significantly reducing memory usage and computational demands.The idea of quantization is that:Most weights in the network are 0 anywayNF4 optimizes the distribution of values based on the actual data statistics rather than using a linear distribution of floating-point valuesFor the LoRa pass, we will still use regular models using 32-bit or 16-bit floating point though to have more range for learning.Using QLoRa can reduce GPU memory usage by 4070%. However, it comes with a cost QLoRA is approximately 30% slower than LoRA in training and slightly degrades the quantized model quality.It works well even with very large models (e.g., LLaMA or GPT-based architectures).Fine-tuning with (Human) Preference AlignmentFine-tuning works well for training a model to do specific tasks, but it is important not only important what the model does but also to how it interacts with humans. If we want to create a language model assistant, we cannot use a pre-trained model as it is it will not be able to intelligently answer user queries, even though it has the required knowledge.Teaching the model to communicate to humans is called alignment. There are different ways to define what it is, Ill use Antropics definition of 3H:Helpful The response should address the users problem.Harmless The response should not cause harm to the user.Honest The response should be factually accurateTraditional methods do not help much here, so a new set of techniques was developed.The idea of any such technique is to have a dataset similar to what we discussed above, where additionally human preferences or values are clearly indicated. This could include feedback on text quality, tone, style, or factual correctness. Usually, the dataset items have more than one option of response, each ranked by preference.I bet you have seen ChatGPT giving you multiple options to pick when generating answers they are doing that to collect a similar dataset. Oftentimes question-answer websites have likes or upvotes/downvotes systems that can be also used as training data. If you crawl data from the internet it is important to do the cleaning afterward, the dataset can contain lots of junk.For example:User: Im feeling overwhelmed with work and life right now. Its hard to keep going.Response Options:Option A: Im sorry youre feeling this way. Have you thought about talking to someone you trust or a professional counselor?.Option B: What kind of man are you, complaining like that? Just drink some vodka youll be fine.Human-Provided Preference:Preferred Response: Option A (Ranked highest for empathy and clarity).Ranking: Option A > Option B.Rationale:Option A shows empathy, acknowledges the users feelings, and provides actionable advice.Option B dismisses the users feelings and offers no constructive help.Or in JSON format:{ "context": "I'm feeling overwhelmed with work and life right now. It's hard to keep going.", "responses": [ { "text": "I'm sorry you're feeling this way. Have you thought about talking to someone you trust or a professional counselor? It might help to share your feelings.", "rank": 1 }, { "text": "What kind of man are you, complaining like that? Just drink some vodka - youll be fine.", "rank": 2 } ]}Once you have that data, you can use the techniques below:Reinforcement Learning with Human Feedback (RLHF)This is a cornerstone of preference alignment. This idea is very similar to training dogs whereby you reward the dog for doing the right things and punish for doing wrong over many iterations. You play a reward model role in this case and a dog plays a base model role.So there is a separate reward model that is trained to predict human preferences using pairwise comparisons (e.g., Response A is better than Response B). Basically, we train a reward model that predicts rankings for responses.It is done so we dont have to use humans after we have a reward model it serves as a proxy for human feedback in further training processes.The main model is then further fine-tuned using reinforcement learning, where the reward signal comes from the trained reward model using reinforced learning, usually over multiple iterations. The base model does not acquire new knowledge in this process but instead learns to use and communicate the knowledge it already has. Studies have shown that using a small, high-quality dataset is much better than using large datasets of bad quality (see LIMA study: Less Is More for Alignment).This approach allows for complex reward signals from the reward model that include correctness, relevance, safety, and all sorts of political censorship bullshit too. It also allows us to use our reward model to train multiple base models for preference alignment.The downsides are obvious as well. Now we have to train two models instead of one and then do multiple iterations on finetuning the base model. Thats computationally expensive, complex, and takes time.Additionally, there is a risk of overfitting your reward model and degrading base model performance.So to avoid complications another approach was proposed:Direct Preference Optimization (DPO)This is probably the closest you can get to having your cake and eating it too.It was introduced in the paper Direct Preference Optimization: Your Language Model is Secretly a Reward Model, authored by Rafael Rafailov and a bunch of other people. They had a genius idea: what if we skip the intermediate reward model outputs and directly align the model with human preferences using standard supervised learning?So the difference here is that we dont have a separate reward model and dont use reinforcement learning but instead update the base model directly with standard supervised learning methods. If you wonder what the difference is you can read here.Supervised learning typically uses gradient-based optimization (e.g., Stochastic Gradient Descent) to adjust the base model weights directly based on the labeled data. DPO is much better in terms of time and costs than RLFH as it doesnt require many iterations and a separate model, but in many cases provides similar performance and alignment of the base model, albeit under certain conditions.This approach requires granular data of good quality, it is more sensitive to quality than RLHF. Preference data in the dataset has to be sufficient and straightforward. If you have dataset like that or is able to create one DPO is probably the best way to go.What to use for fine-tuning experiments and hostingYou can, of course selfhost and train/deploy locally if you have the hardware to do that. Your setup will depend on what kind of hardware, model, and virtualization you are using so I wont go into that.OrchestrationIn general I suggges to models deployment using orchestrator like ZenML so you can switch infrastructure providers and you want and avoid vendor lock. Than you can strt with free tier with one provider for building a prototype and switch to a scalable cloud version or on-prem if you need to.For experiments, I suggest sticking with free tiers of cloud platforms, specifically:Fine-tuning infrastructureAWS SageMaker: A fully managed service for building, training, and deploying machine learning models on AWS. Very convenient so you dont have to build your own infrastructure and buy GPUs. Thy have free tier to start experimenting.Alternatives:Google Vertex AIAzure Machine LearningDatabricks MLMLflow this one is open source and can be self-hostedModels hostingFor experiments and collaboration the best option is HuggingFace collaborative platform for sharing and discovering machine learning models, datasets, and demos. Its like github for models. They also have free tier.Alternatives: I dont think there is a good alternative, thats why they are soil popular. All major players (Google, Azure AI Playground) have something similar but not as good.For production, you can useAWS SageMakerGoogle Vertex AIMicrosoft Azure Machine LearningMLflow (can be deployed on-prem)Have fun!Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AI0 Comentários 0 Compartilhamentos 133 Visualizações
-
WWW.IGN.COMNew Report Says $250 Million Arcane Was a 'Financial Miss,' Riot Co-Founder Insists It 'Crushed for Players and So It Crushed for Us'The co-founder of Riot Games has responded to a report that claimed League of Legends animated series Arcane was a financial miss.Bloomberg reported that Arcane's two seasons cost an eye-watering $250 million to produce and market, and ultimately failed to generate enough gaming revenue for Riot despite winning a big audience on Netflix. The publication said Netflix paid $3 million per episode, with Riot owner Tencent handing over an additional $3 million per episode to show Arcane in China. All told, thats less than half the $250 million it cost Riot to bring Arcane to market. And, according to Bloomberg, Tencent started asking Riot difficult questions between the release of Season 1 and 2.The hope, Bloomberg reported, was that Arcane would fuel an increase in players of League of Legends and in turn a boost in spending. Riot makes significant revenue from the sale of skins for League of Legends characters, some of which cost hundreds of dollars. Bloomberg said that Riot failed to capitalize on the success of Season 1 with Arcane-themed items, but had more time to do so ahead of the release of Season 2.In a quote attributed to a spokesman, Riot insisted that while Arcane wasnt profitable, the show should be considered a success overall, with the last month one of the companys highest grossing revenue periods ever. Apparently the second season is on track to at least break even financially.Now, Riot co-founder Mark Merrill has responded to the report, taking to Reddit to address discussion about it within the League of Legends community.People who look at the world through a short term, transactional, cynical lens, really struggle to understand Riot, Merrill said. This has been true with various people trying to claim that high quality free games won't work, that esports will never work, that our music was insane, are now saying that Arcane wasn't awesome and worth it.These people think we make things like Arcane to sell skins, when in reality we sell skins to make things like Arcane. Riot is a mission driven company where Rioters are constantly striving to make it better to be a player. That is why we have successfully done that over and over again across multiple games and now multiple businesses / mediums - games, sports, music & animation. Do we get everything right? Nope. But we are not focused on the short term extraction of profits - we are focused on delivering exceptional value to our audience over the long term, again and again and again.To be clear, Arcane crushed for players and so it crushed for us.Merrill, clearly, is insisting that for Riot the costly Arcane was worth it, although its worth noting that he does not dispute any specific part of Bloombergs reporting. Merrill subsequently responded to one Reddit user who suggested Arcane wasnt profitable enough for Riot to make more League of Legends animated spin-offs, saying: "Except it was."Fans are hoping that Riot pushes forward with more League of Legends animated series despite all this. Last month, Riot creative director and Arcane creator and showrunner Christian Linke revealed the three Runeterra regions it's exploring as settings for future shows: Noxus, Ionia, and Demacia.Wesley is the UK News Editor for IGN. Find him on Twitter at @wyp100. You can reach Wesley at wesley_yinpoole@ign.com or confidentially at wyp100@proton.me.0 Comentários 0 Compartilhamentos 112 Visualizações