• 'This is comedy': Balatro developer Localthunk baffled after PEGI hands title 18+ rating
    www.gamedeveloper.com
    Chris Kerr, News EditorDecember 17, 20243 Min ReadScreenshot via Localthunk / 18+ sticker via PEGIAt a GlanceLocalthunk said they're more disgruntled at what they perceive as inconsistency on PEGI's part than the decision itself.European rating agency PEGI has handed Balatro an 18+ rating because it could teach players how to dabble in real-world poker.The decision looks to have surprised developer Localthunk, who questioned why Balatro has been deemed "adults only" when other titles that feature in-game spending and randomized item packs are considered suitable for children."Since PEGI gave us an 18+ rating for having evil playing cards maybe I should add microtransactions/loot boxes/real gambling to lower that rating to 3+ like EA sports FC," reads an X post from the developer. "This is comedy."In a follow-up post, Localthunk said they're more disgruntled at what they perceive as inconsistency on PEGI's part than the decision to give Balatro an 18+ rating."Just to clear it upI'm way more irked at the 3+ for these games with actual gambling mechanics for children than I am about Balatro having an 18+ rating," they added. "If these other games were rated properly Id happily accept the weirdo 18+. The red logo looks kinda dope."Then PEGI ratings explainer for Balatro states the 2D deck-builder is being restricted because it "features prominent gambling imagery" and essentially teaches players how to navigate a poker game."As the game goes on, the player becomes increasingly familiar with which hands would earn more points. Because these are hands that exist in the real world, this knowledge and skill could be transferred to a real-life game of poker," adds PEGI.Localthunk doesn't want Balatro to become a 'true gambling game'By contrast, PEGI has handed EA Sports FC 25 a 3+ rating that indicates the title is "suitable for all ages." That's despite the ratings agency acknowledging the soccer sim "offers players the opportunity to purchase in-game items, in the form of an in-game currency, which can be used to purchase random card packs and other game items.""Some parents or carers may want to be aware of this," it adds.It's worth noting that Balatro doesn't let players place bets in-game. Instead, they must accrue points by collecting offbeat joker cards that imbue regular playing cards with new abilities, dish out score multipliers, and generally turn the concept of poker on its head.It's possible to obtain new jokers and other special cards by opening randomized booster packs, but those packs can only be purchased using in-game currency obtained through play. There are no microtransactions in Balatro.In August, Localthunk said they "hate the thought" of Balatro becoming a "true gambling game" and have created a will that stipulates the IP may never be sold or licensed to any gambling companies or casinos.The ESRB, which handles video game ratings in Canada, the US, and Mexico, gave Balatro an 'Everyone 10+' rating and noted it contains "gambling themes" but "no interactive elements.""The game has a poker theme, which includes the names of hands, scoring system, and types of playing cards, but does not include making wagers," it added.Game Developer has reached out to PEGI for more information.Read more about:Top StoriesAbout the AuthorChris KerrNews Editor, GameDeveloper.comGame Developer news editor Chris Kerr is an award-winning journalist and reporter with over a decade of experience in the game industry. His byline has appeared in notable print and digital publications including Edge, Stuff, Wireframe, International Business Times, andPocketGamer.biz. Throughout his career, Chris has covered major industry events including GDC, PAX Australia, Gamescom, Paris Games Week, and Develop Brighton. He has featured on the judging panel at The Develop Star Awards on multiple occasions and appeared on BBC Radio 5 Live to discuss breaking news.See more from Chris KerrDaily news, dev blogs, and stories from Game Developer straight to your inboxStay UpdatedYou May Also Like
    0 Σχόλια ·0 Μοιράστηκε ·92 Views
  • Fab December 2024 Asset Giveaway #2
    gamefromscratch.com
    It is the third Tuesday of the month and that means its time for anotherUnreal EngineFab marketplace giveaway. You can get three free game development assets, this weeks assets are entirely for Unreal Engine however guides below instruct you on how to export from Unreal Engine to other engines such as Godot or Unity.This Months free assets include:Newtonian Falling and Momentum Damage SystemModular Japanese Architecture PackOLD OFFICE (MODULAR)You can see all of these assets in action in thevideobelow. Also, a quick reminder tograb all of the Quixel Megascans assetsif you havent already as this offer expires at the end of 2024!If you are interested in getting these assets into other game engines, check out our various guides available here:
    0 Σχόλια ·0 Μοιράστηκε ·112 Views
  • Meshingun 3D Environment Unreal Engine & Unity Bundle Returns
    gamefromscratch.com
    As part of the ongoing Humble Bundle holiday rerun, the Deluxe Dev Dreams 3D Unreal Engine & Unity Humble Bundle for 2 days only. The bundle is composed mostly of 3D environments for Unreal Engine with a couple of Unity versions available as well. Details on exporting from Unreal Engine to various different game engines and tools is available below. Unlike the first run, this bundle has no tiers with only a single $30 USD tier available:Gothic Cemetery PackFoliage PackGothic Texture PackAsian Temple PackMedieval Props PackMedieval Dinnerware PackGothic Dungeon Props Vol1Stylized Foliage Pack V.01Cyber-Town PackGothic Furniture Props Vol1Flag GeneratorBook GeneratorFeudal Japan Interior Props Vol1Egyptian Props Vol1Feudal Japan MegapackGothic Interior Megapack (UE)Gothic Megapack (UE)Medieval Village Megapack (UE)Stylized Village FatpackBrooke Industrial TownThe BazaarGothic Megapack (Unity)Gothic Interior Megapack (Unity)Medieval Village Megapack (Unity)If you wish to convert the assets from one game engine to another, consider the following guides.You can learn more about the Deluxe Dev Dreams 3D Unreal Engine & Unity Humble Bundle in the video below. Using links on this page helps support GFS (and thanks so much if you do!). If you have any trouble opening the links simply paste it into a new tab and it should work just fine.
    0 Σχόλια ·0 Μοιράστηκε ·115 Views
  • This AI Paper from Microsoft and Novartis Introduces Chimera: A Machine Learning Framework for Accurate and Scalable Retrosynthesis Prediction
    www.marktechpost.com
    Chemical synthesis is essential in developing new molecules for medical applications, materials science, and fine chemicals. This process, which involves planning chemical reactions to create desired target molecules, has traditionally relied on human expertise. Recent advancements have turned to computational methods to enhance the efficiency of retrosynthesisworking backward from a target molecule to determine the series of reactions needed to synthesize it. By leveraging modern computational techniques, researchers aim to solve long-standing bottlenecks in synthetic chemistry, making these processes faster and more accurate.One of the critical challenges in retrosynthesis is accurately predicting chemical reactions that are rare or less frequently encountered. These reactions, although uncommon, are vital for designing novel chemical pathways. Traditional machine-learning models often fail to predict these reactions due to insufficient representation in training data. Also, multi-step retrosynthesis planning errors can cascade, leading to invalid synthetic routes. This limitation hinders the ability to explore innovative and diverse pathways for chemical synthesis, particularly in cases requiring uncommon reactions.Existing computational methods for retrosynthesis have primarily focused on single-step models or rule-based expert systems. These methods rely on pre-defined rules or extensive training datasets, which limits their adaptability to new and unique reaction types. For instance, some approaches use graph-based or sequence-based models to predict the most likely transformations. While these methods have improved accuracy for common reactions, they often need more flexibility to account for the complexities and nuances of rare chemical transformations, leading to a gap in comprehensive retrosynthetic planning.Researchers from Microsoft Research, Novartis Biomedical Research, and Jagiellonian University developed Chimera, an ensemble framework for retrosynthesis prediction. Chimera integrates outputs from multiple machine-learning models with diverse inductive biases, combining their strengths through a learned ranking mechanism. This approach leverages two newly developed state-of-the-art models: NeuralLoc, which focuses on molecule editing using graph neural networks, and R-SMILES 2, a de-novo model employing a sequence-to-sequence Transformer architecture. By combining these models, Chimera enhances both accuracy and scalability for retrosynthetic predictions.The methodology behind Chimera relies on combining outputs from its constituent models through a ranking system that assigns scores based on model agreement and predictive confidence. NeuralLoc encodes molecular structures as graphs, enabling precise prediction of reaction sites and templates. This method ensures that predicted transformations align closely with known chemical rules while maintaining computational efficiency. Meanwhile, R-SMILES 2 utilizes advanced attention mechanisms, including Group-Query Attention, to predict reaction pathways. This models architecture also incorporates improvements in normalization and activation functions, ensuring superior gradient flow and inference speed. Chimera combines these predictions, using overlap-based scoring to rank potential pathways. This integration ensures that the framework balances the strengths of editing-based and de-novo approaches, enabling robust predictions even for complex and rare reactions.The performance of Chimera has been rigorously validated against publicly available datasets such as USPTO-50K and USPTO-FULL, as well as the proprietary Pistachio dataset. On USPTO-50K, Chimera achieved a 1.7% improvement in top-10 prediction accuracy over the previous state-of-the-art methods, demonstrating its capability to accurately predict both common and rare reactions. On USPTO-FULL, it further improved top-10 accuracy by 1.6%. Scaling the model to the Pistachio dataset, which contains over three times the data of USPTO-FULL, showed that Chimera maintained high accuracy across a broader range of reactions. Expert comparisons with organic chemists revealed that Chimeras predictions were consistently preferred over individual models, confirming its effectiveness in practical applications.The framework was also tested on an internal Novartis dataset of over 10,000 reactions to evaluate its robustness under distribution shifts. In this zero-shot setting, where no additional fine-tuning was performed, Chimera demonstrated superior accuracy compared to its constituent models. This highlights its capability to generalize across datasets and predict viable synthetic pathways even in real-world scenarios. Further, Chimera excelled in multi-step retrosynthesis tasks, achieving close to 100% success rates on benchmarks such as SimpRetro, significantly outperforming individual models. The frameworks ability to find pathways for highly challenging molecules further underscores its potential to transform computational retrosynthesis.Chimera represents a groundbreaking advancement in retrosynthesis prediction by addressing the challenges of rare reaction prediction and multi-step planning. The framework demonstrates superior accuracy and scalability by integrating diverse models and employing a robust ranking mechanism. With its ability to generalize across datasets and excel in complex retrosynthetic tasks, Chimera is set to accelerate progress in chemical synthesis, paving the way for innovative approaches to molecular design.Check out the Paper. All credit for this research goes to the researchers of this project. Also,dont forget to follow us onTwitter and join ourTelegram Channel andLinkedIn Group. Dont Forget to join our60k+ ML SubReddit. Nikhil+ postsNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute. [Download] Evaluation of Large Language Model Vulnerabilities Report (Promoted)
    0 Σχόλια ·0 Μοιράστηκε ·84 Views
  • Meta AI Releases Apollo: A New Family of Video-LMMs Large Multimodal Models for Video Understanding
    www.marktechpost.com
    While multimodal models (LMMs) have advanced significantly for text and image tasks, video-based models remain underdeveloped. Videos are inherently complex, combining spatial and temporal dimensions that demand more from computational resources. Existing methods often adapt image-based approaches directly or rely on uniform frame sampling, which poorly captures motion and temporal patterns. Moreover, training large-scale video models is computationally expensive, making it difficult to explore design choices efficiently.To tackle these issues, researchers from Meta AI and Stanford developed Apollo, a family of video-focused LMMs designed to push the boundaries of video understanding. Apollo addresses these challenges through thoughtful design decisions, improving efficiency, and setting a new benchmark for tasks like temporal reasoning and video-based question answering.Meta AIs Apollo models are designed to process videos up to an hour long while achieving strong performance across key video-language tasks. Apollo comes in three sizes 1.5B, 3B, and 7B parameters offering flexibility to accommodate various computational constraints and real-world needs.Key innovations include:Scaling Consistency: Design choices made on smaller models are shown to transfer effectively to larger ones, reducing the need for large-scale experiments.Frame-Per-Second (fps) Sampling: A more efficient video sampling technique compared to uniform frame sampling, ensuring better temporal consistency.Dual Vision Encoders: Combining SigLIP for spatial understanding with InternVideo2 for temporal reasoning enables a balanced representation of video data.ApolloBench: A curated benchmark suite that reduces redundancy in evaluation while providing detailed insights into model performance.Technical Highlights and AdvantagesThe Apollo models are built on a series of well-researched design choices aimed at overcoming the challenges of video-based LMMs:Frame-Per-Second Sampling: Unlike uniform frame sampling, fps sampling maintains a consistent temporal flow, allowing Apollo to better understand motion, speed, and sequence of events in videos.Scaling Consistency: Experiments show that model design choices made on moderately sized models (2B-4B parameters) generalize well to larger models. This approach reduces computational costs while maintaining performance gains.Dual Vision Encoders: Apollo uses two complementary encoders: SigLIP, which excels at spatial understanding, and InternVideo2, which enhances temporal reasoning. Their combined strengths produce more accurate video representations.Token Resampling: By using a Perceiver Resampler, Apollo efficiently reduces video tokens without losing information. This allows the models to process long videos without excessive computational overhead.Optimized Training: Apollo employs a three-stage training process where video encoders are initially fine-tuned on video data before integrating with text and image datasets. This staged approach ensures stable and effective learning.Multi-Turn Conversations: Apollo models can support interactive, multi-turn conversations grounded in video content, making them ideal for applications like video-based chat systems or content analysis.Performance InsightsApollos capabilities are validated through strong results on multiple benchmarks, often outperforming larger models:Apollo-1.5B:Surpasses models like Phi-3.5-Vision (4.2B) and LongVA-7B.Scores: 60.8 on Video-MME, 63.3 on MLVU, 57.0 on ApolloBench.Apollo-3B:Competes with and outperforms many 7B models.Scores: 58.4 on Video-MME, 68.7 on MLVU, 62.7 on ApolloBench.Achieves 55.1 on LongVideoBench.Apollo-7B:Matches and even surpasses models with over 30B parameters, such as Oryx-34B and VILA1.5-40B.Scores: 61.2 on Video-MME, 70.9 on MLVU, 66.3 on ApolloBench.Benchmark Summary:ConclusionApollo marks a significant step forward in video-LMM development. By addressing key challenges such as efficient video sampling and model scalability, Apollo provides a practical and powerful solution for understanding video content. Its ability to outperform larger models highlights the importance of well-researched design and training strategies.The Apollo family offers practical solutions for real-world applications, from video-based question answering to content analysis and interactive systems. Importantly, Meta AIs introduction of ApolloBench provides a more streamlined and effective benchmark for evaluating video-LMMs, paving the way for future research.Check out the Paper, Website, Demo, Code, and Models. All credit for this research goes to the researchers of this project. Also,dont forget to follow us onTwitter and join ourTelegram Channel andLinkedIn Group. Dont Forget to join our60k+ ML SubReddit. Asif RazzaqAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences. [Download] Evaluation of Large Language Model Vulnerabilities Report (Promoted)
    0 Σχόλια ·0 Μοιράστηκε ·85 Views
  • Ways to Deal With Hallucinations in LLM
    towardsai.net
    LatestMachine LearningWays to Deal With Hallucinations in LLM 1 like December 16, 2024Share this postAuthor(s): Igor Novikov Originally published on Towards AI. Image by the authorOne of the major challenges in using LLMs in business is that LLMs hallucinate. How can you entrust your clients to a chatbot that can go mad and tell them something inappropriate at any moment? Or how can you trust your corporate AI assistant if it makes things up randomly?Thats a problem, especially given that an LLM cant be fired or held accountable.Thats the thing with AI systems they dont benefit from lying to you in any way but at the same time, despite sounding intelligent they are not a person, so they cant be blamed either.Some tout RAG as a cure-all approach, but in reality it only solves one particular cause and doesnt help with others. Only a combination of several methods can help.Not all hope is lost though. There are ways to work with it so lets look at that.So not to go too philosophical about what is hallucination, lets define the most important cases:The model understands the question but gives an incorrect answerThe model didnt understand the question and thus gave an incorrect answerThere is no right or wrong answer, and therefore if you disagree with the mode it doesnt make it incorrect. Like if you ask Apple vs Android whatever it answers is technically just an opinionLets start with the latter. These are reasons why a model can misunderstand the questions:The question is crap (ambiguous, not clear, etc.), and therefore the answer is crap. Not the model's fault, ask better questionsThe model does not have contextLanguage: the model does not understand the language you are usingBad luck or, in other words, stochastic distribution led the reasoning in a weird wayNow lets look at the first one: why would a model lie, that is give factually and verifiably incorrect information, if it understands the questions?It didnt follow all the logical steps to arrive at a conclusionIt didnt have enough contextThe information (context) in this is incorrect as wellIt has the right information but got confusedIt was trained to give incorrect answers (for political and similar reasons)Bad luck, and stochastic distribution led to the reasoning in a weird wayIt was configured so it is allowed to fantasize (which can be sometimes desirable)Overfitting and underfitting: the model was trained in a specific field and tries to apply its logic to a different field, leading to incorrect deduction or induction in answeringThe model is overwhelmed with data and starts to lose contextIm not going to discuss things that are not a model problem, like bad questions or questions with no right answers. Lets concentrate on what we can try to solve, one by one.The model does not have enough context or information, or the information that was provided to it is not correct or fullThis is where RAG comes into play. RAG, when correctly implemented should provide the model's necessary context, so it can answer. Here is the article on how to do the RAG properly.It is important to do it right, with all required metadata about the information structure and attributes. It is desirable to use something like GraphRag, and Reranking in the retrieval phase, so that the model is given only relevant context, otherwise, the model can get confused.It is also extremely important to keep the data you provide to the model up to date and continuously update it, taking versioning into account. If you have data conflicts, which is not uncommon, the model will start generating conflicting answers as well. There are methods, such as the Maximum Marginal Relevance (MMR) algorithm, which considers the relevance and novelty of information for filtering and reordering. However, this is not a panacea, and it is best to address this issue at the data storage stage.LanguageNot all models understand all languages equally well. It is always preferable to use English for prompts as it works best for most models. If you have to use a specific language you may have to use a model build for that, like Qwen for Chinese.A model does not follow all the logical steps to arrive at a conclusionYou can force the model to follow a particular thinking process with techniques like SelfRag, Chain of Thought, or SelfCheckGPT. Here is an article about these techniques.The general idea is to ask the model to think in steps and explain/validate its conclusions and intermediate steps, so it can catch its errors.Alternatively, you can use the Agents model, where several LLM agents communicate with each other and verify each other's outputs and each step.A model got confused with the information it had and bad luckThese two are actually caused by the same thing and this is a tricky one. The way models work is they stochastically predict the next token in a sentence. The process is somewhat random, so it is possible that it will pick some less probable route and go off course. It is built into the model and the way it works.There are several methods on how to handle this:MultiQuerry ran several queries for the same answer and picked the best one using relevance score like Cross Encoder. If you get 3 very similar answers and one very different it is likely that it was a random hallucination. It adds certain overhead, so you pay the price but it is a very good method to ensure you dont randomly get a bad answerSet the model temperature to a lower value to discourage it from going in less probable directions (ie fantasizing)There is one more, which is harder to fix. The model keeps semantically similar ideas close in the vector space. Being asked about facts that have other facts close in proximity that are close but not actually related will lead the model to a path of least resistance. The model has associative memory, so to speak, so it thinks in associations, and that mode of thinking is not suitable for tasks like playing chess or math. The model has a fast-thinking brain, per Kahneman's description, but lacks a slow one.For example, you ask a mode what is 3 + 7 and it answers 37. Why???But it all makes sense since if you look at 3 and 7 in vector space, the closest vector to them is 37. Here the mistake is obvious but it may be much more subtle.Example:Image by the authorThe answer is incorrect.Afonso was the third king of Portugal. Not Alfonso. There was no Alfonso II as the king of Portugal.The mother of Afonso II was Dulce of Aragon, not Urraca of Castile.From the LLMs perspective, Alfonso is basically the same as Afonso and mother is a direct match. Therefore, if there is no mother close to Afonso then the LLM will choose the Alfonso/mother combination.Here is an article explaining this in detail and potential ways to fix this. Also, in general, fine-tuning the model on data from your domain will make it less likely to happen, as the model will be less confused with similar facts in edge cases.The model was configured so it is allowed to fantasizeThis can be done either through a master prompt or by setting the model temperature too high. So basically you need to instruct the model to:Not give an answer if it is not sure or dont have informationEnsure nothing in the prompt instructs the model to make up facts and, in general, make instructions very clearSet temperature lowerOverfitting and underfittingIf you use a model that is trained in healthcare space to solve programming tasks it will hallucinate, or in other words, will try to put square bits into round holes because it only knows how to do that. Thats kind of obvious. Same if you use a generic model, trained on generic data from the internet to solve industry-specific tasks.The solution is to use a proper model for your industry and fine-tune/train it in that area. That will improve the correctness dramatically in certain cases. Im not saying you always have to do that, but you might have to.Another case of this is using a model too small (in terms of parameters) to solve your tasks. Yes, certain tasks may not require a large model, but certainly do, and you should use a model not smaller than appropriate. Using a model too big will cost you but at least it will work correctly.The model is overwhelmed with data and starts to lose contextYou may think that the more data you have the better but it is not the case at all!Model context window and attention span are limited. Even recent models with millions of tokens of context window do not work well. They will start to forget things, ignore things in the middle, and so on.The solution here is to use RAG with proper context size management. You have to pre-select only relevant data, rerank it, and feed it to LLM.Here is my article that overviews some of the techniques to do that.Also, some models do not handle long context at all, and at a certain point, the quality of answers will start to degrade with increasing context size, see below:Here is a research paper on that.Other general techniquesHuman in the loopYou can always have someone in the loop to fact-check LLM outputs. For example, you use LLM for data annotation (which is a great idea) you will need to use it in conjunction with real humans to validate the results. Or use your system in Co-pilot mode where humans make the final decision. This doesnt scale well thoughOraclesAlternatively, you can use an automated Oracle to fact-check the system results, if that option is availableExternal toolsCertain things, like calculations and math, should be done outside of LLM, using tools that are provided to LLM. For example, you can use LLM to generate a query to SQL database or Elasticsearch and execute that, and then use the results to generate the final answer.What to read next:RAG architecture guideAdvanced RAG guidePeace!Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post
    0 Σχόλια ·0 Μοιράστηκε ·84 Views
  • Ways to Deal With Hallucinations in LLM
    towardsai.net
    LatestMachine LearningWays to Deal With Hallucinations in LLM 0 like December 16, 2024Share this postLast Updated on December 17, 2024 by Editorial TeamAuthor(s): Igor Novikov Originally published on Towards AI. Image by the authorOne of the major challenges in using LLMs in business is that LLMs hallucinate. How can you entrust your clients to a chatbot that can go mad and tell them something inappropriate at any moment? Or how can you trust your corporate AI assistant if it makes things up randomly?Thats a problem, especially given that an LLM cant be fired or held accountable.Thats the thing with AI systems they dont benefit from lying to you in any way but at the same time, despite sounding intelligent they are not a person, so they cant be blamed either.Some tout RAG as a cure-all approach, but in reality it only solves one particular cause and doesnt help with others. Only a combination of several methods can help.Not all hope is lost though. There are ways to work with it so lets look at that.So not to go too philosophical about what is hallucination, lets define the most important cases:The model understands the question but gives an incorrect answerThe model didnt understand the question and thus gave an incorrect answerThere is no right or wrong answer, and therefore if you disagree with the mode it doesnt make it incorrect. Like if you ask Apple vs Android whatever it answers is technically just an opinionLets start with the latter. These are reasons why a model can misunderstand the questions:The question is crap (ambiguous, not clear, etc.), and therefore the answer is crap. Not the model's fault, ask better questionsThe model does not have contextLanguage: the model does not understand the language you are usingBad luck or, in other words, stochastic distribution led the reasoning in a weird wayNow lets look at the first one: why would a model lie, that is give factually and verifiably incorrect information, if it understands the questions?It didnt follow all the logical steps to arrive at a conclusionIt didnt have enough contextThe information (context) in this is incorrect as wellIt has the right information but got confusedIt was trained to give incorrect answers (for political and similar reasons)Bad luck, and stochastic distribution led to the reasoning in a weird wayIt was configured so it is allowed to fantasize (which can be sometimes desirable)Overfitting and underfitting: the model was trained in a specific field and tries to apply its logic to a different field, leading to incorrect deduction or induction in answeringThe model is overwhelmed with data and starts to lose contextIm not going to discuss things that are not a model problem, like bad questions or questions with no right answers. Lets concentrate on what we can try to solve, one by one.The model does not have enough context or information, or the information that was provided to it is not correct or fullThis is where RAG comes into play. RAG, when correctly implemented should provide the model's necessary context, so it can answer. Here is the article on how to do the RAG properly.It is important to do it right, with all required metadata about the information structure and attributes. It is desirable to use something like GraphRag, and Reranking in the retrieval phase, so that the model is given only relevant context, otherwise, the model can get confused.It is also extremely important to keep the data you provide to the model up to date and continuously update it, taking versioning into account. If you have data conflicts, which is not uncommon, the model will start generating conflicting answers as well. There are methods, such as the Maximum Marginal Relevance (MMR) algorithm, which considers the relevance and novelty of information for filtering and reordering. However, this is not a panacea, and it is best to address this issue at the data storage stage.LanguageNot all models understand all languages equally well. It is always preferable to use English for prompts as it works best for most models. If you have to use a specific language you may have to use a model build for that, like Qwen for Chinese.A model does not follow all the logical steps to arrive at a conclusionYou can force the model to follow a particular thinking process with techniques like SelfRag, Chain of Thought, or SelfCheckGPT. Here is an article about these techniques.The general idea is to ask the model to think in steps and explain/validate its conclusions and intermediate steps, so it can catch its errors.Alternatively, you can use the Agents model, where several LLM agents communicate with each other and verify each other's outputs and each step.A model got confused with the information it had and bad luckThese two are actually caused by the same thing and this is a tricky one. The way models work is they stochastically predict the next token in a sentence. The process is somewhat random, so it is possible that it will pick some less probable route and go off course. It is built into the model and the way it works.There are several methods on how to handle this:MultiQuerry ran several queries for the same answer and picked the best one using relevance score like Cross Encoder. If you get 3 very similar answers and one very different it is likely that it was a random hallucination. It adds certain overhead, so you pay the price but it is a very good method to ensure you dont randomly get a bad answerSet the model temperature to a lower value to discourage it from going in less probable directions (ie fantasizing)There is one more, which is harder to fix. The model keeps semantically similar ideas close in the vector space. Being asked about facts that have other facts close in proximity that are close but not actually related will lead the model to a path of least resistance. The model has associative memory, so to speak, so it thinks in associations, and that mode of thinking is not suitable for tasks like playing chess or math. The model has a fast-thinking brain, per Kahneman's description, but lacks a slow one.For example, you ask a mode what is 3 + 7 and it answers 37. Why???But it all makes sense since if you look at 3 and 7 in vector space, the closest vector to them is 37. Here the mistake is obvious but it may be much more subtle.Example:Image by the authorThe answer is incorrect.Afonso was the third king of Portugal. Not Alfonso. There was no Alfonso II as the king of Portugal.The mother of Afonso II was Dulce of Aragon, not Urraca of Castile.From the LLMs perspective, Alfonso is basically the same as Afonso and mother is a direct match. Therefore, if there is no mother close to Afonso then the LLM will choose the Alfonso/mother combination.Here is an article explaining this in detail and potential ways to fix this. Also, in general, fine-tuning the model on data from your domain will make it less likely to happen, as the model will be less confused with similar facts in edge cases.The model was configured so it is allowed to fantasizeThis can be done either through a master prompt or by setting the model temperature too high. So basically you need to instruct the model to:Not give an answer if it is not sure or dont have informationEnsure nothing in the prompt instructs the model to make up facts and, in general, make instructions very clearSet temperature lowerOverfitting and underfittingIf you use a model that is trained in healthcare space to solve programming tasks it will hallucinate, or in other words, will try to put square bits into round holes because it only knows how to do that. Thats kind of obvious. Same if you use a generic model, trained on generic data from the internet to solve industry-specific tasks.The solution is to use a proper model for your industry and fine-tune/train it in that area. That will improve the correctness dramatically in certain cases. Im not saying you always have to do that, but you might have to.Another case of this is using a model too small (in terms of parameters) to solve your tasks. Yes, certain tasks may not require a large model, but certainly do, and you should use a model not smaller than appropriate. Using a model too big will cost you but at least it will work correctly.The model is overwhelmed with data and starts to lose contextYou may think that the more data you have the better but it is not the case at all!Model context window and attention span are limited. Even recent models with millions of tokens of context window do not work well. They will start to forget things, ignore things in the middle, and so on.The solution here is to use RAG with proper context size management. You have to pre-select only relevant data, rerank it, and feed it to LLM.Here is my article that overviews some of the techniques to do that.Also, some models do not handle long context at all, and at a certain point, the quality of answers will start to degrade with increasing context size, see below:Here is a research paper on that.Other general techniquesHuman in the loopYou can always have someone in the loop to fact-check LLM outputs. For example, you use LLM for data annotation (which is a great idea) you will need to use it in conjunction with real humans to validate the results. Or use your system in Co-pilot mode where humans make the final decision. This doesnt scale well thoughOraclesAlternatively, you can use an automated Oracle to fact-check the system results, if that option is availableExternal toolsCertain things, like calculations and math, should be done outside of LLM, using tools that are provided to LLM. For example, you can use LLM to generate a query to SQL database or Elasticsearch and execute that, and then use the results to generate the final answer.What to read next:RAG architecture guideAdvanced RAG guidePeace!Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post
    0 Σχόλια ·0 Μοιράστηκε ·89 Views
  • Tekken Chief Katsuhiro Harada Couldn't Get Pac-Man in Tekken Before, but Maybe Now Its Possible With Shadow Labyrinth Coming Out
    www.ign.com
    Tekken chief Katsuhiro Harada is now also the executive producer of Shadow Labyrinth, the surprise Pac-Man spin-off first revealed in one of the episodes of Amazons anthology animated series Secret Level before a reveal trailer for the video game itself at The Game Awards 2024.All of which begs the question: could The Swordsman, the main character from Shadow Labyrinth, appear in Tekken as a guest character? In many ways it would make sense, given both Tekken and Pac-Man are owned by Bandai Namco, and The Swordsman is an actual humanoid fighter who wields a sword. But what does Harada himself think?IGN put that question to the man himself in a recent interview, and his response suggested such a crossover is certainly possible.Perhaps, Harada responded when asked, because the game environment would be a good crossover for Tekken, atmosphere-wise.PlayIGN's Twenty Questions - Guess the game!IGN's Twenty Questions - Guess the game!To start:...try asking a question that can be answered with a "Yes" or "No".000/250It turns out that Harada explored the possibility of adding Pac-Man to Tekken as a guest character long before Shadow Labyrinth was a thing. But designs for the character, which saw Pac-Man as a head on a muscular humanoid body, horrified original Pac-Man creator Tru Iwatani.Many years ago, Iwatani-san, the original creator of Pac-Man, was still in the company when we were here. And I thought about maybe using Pac-Man as a guest character, Harada remembered. So I had the artist come up with a sketch and it turned out to be that the character had like a Pac-Man head, but the body was this really muscular bodybuilder type.I showed it to Iwatani-san, and Iwatani-san got so upset and scolded me quite heavily. So I was surprised that, Oh, I can't just be quite liberal with the Pac-Man IP, I have to be more constrained. At the time I was a bit surprised, but now because of the atmosphere of this [Shadow Labyrinth], it's something that could be considered.Iwatani, by the way, is apparently fully on board with Bandai Namcos plan to take the Pac-Man IP in totally different directions. Iwatani-san seems to realize that that's something that's needed at this point, Harada added.2D action platformer Shadow Labyrinth stars The Swordsman, a character whose only goal is to "eat" as they attempt to escape a maze. Its a Metroidvania of sorts, and a significant departure for the world famous arcade franchise. Meanwhile, Tekken 8 gets Clive from Final Fantasy 16 as a DLC character.Wesley is the UK News Editor for IGN. Find him on Twitter at @wyp100. You can reach Wesley at wesley_yinpoole@ign.com or confidentially at wyp100@proton.me.
    0 Σχόλια ·0 Μοιράστηκε ·63 Views
  • Get a $75 Best Buy Gift Card for Free With Purchase of a 256GB Meta Quest 3S as Part of 12 Days of Gaming
    www.ign.com
    The holidays are almost upon us and Best Buy is celebrating the season with its 12 Days of Gaming sale event. Each day of the event has had an excellent one-day-only offer to take advantage of, and today's no different. For today only, Best Buy's offering a free $75 gift card with purchase of a 256GB Meta Quest 3S VR headset (which runs for $399.99).Alongside the extra cash to use at Best Buy, purchasing this Meta Quest 3S will also set you up with Batman: Arkham Shadow and a three-month trial of Meta Quest+. That's a deal you definitely don't want to miss. Check it out at the link below.12 Days of Gaming: Get a Free $75 Best Buy Gift Card With Purchase of a 256GB Meta Quest 3S12 Days of GamingMeta Quest 3S 256GB Get Batman: Arkham Shadow and a 3-Month Trial of Meta Quest+ Included All-In-One Headset - WhiteFree $75 gift card with purchase$399.99 at Best BuyWe adore the Meta Quest 3S VR headset. In our Meta Quest 3S review, writer Gabriel Moss said, "Raw processing power, full-color passthrough, and snappy Touch Plus controllers make the Quest 3S a fantastic standalone VR headset that also brings entry-level mixed-reality gaming to the masses for arguably the very first time." If you're still doing some shopping ahead of the holidays, there are quite a few more video game deals that are worth taking advantage of right now. In our roundups of the best PlayStation deals, the best Xbox deals, and the best Nintendo Switch deals you can see some of our favorites. And to see some of our overall favorite deals, have a look at our roundup of the best video game deals which covers the highlights from each platform. Some of the latest and greatest offers include a nice discount on Indiana Jones and the Great Circle for PC and $10 off The Game Awards' Game of the Year winner, Astro Bot, for PlayStation 5.Hannah Hoolihan is a freelancer who writes with the guides and commerce teams here at IGN.
    0 Σχόλια ·0 Μοιράστηκε ·49 Views
  • Yellowstone Mysteries That Remain Unsolved After the Finale
    www.denofgeek.com
    This article contains spoilers for Yellowstone seasons 1-5.With Yellowstone airing its finale (at least this iteration of the Montana drama, as plans move forward for a proposed spinoff and sister shows) it joins the lexicon of popular, engaging series that simply forget a plot line or two, or perhaps just run out of time. Its hard to argue that Yellowstone ran out of time, having five seasons to answer these questions, and with co-creator Taylor Sheridan writing an entire episode where hes the most important (and most shirtless) character, but alas, here we are.Lets take a look at several major Yellowstone questions that likely will never be answered.Where Did Garrett Randall Get the Money?One of the greatest cliffhangers in the history of the show was the season 3 finale, The World is Purple. It was this generations Who Shot J.R. (Or maybe Who Shot Mr. Burns old age might be affecting this writer) when John (Kevin Costner) was shot on the side of the road and left for dead. In fact, many of the fates of the Duttons were left up in the air in between seasons as Kayce (Luke Grimes) was attacked and shot at, and Beths (Kelly Reilly) office was blown up.The episode aired in August of 2020, and during that fall fans were letting theories run amuck who was responsible for the attack on the Duttons? Many were correct in thinking it was the biological father of Jamie (Wes Bentley), Garrett Randall, played by Will Patton. Randall had a mean grudge against John, and blamed him for ruining his life and taking Jamie away from him.Yet when one mystery was answered, another major one popped up. How did Randall afford all that? The show would have us believe that it was an old cell mate, Terrell Riggins (Bruno Amato) helped Randall out, but this is a three-tiered attack, with militia grade weaponry, taking on an ex-marine in Kayce (Luke Grimes) and someone with enough knowledge of explosives to take out Beth (Kelly Reilly) with a letter bomb. Theres honor among thieves, but it seems like even if Randall owed Riggins a few cigarettes, this is quite a rich favor.How would a small-time criminal, who after being released had no money to his name manage to convince these professionals to take on the hit? Did Randall promise payment once Jamie inherited the Yellowstone?What Happened Between John and Kayce?From the very first season, the YellowstoneRealistically, the brand meant way too many things, even intended as a punishment. After John (Kevin Costner) promised the grandfather of Jimmy (Jefferson White) he would take the kid in and clean him up, the brand was a reminder of leaving the life he had behind.Then comes the story of Kayce (Luke Grimes), Johns youngest son. Kayces only sin was falling in love and having a child with Monica (Kelsey Asbille). John did not respond too well to this and brands Kayce for his crimes. Even for John, does this not seem like an overreaction? What was it about Monica that John hated so much? Why did he feel the need to brand his own son, just to show that John owned him? Why did he never brand Jamie, a son he barely felt an attachment to at times? Why was his reaction to Kayces insubordination so much more harsh that the countless times Beth went against his wishes?Especially considering that the relationship between Kayce and John perhaps showed the most growth, with John coming to be quite tender to Monica, and showing nothing but love to his grandson Tait (a grandson he, at one time, wanted aborted), its strange that John never took the time to explain why he did what he did to Kayce. Tough love or not.What Ever Happened to?This one is a bit of a cheat, as it groups several loose threads together, but Yellowstone was notorious for teasing great new characters who looked like they were going to shake things up, only for those characters never to be seen or heard from again. Names like Cowboy, Angela Blue Thunder, and Christina were additions who came in and made a big impact but never really got a conclusion. Even Johns assistant, Clara (Lilli Kay) was set up to be a major player in the downfall of Jamie even within the middle episodes of season five, but instead apparently the character felt discretion was the better part of valor.One of the more notable abandoned plots was that between Kayce and the seductive Avery (Tanaya Beatty). For several episodes in the fourth season, Avery seemed to have graduated from side character to featured player as she entered the gaze of Kayce. In a very short amount of time, the beautiful temptress even admits that she loves Kayce, and it looked like there was going to be real trouble in paradise.After Kayces spiritual ayahuasca ceremony, where he has visions of Avery, he admits to Monica he saw the end of us. This was clearly intended to be a major storyline, perhaps driving a divide between Kayce and Monica but Kayces vision was the last vision audiences got of Avery.Bill Ramsey (Rob Kirkland) was another character who came in for a few episodes looking as if he was going to shake up the dynamic and be a real thorn in the Duttons side. Gone were the days when John was best friends with the Sheriff, because when Ramsey became Sheriff of Gallatin County, he seemed to be one who was going to clean things up and not be taken in by the charm and intimidation of John Dutton. Kirkland and Costner shared important scenes together, seemingly setting up a real rivalry. However, aside from Ramsey arresting Beth after a barroom brawl, he never crossed paths with the Duttons again.Who Was Jamies Biological Mother?Speaking of characters that should have, or were intended to have more meaning When Jamie finally learned the truth of his parentage, he confronted his biological father, Garrett Randall. The only thing Jamie knew about his parents was that Garrett was a violent criminal who beat Jamies biological mother to death.The strange thing isnt that Jamie and Randall somehow developed a trusting, loving relationship despite that horrific history, it was that it was never really revealed why John and Evelyn Dutton took Jamie in. The relationship between Jamies mother, Phyllis and the Duttons seemed to be more than just casual.John mentions in passing that he took Jamie in because he knew Jamies mother, and wanted to save him from a life of violence (isnt that ironic considering Jamies upbringing). It was theorized that Jamie was perhaps an illegitimate son, or at least a biological relative. Was Phyllis related to Evelyn? Was her maiden name even perhaps Dutton? Its once again such a major shift in character that John, who is often ice cold towards his own biological children, took in a 3 month old child for no reason, unless Phyllis was a major part of his life.Why Was Beths Relationship With Her Mother So Toxic?If any therapists are reading this article, this writer swears he doesnt have any unresolved issues with his mother.Yet another mother-of-a-storyline that needed a lot more explanation was surrounding Evelyn Duttons (Gretchen Mol) accidental death. It was a moment that shaped Beth into the mother-of-all headaches, and coincidentally a thread Sheridan clearly thought about, right until the end.In the finale, when Beth and Rip (Cole Hauser) finally settle into their own ranch, Beth is giving her husband the report on the nearest town. Its seemingly the quiet piece of paradise theyve always been looking for, even with a hitching post outside the local watering hole. This is a major reveal as Beth hasnt really ridden since she was a child, and since she was very indirectly responsible for her mothers death.In the very first season, it is revealed that Beth lost control of her horse, scaring others, and one of them ended up falling and landing on Evelyn. Evelyn specifically tells a young Kayce that Beth is the one who has to ride back to the ranch to get help. Of course, Beth being uncertain on a horse eventually falls off of hers, delaying help, and Evelyn succumbs to her injuries, but the harshness Evelyn threw at Beth was palpable.Audiences have seen strong, no nonsense Dutton women before (even those who marry into the family) but none of them were so devoid of love for their daughter as Evelyn was for Beth. Its awful to think that no Dutton child in this modern Yellowstone generation had a good relationship with both of their parents, but between John being toxic to his boys, and Beth and Evelyns undefined tension, the kids had a rough go.
    0 Σχόλια ·0 Μοιράστηκε ·78 Views