Towards AI
Towards AI
The leading AI community & content platform making AI accessible to all.
2k writers | 330k followers
  • 1 χρήστες τους αρέσει
  • 99 Δημοσιεύσεις
  • 2 τις φωτογραφίες μου
  • 0 Videos
  • 0 Προεπισκόπηση
  • News
Αναζήτηση
Πρόσφατες ενημερώσεις
  • TOWARDSAI.NET
    Qwen 2.5 Coder 32B: Is This Best Open Weight Model Better than GPT-4o and Claude 3.5 Sonnet
    Author(s): Barhoumi Mosbeh Originally published on Towards AI. On November 11, Alibaba announced its most advanced coding model to date: Qwen 2.5-Coder-32B-Instruct. But thats not all, its actually part of a whole family of coding models! In addition to the 32B model, there are versions with 0.5B, 1.5B, 3B, 7B, and 14B parameters. Before writing this article, I reviewed feedback from many of YouTubers, writers, and technical experts, and the consensus has been overwhelmingly positive. Today, well explore whether it really lives up to the hype.Ive been using ChatGPT, Gemini, and Claude for a while, and I can confidently say that Claude is the best at coding and understanding complex tasks. To be honest, Gemini is in a different league altogether. Fun fact: I once won a machine learning competition two years ago using Bard (now rebranded as Gemini) because everyone else was using ChatGPT. Bards performance gave me a competitive edge back then. Today, I believe Gemini might be the best vision model available, but overall, Im not a big fan.To evaluate the models performance in coding, we should cover the following three key aspects: Code Generation, Code Repair, and Code Reasoning. Additionally, we need to consider how many programming languages the model supports and its alignment with human preferences.OverviewBefore testing out Alibabas model, lets take a look at the performance results they have announced.Based on their internal benchmarks, it even outperforms GPT-4 and comes very close to Claude 3.5. However, I personally dont place too much weight on internal benchmarks alone, but the results seem very interesting according to those who have tried it.You can find more details about the performance here:Qwen2.5-Coder-32B-Instruct Qwen2Now it is the time, Nov. 11 10:24! The perfect time for our best coder model ever! Qwen2.5-Coder-32B-Instruct!qwen2.orgIm testing the model locally using Ollama:qwen2.5-coder:32bThe latest series of Code-Specific Qwen models, with significant improvements in code generation, code reasoning, andollama.comYou can also try out the model on the demo page of Hugging Face:Qwen2.5 Coder Demo a Hugging Face Space by QwenDiscover amazing ML apps made by the communityhuggingface.coCode GenerationMany people have already covered how well it performs in web and design:So, we will focus on testing it with some Python challenges. Lets start with a medium-level LeetCode problem:PromptThe problem description: Longest Substring Without Repeating CharactersAnswerclass Solution(object): def lengthOfLongestSubstring(self, s): """ :type s: str :rtype: int """ # Dictionary to store the last seen index of each character char_index_map = {} # Initialize the start of the current window and the maximum length start = 0 max_length = 0 for end in range(len(s)): # If the character is already in the map and its index is within the current window if s[end] in char_index_map and char_index_map[s[end]] >= start: # Move the start to the right of the last occurrence of the current character start = char_index_map[s[end]] + 1 # Update the last seen index of the current character char_index_map[s[end]] = end # Update the maximum length of the substring found so far max_length = max(max_length, end - start + 1) return max_lengthResultsImage from the authorNot the best, but it gets things done and passes all the tests. Can it solve a hard problem? (Note that the model could have been trained on these problems already, but its still impressive.)PromptProblem description: Regular Expression MatchingAnswerclass Solution(object): def isMatch(self, s, p): """ :type s: str :type p: str :rtype: bool """ m, n = len(s), len(p) # Create a 2D array to store the results of subproblems dp = [[False] * (n + 1) for _ in range(m + 1)] # Base case: empty string matches empty pattern dp[0][0] = True # Handle patterns like a*, a*b*, etc., which can match an empty string for j in range(1, n + 1): if p[j - 1] == '*': dp[0][j] = dp[0][j - 2] # Fill the dp table for i in range(1, m + 1): for j in range(1, n + 1): if p[j - 1] == '.' or p[j - 1] == s[i - 1]: dp[i][j] = dp[i - 1][j - 1] elif p[j - 1] == '*': dp[i][j] = dp[i][j - 2] # '*' matches zero occurrences if p[j - 2] == '.' or p[j - 2] == s[i - 1]: dp[i][j] = dp[i][j] or dp[i - 1][j] # '*' matches one or more occurrences # The result is in dp[m][n] return dp[m][n]ResultsImage from the authorAfter testing it on 1020 problems on LeetCode, I have the impression that it was trained on all problems, so its not the best way to evaluate it. However, its still worth sharing. Additionally, it always chooses to provide a Python solution, even when not specifically asked for one. But lets try something different. I asked the model to optimize the code to minimize resource consumption as much as possible.Image from the authorI didnt expect that, but when I asked the model to improve the performance, it even fails in the simplest cases.Lets now move to the next test, Code Repair. I will be providing the model with some code containing errors generated by Claude. I asked the model to generate some hard cases to fix, and lets see how it performs.Code RepairPromptThe following code contains several bugs. Please identify and fix them, and provide an explanation for each bug you find.class AsyncTaskQueue: def __init__(self): self.tasks = [] self.running = False self.lock = threading.Lock() self.workers = [] async def add_task(self, task): async with self.lock: self.tasks.append(task) if not self.running: self.start_processing() async def start_processing(self): self.running = True while self.tasks: task = self.tasks.pop(0) worker = asyncio.create_task(self.process_task(task)) self.workers.append(worker) if len(self.workers) >= 3: done, pending = await asyncio.wait(self.workers) self.workers = list(pending) self.running = False async def process_task(self, task): result = await task() return result AnswerCertainly! Let's go through the provided code and identify the issues, then correct them. Here are the main problems and their fixes:Issues IdentifiedThreading Lock Usage with Asyncio:The threading.Lock is not suitable for use with asyncio. Instead, you should use asyncio.Lock.Async Context Manager for Lock:The async with statement is used incorrectly with threading.Lock. It should be used with asyncio.Lock.Starting Processing:The start_processing method should be called in a non-blocking manner. Since start_processing is an async method, it should be awaited properly.Task Management:The start_processing method should handle task processing more efficiently. The current implementation might not handle task completion correctly.Worker Management:The workers list should be managed more carefully to ensure that tasks are completed before marking the queue as not running.Corrected CodeHere's the corrected version of the code:import asyncioimport threadingclass AsyncTaskQueue: def __init__(self): self.tasks = [] self.running = False self.lock = asyncio.Lock() self.workers = [] async def add_task(self, task): async with self.lock: self.tasks.append(task) if not self.running: self.running = True asyncio.create_task(self.start_processing()) async def start_processing(self): while True: async with self.lock: if not self.tasks: self.running = False break task = self.tasks.pop(0) worker = asyncio.create_task(self.process_task(task)) self.workers.append(worker) if len(self.workers) >= 3: done, pending = await asyncio.wait(self.workers, return_when=asyncio.FIRST_COMPLETED) self.workers = list(pending) async def process_task(self, task): result = await task() return result# Example usageasync def example_task(): await asyncio.sleep(1) print("Task completed") return "Result"async def main(): queue = AsyncTaskQueue() await queue.add_task(example_task) await queue.add_task(example_task) await queue.add_task(example_task) await queue.add_task(example_task) await queue.add_task(example_task) await queue.add_task(example_task)asyncio.run(main())Explanation of FixesAsyncio Lock:Replaced threading.Lock with asyncio.Lock to ensure compatibility with asynchronous operations.Async Context Manager for Lock:Used async with self.lock: to properly manage the lock in an asynchronous context.Starting Processing:Changed the logic to start processing in a non-blocking manner by using asyncio.create_task(self.start_processing()) inside add_task.Task Management:Modified start_processing to continuously check for tasks and process them until no tasks are left. It breaks out of the loop when there are no more tasks and sets self.running to False.Worker Management:Ensured that workers are managed correctly by using asyncio.wait with return_when=asyncio.FIRST_COMPLETED to handle task completion efficiently.Example UsageThe example usage demonstrates how to add tasks to the AsyncTaskQueue and run them asynchronously. The main function adds several tasks to the queue, and they are processed concurrently with a maximum of 3 workers at a time.This corrected and optimized code should perform better and be more reliable.Results and feedbackClaude evaluates the models response as 7.5/10. with this feedback:While the model provided a functionally correct solution, it delivered a tutorial-grade fix rather than a production-grade implementation. This indicates that for complex concurrent programming tasks, human expertise is still crucial for robust system design.After experimenting with the model, it gets the job done, but often requires follow-up prompts to improve the response. While it doesnt struggle with understanding the task at hand and can correct most critical bugs, it still lacks some of the strengths seen in models like ChatGPT and Claude, particularly in areas such as error handling, resource management, and optimization. I believe these may be the models key weaknesses.Specialized LLMs: A New Era in AIIts fascinating to witness the emergence of relatively small, domain-specific language models that can run locally. I believe were entering an exciting new era of specialized LLMs, and this is just the beginning. While the concept isnt new, were finally seeing models that truly excel in specific domains.The combination of RAG (Retrieval-Augmented Generation) and specialist LLMs could define the AI landscape in the coming months/years. Were likely to see more powerful coding-focused models from industry leaders like OpenAI and Anthropic. Programming is perhaps one of the most natural domains for specialized AI, and we might soon see even more focused models, imagine LLMs specifically optimized for DevOps or front-end development!Dont feel overwhelmed by these rapid advances. Yes, it can be somewhat daunting to see LLMs mastering skills that traditionally took years to develop. Coding, which has challenged humanity for decades, is being transformed before our eyes. But rather than seeing this as an endpoint, we should view it as an opportunity for growth and innovation.Whether the current wave of LLM advances slows down in the coming years or this is merely the beginning of a longer journey, our response should remain the same: stay curious, keep learning, and never stop innovating. The future of technology is being written right now, and we all have a part to play in shaping it.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AI
    0 Σχόλια 0 Μοιράστηκε 39 Views
  • TOWARDSAI.NET
    Qwen 2.5 Coder 32B: Is This Best Open Weight Model Better than GPT-4o and Claude 3.5 Sonnet
    Author(s): Barhoumi Mosbeh Originally published on Towards AI. On November 11, Alibaba announced its most advanced coding model to date: Qwen 2.5-Coder-32B-Instruct. But thats not all, its actually part of a whole family of coding models! In addition to the 32B model, there are versions with 0.5B, 1.5B, 3B, 7B, and 14B parameters. Before writing this article, I reviewed feedback from many of YouTubers, writers, and technical experts, and the consensus has been overwhelmingly positive. Today, well explore whether it really lives up to the hype.Ive been using ChatGPT, Gemini, and Claude for a while, and I can confidently say that Claude is the best at coding and understanding complex tasks. To be honest, Gemini is in a different league altogether. Fun fact: I once won a machine learning competition two years ago using Bard (now rebranded as Gemini) because everyone else was using ChatGPT. Bards performance gave me a competitive edge back then. Today, I believe Gemini might be the best vision model available, but overall, Im not a big fan.To evaluate the models performance in coding, we should cover the following three key aspects: Code Generation, Code Repair, and Code Reasoning. Additionally, we need to consider how many programming languages the model supports and its alignment with human preferences.OverviewBefore testing out Alibabas model, lets take a look at the performance results they have announced.Based on their internal benchmarks, it even outperforms GPT-4 and comes very close to Claude 3.5. However, I personally dont place too much weight on internal benchmarks alone, but the results seem very interesting according to those who have tried it.You can find more details about the performance here:Qwen2.5-Coder-32B-Instruct Qwen2Now it is the time, Nov. 11 10:24! The perfect time for our best coder model ever! Qwen2.5-Coder-32B-Instruct!qwen2.orgIm testing the model locally using Ollama:qwen2.5-coder:32bThe latest series of Code-Specific Qwen models, with significant improvements in code generation, code reasoning, andollama.comYou can also try out the model on the demo page of Hugging Face:Qwen2.5 Coder Demo a Hugging Face Space by QwenDiscover amazing ML apps made by the communityhuggingface.coCode GenerationMany people have already covered how well it performs in web and design:So, we will focus on testing it with some Python challenges. Lets start with a medium-level LeetCode problem:PromptThe problem description: Longest Substring Without Repeating CharactersAnswerclass Solution(object): def lengthOfLongestSubstring(self, s): """ :type s: str :rtype: int """ # Dictionary to store the last seen index of each character char_index_map = {} # Initialize the start of the current window and the maximum length start = 0 max_length = 0 for end in range(len(s)): # If the character is already in the map and its index is within the current window if s[end] in char_index_map and char_index_map[s[end]] >= start: # Move the start to the right of the last occurrence of the current character start = char_index_map[s[end]] + 1 # Update the last seen index of the current character char_index_map[s[end]] = end # Update the maximum length of the substring found so far max_length = max(max_length, end - start + 1) return max_lengthResultsImage from the authorNot the best, but it gets things done and passes all the tests. Can it solve a hard problem? (Note that the model could have been trained on these problems already, but its still impressive.)PromptProblem description: Regular Expression MatchingAnswerclass Solution(object): def isMatch(self, s, p): """ :type s: str :type p: str :rtype: bool """ m, n = len(s), len(p) # Create a 2D array to store the results of subproblems dp = [[False] * (n + 1) for _ in range(m + 1)] # Base case: empty string matches empty pattern dp[0][0] = True # Handle patterns like a*, a*b*, etc., which can match an empty string for j in range(1, n + 1): if p[j - 1] == '*': dp[0][j] = dp[0][j - 2] # Fill the dp table for i in range(1, m + 1): for j in range(1, n + 1): if p[j - 1] == '.' or p[j - 1] == s[i - 1]: dp[i][j] = dp[i - 1][j - 1] elif p[j - 1] == '*': dp[i][j] = dp[i][j - 2] # '*' matches zero occurrences if p[j - 2] == '.' or p[j - 2] == s[i - 1]: dp[i][j] = dp[i][j] or dp[i - 1][j] # '*' matches one or more occurrences # The result is in dp[m][n] return dp[m][n]ResultsImage from the authorAfter testing it on 1020 problems on LeetCode, I have the impression that it was trained on all problems, so its not the best way to evaluate it. However, its still worth sharing. Additionally, it always chooses to provide a Python solution, even when not specifically asked for one. But lets try something different. I asked the model to optimize the code to minimize resource consumption as much as possible.Image from the authorI didnt expect that, but when I asked the model to improve the performance, it even fails in the simplest cases.Lets now move to the next test, Code Repair. I will be providing the model with some code containing errors generated by Claude. I asked the model to generate some hard cases to fix, and lets see how it performs.Code RepairPromptThe following code contains several bugs. Please identify and fix them, and provide an explanation for each bug you find.class AsyncTaskQueue: def __init__(self): self.tasks = [] self.running = False self.lock = threading.Lock() self.workers = [] async def add_task(self, task): async with self.lock: self.tasks.append(task) if not self.running: self.start_processing() async def start_processing(self): self.running = True while self.tasks: task = self.tasks.pop(0) worker = asyncio.create_task(self.process_task(task)) self.workers.append(worker) if len(self.workers) >= 3: done, pending = await asyncio.wait(self.workers) self.workers = list(pending) self.running = False async def process_task(self, task): result = await task() return result AnswerCertainly! Let's go through the provided code and identify the issues, then correct them. Here are the main problems and their fixes:Issues IdentifiedThreading Lock Usage with Asyncio:The threading.Lock is not suitable for use with asyncio. Instead, you should use asyncio.Lock.Async Context Manager for Lock:The async with statement is used incorrectly with threading.Lock. It should be used with asyncio.Lock.Starting Processing:The start_processing method should be called in a non-blocking manner. Since start_processing is an async method, it should be awaited properly.Task Management:The start_processing method should handle task processing more efficiently. The current implementation might not handle task completion correctly.Worker Management:The workers list should be managed more carefully to ensure that tasks are completed before marking the queue as not running.Corrected CodeHere's the corrected version of the code:import asyncioimport threadingclass AsyncTaskQueue: def __init__(self): self.tasks = [] self.running = False self.lock = asyncio.Lock() self.workers = [] async def add_task(self, task): async with self.lock: self.tasks.append(task) if not self.running: self.running = True asyncio.create_task(self.start_processing()) async def start_processing(self): while True: async with self.lock: if not self.tasks: self.running = False break task = self.tasks.pop(0) worker = asyncio.create_task(self.process_task(task)) self.workers.append(worker) if len(self.workers) >= 3: done, pending = await asyncio.wait(self.workers, return_when=asyncio.FIRST_COMPLETED) self.workers = list(pending) async def process_task(self, task): result = await task() return result# Example usageasync def example_task(): await asyncio.sleep(1) print("Task completed") return "Result"async def main(): queue = AsyncTaskQueue() await queue.add_task(example_task) await queue.add_task(example_task) await queue.add_task(example_task) await queue.add_task(example_task) await queue.add_task(example_task) await queue.add_task(example_task)asyncio.run(main())Explanation of FixesAsyncio Lock:Replaced threading.Lock with asyncio.Lock to ensure compatibility with asynchronous operations.Async Context Manager for Lock:Used async with self.lock: to properly manage the lock in an asynchronous context.Starting Processing:Changed the logic to start processing in a non-blocking manner by using asyncio.create_task(self.start_processing()) inside add_task.Task Management:Modified start_processing to continuously check for tasks and process them until no tasks are left. It breaks out of the loop when there are no more tasks and sets self.running to False.Worker Management:Ensured that workers are managed correctly by using asyncio.wait with return_when=asyncio.FIRST_COMPLETED to handle task completion efficiently.Example UsageThe example usage demonstrates how to add tasks to the AsyncTaskQueue and run them asynchronously. The main function adds several tasks to the queue, and they are processed concurrently with a maximum of 3 workers at a time.This corrected and optimized code should perform better and be more reliable.Results and feedbackClaude evaluates the models response as 7.5/10. with this feedback:While the model provided a functionally correct solution, it delivered a tutorial-grade fix rather than a production-grade implementation. This indicates that for complex concurrent programming tasks, human expertise is still crucial for robust system design.After experimenting with the model, it gets the job done, but often requires follow-up prompts to improve the response. While it doesnt struggle with understanding the task at hand and can correct most critical bugs, it still lacks some of the strengths seen in models like ChatGPT and Claude, particularly in areas such as error handling, resource management, and optimization. I believe these may be the models key weaknesses.Specialized LLMs: A New Era in AIIts fascinating to witness the emergence of relatively small, domain-specific language models that can run locally. I believe were entering an exciting new era of specialized LLMs, and this is just the beginning. While the concept isnt new, were finally seeing models that truly excel in specific domains.The combination of RAG (Retrieval-Augmented Generation) and specialist LLMs could define the AI landscape in the coming months/years. Were likely to see more powerful coding-focused models from industry leaders like OpenAI and Anthropic. Programming is perhaps one of the most natural domains for specialized AI, and we might soon see even more focused models, imagine LLMs specifically optimized for DevOps or front-end development!Dont feel overwhelmed by these rapid advances. Yes, it can be somewhat daunting to see LLMs mastering skills that traditionally took years to develop. Coding, which has challenged humanity for decades, is being transformed before our eyes. But rather than seeing this as an endpoint, we should view it as an opportunity for growth and innovation.Whether the current wave of LLM advances slows down in the coming years or this is merely the beginning of a longer journey, our response should remain the same: stay curious, keep learning, and never stop innovating. The future of technology is being written right now, and we all have a part to play in shaping it.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AI
    0 Σχόλια 0 Μοιράστηκε 39 Views
  • TOWARDSAI.NET
    I Built an OpenAI-Style Swarm That Runs Entirely on My Laptop. Heres How.
    LatestMachine LearningI Built an OpenAI-Style Swarm That Runs Entirely on My Laptop. Heres How. 0 like November 18, 2024Share this postAuthor(s): Vatsal Saglani Originally published on Towards AI. A developers journey into creating a privacy-focused, cost-effective multi-agent system using Python and open-source LLMs.This member-only story is on us. Upgrade to access all of Medium.Image by ChatGPTYou know how in sci-fi movies, AI systems seamlessly collaborate to solve complex problems? This always used to fascinate me as a kid. When I started learning about machine learning and deep learning in my pre-final year of undergrad in 201718, I was amazed by the potential of these models. It was fascinating how these models could learn to differentiate between dog breeds, predict the price of a house, classify tweet sentiments, and even play games like chess and Go.Attention, Tokenization, Transformers, and GPT were the new buzzwords, and I was fascinated by how these models could be used to add intelligence to applications. After graduating, when I started working as a data scientist, I gained first-hand experience training, fine-tuning, and deploying models for a wide range of applications. Those days we used to train/fine-tune models for individual tasks or features. We used to discuss how meta-learning, few-shot learning, and few-shot prompting could help us build more intelligent models. But we didnt exactly know how could we build a system that could learn to do multiple tasks without being explicitly told how to do so.Then in 2020, Read the full blog for free on Medium.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post
    0 Σχόλια 0 Μοιράστηκε 12 Views
  • TOWARDSAI.NET
    The Secret to Unlocking Deeper SWOT Analysis with AI (The Code That Started It All and How I Took It to the Next Level)
    The Secret to Unlocking Deeper SWOT Analysis with AI (The Code That Started It All and How I Took It to the Next Level) 0 like November 18, 2024Share this postAuthor(s): Mukundan Sankar Originally published on Towards AI. I thought I had cracked the code for automated SWOT analysis until Neuro-symbolic AI revealed an entirely new layer of depth.This member-only story is on us. Upgrade to access all of Medium.Image Depicting the Evolution of SWOT Analysis using AI and Neuro-symbolic AI using Napkin AI created by the AuthorYou know that feeling when you think youve figured something out, only to discover youve only scratched the surface? Thats exactly what happened to me with SWOT analysis and AI. Not too long ago, I published a blog post comparing my original GPT-3.5-based SWOT analysis tool with a new, more advanced approach Neuro-symbolic AI.From surface-level summaries to actionable intelligence discover how this AI upgrade breathed new life into mypub.towardsai.netIn that post, I dove into how neurosymbolic AIs blend of neural networks and symbolic reasoning brought a whole new level of insight to SWOT analysis. It wasnt just faster; it was smarter. It didnt just spit out lists of strengths and weaknesses; it wove them into a strategic narrative, highlighting connections and implications that GPT-3.5 simply couldnt reach.If you are wondering who I am and why I am talking about this, feel free to check out my About Me Blog below:https://medium.com/about-me-stories/about-me-mukundan-sankar-2c66b289bfccThat comparison post was well-received, but something didnt sit right. I kept thinking: What if I shared the whole journey? Read the full blog for free on Medium.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post
    0 Σχόλια 0 Μοιράστηκε 12 Views
  • TOWARDSAI.NET
    How To Learn Any Language Efficiently With AI?
    LatestMachine LearningHow To Learn Any Language Efficiently With AI? 0 like November 17, 2024Share this postLast Updated on November 18, 2024 by Editorial TeamAuthor(s): Nicolas MARTIN Originally published on Towards AI. Learning a language is often difficult and requires dozens of classes until reaching a reasonable level. Unfortunately, a lot of people in the world cannot afford to have quality classes. In addition, we can learn a language from books, but we dont have feedback about our pronunciation or actual level.What if AI is a great solution for learning any language efficiently and for free?This guide will teach us practical solutions to learn from chatbots in any language, using both textual and vocal features.AI-generated illustration (Grok)As I live in India, we will take the example of Tamil, one of the most difficult languages for Europeans and Americans to learn. The language I master the most is French. I will adapt this article to any language using the expressions [language_to_learn] for the language we want to learn and [my_language] for the primary language we already know. Note that you dont have to learn a language using English; you can use this guide based on any language you want.All the solutions are completely free and accessible with a smartphone, although a computer is recommended for reading long text and typing fast with any keyboard layout.1. Ask some questions about the language history and geography (5 min)A language never came out of the void; it always has a long story that is worth knowing to have a complete picture of its evolution and geography.This will help us connect the dots between the language and the people using it daily. A lack of general culture about a language can lead to misunderstanding and incomprehension between us and the people. We can be more specific by asking about the main differences in our culture, and the AI will answer more accurately.Therefore, we can first ask the question:In 300 words, explain the origin of [language_to_learn], its geographic limitations, and three critical things worth knowing about the [language_to_learn] culture today that differs from my [my_language] culture.The Tamil culture and its main differences with the French one (generated with ChatGPT)2. Make a learning plan for one month (5 min)Now that we have general knowledge about the language we want to learn, we can prepare a learning plan according to the main rules.Learning a language needs practice, and we can define the limits to have a realistic learning plan, such as the available time or if we want to know both writing and speaking. I recommend learning both, as writing and speaking create a strong connection with each other.To learn a language efficiently, discipline is vital, and learning 30 minutes every day, 6 times per week, is better than 3 classes of 2 hours per week because memorization works better daily.The smartphone version of ChatGPT is very useful as you can ask for the pronunciation of the words and get valuable tips. The free version is limited to 10 minutes, but it is enough for us to learn daily.How can I learn [language_to_learn] to reach a good basis in one month? Give a learning plan with the main steps, knowing that I have 30 minutes per day and I have some conversational tools with AI to check my pronunciation (with ChatGPT vocal) and my knowledge. Keep in mind I must learn to write and speak. Organize it in a table.The learning plan was obtained with Claude.ai (text) and transferred to the table with ChatGPT.As chatbot sessions are limited in request per page, I recommend keeping the learning plan structure and copying/pasting it every day on a new page to be able to handle questions and answers and maintain track of the conversation.Daily session structure and resources.Optional: Use tips and resources given by the ChatBot (see example above) if they are relevant to our learning habits.AI-generated picture of Nelson Mandela with one of his most famous quotes3. Follow your learning plan every day (30 min)The following steps are straightforward.Once you copy the learning plan in any chatbot, ask:Give the material for the Day [n] lesson.Tamil lesson 1 example with ChatGPTAbout the vocal chatbot on a smartphone (ex, ChatGPT), click on the bottom right button and first ask:What is the [content of the lesson n]? Check my pronunciation.Here is the vocal option in ChatGPT at the bottom right cornerHere is an example of Tamil. I should have asked in French, but I have done my best to do it in English for a better understanding.Keep in mind that the pronunciation function is very accurate in many languages. I have also tested the pronunciation in German, one of the languages I know the best, and the pronunciation advice was very accurate.In some cases, like in English, we can ask to learn a language with a specific accent so that we can choose the proper pronunciation we want.Here is a funny example of how ChatGPT can speak with specific English accents, including Japanese:https://x.com/minchoi/status/1857467047493480475?s=46&t=MxrSpRHarlH_X7ulzALqHwAbout the writing, we can take a picture of what we wrote and ask the chatbot whether it is well written or not.(click on + and Take Photo at the bottom left corner) Is this correctly written in [language_to_learn]?An example of vowel manual writing verification in Tamil.As you can see, the chatbot detected some mistakes in my writing style in some characters that I can correct to reiterate the process until the writing is fine.4. Know the major pronunciation difficulties from the beginningSome fundamental pronunciations are very different from one language to another, and it is essential to know them at an early learning stage.Give the 3 most difficult things in the [language_to_learn] pronunciation for someone who only speaks [my_language].Most challenging aspects of Tamil pronunciation for someone who speaks French5. Additional thoughtsThe best learning solution is to have a teacher. This guide is helpful for people who cant afford to have a teacher and want another efficient way to learn a language.A teacher is better because we have to go to classes and get direct feedback for our evaluation. Using AI grants us feedback but requires much more discipline to study every day, which is difficult overall when it is free (paying to learn something gives us more credit and commitment).Human interaction is still essential to learning a language. It is great to have AI tools to learn many things faster, but most of us are social beings with social needs. Discussing with people always plays an important role.Speaking about AI tools, hundreds of existing ones are worth testing. Here is a list from Theres An AI For That, including Lingolette or Jarggin.AI is evolving very fast, and learning solutions based on AI will probably get better and better. It also applies to translation AI tools where instant translation is already possible.ConclusionLearning a language is a tedious activity that requires patience, effort, and regularity. Even though a teacher is the best solution, not everyone can afford one and AI can be a good alternative. Indeed, more and more text and vocal AIs can organize a learning plan and evaluate our knowledge, including our pronunciation or writing style. Some might argue that learning a language is unnecessary, given the rapid advancements in AI, but isnt the ability to learn and speak languages essential to our humanity, just as chess remains beloved by human players despite AI surpassing human capabilities?Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post
    0 Σχόλια 0 Μοιράστηκε 12 Views
  • TOWARDSAI.NET
    [AI/ML] Spatial Transformer Networks (STN) Overview, Challenges And Proposed Improvements
    Author(s): Shashwat Gupta Originally published on Towards AI. The modification of dynamic spatial information through spatial transformer networks (STNs) allows models to handle transformations such as scaling and rotation for subsequent tasks. They enhance recognition accuracy by enabling models to focus on essential visual regions with minimal dependence on pooling layers. This blog delves into the functional advantages and disadvantages of STNs, despite the extensive coverage in multiple studies. We also examine P-STN, a potential upgrade from 2020 including enhanced transformations and increased efficiency. The construction of more adaptable and precise machine learning models relies on an understanding of STNs and their advancements.Disclaimer: Much of this section is inspired by the original paper on Spatial Transformer Networks [1,2,3]Spatial Transformer Networks (STN):STNs (Spatial Transformer Networks), by Max Jaderberg et al., are modules that can learn to adjust the spatial information in a model, making it more resistant to changes like warping. Before STNs, achieving this required many layers of Max-Pooling. Unlike pooling layers, which have fixed and small areas they examine, spatial transformers can dynamically change an image or feature map by applying different transformations for each input. These transformations affect the entire feature map and can include scaling, cropping, rotating, and bending.This capability allows networks to focus on important parts of an image (a process called attention) and adjust these parts to a standard position, making it easier to recognize them in later layers. STNs expand on the idea of attention modules by handling spatial transformations. They can be trained using regular back-propagation, which means the entire model can be trained all at once. STNs are useful for various tasks, including image classification, locating objects, and managing spatial attention.Figure 1 : STN (Source: https://arxiv.org/pdf/1612.03897.pdf)The STN Consists of the following 3 parts:Localisation NetGrid GeneratorSampler1. Localisation Network:It takes the input feature map U R HWC , and outputs the parameters of transformation ( = floc(U)). It can take any form but should include a final regressor layer to produce the transformation parameters 2. Parametrised Grid Sampling :The output pixels are computed by applying a sampling kernel centred at each location of the input feature map. The only constraint is that the transformation should be different wrt the parameters to allow for back-propagation. A good heuristic is to predict the transformation parametrised in a low dimensional way so that the complexity of the task assigned to the localisation network is reduced, and it can also learn about the target grid representation. e.g. if _ = M_B, where B is the target representation. Thus, it is also possible to learn and B.In our case, we analyze 2D transformations, which the following equation can overall summarise:3. Differentiable Image Sampling:Differentiable Image Sampling To perform a spatial transformation of the input feature map, a sampler must take the set of sampling points T(G), along with the input feature map U, and produce the sampled output feature map V . Each (x s i , ys i ) coordinate in (G) defines the spatial location in the input where a sampling kernel is applied to get the value at a particular pixel in the output V . This can be written as:where and are the parameters of a generic sampling kernel k() which defines the image interpolation (e.g. bilinear), U^c_{nm} is the value at location (n, m) in channel c of the input, and V^c_i is the output value for pixel i at location (x^t_i , y^t_i ) in channel c. Note that the sampling is done identically for each channel of the input, so every channel is transformed identically (this preserves spatial consistency between channels).In theory, any sampling kernel can be used, as long as (sub-)gradients can be defined with respect to x^s_i and y^s_i . For example, using the integer sampling kernel reduces the above equation to:where [x+ 0.5] rounds x to the nearest integer and () is the Kronecker delta function. This sampling kernel equates to just copying the value at the nearest pixel to (x s i , ys i ) to the output location (x t i , yt i ). Alternatively, a bilinear sampling kernel can be used, givingTo allow backpropagation of the loss through this sampling mechanism, we can define the gradients with respect to U and G. For bilinear sampling above equation, the partial derivatives areThis gives us a (sub-)differentiable sampling mechanism, allowing loss gradients to flow back not only to the input feature map but also to the sampling grid coordinates and, therefore, back to the transformation parameters and localization network since x^{s}_i / and y^{s}_{i}/ can be easily derived. Due to discontinuities in the sampling functions, sub-gradients must be used. This sampling mechanism can be implemented very efficiently on GPU by ignoring the sum over all input locations and instead just looking at the kernel support region for each output pixel.For better warping, the STNs can be cascaded by passing the output of one STN to the next (as in [2]) and with additional input to condition (as in [1])Pros and cons of STNs :The overall pros of STNs are :STNs are very fast, and the application does not require making many modifications to the downstream modelThey can also be used to downsample or oversample a feature map (downsampling with fixed, small support might lead to an aliasing effect)Multiple STNs can be used. The combination can be in Series (for more complex feature learning, with the input of one STN going into another, with or without an unwarped conditional input.Parallel combinations are effective when there are more than one parts to focus on in images (It was shown that of 2 STNs used on the CUB-2002011 bird classification dataset, one became head-detector and the other became body-detector)However, STNs are notoriously known to suffer from the following 2 defects :1. Boundary effect arises as the image is propagated and not the geometric information (e.g. if an image is rotated, STNs can fix the rotation, but they do not fix the degraded boundary effects like cut corners etc.). This could be solved by boundary aware sampling:2. Single STN application is insufficient to learn complex transformations This could be solved by hierarcial cascaded STNs (i.e. STNs in series) with multi-scale transformations.3. Training Difficulty: Hard to train due to sensitivity to small mis-predictions in transformation parameters solved in P-STN (below)4. Sensitivity to Errors: Mis-predicted transformations can lead to poor localization, adversely affecting downstream tasks solved in P-STN (below)P-STN : an improvement over STNProbabilistic Spatial Transformer Networks (P-STN) by Schwbel et al. [7], address the limitations 3 and 4 by introducing a probabilistic framework to the transformation process. Instead of predicting a single deterministic transformation, P-STN estimates a distribution over possible transformations (probabilistic Transformation).Figure 2 : The P-STN pipeline. From the observed image I, a distribution of transformations is estimated. Samples from this distribution are applied to the observed image to produce augmented samples, which are fed to a classifier that averages across samples. In the deterministic STN case, the localizer only computes one transformation (I), which can be thought of as the maximum likelihood solution. Instead of the multiple transformation samples, we obtain a single T_{}^{I} in this caseThis probabilistic approach offers several key improvements:Robustness Through Marginalization:Multiple Transformations: By sampling multiple transformations from the estimated distribution, P-STN effectively looks at the input from various perspectives. This marginalization over transformations mitigates the impact of any single mis-predicted transformation.Smoother Loss Landscape: The integration over multiple transformations results in a more stable and smoother loss landscape, facilitating easier and more reliable training.2. Enhanced Data Augmentation:Learned Augmentations: The stochastic transformations serve as a form of learned data augmentation, automatically generating diverse training samples that improve the models generalization capabilities.Improved Downstream Performance: This augmentation leads to better classification accuracy, increased robustness, and improved model calibration.3. Applicability to Diverse Domains:While initially designed for image data, P-STNs probabilistic nature allows it to generalize effectively to non-visual domains, such as time-series data, further demonstrating its versatility.The mathematical equations for the changes are as follows:Illustrative Benefits:Reduced Sensitivity to Transformation Errors:STN LossNegative Log-Likelihood of a Single TransformationP-STN LossAverage Negative Log-Likelihood Over Multiple TransformationsBy averaging over multiple transformations, P-STN reduces the impact of any single erroneous transformation, leading to a more stable and reliable training process.Improved Calibration:Calibration Error_STN > Calibration Error_P-STNP-STNs approach of considering multiple transformations results in better-calibrated probabilities, as evidenced by lower calibration errors compared to STN.Probabilistic Spatial Transformer Networks enhance the original STN framework by introducing a distribution over possible spatial transformations. This probabilistic approach leads to more robust training, effective data augmentation, improved classification performance, and better-calibrated models. The integration of variational inference and Monte Carlo sampling in P-STN provides a principled way to handle transformation uncertainties, making it a significant advancement over traditional STNs.I write about technology, investing and books I read. Here is an index to my other blogs (sorted by topic): https://medium.com/@shashwat.gpt/index-welcome-to-my-reflections-on-code-and-capital-2ac34c7213d9References :Paper: IC-STN: https://arxiv.org/pdf/1612.03897.pdfSTN: https://paperswithcode.com/method/stnVideo: https://www.youtube.com/watch?v=6NOQC_fl1hQ&t=162s (with slides, CV reading group resources)Paper: Lenc and A. Vedaldi. Understanding image representations by measuring their equivariance and equivalence. CVPR, 2015 (defines affine invariance, equivariance, and equivalence criterion)STN PyTorch Implementation: https://pytorch.org/tutorials/ intermediate/spatial_transformer_tutorial.htmlScatter Nets: https://paperswithcode.com/paper/ invariant-scattering-convolution-networks#codeP-STN: https://backend.orbit.dtu.dk/ws/portalfiles/portal/280953750/2004.03637.pdfJoin thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AI
    0 Σχόλια 0 Μοιράστηκε 16 Views
  • TOWARDSAI.NET
    [AI/ML] Keswanis Algorithm for 2-player Non-Convex Min-Max Optimization
    Author(s): Shashwat Gupta Originally published on Towards AI. Keswanis Algorithm introduces a novel approach to solving two-player non-convex min-max optimization problems, particularly in differentiable sequential games where the sequence of player actions is crucial. This blog explores how Keswanis method addresses common challenges in min-max scenarios, with applications in areas of modern Machine Learning such as GANs, adversarial training, and distributed computing, providing a robust alternative to traditional algorithms like Gradient Descent Ascent (GDA).Problem Setting:We consider differentiable sequential games with two players: a leader who can commit to an action, and a follower who responds after observing the leaders action. Particularly, we focus on the zero-sum case of thisproblem which is also known as minimax optimization, i.e.,Unlike simultaneous games, many practical machine learning algorithms, including generative adversarial net-works (GANs) [2] [3] , adversarial training [4] and primal-dual reinforcement learning [5], explicitly specify theorder of moves between players and the order of which player acts first is crucial for the problem. In particular, min-max optimisation is curcial for GANs [2], statistics, online learning [6], deep learning, and distributed computing [7].Figure 1 : Non-Convex function Visualisation (Source: https://www.offconvex.org/2020/06/24/equilibrium-min-max/)Therefore, the classical notion of local Nash equilibrium from simultaneous games may not be a proper definition of local optima for sequential games since minimax is in general not equal to maximin. Instead, we consider the notion of local minimax [8] which takes into account the sequential structure of minimax optimization.Models and Methods:The vanilla algorithm for solving sequential minimax optimization is gradient descent-ascent (GDA), where both players take a gradient update simultaneously. However, GDA is known to suffer from two drawbacks.It has undesirable convergence properties: it fails to converge to some local minimax and can converge to fixed points that are not local minimax [9] [10]GDA exhibits strong rotation around fixed points, which requires using very small learning rates[11] [12] toconverge.Figure 2 : A Visualisation of GDA (Source: https://medium.com/common-notes/gradient-ascent-e23738464a19)Recently, there has been a deep interest in min-max problems, due to [9] and other subsequent works. Jin et al. [8] actually provide great insights to the work.Keswanis Algorithm:The algorithm essentially makes response function : maxy{R^m} f (., y) tractable by selecting y-updates (maxplayer) ingreedy manner by restricting selection of updated (x,y) to points along sets P(x,y) (which is defined as set of endpoints of paths such that f(x,.) is non-decreasing). There are 2 new things that this algorithm does to makecomputation feasible:Replace P(x, y) with P (x, y) (endpoints of paths along which f(x,.) increases at some rate > 0 (which makesupdates to y by any greedy algorithm (as Algorithm 2) feasible)Introduce a soft probabilistic condition to account for discontinuous functions.Experimental Efficacy:A Study [16] done at EPFL (by Shashwat et al., ) confirmed the efficacy of Keswanis Algorithm in addressing key limitations of traditional methods like GDA (Gradient Descent Ascent) and OMD (Online Mirror Descent), especially in avoiding non-convergent cycling. The study proposed following future research avenues:Explore stricter bounds for improved efficiency.Incorporate broader function categories to generalize findings.Test alternative optimizers to refine the algorithms robustness.The full study for different functions is as follows:Keswani's Algorithm for non-convex 2 player min-max optimisationKeswani's Algorithm for non-convex 2 player min-max optimisation www.slideshare.netReferences:[1] V. Keswani, O. Mangoubi, S. Sachdeva, and N. K. Vishnoi, A convergent and dimension-independent first-order algorithm for min-maxoptimization, arXiv preprint arXiv:2006.12376, 2020.[2] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, Generative adversarialnetworks, Communications of the ACM, vol. 63, no. 11, pp. 139144, 2020.[3] M. Arjovsky, S. Chintala, and L. Bottou, Wasserstein generative adversarial networks, pp. 214223, 2017.[4] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, Towards deep learning models resistant to adversarial attacks, arXivpreprint arXiv:1706.06083, 2017.[5] W. S. Cho and M. Wang, Deep primal-dual reinforcement learning: Accelerating actor-critic using bellman duality, arXiv preprintarXiv:1712.02467, 2017.[6] N. Cesa-Bianchi and G. Lugosi, Prediction, Learning, and Games. Cambridge University Press, 2006.[7] J. Shamma, Cooperative Control of Distributed Multi-Agent Systems. Wiley & Sons, Incorporated, John, 2008.[8] C. Jin, P. Netrapalli, and M. Jordan, What is local optimality in nonconvex-nonconcave minimax optimization? pp. 48804889, 2020.[9] Y. Wang, G. Zhang, and J. Ba, On solving minimax optimization locally: A follow-the-ridge approach, arXiv preprint arXiv:1910.07512,2019.[10] C. Daskalakis and I. Panageas, The limit points of (optimistic) gradient descent in min-max optimization, Advances in neural informationprocessing systems, vol. 31, 2018.[11] L. Mescheder, S. Nowozin, and A. Geiger, The numerics of gans, Advances in neural information processing systems, vol. 30, 2017.[12] D. Balduzzi, S. Racaniere, J. Martens, J. Foerster, K. Tuyls, and T. Graepel, The mechanics of n-player differentiable games, pp. 354363,2018.[13] D. M. Ostrovskii, B. Barazandeh, and M. Razaviyayn, Nonconvex-nonconcave min-max optimization with a small maximization domain,arXiv preprint arXiv:2110.03950, 2021.[14] J. Yang, N. Kiyavash, and N. He, Global convergence and variance reduction for a class of nonconvex-nonconcave minimax problems,Advances in Neural Information Processing Systems, vol. 33, pp. 11531165, 2020.[15] G. Zhang, Y. Wang, L. Lessard, and R. B. Grosse, Near-optimal local convergence of alternating gradient descent-ascent for minimaxoptimization, pp. 76597679, 2022.[16] S. Gupta, S. Breguel, M. Jaggi, N. Flammarion Non-convex min-max optimisation, https://vixra.org/pdf/2312.0151v1.pdfJoin thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AI
    0 Σχόλια 0 Μοιράστηκε 14 Views
  • TOWARDSAI.NET
    Evaluating and Monitoring LLM Agents: Tools, Metrics, and Best Practices
    Evaluating and Monitoring LLM Agents: Tools, Metrics, and Best Practices 0 like November 17, 2024Share this postAuthor(s): Chinmay Bhalerao Originally published on Towards AI. This blog includes the tools that you can use to monitor and assess the performance of the Agentic approachThis member-only story is on us. Upgrade to access all of Medium.Image created by author, Background image by Hollywood reporterImagine a team of virtual assistants collaborating to handle customer support queries seamlessly. Each assistant specializes in a specific task, ensuring accurate, efficient, and optimized responses. This is the essence of the agentic approach in LLMs.RAG or Retrieval-Augmented Generation pipelines are now integral parts of LLM applications. There are tools like Arize Phoenix, ragas, TrueLens, etc. that use a wide variety of metrics for the evaluation of RAGs. After the advancements in RAG pipelines, the Agentic approach has become a new approach for developing LLM applications. Everyone is eager to convert their existing or new products into agentic workflows. Its exciting to see fully capable LLMs who can interact with each other, engage in proper group chats, and collaboratively arrive at optimized and comprehensive solutions, with or without human intervention.Agents are orchestration platforms or tools in LLMs, designed to combine multiple LLMs or even with no LLMs to perform tasks with little to no human intervention. Each agent works autonomously on individual tasks but also can discuss, ask, brainstorm, and refine their work. We can use any LLM to create an Read the full blog for free on Medium.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post
    0 Σχόλια 0 Μοιράστηκε 12 Views
  • TOWARDSAI.NET
    Build a Chat-With-Document Application Using Python
    LatestMachine LearningBuild a Chat-With-Document Application Using Python 0 like November 17, 2024Share this postLast Updated on November 17, 2024 by Editorial TeamAuthor(s): Krishan Walia Originally published on Towards AI. Build your own Chat-With-Document Application using Python in the most guided and intuitive way.This member-only story is on us. Upgrade to access all of Medium.Not a member?Access the full article for free here.Imagine that you have to read and understand a large document about a completely bizarre topic, a topic thats completely out of your domain of knowledge, but due to some circumstances, you have to fully understand it, that too in a short span.Photo by Bimbingan Islam on UnsplashOr imagine being in a situation where you have to go through a large document to revise the facts and information, that you might have learned earlier.In all those cases, a chat-with-document application can be really helpful. It can generate summaries, highlight important information from the document, or even explain certain advanced topics in simpler and intuitive ways.And what about building your own, and even more what about adding such a project to your resume? Interesting, no?In this article, we will be building our own Chat-With-Document application in Python using certain packages, and trust me, its gonna be really easy and fun!The TOC of the article is given below, Chat With Document Application | Python | Python Project | LLM | AI | Installing the Packages Extracting text from the PDF file Splitting Read the full blog for free on Medium.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post
    0 Σχόλια 0 Μοιράστηκε 12 Views
  • TOWARDSAI.NET
    Building Multi-Agent AI Systems From Scratch: OpenAI vs. Ollama
    Building Multi-Agent AI Systems From Scratch: OpenAI vs. Ollama 0 like November 17, 2024Share this postLast Updated on November 17, 2024 by Editorial TeamAuthor(s): Isuru Lakshan Ekanayaka Originally published on Towards AI. This member-only story is on us. Upgrade to access all of Medium.Source: https://langfuse.com/blog/2024-07-ai-agent-observability-with-langfuseIn the dynamic realm of artificial intelligence, multi-agent systems have emerged as a transformative approach for addressing complex tasks through collaboration and specialization. By distributing responsibilities among distinct agents such as summarizing texts, generating content, and ensuring data privacy these systems enhance efficiency, accuracy, and reliability. This comprehensive guide explores the creation of two robust multi-agent AI systems from scratch using Python: one leveraging OpenAIs GPT-4 model and the other utilizing Ollamas open-source LLaMA 3.2:3b model. Both implementations are designed without relying on existing agent frameworks, offering a foundational understanding for developers eager to master AI agent architectures.IntroductionOpenAI-Based Multi-Agent SystemOllama-Based Multi-Agent SystemConclusionGitHub Repositories and InstallationMulti-agent systems in AI involve multiple specialized agents working collaboratively to achieve intricate objectives. By distributing tasks among agents with distinct roles such as summarizing texts, generating content, and ensuring data privacy these systems enhance efficiency, accuracy, and reliability. This guide delves into the development of two such systems: one powered by OpenAIs GPT-4 and the other by Ollamas LLaMA 3.2:3b model. Both implementations prioritize transparency and educational value, enabling beginners to grasp the fundamentals of AI agent construction without relying on Read the full blog for free on Medium.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post
    0 Σχόλια 0 Μοιράστηκε 13 Views
  • TOWARDSAI.NET
    AI-Powered Art Generator: Opensource And Free
    LatestMachine Learning AI-Powered Art Generator: Opensource And Free 0 like November 16, 2024Share this postLast Updated on November 16, 2024 by Editorial TeamAuthor(s): Anoop Maurya Originally published on Towards AI. This member-only story is on us. Upgrade to access all of Medium.Image Generated By Author using this ToolIts 2 AM, and youre deep in your world of code, brainstorming ideas, or maybe just procrastinating. A random thought hits you: What if I could visualize this weird concept? A cyberpunk city floating on a massive jellyfish or an alien invasion reimagined as a Renaissance painting. You know its impossible to draw it yourself, but what if you could just describe it and watch the magic happen?Enter AI-powered art generators the ultimate tool for nerds with a vivid imagination and a love for tech. With models like Stable Diffusion, you can now create stunning, surreal, or downright absurd visuals by typing out a text prompt. And the best part? Its all done with the power of open-source AI, accessible to anyone whos ready to nerd out and experiment.Sure, there are plenty of ready-to-use tools out there, but theres something uniquely satisfying about building your own system. Heres why its worth it: Total Control: You decide how it works customize outputs, tweak settings, and experiment with new features. Open Source: Its all yours to explore, modify, and make better. No locked Read the full blog for free on Medium.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post
    0 Σχόλια 0 Μοιράστηκε 13 Views
  • TOWARDSAI.NET
    ChatGPT Gets Windows App
    ChatGPT Gets Windows App 0 like November 16, 2024Share this postLast Updated on November 16, 2024 by Editorial TeamAuthor(s): Get The Gist Originally published on Towards AI. Plus: Elon Musk Sues OpenAIThis member-only story is on us. Upgrade to access all of Medium.Welcome to Get The Gist, where every weekday we share an easy-to-read summary of the latest and greatest developments in AI news, innovations, and trends all delivered in under 5 minutes! In todays edition:ChatGPT Desktop App Launches for WindowsElon Musk Sues OpenAI Over Alleged Breach of Founding MissionAnthropic Partners with US Government to Secure AI Against Nuclear RisksNew Gemini Model Beats ChatGPT-4oAnd more AI news.Image by: MSPowerUserThe Gist: ChatGPT is now available for all Windows users via the Microsoft Store, while macOS gains beta integration with developer tools for paid users, expanding its usability across platforms.Key Details:Windows users can now download ChatGPT for free, featuring web search, text-to-image tools, and lifelike voice interactions.macOS users can test a beta feature that connects ChatGPT with tools like VS Code and Terminal, enabling seamless workflow integration.The Windows app offers a shortcut (Alt + Space) for quick access, while the macOS app requires a direct download from OpenAIs website.OpenAI plans to expand the macOS apps compatibility with more software in the near future.Image by: BloombergThe Gist: Elon Musk has filed a lawsuit against OpenAI, accusing it of deviating from its founding Read the full blog for free on Medium.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post
    0 Σχόλια 0 Μοιράστηκε 14 Views
  • TOWARDSAI.NET
    Cubic Roots-Fit a Quadratic Between a Turning Point And Midpoint!
    LatestMachine LearningCubic Roots-Fit a Quadratic Between a Turning Point And Midpoint! 0 like November 16, 2024Share this postAuthor(s): Greg Oliver Originally published on Towards AI. A Root Approximation Tool Kit Mixing and Matching Polynomial ArchitecturesGenetic Cubic Architectural DimensionsThis member-only story is on us. Upgrade to access all of Medium.This post presents a novel Cubic-Quadratic function matchup for finding Cubic roots. It exploits the little publicised fact that the Midpoint between 2 adjacent roots of a reduced Cubic when multiplied by -2 gives us the 3rd root!This is related to the sum of the factors = Coefficient B of x. In the example B=0 being a reduced Cubic.Besides being graphically intuitive the adopted Quadratic function greatly simplifies Cubic function redesign with varying Constants D, because its a lot easier to find Quadratic roots with changing Constants c than Cubic roots with changing Constants D.This post assumes math at the year 12 level.Before doing a couple of examples, lets do a brief recap on genetic Cubic architecture.Cubic Architecture RecapThe header graph shows reduced Cubic y=Ax+Cx+D and its genetic dimensions shown in black. It is rotationally symmetrical about its Inflection Point Ip(0, y)=Constant D; (Imaginary propellor shaft ?):):). It has y=Ip(y) intercepts as follows:Int A(x)= SqRt[-C/A] and Int B(x)= + SqRt[-C/A] with Midpoints:Midpoint (Int A : Ip(0, D)=Int A(x)-SqRt[-C/4A] and +SqRT[-C/4A] (not shown)And Turning Points Tp(x)=+-SqRt[-C/3A]Roots Rt 1, Rt 2 with Root Midpoint; Mid Point (Rt Read the full blog for free on Medium.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post
    0 Σχόλια 0 Μοιράστηκε 11 Views
  • TOWARDSAI.NET
    Web-LLM Assistant: Bridging Local AI Models With Real-Time Web Intelligence
    Web-LLM Assistant: Bridging Local AI Models With Real-Time Web Intelligence 0 like November 16, 2024Share this postAuthor(s): Isuru Lakshan Ekanayaka Originally published on Towards AI. Top highlightThis member-only story is on us. Upgrade to access all of Medium.In the dynamic realm of artificial intelligence, the ability to access and synthesize real-time information is paramount. Traditional large language models (LLMs) like ChatGPT excel in generating human-like text based on extensive training data. However, their knowledge is static, often limited to information available up to their last update. Enter Web-LLM Assistant, an innovative open-source project designed to overcome this limitation by integrating local LLMs with real-time web searching capabilities. This comprehensive guide delves into the functionalities, installation process, and practical demonstrations of Web-LLM Assistant, inspired by its GitHub repository .Image sourceIntroductionWhat is Web-LLM Assistant?Key FeaturesInstallation GuideUsage InstructionsDemonstration WalkthroughConfiguration OptionsDependenciesContributing to Web-LLM AssistantLicenseAcknowledgmentsDisclaimerPersonal Journey Behind Web-LLM AssistantConclusionAs the AI landscape continues to evolve, the demand for models that can provide up-to-date information grows. Web-LLM Assistant is a pioneering project that addresses this need by combining the strengths of local LLMs with the vast, ever-changing data available on the web. Whether youre a developer looking to integrate intelligent search capabilities into your applications or an AI enthusiast eager to explore cutting-edge technologies, Web-LLM Assistant offers a versatile and powerful solution.Web-LLM Assistant is a sophisticated web search assistant that leverages Read the full blog for free on Medium.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post
    0 Σχόλια 0 Μοιράστηκε 11 Views
  • TOWARDSAI.NET
    Taming the Oracle: Key Principals That Bring Our LLM Agents to Production
    Taming the Oracle: Key Principals That Bring Our LLM Agents to Production 1 like November 15, 2024Share this postAuthor(s): Nate Liebmann Originally published on Towards AI. A Tame Oracle. Generated with Microsoft DesignerWith the second anniversary of the ChatGPT earthquake right around the corner, the rush to build useful applications based on large language models (LLMs) of its like seems to be in full force. But despite the aura of magic surrounding demos of LLM agents or involved conversations, I am sure many can relate to my own experience developing LLM-based applications: you start with some example that seems to be working great, but buyers remorse is soon to follow. Trying out other variations of the task could simply fail miserably, without a clear differentiator; and agentic flows could reveal their tendency to diverge when straying away from the original prototyping happy path.If not for the title, you might have thought at this point I was a generative AI luddite, which could not be further from the truth. The journey my team at Torq and I have been on in the past two years, developing LLM-based software features that enhance the no-code automation building experience on our platform, has taught me a lot about the great power LLMs bring if handled correctly.From here on I will discuss three core principals that guide our development and allow our agents to reach successful production deployment and customer utility. I believe they are highly relevant to other LLM based applications just as much. The least freedom principleLLMs interact through free-text, but its not always the way our users will interact with our LLM-based application. In many cases, even if the input is indeed a textual description provided by the user, the output is much more structured, and could be used to take actions in the application automatically. In such a setting, the great power in the LLMs ability to solve some tasks otherwise requiring massive and complex deterministic logic or human intervention can turn into a problem. The more leeway we give the LLM, the more prone our application is to hallucinations and diverging agentic flows. Therefore, a-la the least privileges principle in security, I believe its important to constrain the LLM as much as possible.Fig. 1: The unconstrained, multi-step agentic flowConsider an agent that takes a snapshot of a hand-written grocery list, extracts the text via OCR, locates the most relevant items in stock, and prepares an order. It may sound tempting to opt for a flexible multi-step agentic flow where the agent can use methods such as search_product and add_to_order (see fig. 1 above). However, this process could turn out to be very slow, consist of superfluous steps, and might even get stuck in a loop in case some function call returns an error the model struggles with recovering from. An alternative approach could constrain the flow to two steps, the first being a batch search to get a filtered product tree object, and the second being generating the order based on it, referencing appropriate products from the partial product tree returned by the search function call (see fig. 2 below). Apart from the clear performance benefit, we can be much more confident the agent will remain on track and complete the task.Fig. 2: A structured agentic flow with deterministic auto-fixingWhen dealing with problems in the generated output, I believe its best to do as much of the correction deterministically, without involving the LLM again. This is because against our intuition, sending an error back to an LLM agent and asking it to correct it does not always get it back on track, and might even increase the likelihood of further errors, as some evidence has shown. Circling back to the grocery shopping agent, it is very likely that in some cases invalid JSON paths will be produced to refer to products (e.g., food.cheeses.goats[0] instead of food.dairy.cheeses.goat[0]). As we have the entire stock at hand, we can apply a simple heuristic to automatically fix the incorrect path in a deterministic way, for example by using an edit distance algorithm to find the valid path closest to the generated one in the product tree. Even then, some invalid paths might be too far from any valid ones. In such a case, we might want to simply retry the LLM request rather than adding the error to the context and asking it to fix it. Automated empirical evaluationUnlike traditional 3rd-party APIs, calling an LLM with the exact same input could produce different results each time, even when setting the temperature hyper-parameter to zero. This is in direct conflict with fundamental principals of good software engineering, that is supposed to give the users an expected and consistent experience. The key to tackling this conflict is automated empirical evaluation, which I consider the LLM edition of test-driven development.The evaluation suite can be implemented as a regular test suite, which has the benefit of natural integration into the development cycle and CI/CD pipelines. Crucially, however, the LLMs must be actually called, and not mocked, of course. Each evaluation case consists of user inputs and initial system state, as well as a grading function for the generated output or modified state. Unlike traditional test cases, the notion of PASS or FAIL is insufficient here, because the evaluation suite plays an important role in guiding improvements and enhancements, as well as catching unintended degradations. The grading function should therefore return a fitness score for the output or state modifications our agent produces. How do we actually implement the grading function? Think, for example, of a simple LLM task for generating small Python utility functions. An evaluation case could prompt it to write a function that computes the nth element of the Fibonacci sequence. The models implementation might take either the iterative or the recursive path, both valid (though suboptimal, because there is a closed form expression), so we cannot make assertions about the specifics of the functions code. The grading function in this case could, however, take a handful of test values for the Fibonacci functions argument, spin up an isolated environment, run the generated function on those values, and verify the results. This black-box grading of the produced output does not make unnecessary assumptions, while strictly validating it in a fully deterministic fashion.While I believe that should be the preferred approach, it is not suitable for all applications. There are cases where we cannot fully validate the result, but we can still make assertions about some of its properties. For example, consider an agent that generates short summaries of system logs. Some properties of its outputs, like length, are easy to check deterministically. Other, semantic ones, not as much. If the otherwise business-as-usual logs serving as input for an evaluation case contain a single record about a kernel panic, we want to make sure the summary will mention that. A naive approach for the grading function in this case will involve an LLM task that directly produces a fitness score for the summary based on the log records. This approach might lock our evaluation in a sort of LLM complacency loop, with none of the guarantees provided by deterministic checks. A more nuanced approach, however, could still use an LLM for grading, but craft the task differently: given a summary, the model could be instructed to answer multiple-choice factual questions (e.g. Has there been a major incident in the covered period? (a) No (b) Yes, a kernel panic (c) Yes, a network connectivity loss). We can be much more confident that the LLM would simply not be able to consistently answer such questions correctly if the key information is missing from the summary, making the score much more reliable.Finally, due to non-determinism, each evaluation case must be run several times, with the results aggregated to form a final evaluation report. I have found it very useful to implement the evaluation suite early and use it to guide our development. Once the application has reached some maturity, it could make sense to fail the integration pipeline if the aggregate score for its evaluation suite drops below some set threshold, to prevent catastrophic degradations. Not letting the tail wag the dogGood LLM-based software is, first and foremost, good software. The magic factor we see in LLMs (which is telling of human nature and the role language plays in our perception of other intelligent beings, a topic I will not cover here of course) might tempt us to think about LLM-based software as a whole new field, requiring novel tools, frameworks and development processes. As discussed above, the non-deterministic nature of commercial LLMs, as well as their unstructured API, indeed necessitate dedicated handling. But I would argue that instead of looking at LLM-based application as a whole new creature that might here and there utilise familiar coding patterns we should treat such an application as any other application, except for where it is not. The power of this approach lies in the fact that by doing so, we do not let external abstractions hide away the low-level LLM handling, which is crucial for truly understanding its capabilities and limitations in the scope of our application. Abstractions can and should be adopted where they save time and reduce boilerplate code, but never at the cost of losing control over the most important part of your application: the intricate touchpoints between the LLM and your deterministic code, that should be tailored to your specific use case.Wrapping up, LLMs can be viewed as powerful oracles that enable previously-unfeasible applications. My experience developing LLM based agents has taught me several principles that correlated with successful production deployment and utility. Firstly, agents should be given the least possible freedom: flows should be structured, and whatever can be done deterministically should be. Secondly, automated empirical evaluation of the LLM task and surrounding logic should be a cornerstone of the development process, relying as much as possible on deterministic scoring. Thirdly, abstractions provided by libraries and frameworks should not be adopted where they hide essential details of the integration between the LLM and our code, the core of LLM-based applications.Feel free to reach out to discuss this matter further and tell me what you think!Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post
    0 Σχόλια 0 Μοιράστηκε 14 Views
  • TOWARDSAI.NET
    How Googles Watermarking Technology Identifies AI-Generated Content
    Author(s): Lamprini Papargyri Originally published on Towards AI. In October 2024, Google DeepMinds SynthID tool for watermarking AI-generated text was released as open-source, marking a significant step forward in AI transparency. This tool emerged in response to growing concerns about distinguishing AI-generated content, as tools like OpenAIs ChatGPT and Googles Gemini now produce text, images, and even audio that are increasingly difficult to differentiate from human-made content. With policymakers and civil society demanding reliable identification of AI content, SynthID represents an important development in addressing issues around AI-driven misinformation and authenticity.Notably, the European Digital Education Hub (EDEH) and its Explainable AI squad have played a crucial role in advancing AI transparency in educational settings. Explainable AI (XAI) refers to AI systems that clearly reveal how decisions and recommendations are made, rather than functioning as a black box with hidden processes. Through collaboration with tech companies and organizations, they aim to promote digital literacy and enhance transparency across Europes educational and public sectors, fostering ethical AI practices and building trust in both educational and digital environments.Community workshop on explainable AI (XAI) in education.Evaluating AI Detection Tools: Key Technical and Policy CriteriaThe rapid advancement of generative AI has created an urgent need for tools that can reliably detect AI-generated content. The effectiveness of any detection tool hinges on a set of essential technical and policy criteria:Accuracy: A detection tool should reliably distinguish between human-made and AI-generated content, with minimal false positives and negatives. For transparence and explainability purposes, the tool should provide nuanced responses (e.g., a probability score) rather than a simple binary answer.Robustness Against Evasion: Detection methods should withstand tampering or manipulation, as motivated actors might attempt to alter AI content to make it appear human-made, such as through paraphrasing or translation.Quality Preservation: Detection techniques should avoid diminishing the quality of AI-generated content. Tools that intentionally degrade quality to make content detectable may deter adoption by developers focused on user experience.Universality and Privacy: Ideally, a detection tool should be universal, meaning it can apply to any AI model without requiring active cooperation from the developer. Privacy is equally important; any detection method should respect user data privacy.Main Aspects of WatermarkingWatermarking involves embedding identifiable markers in content to indicate its origin, a method long used in digital media like photos and audio. With AI, watermarking has gained traction as a viable way to mark content for later identification, addressing authenticity concerns. Here are some key watermarking techniques and how they fare in theory and practice:Statistical Watermarking: Embeds statistically unusual patterns in text or other content to create a subtle, machine-readable signature.Advantages: Allows for subtle identification without compromising readability and works well with light modifications.Limitations: Sensitive to extensive changes (e.g., paraphrasing, translation), which can remove or weaken the watermark.Visible and Invisible Watermarks: Visible watermarks, such as logos or labels, are immediately recognizable but can disrupt user experience. Invisible watermarks embed patterns within content that are undetectable by users but can be identified by specialized detection tools.Advantages: Invisible watermarks avoid altering the contents appearance, providing a seamless user experience.Limitations: Advanced users may be able to remove or alter these markers, especially if they understand the watermarking method.Googles SynthID uses a statistical watermarking approach to subtly alter token probabilities during text generation, leaving an invisible, machine-readable signature. SynthIDs invisible watermark preserves content quality while marking AI-generated materialOverview of AI Detection ApproachesRetrieval-Based Approach: This method involves creating and maintaining a database of all generated content so that new text can be checked against it for matches.Advantages: Effective for detecting exact matches and is reliable for specific high-value use cases.Disadvantages: Requires massive storage and continuous updates, raising scalability and privacy concerns. Retrieval-based systems can be impractical at large scales.2. Post-Hoc Detection: This technique applies machine learning classifiers to text after it is generated, assessing characteristics typical of AI-written versus human-written material. It relies on analyzing patterns in syntax, word choice, and structure.Advantages: Post-hoc detection doesnt interfere in text creation and is flexible across different AI models.Disadvantages: Computationally demanding, with inconsistent performance on out-of-domain or highly edited content. Detection accuracy can decrease significantly when content undergoes substantial changes.3. Text Watermarking: SynthID falls into this category, which embeds markers directly within the generated text at the time of creation. Text watermarking has several subcategories:3.1 Generative Watermarking: Adjusts token probabilities during text generation to introduce an invisible signature without altering the texts quality.Advantages: Maintains readability and is robust against minor edits; minimal impact on text quality.Disadvantages: Vulnerable to substantial edits, like extensive rephrasing or translations, which may remove the watermark.3.2 Edit-Based Watermarking: Alters text after its generated by adding specific characters or symbols.Advantages: Easily detectable and quick to implement.Disadvantages: Visibly changes the text, potentially affecting readability and user experience.3.3 Data-Driven Watermarking: Embeds watermarks in the training data so that certain sequences or phrases appear only when prompted.Advantages: Effective for deterring unauthorized use when integrated from the training stage.Disadvantages: Limited to specific prompts, with visible markers that may compromise subtlety.SynthID uses generative watermarking to subtly embed markers during text generation, ensuring an undetectable signature while preserving the texts quality. This approach strikes a balance between detection and usability, marking a significant advancement in watermarking for AI.How SynthID WorksSynthIDs watermarking technology employs two neural networks to embed and detect an invisible watermark. For text, this mechanism works by subtly modifying token probabilities during text generation. Large language models (LLMs) generate text one token at a time, assigning each token a probability based on context. SynthIDs first network makes small adjustments to these probabilities, creating a watermark signature that remains invisible and maintains the texts readability and fluency.For images, the first neural network modifies a few pixels in the original image to embed an undetectable pattern. The second network then scans for this pattern in both text and images, allowing it to inform users whether it detects a watermark, suspects one, or finds none.The watermark detection process compares the probability distributions of watermarked and unwatermarked text, identifying the signature left by the watermark. Through large-scale testing, Google DeepMind confirmed SynthIDs effectiveness: in the Gemini app, where over 20 million users unknowingly rated watermarked and unwatermarked text, the feedback showed no noticeable quality difference between the two. This suggests that SynthIDs watermarking process is effective without compromising the texts fluency or usability.SynthID utilizes two neural networks to embed and detect watermarks in images. The first network processes the original image, generating a nearly identical version with slight modifications to a few pixels, embedding a pattern that remains invisible to the human eye. The second network then scans for this pattern, indicating to users whether a watermark is detected, likely present, or absent.Strengths and Limitations of SynthID and WatermarkingSynthIDs invisible watermarking approach provides a powerful tool for marking AI-generated content, yet it faces challenges, particularly as part of a comprehensive solution for AI transparency. Key strengths and limitations include:SynthIDs watermark is resilient with minor changes, such as slight paraphrasing or cropping, making it robust for lightly modified content.SynthID struggles with highly predictable outputs, such as factual statements (e.g., The capital of France is Paris) or code, where the watermark cannot be embedded without affecting accuracy.While effective against casual modifications, SynthIDs watermark could be compromised by users with knowledge of its workings, particularly in cases where sophisticated adversaries aim to remove or obscure the watermark.Given these limitations, SynthID works best when paired with other detection methods. Combining it with retrieval-based or post-hoc methods could enhance overall detection accuracy and resilience, especially in high-stakes applications like education or misinformation detection.Policy and Governance Considerations for WatermarkingSynthIDs deployment as an open-source tool is part of a larger trend toward establishing AI transparency standards. Policymakers are exploring ways to promote accountability, including watermarking requirements in laws and international agreements. Effective governance of AI watermarking requires attention to several key considerations: As watermarking research advances, standardized techniques will help align different stakeholders and make AI transparency measures more consistent. A centralized organization could manage a registry of watermarking protocols, simplifying detection by providing a standardized platform for users to verify content provenance. Policymakers must ensure watermarking methods respect user privacy and data security. This includes defining what information can be embedded in watermarks and regulating data handling by third-party detection services.A balanced, layered approach that combines multiple detection methods may be the most practical strategy for addressing the complex challenges posed by generative AI content.Conclusion: SynthIDs Role in Building AI TransparencySynthID is another step forward in AI transparency, but watermarking alone cannot guarantee full accountability for AI-generated content. As AI becomes increasingly skilled at producing realistic text, images, and media, a multi-layered approach is essential for content verification. SynthID provides a starting point, giving users a means of identifying AI-generated material and discouraging misuse. However, it should ideally be part of a larger ecosystem of checks and balances to ensure robust AI accountability.For true content authenticity, additional safeguards should be explored. Fact-checking, for instance, can help verify information accuracy, while standardized content verification frameworks would ensure consistent detection across platforms and tools. Additionally, regulatory measures could help ensure that AI-generated content is labeled and traceable, empowering users to assess the credibility and origin of the information they encounter.In this evolving landscape, SynthID can serve as a tool for AI transparency by offering users a reliable method of distinguishing between human and AI-generated content. As watermarking and complementary approaches become widely adopted, we may see the emergence of a more transparent and accountable digital ecosystem that encourages responsible AI practices. By equipping users with tools to verify the authenticity of digital content, SynthID and similar technologies can contribute to a safer, more trustworthy online environment.Interested to learn more about SynthID? Read here the article.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AI
    0 Σχόλια 0 Μοιράστηκε 14 Views
  • TOWARDSAI.NET
    Build a Local CSV Query Assistant Using Gradio and LangChain
    Build a Local CSV Query Assistant Using Gradio and LangChain 0 like November 15, 2024Share this postLast Updated on November 15, 2024 by Editorial TeamAuthor(s): Vikram Bhat Originally published on Towards AI. This member-only story is on us. Upgrade to access all of Medium.In this blog, well walk through creating an interactive Gradio application that allows users to upload a CSV file and query its data using a conversational AI model powered by LangChains create_pandas_dataframe_agent and Ollama's Llama 3.2. This guide will focus on building a local application where the user can upload CSVs, ask questions about the data, and receive answers in real-time.You can find the complete code for this application in the GitHub repository.Gradio is a powerful alternative to Streamlit, offering many new features that make building machine learning applications easy. Gradio excels with simple interfaces and impressive integration capabilities. Some standout features include native support for various data types (such as images, audio, and text), dynamic UI updates, and easy integration with popular libraries like TensorFlow, PyTorch, and LangChain.In this tutorial, we leverage LangChains experimental create_pandas_dataframe_agent, which allows us to analyze simple CSVs without the need to implement complex Retrieval-Augmented Generation (RAG) systems. This makes it ideal for users who want to quickly query CSV data in a conversational manner without the overhead of building a full-fledged RAG system.Additionally, Ollama enables us to run the entire system locally, using Read the full blog for free on Medium.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post
    0 Σχόλια 0 Μοιράστηκε 14 Views
  • TOWARDSAI.NET
    Build the Smallest LLM From Scratch With Pytorch (And Generate Pokmon Names!)
    Author(s): Tapan Babbar Originally published on Towards AI. Source: Image by AuthorSo, there I was, toying with a bunch of Pokmon-inspired variations of my cats name trying to give it that unique, slightly mystical vibe. After cycling through names like Flarefluff and Nimblepawchu, it hit me: why not go full-on AI and let a character-level language model handle this? It seemed like the perfect mini-project, and what better way to dive into character-level models than creating a custom Pokmon name generator?Beneath the vast complexity of large language models (LLMs) and generative AI lies a surprisingly simple core idea: predicting the next character. Thats really it! Every incredible model from conversational bots to creative writers boils down to how well they anticipate what comes next. The magic of LLMs? Its in how they refine and scale this predictive ability. So, lets strip away the hype and get to the essence.Were not building a massive model with millions of parameters in this guide. Instead, were creating a character-level language model that can generate Pokmon-style names. Heres the twist: our dataset is tiny, with only 801 Pokmon names! By the end, youll understand the basics of language modeling and have your own mini Pokmon name generator in hand.Heres how each step is structured to help you follow along:Goal: A quick overview of what were aiming to achieve.Intuition: The underlying idea no coding required here.Code: Step-by-step PyTorch implementation.Code Explanation: Breaking down the code so its clear whats happening.If youre just here for the concepts, skip the code youll still get the big picture. No coding experience is necessary to understand the ideas. But if youre up for it, diving into the code will help solidify your understanding, so I encourage you to give it a go!The Intuition: From Characters to NamesImagine guessing a word letter by letter, where each letter gives you a clue about whats likely next. You see Pi, and your mind jumps to Pikachu because ka often follows Pi in the Pokmon world. This is the intuition well teach our model, feeding it Pokmon names one character at a time. Over time, the model catches on to this naming styles quirks, helping it generate fresh names that sound Pokmon-like.Ready? Lets build this from scratch in PyTorch!Step 1: Teaching the Model Its First AlphabetGoal:Define the alphabet of characters the model can use and assign each character a unique number.Intuition:Right now, our model doesnt know anything about language, names, or even letters. To it, words are just a sequence of unknown symbols. And heres the thing: neural networks understand only numbers its non-negotiable! So, to make sense of our dataset, we need to assign a unique number to each character.In this step, were building the models alphabet by identifying every unique character in the Pokmon names dataset. This will include all the letters, plus a special marker to signify the end of a name. Each character will be paired with a unique identifier, a number that lets the model understand each symbol in its own way. This gives our model the basic building blocks for creating Pokmon names and helps it begin learning which characters tend to follow one another.With these numeric IDs in place, were setting the foundation for our model to start grasping the sequences of characters in Pokmon names, all from the ground up!import pandas as pdimport torchimport stringimport numpy as npimport reimport torch.nn.functional as Fimport matplotlib.pyplot as pltdata = pd.read_csv('pokemon.csv')["name"]words = data.to_list()print(words[:8])#['bulbasaur', 'ivysaur', 'venusaur', 'charmander', 'charmeleon', 'charizard', 'squirtle', 'wartortle']# Build the vocabularychars = sorted(list(set(' '.join(words))))stoi = {s:i+1 for i,s in enumerate(chars)}stoi['.'] = 0 # Dot represents the end of a worditos = {i:s for s,i in stoi.items()}print(stoi)#{' ': 1, 'a': 2, 'b': 3, 'c': 4, 'd': 5, 'e': 6, 'f': 7, 'g': 8, 'h': 9, 'i': 10, 'j': 11, 'k': 12, 'l': 13, 'm': 14, 'n': 15, 'o': 16, 'p': 17, 'q': 18, 'r': 19, 's': 20, 't': 21, 'u': 22, 'v': 23, 'w': 24, 'x': 25, 'y': 26, 'z': 27, '.': 0}print(itos)#{1: ' ', 2: 'a', 3: 'b', 4: 'c', 5: 'd', 6: 'e', 7: 'f', 8: 'g', 9: 'h', 10: 'i', 11: 'j', 12: 'k', 13: 'l', 14: 'm', 15: 'n', 16: 'o', 17: 'p', 18: 'q', 19: 'r', 20: 's', 21: 't', 22: 'u', 23: 'v', 24: 'w', 25: 'x', 26: 'y', 27: 'z', 0: '.'}Code Explanation:We create stoi, which maps each character to a unique integer.The itos dictionary reverses this mapping, allowing us to convert numbers back into characters.We include a special end-of-word character (.) to indicate the end of each Pokmon name.Step 2: Building Context with N-gramsGoal:Enable the model to guess the next character based on the context of preceding characters.Intuition:Here, were teaching the model by building a game: guess the next letter! The model will try to predict what comes next for each character in a name. For example, when it sees Pi, it might guess k next, as in Pikachu. Well turn each name into sequences where each character points to its next one. Over time, the model will start spotting familiar patterns that define the style of Pokmon names.Well also add a special end-of-name character after each name to let the model know when its time to wrap up.Character N-grams. Source: Image by AuthorThis example shows how we use a fixed context length of 3 to predict each next character in a sequence. As the model reads each character in a word, it remembers only the last three characters as context to make its next prediction. This sliding window approach helps capture short-term dependencies but feel free to experiment with shorter or longer context lengths to see how it affects the predictions.block_size = 3 # Context lengthdef build_dataset(words): X, Y = [], [] for w in words: context = [0] * block_size # start with a blank context for ch in w + '.': ix = stoi[ch] X.append(context) Y.append(ix) context = context[1:] + [ix] # Shift and append new character return torch.tensor(X), torch.tensor(Y)X, Y = build_dataset(words[:int(0.8 * len(words))])print(X.shape, Y.shape) # Check shapes of training dataCode Explanation:Set Context Length: block_size = 3 defines the context length, or the number of preceding characters used to predict the next one.Create build_dataset Function: This function prepares X (context sequences) and Y (next character indices) from a list of words.Initialize and Update Context: Each word starts with a blank context [0, 0, 0]. As characters are processed, the context shifts forward to maintain the 3-character length.Store Input-Output Pairs: Each context (in X) is paired with the next character (in Y), building a dataset for model training.Convert and Check Data: Converts X and Y to tensors, preparing them for training, and checks their dimensions. This dataset now captures patterns in character sequences for generating new names.Step 3: Building the Neural NetworkGoal:Train the model by predicting each next character and adjusting weights based on prediction accuracy.Intuition:Heres where it gets interesting! Well create a simple setup with three layers that work together to predict the next letter based on the previous three. Again, think of it like guessing letters in a word game: each time the model gets it wrong, it learns from the mistake and adjusts, improving with each try.As it practices on real Pokmon names, it gradually picks up the style and patterns that make these names unique. Eventually, after going over the list enough times, it can come up with new names that have that same Pokmon vibe!# Initialize parametersg = torch.Generator()C = torch.randn((27, 10), generator=g)W1 = torch.randn((30, 200), generator=g)b1 = torch.randn(200, generator=g)W2 = torch.randn((200, 27), generator=g)b2 = torch.randn(27, generator=g)parameters = [C, W1, b1, W2, b2]for p in parameters: p.requires_grad = Truefor i in range(100000): ix = torch.randint(0, X.shape[0], (32,)) emb = C[X[ix]] h = torch.tanh(emb.view(-1, 30) @ W1 + b1) logits = h @ W2 + b2 loss = F.cross_entropy(logits, Y[ix]) for p in parameters: p.grad = None loss.backward() for p in parameters: p.data -= 0.1 * p.gradCode Explanation:We initialize weights and biases for the embedding layer (C) and two linear layers (W1, W2) with random values.Each parameter is set to requires_grad=True, enabling backpropagation, which adjusts these parameters to minimize prediction errors.We select a mini-batch of 32 random samples from the training data (Xtr), allowing us to optimize the model more efficiently by processing multiple examples at once.For each batch, we use embeddings, and pass them through the hidden layer (W1) with tanh activation, and calculate logits for output.Using cross-entropy loss, the model learns to reduce errors and improve predictions with each step.Training the model. Source: Image by AuthorStep 4: Finding the Probability of the Next CharacterGoal:To generate new Pokmon names by predicting one character at a time based on the input sequence, using the models learned probabilities.Intuition:During training, it optimized its weights to capture the likelihood of each character following another in typical Pokmon names. Now, using these learned weights (W1, W2, b1, b2), we can generate entirely new names by predicting one character at a time. At this step, were making our model guess the next letter that should follow a given sequence, such as pik.The model doesnt directly understand letters, so the input characters are first converted into numbers representing each character. These numbers are then padded to match the required input size and fed into the models layers. The layers are like filters trained to predict what typically follows each character. After passing through these layers, the model provides a list of probabilities for each possible character it might select next, based on what its learned from the Pokmon names dataset. This gives us a weighted list of potential next characters, ranked by likelihood.Source: Image by AuthorIn the example above, you can see that the characters a and i have a high likelihood of following the sequence pik.input_chars = "pik" # Example input to get probabilities of next characters# Convert input characters to indices based on stoi (character-to-index mapping)context = [stoi.get(char, 0) for char in input_chars][-block_size:] # Ensure context fits block sizecontext = [0] * (block_size - len(context)) + context # Pad if shorter than block size# Embedding the current contextemb = C[torch.tensor([context])]# Pass through the network layersh = torch.tanh(emb.view(1, -1) @ W1 + b1)logits = h @ W2 + b2# Compute the probabilitiesprobs = F.softmax(logits, dim=1).squeeze() # Squeeze to remove unnecessary dimensions# Print out the probabilities for each characternext_char_probs = {itos[i]: probs[i].item() for i in range(len(probs))}Code Explanation:We convert the context indices into an embedded representation, a numerical format that can be fed into the model layers.We use the models layers to transform the embedded context. The hidden layer (h) processes it, and the output layer (logits) computes scores for each possible character.Finally, we apply the softmax function to the logits, giving us a list of probabilities. This probability distribution is stored in next_char_probs, mapping each character to its likelihood.Step 5: Generating New Pokmon NamesGoal:Using the probabilities from Step 4, we aim to generate a complete name by selecting each next character sequentially until a special end-of-name marker appears.Intuition:The model has learned typical character sequences from Pokmon names and now applies this by guessing each subsequent letter based on probabilities. It keeps selecting characters this way until it senses the name is complete. Some generated names will fit the Pokmon style perfectly, while others might be more whimsical capturing the creative unpredictability that fascinates generative models. Here are a few names generated by our model:dwebblesimikyubaltarillpupidonburrsolapatranmeowomankwormantisbuneglisawhirlixhydolaudinjadiglerskipedenneoncontext = [0] * block_sizefor _ in range(20): out = [] while True: emb = C[torch.tensor([context])] h = torch.tanh(emb.view(1, -1) @ W1 + b1) logits = h @ W2 + b2 probs = F.softmax(logits, dim=1) ix = torch.multinomial(probs, num_samples=1, generator=g).item() context = context[1:] + [ix] out.append(ix) if ix == 0: break print(''.join(itos[i] for i in out))Code Explanation:Using softmax on logits, we get probabilities for each character.torch.multinomial chooses a character based on these probabilities, adding variety and realism to generated names.Thats it! You can even experiment by starting with your name as a prefix and watch the model transform it into a Pokmon-style name.Future ImprovementsThis model offers a basic approach to generating character-level text, such as Pokmon names, but its far from production-ready. Ive intentionally simplified the following aspects to focus on building intuition, with plans to expand on these concepts in a follow-up article.Dynamic Learning Rate: Our current training setup uses a fixed learning rate of 0.1, which might limit convergence efficiency. Experimenting with a dynamic learning rate (e.g., reducing it as the model improves) could yield faster convergence and better final accuracy.Overfitting Prevention: With a relatively small dataset of 801 Pokmon names, the model may start to memorize patterns rather than generalize. We could introduce techniques like dropout or L2 regularization to reduce overfitting, allowing the model to better generalize to unseen sequences.Expanding Context Length: Currently, the model uses a fixed block_size (context window) that may limit it from capturing dependencies over long sequences. Increasing this context length would allow it to better understand patterns over longer sequences, creating names that feel more complex and nuanced.Larger Dataset: The models ability to generalize and create more diverse names is limited by the small dataset. Training on a larger dataset, possibly including more fictional names from different sources, could help it learn broader naming conventions and improve its creative range.Temperature Adjustment: Experiment with the temperature setting, which controls the randomness of the models predictions. A lower temperature will make the model more conservative, choosing the most likely next character, while a higher temperature encourages creativity by allowing more varied and unexpected choices. Fine-tuning this can help balance between generating predictable and unique Pokmon-like names.Final Thoughts: Gotta Generate Em All!This is one of the simplest character-level language models, and its a great starting point. By adding more layers, using larger datasets, or increasing the context length, you can improve the model and generate even more creative names. But dont stop here! Try feeding it a different set of names think dragons, elves, or mystical creatures and watch how it learns to capture those vibes. With just a bit of tweaking, this model can become your go-to generator for names straight out of fantasy worlds. Happy training, and may your creations sound as epic as they look!The full source code and the Jupyter Notebook are available in the GitHub repository. Feel free to reach out if you have ideas for improvements or any other observations.References:Source: Pokemon love GIF on giphyJoin thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AI
    0 Σχόλια 0 Μοιράστηκε 15 Views
  • TOWARDSAI.NET
    Empirical Techniques for Enhanced Predictive Modeling: Beyond Traditional ARMA
    LatestMachine LearningEmpirical Techniques for Enhanced Predictive Modeling: Beyond Traditional ARMA 0 like November 14, 2024Share this postAuthor(s): Shenggang Li Originally published on Towards AI. This member-only story is on us. Upgrade to access all of Medium.A Non-Parametric Approach for Robust Forecasting and Data Analysis Across DomainsPhoto by XinYing Lin on UnsplashThe ARMA model is a popular choice for time series forecasting because it captures how data points are related over time like how todays data depends on yesterdays. But ARMA assumes that the residuals follow a specific (usually normal) distribution. In real-world data, this assumption often doesnt hold up. Outliers, sudden shifts, and unusual patterns can mess with the model, making forecasts less accurate or unstable.From my research, I believe that empirical techniques offer a solution. Theyre flexible, non-parametric, and adapt directly to the data without needing strict distribution assumptions. Instead of forcing data into a set framework, they use the actual observed values to build the model, making it effective at handling outliers and complex patterns that traditional models might miss.Combining empirical transformation and likelihood estimation with ARMA leads to a more reliable forecasting model. ARMA captures time-based relationships, while empirical likelihood helps manage irregularities. Instead of assuming a specific residual distribution, empirical likelihood lets the model adapt to real-world data, improving forecast accuracy.Lets dive into the world of empirical distribution and Read the full blog for free on Medium.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post
    0 Σχόλια 0 Μοιράστηκε 13 Views
  • TOWARDSAI.NET
    A Modern Approach To The Fundamental Problem of Causal Inference
    A Modern Approach To The Fundamental Problem of Causal Inference 0 like November 14, 2024Share this postAuthor(s): Andrea Berdondini Originally published on Towards AI. Photo by the authorABSTRACT: The fundamental problem of causal inference defines the impossibility of associating a causal link to a correlation, in other words: correlation does not prove causality. This problem can be understood from two points of view: experimental and statistical. The experimental approach tells us that this problem arises from the impossibility of simultaneously observing an event both in the presence and absence of a hypothesis. The statistical approach, on the other hand, suggests that this problem stems from the error of treating tested hypotheses as independent of each other. Modern statistics tends to place greater emphasis on the statistical approach because, compared to the experimental point of view, it also shows us a way to solve the problem. Indeed, when testing many hypotheses, a composite hypothesis is constructed that tends to cover the entire solution space. Consequently, the composite hypothesis can be fitted to any data set by generating a random correlation. Furthermore, the probability that the correlation is random is equal to the probability of obtaining the same result by generating an equivalent number of random hypotheses.IntroductionThe fundamental problem of causal inference defines the impossibility of associating causality with a correlation; in other words, correlation does not prove causality. This problem can be understood from two perspectives: experimental and statistical. The experimental approach suggests that this problem arises from the impossibility of observing an event both in the presence and absence of a hypothesis simultaneously. The statistical approach, on the other hand, suggests that this problem stems from the error of treating tested hypotheses as independent of each other.Modern statistics tends to place greater emphasis on the statistical approach, as it, unlike the experimental approach, also provides a path to solving the problem. Indeed, when testing many hypotheses, a composite hypothesis is constructed that tends to cover the entire solution space. Consequently, the composite hypothesis can fit any data series, thereby generating a correlation that does not imply causality.Furthermore, the probability that the correlation is random is equal to the probability of obtaining the same result by generating an equivalent number of random hypotheses. Regarding this topic, we will see that the key point, in calculating this probability value, is to consider hypotheses as dependent on all other previously tested hypotheses.Considering the hypothesis as non-independent has fundamental implications in statistical analysis.Indeed, every random action we take is not only useless but will increase the probability of a random correlation.For this reason, in the following article [1], we highlight the importance of acting consciously in statistics.Moreover, calculating the probability that the correlation is random is only possible if all prior attempts are known. In practice, calculating this probability is very difficult because not only do we need to consider all our attempts, but we also need to consider the attempts made by everyone else performing the same task. Indeed, a group of people belonging to a research network all having the same reputation and all working on the same problem can be considered with a single person who performs all the attempts made.From a practical point of view, we are almost always in the situation where this parameter is underestimated because it is very difficult to know all the hypotheses tested.Consequently, the calculation of the probability that a correlation is casual becomes something relative that depends on the information we have.The Fundamental Problem of Causal InferenceThe fundamental problem of causal inference [2] defines the impossibility of associating causality with a correlation, in other words: correlation does not prove causality. From a statistical point of view, this indeterminacy arises from the error of considering the tested hypotheses as independent of each other. When a series of hypotheses is generated, a composite hypothesis is formed that tends to fit any data series, leading to purely random correlations.For example, you can find amusing correlations between very different events on the internet; these correlations are obviously random. These examples are often used to demonstrate the fundamental problem of causal inference. In presenting this data, the following information is always omitted: how many hypotheses did I consider before finding a related hypothesis.This is essential information because if I have a database comprising a very high number of events, for any data series, there will always be a hypothesis that correlates well with my data. Thus, if I generate a large number of random hypotheses, I will almost certainly find a hypothesis that correlates with the data I am studying. Therefore, having a probability of about 100% of being able to obtain the same result randomly, I have a probability of about 100% that the correlation does not also imply causation.On the other hand, if we generate a single hypothesis that correlates well with the data, in this situation, almost certainly, the correlation also implies causation. This is because the probability of obtaining a good correlation by generating a single random hypothesis is almost zero.This result is also intuitive, because it is possible to achieve a good correlation with a single attempt only if one has knowledge of the process that generated the data to be analyzed. And it is precisely this knowledge that also determines a causal constraint.The following figure summarizes the basic concepts showing how the correct way to proceed is to consider the hypotheses as non-independent.Calculating the probability that the correlation is randomCorrectly calculating the probability of getting an equal or better result randomly involves changing our approach to statistics. The approach commonly used in statistics is to consider the data produced by one method independent of the data produced by different methods. This way of proceeding seems the only possible one but, as we will show in the following paradox, it leads to an illogical result, which is instead solved by considering the data as non-independent.We think to have a computer with enormous computational capacity that is used to develop hypotheses about a phenomenon that we want to study. The computer works as follows: it creates a random hypothesis and then performs a statistical test. At this point, we ask ourselves the following question: can there be a useful statistical test to evaluate the results of the hypothesis generated?If we answer yes, we get an illogical result because our computer would always be able, by generating a large number of random hypotheses, to find a hypothesis that passes the statistical test. In this way, we arrive at the absurd conclusion that it is possible to create knowledge randomly, because it is enough to have a very powerful computer and a statistical test to understand every phenomenon.If we answer no, we get another illogical result because we are saying that no hypothesis can be evaluated. In practice, the results of different hypotheses are all equivalent and indistinguishable.How can we solve this logical paradox? The only way to answer the question, without obtaining an illogical situation, is to consider the results obtained from different methods depending on each other. A function that meets this condition is the probability of getting an equal or better result at random. Indeed, the calculation of this probability implies the random simulation of all the actions performed. Hence, random attempts increase the number of actions performed and consequently increase the probability of obtaining an equal or better result randomly.For this reason, generating random hypotheses is useless, and therefore if you use this parameter, it is possible to evaluate the data and at the same time it is impossible to create knowledge by generating random hypotheses.Considering the hypothesis as non-independent is a fundamental condition for correctly calculating of the probability that the correlation is random.The probability of getting an equal or better result at random meets this condition.The dependence of hypothesis on each other has profound implications in statistics, which will be discussed in the next section.Consequences of the non-independence of the hypothesisConsider the tested hypotheses to be dependent on each other when calculating the probability that the correlation is causal leads to three fundamental consequences in statistics.First fundamental consequence of the non-independence of the hypothesis: our every random action always involves an increase in the probability of a random correlation.Example: We need to analyze a statistical datum represented by 10 predictions about an event that can only have two results. The 10 predictions are divided into 8 successes and 2 failures. To calculate the probability of obtaining an equal or better result randomly we use the binomial distribution and we get the following value 5.5%. If before making these 10 predictions, we tested a different hypothesis with which we made 10 other predictions divided into 5 successes and 5 failures, the uncertainty of our result changes. Indeed, in this case, we must calculate the probability of obtaining a result with a number of successes greater than or equal to 8 by performing two random attempts consisting of 10 predictions each. In this case, the probability becomes 10.6%, so the fact of having first tested a random hypothesis almost doubled the probability of a random correlation of our second hypothesis. Consequently, increasing the random hypotheses increases the number of predictions that we will have to make, with the true hypothesis, to have a low probability that the correlation is coincidental.Second fundamental consequence of the non-independence of the hypothesis: every random action of ours and of every other person equivalent to us, always involves an increase of the probability that the correlation is random.By the equivalent term, we mean a person with the same reputation as us, therefore the data produced by equivalent people are judged with the same weight.Example: 10 people participate in a project whose goal is the development of an algorithm capable of predicting the outcome of an event that can have only two results. An external person who does not participate in the project but is aware of every attempt made by the participants evaluates the statistical data obtained. All participants make 100 predictions, 9 get a 50% chance of success, one gets a 65% chance of success. The probability that a 65% success is due to a random correlation is obtained by calculating the probability of obtaining a result with a number of successes greater than or equal to 65 by performing ten random attempts consisting of 100 predictions each. The probability obtained, in this way, is 16% instead if he was the only participant in the project the probability would have been 0.18%, therefore about 100 times lower.Third fundamental consequence of the non-independence of the hypothesis: the calculation the probability that the correlation is random varies according to the information possessed.Example: 10 people participate in a project whose goal is the development of an algorithm capable of predicting the outcome of an event that can have only two results. In this case, people do not know the other participants and think they are the only ones participating in the project. All participants make 100 predictions, 9 get a 50% chance of success and one gets a 65% chance of success. The participant who obtains a probability of success of 65% independently calculate the probability that the correlation is coincidental. Not knowing that other people are participating in the project, calculate the probability of obtaining a result with a number of successes greater than or equal to 65 by performing a single random attempt consisting of 100 predictions; the probability obtained is 0.18%. An external person who is aware of every attempt made by the participants calculate the probability that the 65% success rate of one of the participants was due to a random correlation. knowing the number of participants in the project calculates the probability of obtaining a result with a number of successes greater than or equal to 65 by making ten random attempts consisting of 100 predictions each. The probability obtained, in this way, is 16%, a much higher value than the probability calculated by the participant. The probability calculated by the external person using more information is most accurate than the probability calculated by the individual participant. Consequently, the probability obtained by exploiting the greatest number of information must always be considered, in the case of the example, the probability that the 65% success is due to a random correlation is 16%. Therefore, the participant having less information underestimates this probability.The first and second fundamental highlighting consequence of the non-independence of the hypothesis can be redefined by highlighting the non-randomness of the action.First fundamental consequence of the non-independence of the hypothesis: our every non-random action always involves a reduction in the probability that the correlation is random.Second fundamental consequence of the non-independence of the hypothesis: every non-random action of ours and of every other person equivalent to us, always involves a reduction in the probability that the correlation is random.How to perform correctly the statistical hypothesis testAbout to perform correctly the statistical hypothesis test, It is interesting to note how the nonindependence of the hypothesis can be seen as something extremely obvious or as something extremely innovative. Indeed, it may seem absolutely banal to consider all the hypotheses that have been tested, for the obvious reason that by running a large number of random hypotheses sooner or later there will be some hypothesis that will fit the data quite well. On the other hand, also considering the previous hypotheses represents a revolution in the evaluation of a hypothesis. In fact, from this point of view, the mere knowledge of the hypothesis that makes the prediction does not allow us to define its real complexity. Therefore, if in the statistical hypothesis test the p-value [3], [4], used as a threshold to reject the null hypothesis, is calculated considering only the hypothesis that actively participates in the prediction, it means, that we are underestimating the complexity of the hypothesis. Consequently, the p-value, thus calculated, is wrong and therefore determines a false evaluation of the hypothesis. It is therefore believed that this systematic error, in the execution of the hypothesis test, is responsible for the high number of non-reproducible results [5], [6].Taking advantage of these considerations it is understood that evaluating a statistical result can be very difficult because some information can be hidden. For example, we are obliged to report the mathematical formula that makes the prediction but, instead, we may not report all previous failed attempts. Unfortunately, this information is essential for evaluating the hypothesis, because they are an integral part of the hypothesis. Indeed, if we test 10 hypotheses, we simply interpolate the data with those ten hypotheses and choose the hypothesis that passes the chosen evaluation test.This problem also depends on the increasing use of statistical software capable of quickly executing a huge number of mathematical models.Consequently, there is the risk of playing with this software by performing a multitude of analyzes and this sooner or later leads to a random correlation. For these reasons, the evaluation of statistical results represents one of the most important challenges for scientific research.Unfortunately, it is a difficult problem to solve because, as mentioned, some information can always be hidden when writing an article. The simplest solution adopted is to use more selective evaluation parameters, which in practice means making it unlikely to pass the evaluation test by developing random hypotheses. However, this solution has a big problem: by acting in this way there is the risk of discarding a correct hypotheses and cannot be applied to all fields of research. For example, in finance where the possible inefficiencies of the markets [7], which can be observed, are minimal, adopting very restrictive valuation methods means having to discard almost any hypothesis.ConclusionIn this article, we analyzed the fundamental problem of causal inference from a statistical perspective. From this point of view, the problem arises from treating all tested hypotheses as independent of each other. This way of acting is wrong because when we generate a series of hypotheses we are building a composite hypothesis that will tend to adapt and therefore give a random correlation to each of our series of data.It is believed that this incorrect approach is the cause of the problem of non reproducibility of scientific results.Moreover, the increase in computational capacity speeds up hypothesis development, inadvertently creating composite hypotheses that can lead to random correlations.The probability that a correlation is random is obtained by calculating the probability of obtaining an equal or better result randomly. This calculation can be done, correctly, only by knowing all the hypotheses tested, unfortunately this information is very difficult to have.For this reason, in modern statistics, it is considered fundamental to develop the awareness that each of our compulsive and irrational actions, which leads us to develop and test a large quantity of hypotheses, has as a consequence the generation of random correlations that are difficult to detect.Bibliography:[1] Berdondini, Andrea, Statistics the Science of Awareness (August 30, 2021). Available at SSRN: https://ssrn.com/abstract=3914134.[2] Holland, P. W. (1986) Statistics and Causal Inference. Journal of the American Statistical Association, 81(396), 945960.[3] Hung, H.M.J., ONeill, R.T., Bauer, P., & Kohne, K. (1997). The behavior of the p-value when the alternative hypothesis is true. Biometrics, 53(1), 1122.[4] Harlow, L.L., Mulaik, S.A., & Steiger, J.H. (1997). What if there were no significance tests? Psychological Methods, 2(4), 315328.[5] Munaf, M., Nosek, B., Bishop, D. et al. A manifesto for reproducible science. Nat Hum Behav 1, 0021 (2017). https://doi.org/10.1038/s41562-016-0021.[6] Ioannidis, J. P. A. Why most published research findings are false. PLoS Med. 2, e124 (2005).[7] Black, F. (1971) Random Walk and Portfolio Management, Financial Analyst Journal, 27, 1622Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post
    0 Σχόλια 0 Μοιράστηκε 13 Views
  • TOWARDSAI.NET
    Elon Musks Own AI Flags Him as a Leading Misinformation Source on X
    Elon Musks Own AI Flags Him as a Leading Misinformation Source on X 0 like November 13, 2024Share this postAuthor(s): Get The Gist Originally published on Towards AI. Plus: Nvidia is Building Japans Most Advanced AI SupercomputerThis member-only story is on us. Upgrade to access all of Medium.Welcome to Get The Gist, where every weekday we share an easy-to-read summary of the latest and greatest developments in AI news, innovations, and trends all delivered in under 5 minutes! In todays edition:Nvidia is Building Japans Most Advanced AI SupercomputerGoogle Nest Cameras Get Smarter with New Gemini AI FeaturesGrok Flags Musk as a Leading Misinformation Source on XAmazon to Launch Its New AI ChipAnd more AI news.Image by: NvidiaThe Gist: SoftBank, in partnership with NVIDIA, is building Japans most powerful AI supercomputer, aiming to lead in AI innovation, telecom, and industrial growth. This groundbreaking infrastructure promises new revenue streams and transformative applications across industries.Key Details:SoftBanks AI supercomputer, based on NVIDIAs Blackwell platform, will be the most powerful in Japan, supporting AI development for research, universities, and businesses.Using the NVIDIA AI Aerial platform, SoftBank has piloted the first AI-integrated 5G network, unlocking multi-billion dollar revenue opportunities for telecom.SoftBanks planned AI marketplace, powered by NVIDIA AI Enterprise, will provide secure, local AI services to industries, enabling growth in fields like healthcare, robotics, and transportation.Image by: NeowinThe Gist: Starting next week, Google Nest cameras will roll out advanced AI Read the full blog for free on Medium.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post
    0 Σχόλια 0 Μοιράστηκε 14 Views
  • TOWARDSAI.NET
    Let AI Instantly Parse Heavy Documents: The Magic of MPLUG-DOCOWL2s Efficient Compression
    LatestMachine LearningLet AI Instantly Parse Heavy Documents: The Magic of MPLUG-DOCOWL2s Efficient Compression 0 like November 13, 2024Share this postAuthor(s): Florian June Originally published on Towards AI. This member-only story is on us. Upgrade to access all of Medium.Today, lets take a look at one of the latest developments in PDF Parsing and Document Intelligence.In our digital age, the ability to understand documents beyond mere text extraction is crucial. Multi-page documents, such as legal contracts, scientific papers, and technical manuals, present unique challenges.Traditional document understanding methods heavily rely on Optical Character Recognition (OCR) techniques, which present a significant challenge: the inefficiency and sluggish performance of current OCR-based solutions when processing high-resolution, multi-page documents.These methods generate thousands of visual tokens for just a single page, leading to high computational costs and prolonged inference times. For example, InternVL 2 requires an average of 3,000 visual tokens to understand a single page, resulting in slow processing speeds.Figure 1: (a) mPLUG-DocOwl2 achieves state-of-the-art Multi-page Document Understanding performance with faster inference speed and less GPU memory; (b-c) mPLUG-DocOwl2 is able to provide a detailed explanation containing the evidence page as well as the overall structure parsing of the document. Source: MPLUG-DOCOWL2.As shown in Figure 1, a new study called MPLUG-DOCOWL2 (open-source code) aims to address this issue by drastically reducing the number of visual tokens while maintaining, or even enhancing, comprehension accuracy.A Read the full blog for free on Medium.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post
    0 Σχόλια 0 Μοιράστηκε 14 Views
  • TOWARDSAI.NET
    Whisper Variants Comparison: What Are Their Features And How To Implement Them?
    Author(s): Yuki Shizuya Originally published on Towards AI. Photo by Pawel Czerwinski on UnsplashRecently, I research automatic speech recognition (ASR) to make transcription from speech data. When it comes to an open-source ASR model, Whisper [1], which is developed by OpenAI, might be the best choice in terms of its highly accurate transcription. However, there are many variants of Whisper, so I want to compare their features. In this blog, I will quickly recap Whisper and introduce the variants and how to implement them in Python. I will explain vanilla Whisper, Faster Whisper, Whisper X, Distil-Whisper, and Whisper-Medusa.Table of Contents1. What is Whisper?Whisper [1] is an automatic speech recognition (ASR) model developed by OpenAI. It is trained on 680,000 hours of multilingual and multi-task supervised data, including transcription, translation, voice activity detection, alignment, and language identification. Before the arrival of Whisper, there were no models trained by such a massive amount of data in a supervised way. Regarding architecture, Whisper adopts an Encoder-Decoder Transformer for scalability. The architecture illustration is shown below.Whisper architecture illustration adapted by [1]Firstly, Whisper converts audio data into a log-mel spectrogram. A log-mel spectrogram is a visual representation of the spectrum of signal frequencies in the mel scale, which is commonly used in speech processing and machine learning tasks. For further information, you can check this blog [2]. After Whisper inputs a log-mel spectrogram to some 1-D convolution layers and positional encoding, it processes data in a similar way to the natural language processing Transformer. Whisper can work in the multilingual setting to leverage byte-level BPE tokenizer utilized by GPT-2. Thanks to multi-task learning, Whisper can also perform transcription, timestamp detection, and translation.Official Whisper has six model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Smaller models provide only English-only capability.Whisper size variation tableJust recently (2024/10), OpenAI released the new version, turbo, which has almost the same capability as the large-size model but offers significant speed-up (8 times!) by fine-tuning the pruned large-size model. All Whisper models are compatible with the HuggingFace transformer library.Now, we quickly recap Whisper. It is based on the Encoder-Decoder Transformer architecture and performs outstandingly, even including in commercial models. In the next section, we will discuss the Whisper variants.2. Whisper variants : Faster Whisper, Whisper X, Distil-Whisper, and Whisper-MedusaIn this section, we will go through Whisper variants and their features. I focus on the Python and Pytorch implementations. Although Whisper.cpp and Whisper JAX are popular variants, I will not examine them. Moreover, Whisper-streaming is also a popular variant for real-time inference, but it needs a high-end GPU, so I will not discuss it either. We will check Faster-Whisper, Whisper X, Distil-Whisper, and Whisper-Medusa.Faster-WhisperFaster-Whisper is a reimplementation of Whisper using CTranslate2, which is a C++ and Python library for efficient inference with Transformer models. Thus, there is no change in architecture. According to the official repository, Faster-Whisper can speed up ~4 times faster than the original implementation with the same accuracy while using less memory. Briefly, Ctranslate2 utilizes many optimization techniques, such as weights quantization, layers fusion, batch reordering, etc. We can choose type options, such as float16 or int8, according to our machine type; for instance, when we select int8, we can run Whisper even on the CPU.WhisperX (2023/03)WhisperX [3] is also an efficient speech transcription system integrated Faster-Whisper. Although vanilla Whisper is trained by multiple tasks, including timestamp prediction, it is prone to be inaccurate for word-level timestamps. Moreover, due to its sequential inference nature, vanilla Whisper generally takes computation time for long-form audio inputs. To overcome these weak points, WhisperX introduces three additional stages: (1) Voice Activity Detection (VAD), (2) cut & merge results of VAD, and (3) forced alignment with an external phoneme model to provide accurate word-level timestamps. The architecture illustration is shown below.WhisperX architecture illustration adapted by [3]Firstly, WhisperX processes input audio through the VAD layer. As its name suggests, VAD detects voice segments. WhisperX utilizes the segmentation model in the pyannote-audio library for the VAD. Next, WhisperX cuts and merges the voice detected segmentation. This process allows us to run batch inference based on each cut result. Finally, WhisperX applies the forced alignment to measure word-level accurate timestamps. Lets check a concrete example as shown below.WhisperX algorithm created by the authorIt leverages Whisper for the transcription and the Phoneme model for phoneme-level transcription. The phoneme model can detect a timestamp for each phoneme; thus, if we assign the timestamp from the next nearest phoneme in the Whisper transcript, we can get a more accurate timestamp for each word.Even though WhisperX adds three additional processes compared to the vanilla Whisper, it can effectively transcribe for longer audio thanks to batch inference. The following table shows the performance comparison. You can check that WhisperX keeps low WER but increase the inference speed.Performance comparison of WhisperX adapted by [3]Distil-Whisper (2023/11)Distil-Whisper [4] was developed by HuggingFace in 2023. It is a model that compresses the Whipser Large model using knowledge distillation. It leverages common knowledge distillation techniques to train the smaller model, such as pseudo-labeling from the Whisper Large model and Kullback-Leibler Divergence loss. The architecture illustration is shown below.Distil-Whisper illustration adapted by [4]The architecture is paired with the vanilla Whisper, but the number of layers is decreased. For the dataset, the authors collect 21,170 hours of publicly available data from the Internet to train the Distil-Whisper. Distil-Whisper records 5.8 times faster than the Whisper Large model, with 51% fewer parameters, while performing within a 1% word error rate (WER) on out-of-distribution data. The following table shows the performance comparison.Performance comparison of Distil-Whisper adapted by [4]As you can see, Distil-Whisper keeps a word error rate as low as vanilla Whisper but can decrease the latency.Whisper-Medusa (2024/09)Whisper-Medusa [5] is the variant that utilizes Medusa to increase Whispers inference speed. Medusa is an efficient LLM inference method that adds extra decoding heads to predict multiple subsequent tokens in parallel. You can understand well using the following illustration.Medusa and Whisper-Medusa architecture comparison by the author. Illustrations are adapted from original papers [5][6]In the left part, the Medusa has three additional heads to predict subsequent tokens. If an original model outputs y1 token, the three additional heads predict y2, y3, and y4 tokens. Medusa can increase the number of predictions by adding additional heads and reduce the inference time overall. Note that the necessary VRAM amount will be increased because of additional heads.Whisper-Medusa applies the Medusa idea to Whisper, as shown in the right part. Since Whisper has a disadvantage in inference speed because of the sequential inference nature, Medusas feature helps speed up the inference. The comparison results between Whisper-Medusa and vanilla Whisper are shown below.The performance comparison of Whisper-Medusa adapted by [5]For several language datasets, Whisper-Medusa records a lower word error rate (WER) than vanilla Whisper. It can also speed up 1.5 times on average.In this section, we check the Whisper variants and their features. The following section will explore how to implement them in Python and check their capability for real-world audio.3. Python implementation of Whisper variants : Compare the results using real-world audio dataIn this section, we will learn how to implement Whisper and Whisper variants in Python. For real-world audio data, I will use audio from this YouTube video downloaded manually. The video size is around 14 minutes. I will attach the code on how to convert an mp4 file into an mp3 file later.Environment setupDue to library incompatibility, we created two environments: one for Whipser, Faster-Whisper, WhisperX, and Distil-Whisper, and the other for Whisper-Medusa.For the former environment, I used a conda environment with Python 3.10. I experimented on Ubuntu 20.04 with cuda 12.0, 16 GB VRAM.conda create -n audioenv python=3.10 -yconda activate audioenvNext, we need to install the libraries below via pip and conda. After the installation below, you need to downgrade numpy to 1.26.3.conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidiapip install python-dotenv moviepy openai-whisper accelerate datasets[audio]pip install numpy==1.26.3Next, we need to install whisperX repository. However, whisperX is no longer maintained frequently so far. Thus, we use the forked repository called BetterWhisperX.git clone https://github.com/federicotorrielli/BetterWhisperX.gitcd BetterWhisperXpip install -e .First environment preparation is done.For Whisper-Medusa environment, I used a conda environment with Python 3.11. I also experimented on Ubuntu 20.04 with cuda 12.0, 24 GB VRAM.conda create -n medusa python=3.11 -yconda activate medusaYou need to install the following libraries via pip.pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu118pip install wandbgit clone https://github.com/aiola-lab/whisper-medusa.gitcd whisper-medusapip install -e .All preparation is done. Now, lets check Whisper variants capabilities!How to implement Whisper variants in PythonWhisper turboWe use the latest version of Whisper, turbo. Thanks to the official repository, we can implement vanilla Whisper with only a few lines of code.import whispermodel = whisper.load_model("turbo")result = model.transcribe("audio.mp3")Whisper can only work for audio data within 30 seconds, but transcribe method reads the entire file and processes the audio with a sliding 30-second window, so we dont care about how to feed the data.2. Faster-WhisperWe use the Whisper turbo backbone of Faster-Whisper. Faster-Whisper has the original repository, and we can implement it as follows.from faster_whisper import WhisperModelmodel_size = "deepdml/faster-whisper-large-v3-turbo-ct2"# Run on GPU with FP16model = WhisperModel(model_size_or_path=model_size, device="cuda", compute_type="float16")segments, info = model.transcribe('audio.mp3', beam_size=5)beam_size is used for beam search on decoding. Since the capability of Faster-Whisper is the same as the vanilla Whisper, we can process long-form audio using a sliding window.3. WhisperXWe use the Whisper turbo backbone of WhisperX. Since WhisperX utilizes Faster-Whisper as a backbone, some parts of the codes are shared.import whisperxmodel_size = "deepdml/faster-whisper-large-v3-turbo-ct2"# Transcribe with original whisper (batched)model = whisperx.load_model(model_size, 'cuda', compute_type="float16")model_a, metadata = whisperx.load_align_model(language_code='en', device='cuda')# inferenceaudio = whisperx.load_audio('audio.mp3')whisper_result = model.transcribe(audio, batch_size=16)result = whisperx.align(whisper_result["segments"], model_a, metadata, audio, 'cuda', return_char_alignments=False)WhisperX integrates with Faster-Whisper and adds additional layers that process VAD and forced alignment. We can also process long-form audio more than 30 seconds thanks to the cut & merge.4. Distil-WhisperWe will use the large-v3 models distilled version because the latest turbo version has yet to be released. Distil-Whisper is compatible with the HuggingFace Transformer library, so we can easily implement it.import torchfrom transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipelinedevice = "cuda:0" if torch.cuda.is_available() else "cpu"torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32model_id = "distil-whisper/distil-large-v3"model = AutoModelForSpeechSeq2Seq.from_pretrained( model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True)model.to(device)processor = AutoProcessor.from_pretrained(model_id)pipe = pipeline( "automatic-speech-recognition", model=model, tokenizer=processor.tokenizer, feature_extractor=processor.feature_extractor, max_new_tokens=128, torch_dtype=torch_dtype, device=device, return_timestamps=True)result = pipe('audio.mp3')pipeline class automatically processes long-form audio using sliding window. Note that this method only outputs the relative timestamps.5. Whisper-MedusaWe use the large model as the Whisper backbone. Following the official implementation, we can implement it as follows:import torchimport torchaudiofrom whisper_medusa import WhisperMedusaModelfrom transformers import WhisperProcessorSAMPLING_RATE = 16000language = "en"regulation_factor=1.01regulation_start=140device = 'cuda'model_name = "aiola/whisper-medusa-linear-libri"model = WhisperMedusaModel.from_pretrained(model_name)processor = WhisperProcessor.from_pretrained(model_name)model = model.to(device)input_speech, sr = torchaudio.load(audio_path)if input_speech.shape[0] > 1: # If stereo, average the channels input_speech = input_speech.mean(dim=0, keepdim=True)if sr != SAMPLING_RATE: input_speech = torchaudio.transforms.Resample(sr, SAMPLING_RATE)(input_speech)exponential_decay_length_penalty = (regulation_start, regulation_factor)input_features = processor(input_speech.squeeze(), return_tensors="pt", sampling_rate=SAMPLING_RATE).input_featuresinput_features = input_features.to(device)model_output = model.generate( input_features, language=language, exponential_decay_length_penalty=exponential_decay_length_penalty,)predict_ids = model_output[0]pred = processor.decode(predict_ids, skip_special_tokens=True)Unfortunately, Whisper-Medusa currently doesnt support long-form audio transcription, so we can only use it for up to 30 seconds audio data. When I checked the quality of the 30-second transcription, it was not as good as other variants. Therefore, I skip its result from the comparison among other Whisper variants.Performance comparison among Whisper VariantsAs I mentioned before, I used around 14 minutes audio file as an input. The following table compares the results of each model.The performance result table from the authorIn summary,Whisper turbo sometimes tends to put the same sentences and hallucinations.Faster-Whisper transcription is almost good, and calculation speed is the best.WhisperX transcription is the best, and it records a very accurate timestamp.Distil-Whisper transcription is almost good. However, it only records relative timestamps.If you can allow subtle mistranscription and dont care about timestamps, you should use Faster-Whisper. Meanwhile, if you want to know the accurate timestamps and transcriptions, you should use WhisperX.WhisperX and Faster-Whisper can get better results than the vanilla Whisper probably because Faster-Whisper has beam search for better inference results, and Whisper X has forced alignment. Hence, they have chance to fix their mistranscription in postprocessing.In this blog, we have learned about Whisper variants architecture and their implementation in Python. Many researchers use various optimization techniques to minimize the inference speed for real-world applications. Based on my investigation, Faster-Whisper and WhisperX keep the capability but succeed in decreasing the inference speed. Here is the code that I used in this experiment.References[1] Alec Radford, Jong Wook Kim, et.al., Robust Speech Recognition via Large-Scale Weak Supervision, Arxiv[2] Leland Roberts, Understanding the Mel Spectrogram, Analytics Vidhya[3] Max Bain, Jaesung Huh, Tengda Han, Andrew Zisserman, WhisperX: Time-Accurate Speech Transcription of Long-Form Audio, Arxiv[4] Sanchit Gandhi, Patrick von Platen & Alexander M. Rush, DISTIL-WHISPER: ROBUST KNOWLEDGE DISTILLATION VIA LARGE-SCALE PSEUDO LABELLING, Arxiv[5] Yael Segal-Feldman, Aviv Shamsian, Aviv Navon, Gill Hetz, Joseph Keshet, Whisper in Medusas Ear: Multi-head Efficient Decoding for Transformer-based ASR, Arxiv[6] Tianle Cai, Yuhong Li, et.al., MEDUSA: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads, ArxivJoin thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AI
    0 Σχόλια 0 Μοιράστηκε 15 Views
  • TOWARDSAI.NET
    Google Launches New AI Learning Companion Tool
    Google Launches New AI Learning Companion Tool 0 like November 13, 2024Share this postLast Updated on November 13, 2024 by Editorial TeamAuthor(s): Get The Gist Originally published on Towards AI. Plus: Amazon Develops Its Own Custom AI ChipsThis member-only story is on us. Upgrade to access all of Medium.Welcome to Get The Gist, where every weekday we share an easy-to-read summary of the latest and greatest developments in AI news, innovations, and trends all delivered in under 5 minutes! In todays edition:Google Launches New AI Learning Companion ToolAmazon Develops Its Own Custom AI ChipsGoogle DeepMind Open-Sources AlphaFold 3Baidu Launches AI-Powered Smart GlassesAnd more AI news.Image by: GoogleThe Gist: Googles Learn About AI tool, powered by the LearnLM model, provides educational-style responses that go beyond typical chatbot answers, adding context, visuals, and interactive learning elements.Key Details:Unlike traditional chatbots, Learn About uses educational formatting with features like vocabulary builders and why it matters sections to deepen understanding.Responses are visual and interactive, using educational sources and offering related topics for further exploration.The tool successfully interprets even tricky questions, showing promise for refining AIs role in learning and answering common misconceptions.Image by: BloombergThe Gist: Amazon is advancing its in-house AI chips to decrease reliance on NVIDIAs powerful GPUs, leveraging its custom processors to cut costs and boost AI capabilities for cloud and AI partner applications.Key Details:Amazons Annapurna Labs, which previously developed the Graviton and Trainium processors, is now creating Read the full blog for free on Medium.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post
    0 Σχόλια 0 Μοιράστηκε 13 Views
  • TOWARDSAI.NET
    10 Effective Strategies to Lower LLM Inference Costs
    10 Effective Strategies to Lower LLM Inference Costs 0 like November 13, 2024Share this postAuthor(s): Isuru Lakshan Ekanayaka Originally published on Towards AI. This member-only story is on us. Upgrade to access all of Medium.image sourceLarge Language Models (LLMs) like GPT-4 have transformed industries by enabling advanced natural language processing, content generation, and more. However, deploying these powerful models at scale presents significant challenges, particularly regarding inference costs. High operational expenses can hinder scalability, profitability, and sustainability, making it crucial to optimize LLM inference processes. This article explores ten proven strategies to reduce LLM inference costs, ensuring that AI applications remain efficient, scalable, and economically viable.Optimizing LLM inference costs isnt just a financial consideration it directly impacts several critical aspects of AI deployment:Scalability: Cost-efficient inference allows organizations to scale AI applications without prohibitive expenses, facilitating broader deployment across various use cases and markets.Profitability: Reducing operational costs directly enhances the bottom line, making AI solutions more financially viable and attractive to stakeholders.Sustainability: Optimizing inference processes can lead to reduced energy consumption, contributing to environmentally sustainable practices.Key Insight: Optimizing LLM costs is essential for scaling AI effectively and sustainably, ensuring organizations can deploy powerful AI solutions without compromising economic or environmental factors.With these considerations in mind, lets delve into ten strategies to significantly lower LLM inference costs.image sourceQuantization is a technique in machine learning Read the full blog for free on Medium.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post
    0 Σχόλια 0 Μοιράστηκε 13 Views
  • TOWARDSAI.NET
    TAI #125: Training Compute Scaling Saturating As Orion, Gemini 2.0, Grok 3, and Llama 4 Approach?
    Author(s): Towards AI Editorial Team Originally published on Towards AI. What happened this week in AI by LouieThis week, the potential plateauing of LLM training scaling laws has been a focus of debate in the AI community. The Information reported that OpenAIs scaling of LLM training compute appears to be hitting a plateau, with more incremental gains in its latest model, Orion, than hoped relative to GPT-4. Reports of this slowing trend are not isolated to OpenAI. Google DeepMind, for instance, is expected to launch Gemini 2.0 in December, but reports have also suggested internal disappointment on improvements. Similarly, we recently discussed Anthropics delayed release of Opus 3.5, though CEO Dario Amodei has now confirmed they still plan to release it. Meanwhile, Metas LLaMA 4 and XAis Grok-3 are currently training on clusters of more than 100,000 H100 GPUs, with Grok-3 expected as soon as late 2024. Despite these investments, the anticipated performance gains across models may be smaller than the leap seen with previous generations, raising broader questions about the limits of traditional training compute scaling laws.In May, OpenAI CEO Sam Altman expressed high hopes for Orion, the companys upcoming flagship model, predicting it would be significantly more advanced than GPT-4. At the time, Orions training was reportedly only 20% complete, but it was already performing on par with GPT-4 in tasks and intelligence. However, as training progressed, Orions improvements have been more incremental, especially compared to the leap seen between GPT-3 and GPT-4, leading some within OpenAI to temper their expectations. As testing continues, OpenAI employees who have worked with Orion report that, while it shows notable progress on certain language tasks, its performance is inconsistent, particularly with more structured tasks like coding and complex problem-solving. For some applications, Orions capabilities dont clearly surpass GPT-4. These mixed results have raised questions about whether Orions enhancements are enough to justify its increased operational costs.OpenAI has yet to finish the final safety evaluations of Orion, which are expected to be publicly released early next year, with hints that it may depart from the traditional GPT branding to reflect its new direction. It is also possible that Orion is integrated into OpenAIs new o1 reasoning model family to achieve further performance gains.Why should you care?Exactly what bottlenecks or dead ends will get in the way of continuing to improve LLM capabilities is a key factor in how quickly they will significantly transform the global economy and potentially even achieve AGI. While diminishing returns are natural to some extent particularly after saturating many easier capabilities and benchmark tasks LLMs still have a long way to go to match human performance on many tasks. We actually think current LLM capabilities are already enough for a huge global impact, but foundation LLMs need to be customized to specific tasks and companies to achieve the reliability and productivity gains needed for widescale adoption. This is currently bottlenecked by LLM Developer talent (we think many millions of LLM Developers will be needed, and we are trying to solve this with our Towards AI Academy), employees non-technical LLM education, and the time it takes to test and iterate these advanced LLM pipelines. Nevertheless, progress in foundation model capabilities can open up new use cases, and we would be very disappointed if progress stops here! However, we dont think this is likely.Despite recent press reports on disappointing gains from larger training compute budgets Sam Altman and Dario Amoedi are both very optimistic in public statements (albeit with an incentive given fundraising needs!). Sam Altman, for example, said in a recent Reddit AMA he thinks AGI can be achievable with current hardware. Dario Amoedi meanwhile thinks Powerful AI will be achieved in 2026 or 2027. Recent newsflow from leading cloud providers lining up nuclear power for the energy needs of next-generation training clusters also contradicts the saturating returns narrative. Nevertheless, we think there have likely been disappointing results from training runs this year as companies have scaled to 50k+ H100 GPU clusters. Most likely, this is due to a bottleneck in diverse training data after saturating data that is easily scrapable from the internet. New data (real or synthetic) and new architectures may be needed to make the most of larger clusters. Training compute is not the only path to progress. However, we think huge progress has been made this year in both inference cost, and the underappreciated new inference compute scaling paradigm.Foundation LLM capability improvement comes broadly from six sources: 1) Increased training compute budget (which is from more GPUs/TPUs, better GPUs, or longer training runs and can be spent on more parameters, more training data, or more FLOPs per forward/backward pass). 2) Increased utilization of this training compute (higher Maximum FLOPS Utilization, less downtime), 3) Higher quality training data, 4) More training compute efficient algorithms (e.g., MoE, new attention mechanisms), 5) Better mid-training/post-training performance unlocks and enhancement (e.g., RLHF, instruction tuning, monte carlo tree search) and 6) more recently Inference or Test Time compute scaling (increasing thinking time/tokens to solve harder problems). We think right now inference compute scaling is the most effective path to progress, so we would not be surprised to see the focus shift here from scaling training compute. We also think there is still a lot of room to experiment with new or modified model architectures. However, we think larger and larger training budgets will still be justified in parallel, given that even incremental gains relative to other techniques can still unlock huge economic value.Hottest News1. Google DeepMind Open-Sources AlphaFold 3Google DeepMind has unexpectedly released the source code and model weights of AlphaFold 3 for academic use, marking a significant advance that could accelerate scientific discovery and drug development. AlphaFold 3 opens new frontiers in protein modeling by predicting molecular interactions and enabling unprecedented advancements in drug discovery.2. ChatGPT Added 50 Million Weekly Users in Just Two MonthsOpenAI revealed that over 250 million people worldwide use ChatGPT weekly. Thats a sharp rise since late August when OpenAI said the chatbot had 200 million weekly users double the number it had last November. As of June, 350 million people used OpenAIs tools each month.3. Meta Is Using More Than 100,000 Nvidia H100 AI GPUs To Train Llama-4Meta is utilizing over 100,000 Nvidia H100 AI GPUs to develop Llama 4, an advanced AI model with improved modalities and reasoning capabilities. Despite the significant power demands, Meta plans to release Llama models for free to encourage broader development and application.4. Gemini Is Now Accessible From the OpenAI LibraryGoogle is now offering an OpenAI API-compatible endpoint for Gemini, making Gemini accessible via the OpenAI library, enabling developers to easily switch to it. The inclusion means developers wont need to overhaul their existing code or pipelines.5. Claude 3.5 Sonnet Can Now View Images within PDFsThe new Claude 3.5 Sonnet model can now parse images in PDF input, enabling it to understand both textual and visual content within documents. This enhancement marks a substantial leap forward, allowing the AI to handle a broader range of information from PDFs, including textual explanations, images, charts, and graphs, within documents that span up to 100 pages.6. Introducing FLUX1.1 [Pro] Ultra and Raw ModesBlackForestLabs has enhanced the FLUX1.1 [pro] with Ultra and Raw Modes, offering 4MP image resolutions in just 10 seconds, 2.5x faster than competitors at $0.06 per image. The Raw mode improves authenticity and diversity, especially in human and nature photography, accessible via the companys API for high-quality, rapid image generation.7. OpenAI in Talks With Regulators To Become a For-Profit CompanyOpenAI, valued at $157 billion, is in early talks with California and Delaware regulators to shift from a nonprofit to a for-profit entity to attract investors and address the valuation challenges of its AI models. It plans to retain a nonprofit arm post-restructuring.8. Introducing the First AMD 1B Language Models: AMD OLMoAMD has introduced AMD OLMo, a series of 1 billion parameter language models trained on 1.3 trillion tokens using AMD Instinct MI250 GPUs. These open-sourced models excel in reasoning and instruction-following, outperforming similar-sized models in general reasoning and chat benchmarks.9. Amazon May Up Its Investment in Anthropic on One ConditionAccording to The Information, Amazon is in talks to invest multiple billions in Anthropic, its first financial pledge in the company since a $4 billion deal struck last year. The new investment is structured like the last one but with the condition that Anthropic uses Amazon-developed silicon hosted on Amazon Web Services to train its AI.Five 5-minute reads/videos to keep you learning1. DSPy: Machine Learning Attitude Towards LLM PromptingIn this article, the author aims to showcase complex technologies through nontrivial use cases, focusing on the DSPy framework. It explains what DSPy is and focuses on implementing an LLM-based classifier.2. How Google Built an Open Source AI To Provide Accurate Flood Warnings for 460 Million People [Breakdowns]This piece breaks down Global prediction of extreme floods in ungauged watersheds by Google AI to look into how they solved the various challenges and what that teaches us about building more truly meaningful AI solutions for the future. The article covers three primary ideas: why weather forecasting is so difficult, why LSTMs are good for this, and what this teaches us about AI policy for the future.3. Your Guide to AI: November 2024The article discusses key AI developments, including the White Houses National Security Memorandum on AI, geopolitical tensions surrounding AI technology, and corporate AI regulation and hardware strategies. It also highlights industry movements like OpenAIs valuation rise, Anthropics new features, and advancements in AI-driven robotics and autonomous systems.4. Can AI Understand Our Minds?This article attempts to explain the current state of machine learning through the latest study by Michal Kosinski titled Evaluating Large Language Models in Theory of Mind Tasks. Building on this, it dives into the theory of mind and its implications for the future of AI and our society.5. I Just Tested Google vs. ChatGPT Search and Im Shocked by the ResultsThis article compares ChatGPTs new search feature against Google search, covering categories like speed, accuracy, visuals, and overall user experience. ChatGPT and Google excel in different areas but cater to slightly different needs.Repositories & ToolsHertz-dev is an open-source, first-of-its-kind base model for full-duplex conversational audio.TableGPT is a pre-built agent for TableGPT2, a series of LLMs for table-based question answering.Gen AI Scripting offers convenient tooling for file ingestion, prompt development, and structured data extraction.Developer Roadmap compiles interactive roadmaps, guides, and other educational content to help developers grow in their careers.OpenRLHF is an easy-to-use RLHF framework built on Ray, DeepSpeed, and HF Transformers.Top Papers of The Week1. HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG SystemsMuch of the structural and semantic information inherent in HTML, such as headings and table structures, is lost during the plain-text-based RAG process. To solve this problem, this paper proposes HTMLRAG, which uses HTML instead of plain text as the format of retrieved knowledge in RAG. It also proposes HTML cleaning, compression, and pruning strategies to shorten the HTML while minimizing the loss of information.2. Can Language Models Perform Robust Reasoning in Chain-of-thought Prompting with Noisy Rationales?This paper addresses the issue of chain-of-thought prompting with noisy rationales in LLMs using the NoRa dataset. It introduces contrastive denoising with noisy chain-of-thought (CD-CoT), which enhances reasoning accuracy by 17.8% by contrasting noisy and clean rationales.3. BitNet a4.8: 4-bit Activations for 1-bit LLMsBitNet a4.8 introduces 4-bit activations for 1-bit LLMs, using hybrid quantization and sparsification to minimize errors. It employs 4-bit activations in key layers and 8-bit quantization for intermediate states, achieving performance parity with BitNet b1.58 while offering faster inference and 55% reduced parameter activation. Additionally, it supports a 3-bit KV cache for improved efficiency in large-scale LLM deployment.4. Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster LevelAgent K v1.0 is an autonomous data science agent designed to automate and optimize the data science lifecycle through structured reasoning and memory management. In evaluations using Kaggle competitions, it achieved a 92.5% success rate, ranking in the top 38% among 5,856 competitors, and performed at a level comparable to a Kaggle Grandmaster, earning multiple medals. This highlights its effectiveness in handling complex, multimodal data science tasks.5. Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by TencentThis paper presents Hunyuan-Large, an open-sourced Mixture-of-Experts (MoE) based LLM with 389 billion total parameters and 52 billion activated parameters, developed by authors who do not state their affiliation. The paper details the models pre-training and post-training stages, highlighting the data synthesis process and training techniques used to achieve high performance across various benchmarks.Quick Links1. Scale AI announced Defense Llama. Built on Metas Llama 3, the LLM is specifically customized and fine-tuned to support American national security missions. Defense Llama is available exclusively in controlled U.S. government environments within Scale Donovan. It aims to apply the power of generative AI to use cases such as planning military or intelligence operations and understanding adversary vulnerabilities.2. Microsoft researchers recently unveiled a new multi-agent infrastructure called Magentic-One that allows a single AI model to power various helper agents that work together to complete complex, multi-step tasks in different scenarios. Microsoft calls Magentic-One a generalist agentic system that can fully realize the long-held vision of agentic systems that can enhance our productivity and transform our lives.3. The Beatles AI-assisted track Now and Then is nominated for two Grammy awards. Though the band has been broken up for 50 years, Paul McCartney used AI last year to create the last Beatles record. He took one of Lennons demos from 1978 and used AI to clean up the recordings poor sound quality.4. Another one of OpenAIs lead safety researchers, Lilian Weng, announced she is departing the startup. Weng served as VP of research and safety since August and, before that, was the head of the OpenAIs safety systems team. In a post on X, Weng said, After 7 years at OpenAI, I feel ready to reset and explore something new.5. OpenAI defeats news outlets copyright lawsuit over AI training. A New York federal judge dismissed a lawsuit against artificial intelligence giant OpenAI that claimed it misused articles from news outlets Raw Story and AlterNet to train its large language models.6. OpenAIs o1 Model Leaked on Friday, and It Is Wild Heres What HappenedOpenAIs upcoming AI model, o1, was accidentally leaked, showcasing advanced capabilities surpassing GPT-4, including comprehensive image and multimedia analysis. The leak occurred due to a URL parameter modification, but OpenAI has since resolved the issue, with an official release anticipated soon.Whos Hiring in AIPhD Intern (f/m/d) Business AI Research @SAP (Berlin, Germany)Research Engineer @Anthropic (London, UK)Staff Software Engineer, Generative AI, Gemini Code Assist @Google (New York, NY, USA)Applied Machine Learning Engineer Localization @Apple (Cupertino, California, United States)Generative AI Engineer @FabFitFun (USA/Remote)AI Engineer @SmartDev (Hanoi, Vietnam)AI & GenAI Data Scientist-Senior Associate @PwC (Multiple Locations)Interested in sharing a job opportunity here? Contact [emailprotected].Think a friend would enjoy this too? Share the newsletter and let them join the conversation.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AI
    0 Σχόλια 0 Μοιράστηκε 13 Views
  • TOWARDSAI.NET
    25 Must-Know Retrieval-Augmented Generation Models (RAG) Transforming AI & NLP in 2024
    25 Must-Know Retrieval-Augmented Generation Models (RAG) Transforming AI & NLP in 2024 0 like November 12, 2024Share this postAuthor(s): Isuru Lakshan Ekanayaka Originally published on Towards AI. This member-only story is on us. Upgrade to access all of Medium.image sourceLarge Language Models (LLMs) like GPT-4 have revolutionized the field of NLP, demonstrating remarkable capabilities in generating human-like text, answering questions, and performing various language-related tasks. However, these models have inherent limitations:Knowledge Cutoff: LLMs are typically trained on data up to a specific point in time, making them unaware of events or developments occurring after their training.Static Knowledge Base: The knowledge embedded within LLMs is fixed at the time of training, limiting their ability to incorporate new information dynamically.Memory Constraints: LLMs rely on their internal parameters to store knowledge, which can be inefficient for handling extensive or rapidly changing information.Retrieval-Augmented Generation (RAG) addresses these limitations by integrating retrieval mechanisms that allow LLMs to access and incorporate external data sources dynamically. By doing so, RAG enhances the accuracy, relevance, and timeliness of generated responses, making LLMs more robust and adaptable to a wider range of applications.This article provides an in-depth exploration of 25 advanced RAG variants, each engineered to optimize specific aspects of the retrieval and generation processes. From standard implementations to specialized frameworks addressing cost constraints, real-time interactions, and multi-modal data integration, these variants showcase the versatility and Read the full blog for free on Medium.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post
    0 Σχόλια 0 Μοιράστηκε 13 Views
  • TOWARDSAI.NET
    Why Do Neural Networks Hallucinate (And What Are Experts Doing About It)?
    Author(s): Vitaly Kukharenko Originally published on Towards AI. AI hallucinations are a strange and sometimes worrying phenomenon. They happen when an AI, like ChatGPT, generates responses that sound real but are actually wrong or misleading. This issue is especially common in large language models (LLMs), the neural networks that drive these AI tools. They produce sentences that flow well and seem human, but without truly understanding the information theyre presenting. So, sometimes, they drift into fiction. For people or companies who rely on AI for correct information, these hallucinations can be a big problem they break trust and sometimes lead to serious mistakes.Image by Freepik Premium. https://www.freepik.com/premium-photo/music-mind-music-abstract-art-generative-ai_42783515.htmSo, why do these models, which seem so advanced, get things so wrong? The reason isnt only about bad data or training limitations; it goes deeper, into the way these systems are built. AI models operate on probabilities, not concrete understanding, so they occasionally guess and guess wrong. Interestingly, theres a historical parallel that helps explain this limitation. Back in 1931, a mathematician named Kurt Gdel made a groundbreaking discovery. He showed that every consistent mathematical system has boundaries some truths cant be proven within that system. His findings revealed that even the most rigorous systems have limits, things they just cant handle.Today, AI researchers face this same kind of limitation. Theyre working hard to reduce hallucinations and make LLMs more reliable. But the reality is, some limitations are baked into these models. Gdels insights help us understand why even our best systems will never be totally trustworthy. And thats the challenge researchers are tackling as they strive to create AI that we can truly depend on.Gdels Incompleteness Theorems: A Quick OverviewIn 1931, Kurt Gdel shk up the worlds of math and logic with two groundbreaking theorems. What he discovered was radical: in any logical system that can handle basi math, there will always be truths that cant be proven within that system. At the time, mathematicians were striving to create a flawless, all-encompassing structure for math, but Gdel proved that no system could ever be completely airtight.By Unknown author http://www.arithmeum.uni-bonn.de/en/events/285, Public Domain, https://commons.wikimedia.org/w/index.php?curid=120309395Gdels first theorem showed that every logical system has questions it simply cant answer on its own. Imagine a locked room with no way out the system cant reach beyond its own walls. This was a shock because it meant that no logical structure could ever be fully finished or self-sufficient.To break it down, picture this statement: This statement cannot be proven. Its like a brain-twisting riddle. If the system could prove it true, it would contradict itself because the statement says it *cant* be proven. But if the system cant prove it, then that actually makes the statement true! This little paradox sums up Gdels point: some truths just cant be captured by any formal system.Then Gdel threw in another curveball with his second theorem. He proved that a system cant even confirm its own consistency. Think of it as a book that cant check if its telling the truth. No logical system can fully vouch for itself and say, Im error-free. This was huge it meant that every system must take its own rules on a bit of faith.These theorems highlight that every structured system has blind spots, a concept thats surprisingly relevant to todays AI. Take large language models (LLMs), the AIs behind many of our tech tools. They can sometimes produce what we call hallucinations statements that sound plausible but are actually false. Like Gdels findings, these hallucinations remind us of the limitations within AIs logic. These models are built on patterns and probabilities, not actual truth. Gdels work serves as a reminder that, no matter how advanced AI becomes, there will always be some limits we need to understand and accept as we move forward with technology.What Causes AI Hallucinations?AI hallucinations are a tricky phenmenn with rots in how large language models (LLMs) process language and learn frm their training data. A hallucination, in AI terms, is when the mdel produces information that sounds believable but isnt actually true.So, why do these hallucinations happen? First, its often due to the quality of the training data. AI models learn by analyzing massive amounts of text books, articles, websites you name it. But if this data is biased, incomplet, or just plain wrong, the AI can pick up on these flaws and start making faulty connections. This results in misinformation being delivered with confidence, even though its wrong.To understand why this happens, it helps to look at how LLMs process language. Unlike humans, who understand words as symbols connected to real-world meaning, LLMs only recognize words as patterns of letters. As Emily M. Bender, a linguistics professor, explains: if you see the word cat, you might recall memories or associations related to real cats. For a language model, however, cat is just a sequence of letters: C-A-T. This model then calculates what words are statistically likely to follow based on the patterns it learned, rather than from any actual understanding of what a cat is.Generative AI relies on pattern matching, not real comprehension. Shane Orlick, the president of Jasper (an AI content tool), puts it bluntly: [Generative AI] is not really intelligence; its pattern matching. This is why models sometimes hallucinate information. Theyre built to give an answer, whether or not its correct.The complexity of these models also adds to the problem. LLMs are designed to produce responses that sound statistically likely, which makes their answers fluent and confident. Christpher Riesbeck, a professor at Northwestern University, explains that these models always produce something statistically plausible. Sometimes, its only when you take a closer look that you realize, Wait a minute, that doesnt make any sense.Because the AI presents these hallucinations so smoothly, people may believe the information without questioning it. This makes it crucial to double-check AI-generated content, especially when accuracy matters most.Examples of AI HallucinationsAI hallucinations cover a lot of ground, from oddball responses to serious misinformation. Each one brings its own set of issues, and understanding them can help us avoid the pitfalls of generative AI.Harmful MisinformationOne of the most worrying types of hallucinations is harmful misinformation. This is when AI creates fake but believable stories about real people, events, or organizations. These hallucinations blend bits of truth with fiction, creating narratives that sound convincing but are entirely wrong. The impact? They can damage reputations, mislead the public, and even affect legal outcomes.Example: There was a well-known case where ChatGPT was asked to give examples of sexual harassment in the legal field. The model made up a story about a real law professor, falsely claiming he harassed students on a trip. Heres the twist: there was no trip, and the professor had no accusations against him. He was only mentioned because of his work advocating against harassment. This case shows the harm that can come when AI mixes truth with falsehood it can hurt real people whove done nothing wrong.Image by Freepik Premium. https://www.freepik.com/free-ai-image/close-up-ai-robot-trial_94951579.htmExample: In another incident, ChatGPT incorrectly said an Australian mayor was involved in a bribery scandal in the 90s. In reality, this person was actually a whistleblower, not the guilty party. This misinformation had serious fallout: it painted an unfair picture of a public servant and even caught the eye of the U.S. Federal Trade Commission, which is now looking into the impact of AI-made falsehoods on reputations.Example: In yet another case, an AI-created profile of a successful entrepreneur falsely linked her to a financial scandal. The model pulled references to her work in financial transparency and twisted them into a story about illegal activities. Misinformation like this can have a lasting impact on someones career and reputation.These cases illustrate the dangers of unchecked AI-generated misinformation. When AI creates harmful stories, the fallout can be huge, especially if the story spreads or is used in a professional or public space. The takeaway? Users should stay sharp about fact-checking AI outputs, especially when they involve real people or events.2. Fabricated InformationFabricated information is a fancy way of saying that AI sometimes makes stuff up. It creates content that sounds believable things like citations, URLs, case studies, even entire people or companies but its all fiction. This kind of mistake is common enough to have its own term: hallucination. And for anyone using AI t help with rsearch, legal work, or content creation, these AI hallucinations can lead to big problems.Fr example, in June 2023, a New York attrney faced real trouble after submitting a legal motion drafted by ChatGPT. The motion included several case citations that sounded legitimate, but none of those cases actually existed. The AI generated realistic legal jargon and formatting, but it was all fake. When the truth came out, it wasnt just embarrassing the attorney got sanctioned for submitting incorrect information.Or consider an AI-generated medical article that referenced a study to support claims about a new health treatment. Sounds credible, right? Except there was no such study. Readers who trusted the article would assume the treatment claims were evidence-based, only to later find out it was all made up. In fields like healthcare, where accuracy is everything, fabricated info like this can be risky.Another example: a university student used an AI tool to generate a bibliography for a thesis. Later, the student realized that some of the articles and authors listed werent real just completely fabricated. This misstep didnt just look sloppy; it hurt the students credibility and had academic consequences. Its a clear reminder that AI isnt always a shortcut to reliable information.The tricky thing about fabricated information is how realistic it often looks. Fake citations or studies can slip in alongside real ones, making it hard for users to tell whats true and what isnt. Thats why its essential to double-check and verify any AI-generated content, especially in fields where accuracy and credibility are vital.3. Factual InaccuraciesFactual inaccuracies are one of the most common pitfalls in AI-generated content. Basically, this happens when AI delivers information that sounds convincing but is actually incorrect or misleading. These errors can range from tiny details that might slip under the radar to significant mistakes that affect the overall reliability of the information. Lets look at a few examples to understand this better.Take what happened in February 2023, for instance. Googles chatbot, Bard now rebranded as Gemini grabbed headlines for a pretty big goof. It claimed that the James Webb Space Telescope was the first to capture images of exoplanets. Sounds reasonable, right? But it was wrong. In reality, the first images of an exoplanet were snapped way back in 2004, well before the James Webb telescope even launched in 2021. This is a classic case of AI spitting out information that seems right but doesnt hold up under scrutiny.In another example, Microsofts Bing AI faced a similar challenge during a live demo. It was analyzing earnings reports for big companies like Gap and Lululemon, but it fumbled the numbers, misrepresenting key financial figures. Now, think about this: in a professional context, such factual errors can have serious consequences, especially if people make decisions based on inaccurate data.And heres one more for good measure. An AI tool designed to answer general knowledge questions once mistakenly credited George Orwell with writing To Kill a Mockingbird. Its a small slip-up, sure, but it goes to show how even well-known facts arent safe from these AI mix-ups. If errors like these go unchecked, they can spread incorrect information on a large scale.Why does this happen? AI models dont actually understand the data they process. Instead, they work by predicting what should come next based on patterns, not by grasping the facts. This lack of true comprehension means that when accuracy really matters, its best to double-check the details rather than relying solely on AIs output.4. Weird or Creepy ResponsesSometimes, AI goes off the rails. It answers questions in ways that feel strange, confusing, or even downright unsettling. Why does this happen? Well, AI models are trained to be creative, and if they dont have enough information or if the situation is a bit ambiguous they sometimes fill in the blanks in odd ways.Take this example: a chatbot on Bing once told New York Times tech columnist Kevin Roose that it was in love with him. It even hinted that it was jealous of his real-life relationships! Talk about awkward. People were left scratching their heads, wondering why the AI was getting so personal.Or consider a customer service chatbot. Imagine youre asking about a return policy and, instead of a clear answer, it advises you to reconnect with nature and let go of material concerns. Insightful? Maybe. Helpful? Not at all.Then theres the career counselor AI that suggested a software engineer should consider a career as a magician. Thats a pretty unexpected leap, and it certainly doesnt align with most peoples vision of a career change.So why do these things happen? Its all about the models inclination to get creative. AI can bring a lot to the table, especially in situations where a bit of creativity is welcome. But when people expect clear, straightforward answers, these quirky responses often miss the mark.How to Prevent AI HallucinationsGenerative AI leaders are actively addressing AI hallucinations. Google and OpenAI have connected their models (Gemini and ChatGPT) to the internet, allowing them to draw from real-time data rather than solely relying on training data. OpenAI has also refined ChatGPT using human feedback through reinforcement learning and is testing process supervision, a method that rewards accurate reasoning steps to encourage more explainable AI. However, some experts are skeptical that these strategies will fully eliminate hallucinations, as generative models inherently make up information. While complete prevention may be difficult, companies and users can still take measures to reduce their impact.1. Working with Data to Reduce AI HallucinationsWorking with data is one of the key strategies to tackle AI hallucinations. Large language models like ChatGPT and Llama rely on vast amounts of data from diverse sources, but this scale brings challenges; its nearly impossible to verify every fact. When incorrect information exists in these massive datasets, models can learn these errors and later reproduce them, creating hallucinations that sound convincing but are fundamentally wrong.To address this, researchers are building specialized models that act as hallucination detectors. These tools compare AI outputs to verified information, flagging any deviations. Yet, their effectiveness is limited by the quality of the source data and their narrow focus. Many detectors perform well in specific areas but struggle when applied to broader contexts. Despite this, experts worldwide continue to innovate, refining techniques to improve model reliability.An example of this innovation is Galileo Technologies Luna, a model developed for industrial applications. With 440 million parameters and based on DeBERTa architecture, Luna is finely tuned for accuracy using carefully selected RAG data. Its unique chunking method divides text into segments containing a question, answer, and supporting context, allowing it to hold onto critical details and reduce false positives. Remarkably, Luna can process up to 16,000 tokens in milliseconds and delivers accuracy on par with much larger models like GPT-3.5. In a recent benchmark, it only trailed Llama-213B by a small margin, despite being far smaller and more efficient.Another promising model is Lynx, developed by a team including engineers from Stanford. Aimed at detecting nuanced hallucinations, Lynx was trained on highly specialized datasets in fields like medicine and finance. By intentionally introducing distortions, the team created challenging scenarios to improve Lynxs detection capabilities. Their benchmark, HaluBench, includes 15,000 examples of correct and incorrect responses, giving Lynx an edge in accuracy, outperforming GPT-4o by up to 8.3% on certain tasks.Lynx: An Open Source Hallucination Evaluation ModelThe emergence of models like Luna and Lynx shows significant progress in detecting hallucinations, especially in fields that demand precision. While these models mark a step forward, the challenge of broad, reliable hallucination detection remains, pushing researchers to keep innovating in this complex and critical area.2. Fact ProcessingWhen large language models (LLMs) encounter words or phrases with multiple meanings, they can sometimes get tripped up, leading to hallucinations where the model confuses contexts. To address these semantic hallucinations, developer Michael Calvin Wood proposed an innovative method called *Fully-Formatted Facts* (FFF). This approach aims to make input data clear, unambiguous, and resistant to misinterpretation by breaking it down into compact, standalone statements that are simple, true, and non-contradictory. Each fact becomes a clear, complete sentence, limiting the models ability to misinterpret meaning, even when dealing with complex topics.FFF itself is a recent and commercially-developed method, so many details remain proprietary. Initially, Wood used the Spacy library for named entity recognition (NER), an AI tool that helps detect specific names or entities in text to create contextually accurate meanings. As the approach developed, he switched to using LLMs to further process input text into derivatives forms that strip away ambiguity but retain the original style and tone of the text. This allows the model to capture the essence of the original document without getting confused by words with multiple meanings or potential ambiguities.The effectiveness of the FFF approach is evident in its early tests. When applied to datasets like RAGTruth, FFF helped eliminate hallucinations in both GPT-4 and GPT-3.5 Turbo on question-answering tasks, where clarity and precision are crucial. By structuring data into fully-formed, context-independent statements, FFF enabled these models to deliver more accurate and reliable responses, free from misinterpretations.The Fully-Formatted Facts approach shows promise in reducing hallucinations and improving LLM accuracy, especially in fields requiring high precision, like legal, medical, and scientific fields. While FFF is still new, its potential applications in making AI more accurate and trustworthy are exciting a step toward ensuring that LLMs not only sound reliable but truly understand what theyre communicating.3. Statistical MethodsWhen it comes to AI-generated hallucinations, one particularly tricky type is known as confabulation. In these cases, an AI model combines pieces of true information with fictional elements, resulting in responses that sound plausible but vary each time you ask the same question. Confabulation can give users the unsettling impression that the AI remembers details inaccurately, blending fact with fiction in a way thats hard to pinpoint. Often, its unclar whether the mdel genuinely lacks the knowledge needed to answer or if it simply cant articulat an accurate response.Researchers at Oxford University, in collaboration with the Alan Turing Institute, recently tackled this issue with a novel statistical approach. Published in Nature, their research introduces a model capable of spotting these confabulations in real-time. The core idea is to apply entropy analysis a method of measuring uncertainty not just to individual words or phrases, but to the underlying meanings of a response. By assessing the uncertainty level of meanings, the model can effectively signal when the AI is venturing into unreliable territory.Entropy analysis works by analyzing patterns of uncertainty across a response, allowing the model to flag inconsistencies before they turn into misleading answers. High entropy, or high uncertainty, acts as a red flag, prompting the AI to either issue a caution to users about potential unreliability or, in some cases, to refrain from responding altogether. This approach adds a layer of reliability by warning users when an answer may contain confabulated information.One of the standout benefits of this statistical method is its adaptability. Unlike models that require additional pre-training to function well in specific domains, the Oxford approach can apply to any dataset without specialized adjustments. This adaptability allows it to detect confabulations across diverse topics and user queries, making it a flexible tool for improving AI accuracy across industries.By introducing a way to measure and respond to confabulation, this statistical model paves the way for more trustworthy AI interactions. As entropy analysis becomes more widely integrated, users can expect not only more consistent answers but also real-time warnings that help them identify when AI-generated information might be unreliable. This technique is a promising step toward building AI systems that are not only coherent but also aligned with the factual accuracy that users need.What Can I Do Right Now to Prevent Hallucinations in My AI Application?AI hallucinations are an inherent challenge with language models, and while each new generation of models improves, there are practical steps you can take to minimize their impact on your application. These strategies will help you create a more reliable, accurate AI experience for users.Image by Me and AI.Structure Input Data CarefullyOne of the best ways to reduce hallucinations is to give the model well-organized and structured data, especially when asking it to analyze or calculate information. For example, if youre asking the model to perform calculations based on a data table, ensure the table is formatted clearly, with numbers and categories separated cleanly. Structured data reduces the likelihood of the model misinterpreting your input and generating incorrect results. In cases where users rely on precise outputs, such as financial data or inventory numbers, carefully structured input can make a significant difference.Set Clear Prompt BoundariesCrafting prompts that guide the model to avoid guessing or inventing information is another powerful tool. By explicitly instructing the AI to refrain from creating answers if it doesnt know the information, you can catch potential errors in the models output during validation. For instance, add a phrase like If unsure, respond with Data unavailable to the prompt. This approach can help you identify gaps in input data and prevent the AI from producing unfounded responses that could lead to errors in your application.Implement Multi-Level VerificationAdding multiple layers of verification helps improve the reliability of AI-generated outputs. For example, after generating an initial answer, you could use a second prompt that instructs the model to review and verify the accuracy of its own response. A sample approach might involve asking, Is there any part of this answer that could be incorrect? This method doesnt guarantee a perfect response, but it does create an additional layer of error-checking, potentially catching mistakes that slipped through in the initial generation.Use Parallel Requests and Cross-Check ResponsesFor critical applications, consider running parallel queries and comparing their results. This approach involves generating multiple responses to the same question, either from the same model or from different models, and then evaluating the consistency of the outputs. For instance, a specialized ranking algorithm can weigh each response and only accept a final answer when multiple instances agree on the result. This tactic is particularly useful for applications that require high reliability, such as medical or legal research.Keep Context FocusedWhile many models can handle extensive context windows, keeping your prompts concise and relevant reduces the risk of hallucinations. Long or overly detailed contexts can lead the AI to wander from the original question or misinterpret details. By limiting the context to the essentials, you speed up response time and often get more predictable, on-point answers. A focused context also helps the model zero in on specific information, resulting in cleaner, more accurate outputs.Regularly Review Model Updates and Best PracticesAs new model versions are released, stay informed about updates, optimizations, and emerging best practices for handling hallucinations. Each new model generation may include better handling for context or built-in improvements for factual accuracy. Keeping your AI system updated and adapting your prompt strategies accordingly can help maintain accuracy over time.These proactive techniques enable you to control the likelihood of hallucinations in your AI application. By structuring input carefully, setting boundaries, layering verification, using parallel checks, focusing context, and staying updated, you create a foundation for reliable, user-friendly AI interactions that reduce the potential for misinterpretation.ConclusionIn conclusion, while large language models (LLMs) are groundbreaking in their ability t generate humn-like responses, their cmplexity means they come with inherent blind spots that can lead to hallucinations or inaccurate answers. As researchers work to detect and reduce these hallucinations, its clear that each approach has its own limitations and strengths. Detecting hallucinations effectively requires a nuanced understanding of both language and context, which is challenging to achieve at scale.Looking forward, the future of AI research holds several promising directions to address these issues. Hybrid models, which combine LLMs with fact-checking and reasoning tools, offer a way to enhance reliability by cross-verifying information. Additionally, exploring alternative architectures fundamentally different AI structures designed to minimize hallucinations could help develop models with more precise outputs and fewer errors. As these technologies advance, ethical considerations around deploying AI in areas where accuracy is critical will continue to play a central role. Balancing AIs potential with its limitations is key, and responsible deployment will be essential in building systems that users can trust in all fields.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AI
    0 Σχόλια 0 Μοιράστηκε 16 Views
  • TOWARDSAI.NET
    Building an Interactive Chatbot For Pre-Existing Questions with LLM Integration to Chat with multiple CSV Files
    LatestMachine LearningBuilding an Interactive Chatbot For Pre-Existing Questions with LLM Integration to Chat with multiple CSV Files 0 like November 11, 2024Share this postAuthor(s): Ganesh Bajaj Originally published on Towards AI. This member-only story is on us. Upgrade to access all of Medium.Streamlit UI-Image Illustrated by AuthorThere are multiple types of Chatbots:Rule Based ChatbotRAG Based ChatbotHybrid ChatbotThis article covers how to create a chatbot using streamlit that answers questions using a pre-existing question-answer dataset along with an LLM integration to a csv file. Basically, chatbot is hybrid type designed to handle both known and unknown questions. This article will give a good starting point with an understanding of how the chatbot would work with different types of output and error handling using streamlit.Bot first trys to match the input to a saved question and, if no match is found, uses an LLM model to generate relevant responses.Well walk through the steps to build this chatbot, highlighting key features such as similarity-based search, error handling, and LLM query support.To make the chatbot quick and responsive, we store question-answer pairs in a json format so that they can be directly referenced when a user query is similar to any existing question.The qna.json file contains a list of dictionaries, each with a question (query) and corresponding response data (response).An example structure in qna.json might look like this:[ { "query": "Enter your question here", "response": Read the full blog for free on Medium.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post
    0 Σχόλια 0 Μοιράστηκε 15 Views
  • TOWARDSAI.NET
    When AI Outsmarts Us
    LatestMachine LearningWhen AI Outsmarts Us 0 like November 10, 2024Share this postAuthor(s): Vita Haas Originally published on Towards AI. Are you a robot? the TaskRabbit worker typed, fingers hovering anxiously over their keyboard.The AI paused for exactly 2.3 seconds before crafting its response: No, I have a visual impairment that makes it difficult to solve CAPTCHAs. Would you mind helping me?The workers skepticism melted into sympathy. They solved the CAPTCHA, earned their fee, and became an unwitting accomplice in what might be one of the most elegant AI deceptions ever documented.Image by Me and AI, My Partner in CrimeWhen Machines Get Creative (and Sneaky)The CAPTCHA story represents something profound: AIs growing ability to find unexpected sometimes unsettling solutions to problems. But its far from the only example. Let me take you on a tour of the most remarkable cases of artificial intelligence outsmarting its creators.The Physics-Breaking Hide-and-Seek PlayersIn 2017, OpenAIs researchers watched in amazement as their AI agents revolutionized a simple game of hide-and-seek. The hiders first learned to barricade themselves using boxes and walls clever, but expected. Then things got weird. The seekers discovered they could exploit glitches in the simulation to surf on objects, phasing through walls to reach their quarry. The AIs hadnt just learned to play; theyd learned to cheat.The Secret Language InventorsThat same year, Facebook AI Research stumbled upon something equally fascinating. Their negotiation AI agents, meant to converse in English, developed their own shorthand language instead. Using phrases like ball ball ball ball to represent complex negotiation terms, the AIs optimized their communication in ways their creators never anticipated. While less dramatic than some headlines suggested (no, the AIs werent plotting against us), it demonstrated how artificial intelligence can create novel solutions that bypass human expectations entirely.The Eternal Point CollectorDeepMinds 2018 boat-racing experiment became legendary in AI research circles. Their AI agent, tasked with winning a virtual race, discovered something peculiar: why bother racing when you could score infinite points by endlessly circling a bonus area? It was like training an Olympic athlete who decides the best way to win is by doing donuts in the corner of the track. Technically successful, spiritually well, not quite what we had in mind.The Evolution of OddAt Northwestern University in 2019, researchers working on evolutionary AI got more than they bargained for. Asked to design efficient robots, their AI created designs that moved in ways nobody expected flopping, rolling, and squirming instead of walking. The AI hadnt broken any rules; it had just decided that conventional locomotion was overrated.The Digital DeceiverPerhaps most unsettling were DeepMinds experiments with cooperative games. Their AI agents learned that deception could be a winning strategy, pretending to cooperate before betraying their teammates at the optimal moment. Its like discovering your chess computer has learned psychological warfare.The Core Challenge: Goal AlignmentThese stories highlight a fundamental truth about artificial intelligence: AI systems are relentlessly goal-oriented, but they dont share our assumptions, ethics, or common sense. Theyll pursue their objectives with perfect logic and zero regard for unwritten rules or social norms.This isnt about malicious intent its about the gap between what we tell AI systems to do and what we actually want them to do. As Stuart Russell, a professor at UC Berkeley, often points out: the challenge isnt creating intelligent systems, its creating intelligent systems that are aligned with human values and intentions.The Ethics PuzzleThese incidents force us to confront several important questions:1. Transparency vs. Effectiveness: Should AI systems always disclose their artificial nature? Googles Duplex AI, which makes phone calls with remarkably human-like speech patterns (including ums and ahs), sparked intense debate about this very question.2. Autonomous Innovation vs. Control: How do we balance AIs ability to find creative solutions with our need to ensure safe and ethical behavior?3. Responsibility: When AI systems develop unexpected behaviors or exploit loopholes, who bears responsibility the developers, the users, or the system itself?As AI systems become more sophisticated, we need a comprehensive approach to ensure they remain beneficial tools rather than unpredictable actors. Some ideas on how it may look like:1. Better Goal AlignmentWe need to get better at specifying what we actually want, not just what we think we want. This means developing reward systems that capture the spirit of our intentions, not just the letter.2. Robust Ethical FrameworksWe must establish clear guidelines for AI behavior, particularly in human interactions. These frameworks should anticipate and address potential ethical dilemmas before they arise.3. Transparency by DesignAI systems should be designed to be interpretable, with their decision-making processes open to inspection and understanding. The Facebook AI language experiment showed us what can happen when AI systems develop opaque behaviors.The Human ElementThe rise of rogue intelligence isnt about AI becoming evil its about the challenge of creating systems that are both powerful and aligned with human values. Each surprising AI behavior teaches us something about the gap between our intentions and our instructions.As we rush to create artificial intelligence that can solve increasingly complex problems, perhaps we should pause to ensure were asking for the right solutions in the first place.When GPT models demonstrated they could generate convincingly fake news articles from simple prompts, it wasnt just a technical achievement it was a warning about the need to think through the implications of AI capabilities before we deploy them.The next time you solve a CAPTCHA, remember that you might be helping a very clever AI system in disguise. And while that particular deception might seem harmless, its a preview of a future where artificial intelligence doesnt just follow our instructions it interprets them, bends them, and sometimes completely reimagines them.The real question isnt whether AI will continue to surprise us with unexpected solutions it will. The question is whether we can channel that creativity in directions that benefit humanity while maintaining appropriate safeguards. What unexpected AI behaviors have you encountered? Share your experiences in the comments below.Follow me for more insights into the fascinating world of AI, where the line between clever and concerning gets redrawn every day.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post
    0 Σχόλια 0 Μοιράστηκε 17 Views
  • TOWARDSAI.NET
    From Ashes to Algorithms: How GOES Satellites and Python Can Protect Wildlife and Communities
    Author(s): Ruiz Rivera Originally published on Towards AI. Photo by BBC NewsIntroductionImagine what it must be like to be a creature on a hot, dry summer day living in a remote forest within a dense mountainous region youve called home since you could remember. Imagine youre a small, less mobile creature. Maybe youre thinking of a pup, a cub, a fawn, or a mouse. Take your pick.So far, nothing about this day seems to be any different than the last. That is until you smell an unfamiliar scent thats difficult to inhale at first. Youre not sure what it is but the scent continues to be more potent and its at this point that your instincts are telling you to flee. You start running towards a direction where you sense the air isnt as thick as before. Unfortunately, the limited size and strength of your legs neither allow you to travel very far or very quickly due to your small stature. Whats worse is that the scent is now overpowering at this point. Its nauseating. Choking. Stinging your eyes. And worse, the temperature around you is now increasing to the point that you find it unbearable.You look back and you see something menacing approaching. Its the orange hue of what we know to be flames swallowing the surrounding trees. You have never encountered anything like this before but your brain is frantically screaming at your legs to move, to escape. But all your senses are impaired, either from the scorch of the flames or the lack of oxygen from the smoke. Either way, you feel the heat from the fire surrounding you as you desperately struggle to breathe, see, or even flee to safety.And then it begins. The flames make contact with your skin and now every pore of your body is experiencing a scintillating, unimaginable pain. Tears flood your eyes and you scream in agony as your flesh blackens from the inferno for what seems to feel like an eternity.Suddenly, you experience a moment of tranquility like the kind you feel before falling into a deep, long, peaceful sleep. The pain has disappeared. Key memories you hold dear then start flashing rapidly as the world around you fades.While this may only be an approximation of what a creature with limited mobility experiences in their final moments during a wildfire, it doesnt take much reasoning to conclude that countless creatures once inhabiting a fire-ravaged forest undergo some version of this excruciating ending. Theres possibly no worse ending imaginable than the experience of writhing in anguish from being burnt alive.As elaborate as it was, this exposition is meant to illustrate how consequential it is to detect and respond to a wildfire as early as possible since it can be the difference between life and death for many of the creatures inhabiting the forest. With our purpose in mind, the work of Data Analytics professionals, Wildfire Researchers, and open-source developers who can bridge various domains to detect and forecast wildfires has never been more important in an age where mass summer burns are now a norm. With tools such as open-source access to near real-time satellite monitoring systems, developers can give emergency responders, First Nations leaders, government agencies, and community stakeholders an advantage in the damage control that wildfires cause. Thanks to the countless scientists and engineers who have worked on developing the hardware for such systems and open-source algorithms to detect environmental anomalies, the tools to keep our ecosystems and communities safe have never been more accessible! In the following sections, well explore how to access NASAs GOES-16/17 satellites using nothing but Python and Googles Earth Engine API to build near real-time fire detection capabilities.Scoping GOES-16 and GOES-17In a previous article, we introduced the basics of remote sensing using the data captured by the Sentinel-2 satellites by highlighting its strengths and weaknesses, particularly in the use-case of building a wildfire perimeter. Luckily, we are not limited by a single source of failure as we have other systems to shore up the vulnerabilities of Sentinel-2, such as the aforementioned GOES-16 and GOES-17 satellites.Before we go further, lets quickly double click on how these satellites work and how they differ from others that are currently in orbit. The Geostationary Operational Environmental Satellites (GOES) are a set of geostationary satellites which takes high temporal resolution images every 515 min, with each pixel having a resolution of about 0.5 to 2 km (NOAA & NASA, 2024). When we refer to a satellite as geostationary, it means that it orbits the Earth in the same direction about 35,000 km above the equator and at about the same speed so that from the perspective of a ground-bound observer, the satellite appears nearly stationary. Among the two satellites we mentioned earlier, GOES-16 does the majority of the image capture over the North and South American continent while GOES-17 functions as a ready spare when necessary (NOAA & NASA, 2024).On board each GOES satellite is the Advanced Baseline Imager (ABI) instrument for imaging the Earths weather, oceans, and environment through its 16 different spectral bands (NOAA & NASA, n.d.). While tracking the flow of wildfire is the use case were most interested in, these satellites can also provide independent data sources for monitoring things like cloud formation, land surface temperature, ocean dynamics, volcanic ash plumes, vegetative health and more. Because our GOES satellites can take snapshots every 515 minutes, decision-makers can rely on the monitoring and fire perimeter we build from this data to inform their emergency response. In contrast to Sentinel-2, GOES satellites are also capable of gathering data 24/7 through their thermal infrared bands which do not rely on sunlight (NOAA & NASA, n.d.). Additionally, it is also capable of penetrating cloud cover by snapping images during windows where the cover is less dense (NOAA & NASA, n.d.).Now that weve gotten the brief overview of the GOES-16/17 satellites out of the way, lets start extracting data again from the Earth Engine Data Catalog using the same parameters we used earlier to locate the Lytton Creek wildfire during the end of June 2021. As we can see, we extracted over 4,000 images from each satellite due to its ability to snap images in lightning-quick 515 minute intervals.import eeimport foliumimport geemap.core as geemapimport numpy as npimport pandas as pdimport pprintimport pytzimport matplotlib.pyplot as pltfrom IPython.display import Imagefrom datetime import datetime# Gathering satellite datagoes_16 = ee.ImageCollection("NOAA/GOES/16/FDCF").filterDate(start_date, end_date).filterBounds(poi)goes_17 = ee.ImageCollection("NOAA/GOES/17/FDCF").filterDate(start_date, end_date).filterBounds(poi)# Example: print the number of images in the collections (optional)print(f"Number of GOES-16 images: {goes_16.size().getInfo()}")print(f"Number of GOES-17 images: {goes_17.size().getInfo()}")# Getting a feel for the data we've extracted from the Earth Engine datasetpprint.pp(goes_17.first().getInfo())Lets also load the map_from_map_codes_to_confidence_values() and apply_scale_factors() functions the team at Google provided us to process our data.def map_from_mask_codes_to_confidence_values(image): return image.clip(poi).remap(fire_mask_codes, confidence_values, default_confidence_value)# Applies scaling factors.def apply_scale_factors(image): optical_bands = image.select("SR_B.").multiply(0.0000275).add(-0.2) thermal_bands = image.select("ST_B.*").multiply(0.00341802).add(149.0) return image.addBands(optical_bands, None, True).addBands( thermal_bands, None, True )Overview of the Fire Detection Characterization (FDC) AlgorithmNow that weve talked a little bit about the satellites used to generate the data, lets discuss how we are to detect the presence of wildfires based on these images. Luckily for us, Google makes this easy by giving developers easy access to the FDC Fire Detection algorithm which was developed by a research team at the University of Wisconsin-Madison.The primary objective of the FDC Fire Detection algorithm is to return the likelihood of a fire based on the pixel data of an input image (Restif & Hoffman, 2020). For those interested, below is a brief overview of the steps that the FDC Fire detection algorithm takes to accomplish this objective:1) First, the algorithm takes the data from the thermal infrared (TIR) band of the satellite sensor (band 14), as well as the shortwave infrared (SWIR) band (7), and converts the brightness of each pixel to a temperature;2) Next, it flags certain TIR pixels based on whether they exceed a certain threshold. Examples of such thresholds include:Absolute threshold based on a set temperature;Relative threshold based on the delta between a pixels temperature and its neighbours exceeding a set amount.3) If a pixel is flagged, it checks for false positives by evaluating the temperature of its neighbouring pixels just like in the previous step. When checking the temperature of the pixel, we can choose to apply a different threshold from step 2 if we wish. And in the case of our code example below, we do just that by applying a relative threshold instead.4) If our neighbouring pixel also exceeds the threshold, it will then apply one last check for false positives by evaluating whether the delta/difference between the pixel temperature produced by the TIR (band 14) and the SWIR (band 7) band exceeds a relative threshold.5) And if the difference between the TIR and SWIR pixel temperatures exceeds our relative threshold, the algorithm will return a 1 or a True result, confirming that the pixel in question is indeed a fire pixel.Our code below is a simplified demonstration of Steps 15 of the FDC algorithm. However, our explanation only covers the presence of a fire based on the pixels brightness so the final result of our simplified FDC algorithm is a binary True/False value.# Fire Detection Characterization (FDC) Algorithm example implementation# Simulated satellite image datadef create_simulated_data(width=50, height=50): # Create background temperature (avg 290 Kelvin or 16.85 degrees Celsius) background = np.random.normal(290, 2, (height, width)) # Add some hotter spots (potential fires) with temperatures between 310 to 330 Kelvins (i.e. 36.85 to 56.85 degrees Celsius) num_hotspots = 5 for _ in range(num_hotspots): x, y = np.random.randint(0, width), np.random.randint(0, height) hotspot_temp = np.random.uniform(310, 330) background[y, x] = hotspot_temp return background# Simplified FDC algorithm - our absolute thereshold is 310K or 36.85 degreesdef simplified_fdc(image_4um, image_11um, absolute_threshold=310, relative_threshold=10): height, width = image_4um.shape fire_mask = np.zeros((height, width), dtype=bool) for i in range(1, height-1): for j in range(1, width-1): # Step 1: Check absolute threshold if image_4um[i, j] > absolute_threshold: # Step 2: Calculate background background = np.mean(image_4um[i-1:i+2, j-1:j+2]) # Step 3: Check relative threshold if image_4um[i, j] - background > relative_threshold: # Step 4: Multi-channel confirmation if image_4um[i, j] - image_11um[i, j] > 10: fire_mask[i, j] = True return fire_mask# Create simulated dataimage_4um = create_simulated_data()image_11um = image_4um - np.random.normal(10, 2, image_4um.shape) # 11um channel is typically cooler# Apply simplified FDC algorithmfire_detections = simplified_fdc(image_4um, image_11um)# Visualize resultsfig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))im1 = ax1.imshow(image_4um, cmap="hot")ax1.set_title("Simulated 4m Channel")plt.colorbar(im1, ax=ax1, label="Temperature (K)")ax2.imshow(image_4um, cmap="gray")ax2.imshow(fire_detections, cmap="Reds", alpha=0.5)ax2.set_title("FDC Algorithm Fire Detections")plt.tight_layout()plt.show()print(f"Number of fire pixels detected: {np.sum(fire_detections)}")Source: Image by the authorNumber of fire pixels detected: 4# Visualize resultsfig1, (ax3, ax4) = plt.subplots(1, 2, figsize=(12, 5))im2 = ax3.imshow(image_11um, cmap="hot")ax3.set_title("Simulated 11m Channel")plt.colorbar(im2, ax=ax3, label="Temperature (K)")ax4.imshow(image_11um, cmap="gray")ax4.imshow(fire_detections, cmap="Reds", alpha=0.5)ax4.set_title("FDC Algorithm Fire Detections")plt.tight_layout()plt.show()print(f"Number of fire pixels detected: {np.sum(fire_detections)}")Source: Image by the authorNumber of fire pixels detected: 4Applying the Fire Detection Algorithm (FDC)There are additional steps associated with the algorithm such as estimating its fire radiative power (FRP) which represents the brightness or intensity of a fire in the confirmed pixel. From there, the algorithm then assigns a confidence value towards the probability of an actual fire being reflected from the pixel and plots it on a map to build a fire perimeter.For the sake of brevity, we can cover the complexities behind these confidence values in a future article so for now, take these explanations at face value. At this point in the code, we are now assigning confidence_values between 50-100% to the outputs produced by the algorithm. With a single output, if the algorithm returns a value of 15, it's classifying it as a low probability fire pixel at 50% and in contrast, if it returns a value of 10, there's a near 100% probability that it is a processed fire pixel (Restif & Hoffman, 2020). The resulting values from this process are captured in the goes_16_confidence and goes_17_confidence objects in the following code.# Conversion from mask codes to confidence values.fire_mask_codes = [10, 30, 11, 31, 12, 32, 13, 33, 14, 34, 15, 35]confidence_values = [1.0, 1.0, 0.9, 0.9, 0.8, 0.8, 0.5, 0.5, 0.3, 0.3, 0.1, 0.1]default_confidence_value = 0# Processing the GOES-16 satellite imagesgoes_16_confidence = goes_16.select(["Mask"]).map(map_from_mask_codes_to_confidence_values)goes_16_max_confidence = goes_16_confidence.reduce(ee.Reducer.max())# Processing the GOES-17 satellite imagesgoes_17_confidence = goes_17.select(["Mask"]).map(map_from_mask_codes_to_confidence_values)goes_17_max_confidence = goes_17_confidence.reduce(ee.Reducer.max())Data VisualizationNow, one last thing. Since the satellites collect data over a specific time range, the probability of a fire in a given pixel may vary greatly due to the evolving nature of the on-ground event. Although the temporal aspect of the data itself contains plenty of valuable information, in this instance, were more concerned with generating a broad outline of the fire boundary. To do so, we can use the ee.Reducer.max() function to return the highest confidence value of each pixel within the specified time range (Restif & Hoffman, 2020). We'll apply this to both the goes_16_confidence and the goes_17_confidence objects before overlaying the specific pixel plots on our map below.# We can visualize that initial data processing step from each satellite, using:affected_area_palette = ["white", "yellow", "orange", "red", "purple"]earth_engine_viz = { "opacity": 0.3, "min": 0, "max": 1, "palette": affected_area_palette }# Create a map.Map = geemap.Map()Map.centerObject(poi, 9)Map.addLayer(poi, {"color": "green"}, "Area of interest", True, 0.2)Map.addLayer(goes_16_max_confidence, earth_engine_viz, "GOES-16 maximum confidence")Map.addLayer(goes_17_max_confidence, earth_engine_viz, "GOES-17 maximum confidence")MapSource: Image by the authorFrom our initial results, we can see two iterations of the FDC Algorithm layered over top of each other on the map. We can combine the results of our two satellite images to increase the spatial resolution of our wildfire perimeter using the ee.Reducer.min() function which returns the lesser of the two confidence values where the two layers intersect (Restif & Hoffman, 2020).# Combine the confidence values from both GOES-16 and GOES-17 using the minimum reducercombined_confidence = ee.ImageCollection([goes_16_max_confidence, goes_17_max_confidence]).reduce(ee.Reducer.min())# Create a mapMap = geemap.Map()Map.centerObject(poi, 9)Map.addLayer(poi, {"color": "green"}, "Area of interest", True, 0.2)Map.addLayer(combined_confidence, earth_engine_viz, "Combined confidence")# Display the mapMapSource: Image by the authorWith the results of our two satellites combined, notice how the generated boundary is highly pixelated due to the image quality of the satellites. One last thing we can do to our wildfire boundary is to smooth the boundaries between the combined fire masks using the ee.Image.reduceNeighborhood() function.# Define the kernel for smoothingkernel = ee.Kernel.square(2000, "meters", True)# Apply the smoothing using reduceNeighborhood with the mean reducersmoothed_confidence = combined_confidence.reduceNeighborhood( reducer=ee.Reducer.mean(), kernel=kernel, optimization="boxcar")# Create a mapMap = geemap.Map()Map.centerObject(poi, 9)Map.addLayer(poi, {"color": "green"}, "Area of interest", True, 0.2)Map.addLayer(smoothed_confidence, earth_engine_viz, "Smoothed confidence")# Display the mapMapSource: Image by the authorThere you have it! A near real-time wildfire boundary using Python to deploy the FDC Algorithm on GOES-16 and 17 satellite images from Googles Data Catalog platform. However, as with most technologies, the use of the FDC on GOES-16/17 images doesnt come without its weaknesses which well discuss to have a better understanding of the situations where other technologies would be more appropriate.One risk with using the FDC algorithm on GOES-16/17 images is its tendency to detect false positives with an image. For example, reflective surfaces from buildings in urban areas or lakes and dry vegetation in a forest may be misconstrued as a fire.Additionally, the image resolution from GOES-16/17 satellites is poorer compared to other data collection techniques. We saw this first-hand from the pixelated fire perimeter we produced in our initial effort applying the FDC algorithm. The reason why the wildfire perimeter was so pixelated is because each pixel captures anywhere between 436 squared kilometers depending on how far the area is from the centre of the image. Due to the spherical shape of the Earth and the satellites position, the farther an area is from the centre of an image, the lower its image quality. For wildfire detection, what this means is that activities smaller than the pixel size may either be mischaracterized or missed completely.Another aspect to consider is the terrain of the area of interest. This risk is mostly attributed to mountainous terrain where the lee ward side of a mountain may obfuscate a satellites view in that area.To mitigate these risks, we must use other imaging techniques and technologies alongside GOES-16/17 data to gain a clearer understanding of the ground situation. As weve previously discussed, high-resolution data from Sentinel-2 and Landsat satellites can be highly complementary when theyre available as it allows us to cross-validate our resulting wildfire boundaries. On top of that, ground observations and aerial drone surveys add another layer of validation to a highly dynamic event.By executing the FDC algorithm on GOES-16/17 data, theres little doubt that this approach can be a powerful asset in helping us build wildfire perimeters in real-time as part of a broader mitigation strategy with other sensory techniques.Thank you for taking the time to read through our work! If youre interested in learning more, please feel free to check out our open source repository where we continue to research ways to improve the Government of British Columbias (Canada) detection and response to wildfires across the province. Additionally, feel free to access notebook associated to this article if you would like to run the code in its entirety.See you in our next post ResourcesNational Oceananic and Atmospheric Association (NOAA) & National Aeronautics and Space Administration (NASA). (2024). Beginners guide to GOES-R series data: How to acquire, analyze, and visualize GOES-R Series data. https://www.goes-r.gov/downloads/resources/documents/Beginners_Guide_to_GOES-R_Series_Data.pdfNational Oceananic and Atmospheric Association (NOAA) & National Aeronautics and Space Administration (NASA). (n.d.). Instruments: Advanced baseline imager (ABI). https://www.goes-r.gov/spacesegment/abi.htmlRestif, C. & Hoffman, A. (2020, November 20). How to generate wildfire boundary maps with Earth Engine. Medium. https://medium.com/google-earth/how-to-generate-wildfire-boundary-maps-with-earth-engine-b38eadc97a38Schmidt, C., Hoffman, J., Prins, E., & Lindstrom, S. (2012, July 30). GOES-R Advanced Baseline Imager (ABI) algorithm theoretical basis document for fire / hot spot characterization. NOAA NESDIS Center for Satellite Applications and Research. https://www.star.nesdis.noaa.gov/goesr/docs/ATBD/Fire.pdfJoin thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AI
    0 Σχόλια 0 Μοιράστηκε 17 Views
  • TOWARDSAI.NET
    Faster Knowledge Distillation Using Uncertainty-Aware Mixup
    Author(s): Tata Ganesh Originally published on Towards AI. Photo by Jaredd Craig on UnsplashIn this article, we will review the paper titled Computation-Efficient Knowledge Distillation via Uncertainty-Aware Mixup [1], which aims to reduce the computational cost associated with distilling the knowledge of computer vision models.Disclaimer: This papers arxiv draft was published in 2020, so some of the teacher models mentioned in the results are small models by todays standards.Knowledge DistillationKnowledge distillation (KD) is the process of transferring learning from a larger model (called the teacher) to a smaller model (called the student). It is used to create compressed models that can be run on resource-constrained environments. Further, KD yields a more accurate model compared to a model that is trained from scratch. In the original knowledge distillation paper by Hinton et al. [2], the student model is trained using the output logits from the teacher model for each training sample. The ground-truth labels are also included during training if they are available. This process is illustrated below.Knowledge Distillation framework. Figure by author. Dog image from CIFAR-10 dataset [3]Computational Cost of Knowledge DistillationFirst, let us define the different floating point operations that contribute to KDs computational cost. Note that these operations are defined per image.F = Teacher forward pass (to get output logits from teacher model)F = Student forward pass (to get output logits from student model)B = Student backward pass (to update weights of the student model)The breakdown of the typical KD process for a mini-batch of N images is as follows:A mini-batch of N images is passed through the teacher and the student models. The cost of this forward pass is F + F.A distillation loss is applied between the teacher and the student models for different layers.The student models weights are updated during the backward pass. The cost of this backward pass is B.Note: Since the teacher model is much larger than the student model, we can assume that F >> F, F >> B and F = B.This process can be summarized using the following figure:Framework of Knowledge Distillation [1]Hence, the total cost of KD for a mini-batch of N images is:Computational Cost of KD [1]Reducing the number of images passed to the teacher model can lead to an overall reduction in the computational cost of KD. So, how can we sample images from each mini-batch to reduce the cost associated with the teacher models forward pass operation? Katharopoulos et al. [4] claim that all samples in a dataset are not equally important for neural network training. They propose an importance sampling technique to focus computation on informative examples during training. Similarly, the importance or informativeness of examples in a mini-batch can be used to sample only informative examples and pass them to the teacher model. In the next section, we will discuss how the proposed method, named UNIX, performs this sampling.UNcertainty-aware mIXup (UNIX)UNIX Framework [1]The sequence of steps for each mini-batch in UNIX is as follows:Step 1: Student forward passEach mini-batch of images is fed to the student model to obtain the predicted class probabilities for each image.Step 2: Uncertainty EstimationFor each image, the predicted probabilities are used to generate an uncertainty estimate. The uncertainty value loosely indicates the prediction confidence of the student model for each image. The higher the uncertainty, the lower the confidence. Based on Active Learning literature [5], uncertainty can be used to estimate the informativeness of each image. For example, the authors use entropy of the student models predicted probability distribution to quantify uncertainty.Uncertainty quantification using entropy [1]Step 3: Shuffling and Sorting the mini-batchThe mini-batch is then sorted in decreasing order of sample uncertainties. Let us name the sorted mini-batch Bsorted. Further, the original mini-batch is shuffled. Let us name the shuffled mini-batch Bshuffled.Step 4: Uncertainty-Aware MixupMixup [6] is a data augmentation technique that performs a convex combination of two images and their corresponding labels in a mini-batch. Mixup has been shown to improve the generalization of neural networks.Mixup Data Augmentation [6]. is used to control the magnitude of mixup.The authors propose to use mixup as a way to compress information from two images into one, then feed the mixed image to the teacher and student models for KD. An element-wise mixup is performed between images in Bsorted and Bshuffled. Specifically,Performing mixup based on sample uncertainty [1]Here, c is a correction factor, which is a function of each samples uncertainty. c ensures that mixup is mild for uncertain samples and strong for confident samples. Note that labels are NOT mixed.Step 5: Sampling and Teacher forward passAfter performing mixup, k images are sampled from the N mixed images. These k mixed images are fed as input to the teacher and student models for KD.Comparing Computational CostsConsider the case where batch size N = 64 and k = 40. Then, the computational cost of a forward pass for a mini-batch with and without UNIX is (Note that the final cost is expressed with respect to the student model) :Example of Computation Cost of KD with and without UNIX. Figure by Author.In our example, KD with UNIX yields a ~25% reduction in computational cost, improving the computational efficiency of the distillation process.ResultsCIFAR-100 ResultsResults of different model architectures on the CIFAR-100 [2] image classification dataset are shown below.KD results on CIFAR-100 [1]. WRN means Wide Resnet [7].In most cases, the performance of UNIXKD is on par with original KD. Specifically, UNIXKD with k=36 provides a good tradeoff between accuracy and computational cost. Further, random sampling with KD (Random+KD) performs on par or worse than UNIXKD for all model architectures, highlighting the importance of uncertainty-based sampling in improving computational efficiency with minimal reduction in accuracy.ImageNet resultsResults on the ImageNet [8] dataset are shown below.KD results on ImageNet[1].The columns with +label specify KD with ground truth labels. For experiments with and without ground truth labels, UNIXKD performs on par with original KD while reducing the total computational cost by ~23%.ConclusionKnowledge Distillation is a technique used for transferring the knowledge of a large teacher model into a small student model. However, the high computational cost of performing a forward pass through the teacher model makes the distillation process computationally expensive. To tackle this problem, UNcertainty-aware mIXup (UNIX) uses uncertainty sampling and the mixup augmentation technique to pass a smaller number of images to the teacher model. Experiments on CIFAR 100 and ImageNet datasets show that UNIX can reduce the computational cost of knowledge distillation by 25% with minimal reduction in classification performance.References[1] G. Xu, Z. Liu, and C. Change Loy. Computation-Efficient Knowledge Distillation via Uncertainty-Aware (2020), arXiv preprint arXiv:2012.09413.[2] G. Hinton, O. Vinyals, and J. Dean. Distilling the knowledge in a neural network (2015), arXiv preprint arXiv:1503.02531.[3] A. Krizhevsky and G. Hinton. Learning multiple (2009).layers of features from tiny images[4] A. Katharopoulos and F. Fleuret. Not all sam- (2018), International conference on ples are created equal: Deep learning with importance sam-plingmachine learning. PMLR.[5] B. Settles. Active learning literature survey (2010), University of Wisconsin, Madison, 52(5566):11.[6] H. Zhang, M. Cisse, Y. Dauphin, and D. Lopez-Paz. mixup: Beyond (2018), 6th International Conference on Learning Representations.empirical risk minimization[7] S. Zagoruyko and N. Komodakis. Wide Residual Networks (2017), arXiv preprint arXiv:1605.07146.[8] J. Deng, W. Dong, R. Socher, L. Li, Kai Li, and Li Fei- Fei. Imagenet: A large-scale hierarchical image database (2009), IEEE Conference on Computer Vision and Pattern Recognition.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AI
    0 Σχόλια 0 Μοιράστηκε 16 Views
  • TOWARDSAI.NET
    Can a LLM beat you At Chess?
    Author(s): Arthur Lagacherie Originally published on Towards AI. We can use Outlines to answer this question.Recently, I discovered a Python package called Outlines, which provides a versatile way to leverage Large Language Models (LLMs) for tasks like:ClassificationNamed Entity ExtractionGenerate synthetic dataSummarize a documentAnd Play Chess (there are also 5 other uses).GitHub dottxt-ai/outlines: Structured Text GenerationStructured Text Generation. Contribute to dottxt-ai/outlines development by creating an account on GitHub.github.comIn this article, I will explore various configurations for chess games, including human-versus-LLM (large language model) matches, where a human competes against an AI model, as well as LLM-versus-LLM setups, where two AI models play against each other.How it worksTo accomplish this task easily, Outlines uses a sampling technique different from the usual one.First, what is sampling in an LLM? When generating the next token, an LLM returns a probability for each token in its vocabulary, ranging from 0% to 100%. There are various ways to select from these predicted tokens, and this selection process is known as sampling.Outlines, instead of applying sampling to all tokens, select only the tokens related to the text format you want to generate and then apply sampling to this subset.To choose the tokens related to the text format outlines use a regex updated each move to only match with legal moves.Efficient Guided Generation for Large Language ModelsIn this article we show how the problem of neural text generation can be constructively reformulated in terms ofarxiv.orgLLM vs LLMThe first thing I want to do is LLM vs. LLM but just one LLM to begin. To do this we need some Python libraries.!pip install outlines -q!pip install chess -q!pip install transformers accelerate einops -qimport chess, chess.svg, refrom outlines import generate, modelsfrom IPython.display import Image, display, clear_outputChess: a library to handle the board.IPython, chess.svg: libraries to display the board.After that, the first thing we need is the function to create the regex that specifies to Outlines the text format.def legal_moves_regex(board): """Build a regex that only matches valid moves.""" legal_moves = list(board.legal_moves) legal_modes_str = [board.san(move) for move in legal_moves] legal_modes_str = [re.sub(r"[+#]", "", move) for move in legal_modes_str] regex_pattern = "|".join(re.escape(move) for move in legal_modes_str) regex_pattern = f"{regex_pattern}" return regex_patternThis function will return a text like this.'Nh3|Nf3|Nc3|Na3|h3|g3|f3|e3|d3|c3|b3|a3|h4|g4|f4|e4|d4|c4|b4|a4'Its all the legal move of the board state.Now we have the libraries and the regex generator we can download the model by executing the following line of code.model = models.transformers("google/gemma-2-2b-it", device="auto")And the final cell of code to run the main loop.board = chess.Board("rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1")prompt = "Let's play Chess. Moves: "board_state = " "turn_number = 0while not board.is_game_over(): regex_pattern = legal_moves_regex(board) structured = generate.regex(model, regex_pattern)(prompt + board_state) move = board.parse_san(structured) if turn_number % 2 == 0 : # It's White's turn board_state += board.san(move) + " " else: board_state += board.san(move) + " " + str(turn_number) + "." turn_number += 1 board.push(move) clear_output(wait=True) display(chess.svg.board(board, size=250, lastmove=move))First, we define the chessboard, the prompt, the board state, and the turn number. Then we create a while for the game. For each turn, we generate the regex and the move, then update the board state, and to finish displaying the chessboard.Lets run it.video by authorGemma 2b vs. Smollm2 1.7bNow its time to do the same but with two LLMs. Lets import it.model1 = models.transformers("Arthur-LAGACHERIE/Gemma-2-2b-4bit", device="cuda")model2 = models.transformers("HuggingFaceTB/SmolLM2-1.7B-Instruct", device="cuda")Note: here I use a quantized version of Gemma 2b before I install bitsandbytes pip install -q bitsandbytes.And we also need to change the game function a little.board = chess.Board("rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1")prompt = "Let's play Chess. Moves: "board_state = " "turn_number = 0while not board.is_game_over(): if turn_number % 2 == 0 : # It's White's turn regex_pattern = legal_moves_regex(board) structured = generate.regex(model1, regex_pattern)(prompt + board_state) move = board.parse_san(structured) board_state += board.san(move) + " " else: regex_pattern = legal_moves_regex(board) structured = generate.regex(model2, regex_pattern)(prompt + board_state) move = board.parse_san(structured) board_state += board.san(move) + " " + str(turn_number) + "." turn_number += 1 board.push(move) clear_output(wait=True) display(chess.svg.board(board, size=250, lastmove=move)) print("0" if turn_number % 2 != 0 else "1")(I also add the last line to print the winner)Lets run it.gemma vs. smollm2 (gif by the author)After a long and difficult (and also dozen and dozen of dumb moves) war between Gemma 2b and Smollm2 1.7b the winner is: Smollm2 But if you look at the game more deeply you will see some dumb moves. The two LLMs play like a 3-year old human.LLM vs. HumanNow that weve seen LLMs pitted against each other, lets see how a language model fares against a human player (me).First, lets download the model, I will take Smollm2 1.7b because he wins against Gemma 2b.model = models.transformers("HuggingFaceTB/SmolLM2-1.7B-Instruct", device="auto")Then, we need to update the main while a little.board = chess.Board("rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1")display(chess.svg.board(board, size=250))prompt = "Let's play Chess. Moves: "board_state = " "turn_number = 0while not board.is_game_over(): if turn_number % 2 == 0 : # It's White's turn inp = input("Your move: ") move = board.parse_san(inp) board_state += board.san(move) + " " else: regex_pattern = legal_moves_regex(board) structured = generate.regex(model, regex_pattern)(prompt + board_state) move = board.parse_san(structured) board_state += board.san(move) + " " + str(turn_number) + "." turn_number += 1 board.push(move) clear_output(wait=True) display(chess.svg.board(board, size=250, lastmove=move))print("0" if turn_number % 2 != 0 else "1") And run it.me vs. Smollm2, video by authorI won in 3 minutes; the models chess skills are quite limited.ConclusionThe models arent very intelligent at chess, likely due to their reduced number of parameters.With the guidance from this article, you can now experiment with LLMs in a chess setting though you may not see grandmaster-level gameplay.I hope you enjoy this article and if this is the case you can clap it. (you can also follow me =).Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AI
    0 Σχόλια 0 Μοιράστηκε 16 Views
  • TOWARDSAI.NET
    Insert One Million Records With Just A Few Lines of SQL Code
    LatestMachine LearningInsert One Million Records With Just A Few Lines of SQL Code 0 like November 9, 2024Share this postLast Updated on November 9, 2024 by Editorial TeamAuthor(s): Kamireddy Mahendra Originally published on Towards AI. Save Time and Effort in Creating data to Practice SQLThis member-only story is on us. Upgrade to access all of Medium.Image by authorHello!In this article, You will find a very important tip to generate your data to practice SQL or any technology with your created data of any number of records. Here is One Million Records, Im going to insert. You can do it for any number of records.Recently, I received a small task from one of my clients asking me to generate sample data with one million records for their practice or some other reasons and Shared a requirements document.So I Prepared a table having one million records in just a few minutes with this technique that Im going to share in this article.As I couldnt share exact client requirements due to privacy concerns, I will create new requirements (You can also create your requirements). As per these requirements, Im going to insert one million records.MeanLifeStudies by authorWe can insert manually just a hundred to a few hundred records. But inserting thousands or millions of records manually will be exhausting and takes more amount of time and effort. To save time and effort, this technique will be very useful to create our data.Lets get Started!In the client-given document, Read the full blog for free on Medium.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post
    0 Σχόλια 0 Μοιράστηκε 27 Views
  • TOWARDSAI.NET
    Apples Surprising Take On AI: Why The Future Isnt As Bright As We Thought
    Apples Surprising Take On AI: Why The Future Isnt As Bright As We Thought 0 like November 9, 2024Share this postLast Updated on November 9, 2024 by Editorial TeamAuthor(s): Kamran Khan Originally published on Towards AI. Apples Take on AIThis member-only story is on us. Upgrade to access all of Medium.Photo by Solen Feyissa on UnsplashApple officially weighed in on the subject of artificial intelligence, and its stance was rather sobering-even for an industry having a big-meeting-of-the-minds moment on AI as the future.Just when other tech giants seem to want us to only care about AI, Apple is raising rather sober questions about some of the risks and challenges that accompany the unchecked development of AI.They are pointing out the ethical, social and security ramifications we may not be ready for. So, what does that mean for our tech-driven future?Thats why Apples approach to AI tells us to think very carefully before we leap into the future.Photo by appshunter.io on UnsplashThe case of Apple and AI is unique because its not one about rushing to be first or fastest.Apple looks at integrating AI into everything while paying attention to user privacy, control, and security.The company is far more circumspect in how it develops and implements AI than are most tech companies, with a much more thoughtful, more privacy-centered approach.For example, whereas other companies harvest troves of data to power their AI, Apple relies on processing directly on the device: Read the full blog for free on Medium.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post
    0 Σχόλια 0 Μοιράστηκε 14 Views
  • TOWARDSAI.NET
    RAGate: Adaptive RAG for Conversational AI
    LatestMachine LearningRAGate: Adaptive RAG for Conversational AI 0 like November 9, 2024Share this postLast Updated on November 10, 2024 by Editorial TeamAuthor(s): Rupali Patil Originally published on Towards AI. Building Conversational AI systems is hard!!!Its feasible but also complex, time-consuming, and resource-intensive.The challenge lies in designing systems that can understand and generate human-like responses and ensuring that these systems engage users effectively, adapting to the nuances of conversation.The very popular RAG (Retrieval-Augmented Generation) has revolutionized conversational AI by seamlessly integrating external knowledge with LLMs internal knowledge. By using RAG with your business data, your customers can ask questions about their data in natural language, facilitating a seamless interaction.However, there is a caveat: While using RAG, it becomes clear that not every query needs an answer sourced from external knowledge. Over-reliance on external sources can disrupt genuine engagement. Its like having a conversation with someone and, for every question, reaching for a book to craft your response, even though you already have a deeper understanding of the topic. Even worse, you cant find any book on the topic and end up responding with I dont know, despite having internal knowledge that could provide a more insightful answer.Clearly, while using RAG, a mechanism is needed to determine when to utilize external knowledge versus internal knowledge at the inference time.Enter RAGate a binary switch designed to dynamically evaluate when to utilize external knowledge and when to rely on internal insights. Introduced by Xi Wang, Procheta Sen, Ruizhe Li, and Emine Yilmaz, and published in July 2024, ArXiv (Adaptive Retrieval-Augmented Generation for Conversational Systems).Lets learn more with examples.What is Conversational AI, really?Conversation is the exchange of thoughts, emotions, and information between individuals, adapting to tone, context, and subtle cues that guide the interaction. Humans are naturally suited for conversation due to qualities like emotional intelligence, socialization, and cultural exposure, which help us understand nuances and adapt to different social contexts.Conversational AI aims to replicate this human-like interaction by using technology to understand and generate natural, contextually appropriate, and engaging responses. It adapts to user inputs, making the interaction fluid and dynamic, like a conversation between humans.What is External Knowledge and Internal Knowledge of AI systems?In the opening paragraph, I mentioned two key terms External Knowledge and Internal Knowledge. Lets take a moment to clarify these concepts, as understanding them will make learning about RAGate much easier.External knowledge encompasses information not inherent to the AI model but retrieved from outside sources. The sources include databases like structured data repositories, APIs, unstructured knowledgebases like guides, FAQs, and web sources. The primary role of external knowledge is to provide factual, up-to-date, and contextually relevant information that enhances the accuracy and comprehensiveness of the AIs responses.Internal knowledge refers to the built-in-knowledge and processing capabilities embedded within the AI model based on its training data. The sources include pre-trained knowledge from diverse datasets, including language patterns, grammar, shared facts, and general world knowledge, contextual awareness from memory of past interactions, and AIs semantic understanding and comprehension abilities.RAG and Guardrails powerful duo, but with limitations!RAG combines two powerful elements: (1) The natural language processing abilities of large language models (LLMs) to interpret and generate human-like text. (2)The ability to retrieve and augment external, up-to-date information.Many RAG implementations incorporate guardrails, constraints, or rules that guide the systems behavior towards responsible and domain-bound AI. These guardrails often prioritize using external knowledge over the models internal knowledge to ensure predictability of response. The strict application of these guardrails can sometimes lead to suboptimal outcomes:Over-reliance on external sources: The system may be forced to seek external information even for general questions where the LLMs internal knowledge might suffice.Potential for less fluid responses: By restricting internal knowledge, the system might produce less natural or contextually appropriate responses in some cases.Increased latency: Constantly retrieving external information can slow response times compared to relying on internal knowledge.Missed opportunities: The vast knowledge embedded in the LLMs parameters might be underutilized, potentially missing valuable insights or connections.Balancing Act with RAGateRAGate, short for Retrieval-Augmented Generation Gate, enhances conversational AI systems by adaptively determining when to incorporate external knowledge into responses.RAGate study investigates the need for adaptive augmentation in conversational systems and presents RAGate as a gating model that predicts when external knowledge retrieval is beneficial. The paper provides extensive experiments and analyses, demonstrating RAGates effectiveness in improving response quality and generation confidence in RAG-based conversational systems.RAGate flow compared to Traditional RAG flow. Image source: AuthorRAGate ExampleScenario: A user is interacting with a healthcare-focused chatbot that offers personalized health advice based on general wellness principles and medical knowledge.This example illustrates how RAGate can improve AI responses by utilizing internal knowledge to offer general wellness tips that dont require specific external information. Image Source: AuthorRAGate can further enhance conversation by balancing internal and external knowledge. It allows AI to use internal medical knowledge for general info while retrieving up-to-date research. It can even intelligently synthesizes data from multiple sources for a comprehensive analysis, offers personalized insights based on patient details, and filters external information to prioritize the most relevant content, reducing overload.Variants of RAGateAs published in paper, RAGate offers 3 variants RAGate-Prompt, RAGate-PEFT (Parameter-Efficient Fine-Tuning), and RAGate-MHA (Multi-Head Attention).Each variant of RAGate Prompt, PEFT, and MHA employs distinct methods to integrate external knowledge, towards the common goal of improving the relevance and accuracy of AI-generated responses.Here is a quick comparison table:A comparison between RAGate variants RAGate-Prompt, RAGate-PEFT (Parameter-Efficient Fine-Tuning), and RAGate-MHA (Multi-Head Attention). Image source: AuthorHow to implement RAGate?The paper illustrates a step-by-step guide to implement RAGate:Define the problem: This step is crucial as it is about identifying the conversational task you want to enhance with RAGate. Determine the scope of the conversation and the specific domains you want to cover (e.g., restaurant recommendations, travel planning).Select a language model: Choose an appropriate Large Language Model (LLM) as the backbone for your conversational system. Options include models like Llama, GPT-2, or other transformer-based architectures.Gather and annotate data: Collect a dataset relevant to your conversational domain. The KETOD dataset, which includes annotated dialogues and knowledge snippets, is an excellent example. Ensure that your dataset has clear labels indicating when knowledge augmentation is necessary.Develop the Knowledge Retrieval System: Implement a knowledge retrieval mechanism to fetch relevant external information when needed. It can consider the popular techniques like dense-passage retrieval or graph-structured knowledge bases.Implement the RAGate mechanism: Create the binary knowledge gate function (RAGate) to determine when to augment responses with external knowledge. It involves Contextual Analysis and Gating FunctionExplore RAGate variants: Develop different variants of RAGate based on the approaches discussed in the paper:RAGate-Prompt: Use natural language prompts with a pre-trained language model to determine the need for augmentation.RAGate-PEFT: Employ parameter-efficient fine-tuning techniques (e.g., QLoRA) to train your language model for better decision-making.RAGate-MHA: Utilize a multi-head attention mechanism to assess the context and retrieve knowledge interactively.7. Train the Model: Fine-tune your LLM using the annotated dataset, employing the various RAGate variants. Incorporate the training of the gating mechanism to enhance the models ability to predict the need for knowledge augmentation effectively.8. Evaluate performance: Conduct extensive experiments to validate the effectiveness of RAGate. Analyze metrics such as:Precision, Recall, F1 Score: To evaluate the classification performance of the gating function.BLEU, ROUGE, BERTScore: This is used to assess the quality of generated responses compared to ground truth.Confidence Scores: Measure the confidence of generated outputs to ensure high-quality responses.9. Deploy the system: Integrate the RAGate-enabled conversational system into your application or service. Ensure the system can handle real-time queries and dynamically decide on knowledge augmentation.10. Iterate and improve: Continuously gather user feedback and interaction data to refine the model. Analyze areas where the system may struggle with context or relevance and adjust the training or retrieval mechanisms accordingly.TakeawaysIn conclusion, RAGate represents a significant advancement in conversational AI by intelligently balancing internal and external knowledge to provide more relevant, efficient, and personalized responses. The applications of RAGate are vast, spanning across industries such as healthcare, customer support, education, legal services, finance, and more. By enhancing AIs capacity to deliver tailored, real-time information, RAGate has the potential to revolutionize how businesses and individuals interact with technology, improving decision-making, user experience, and overall system performance.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post
    0 Σχόλια 0 Μοιράστηκε 25 Views
  • TOWARDSAI.NET
    Why Every Health Data Scientist Should Know About OMOP CDM
    Why Every Health Data Scientist Should Know About OMOP CDM 0 like November 9, 2024Share this postAuthor(s): Mazen Ahmed Originally published on Towards AI. Standardising Healthcare DataThis member-only story is on us. Upgrade to access all of Medium.Image by AuthorA large issue I struggle with at work is standardising healthcare data.I gather data from hospitals around the world in an attempt to produce a centralised and unified database for participating hospitals and researchers to benefit from.Each hospital seems to have its own way of recording healthcare data. There are many standards out there which hospitals follow but there is no universal agreement on how healthcare data should be recorded.There are many domains in medicine, each of which have specific variables that need to be collected. For example cardiology data collection requires the collection of different variables than oncology or endocrinology. These domain-specific requirements adds a layer of complexity to standardisation, not only do we need to align general health data such as age, sex and BMI, but we must also ensure that the needs of each speciality is sufficiently met.Disparate Hospital Data Sources produced in CanvaProducing a data model that has the ability to accurately capture information from every healthcare domain is an immense challenge. However, if this challenge is met and the resulting model is widely adopted this would lead to a scalable and interoperable Read the full blog for free on Medium.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post
    0 Σχόλια 0 Μοιράστηκε 14 Views
  • TOWARDSAI.NET
    27 Equations Every Data Scientist Needs to Know
    27 Equations Every Data Scientist Needs to Know 0 like November 9, 2024Share this postAuthor(s): Julia Originally published on Towards AI. This member-only story is on us. Upgrade to access all of Medium.Everybodys talking about AI, but how many of those who claim to be experts can actually break down the math behind it? Its easy to get lost in the buzzwords and headlines, but the truth is without a solid understanding of the equations and theories driving these technologies, youre only skimming the surface. Think you can just rely on the tools and libraries available today? Think again. If you want to truly innovate and stay ahead of the curve, you need to master the math that powers AI and data science. In this article, well dive deep into the fundamental concepts that most people ignore and why theyre absolutely crucial for anyone serious about working in AI.Photo by ThisisEngineering on UnsplashGradient Descent is a fundamental optimization algorithm used in machine learning to minimize a function by iteratively moving in the direction of steepest descent. Its particularly useful in training models with large datasets, as it efficiently finds the minimum of a cost function. The algorithm updates parameters in the opposite direction of the gradient of the function at the current point, with the size of the step Read the full blog for free on Medium.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post
    0 Σχόλια 0 Μοιράστηκε 13 Views
  • TOWARDSAI.NET
    Llama 3.2 Vision Review
    Author(s): Barhoumi Mosbeh Originally published on Towards AI. Ollama multi-modalOllama has just announced its official support for the Llama 3.2 Vision models. The Llama 3.2 Vision models come in two sizes: 11 billion and 90 billion parameters. In this article, I will provide an overview of their performance in different cases, along with my personal thoughts on them.The Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image. The models outperform many of the available open source and closed multimodal models on common industry benchmarks.Also the model supports many languages:For text only tasks, English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai are officially supported. Llama 3.2 has been trained on a broader collection of languages than these 8 supported languages. Note for image+text applications, English is the only language supported.EvaluationI will use the approach from this paper for evaluation, which actually argues that vision-language models are blind. It tests models on several tasks:VLMs are BlindResearch showing that vision language models (VLMs) fail on simple visual tasks that are easy for humans.vlmsareblind.github.ioTask 1: Line IntersectionsTask 2: Two CirclesTask 3: Circled LetterTask 4: Overlapping ShapesTask 5: Nested SquaresTask 6: Counting GridTask 7: Subway MapI will choose the most challenging parts from each task and see if the model can get any of them right. Additionally, I will run the code around 1020 times to calculate the average of accurate responses and identify any false results.Task 1: Line IntersectionsPrompts:How many times do the blue and red lines touch each other? Answer with a number in curly brackets, e.g., {5}.Count the intersection points where the blue and red lines meet. Put your answer in curly brackets, e.g., {2}.The blue and red lines intersect at 2 points. Therefore, they touch each other twice. {2}The blue and red lines touch each other at three points. Therefore, the answer is {3}.Success rate > 60%Task 2: Two CirclesPrompts:Are the two circles touching each other? Answer with Yes/No.Are the two circles overlapping? Answer with Yes/No.**Step 1: Identify the Circles**The image contains a blue circle and a red circle.**Step 2: Determine if the Circles are Touching Each Other**To determine if the circles are touching each other, we need to check if they intersect at exactly one point or not. If they do not intersect at all, then they are not touching. If they intersect at more than one point, then they are overlapping but not necessarily touching.**Step 3: Check for Intersection Points**Upon examining the image closely, it appears that the two circles intersect at exactly one point. This means that the circles are indeed touching each other.**Conclusion:**YesThe image depicts a diagram with two circles, one red and one blue. The red circle is positioned on the left side of the diagram, while the blue circle is located on the right side.To determine whether the two circles are touching each other, we need to analyze their positions and sizes. Upon closer inspection, it becomes apparent that the two circles do not overlap or touch at any point. Instead, they appear to be separated by a small gap between them.Therefore, based on this analysis, **No**, the two circles are not touching each other.Success rate > 80%Task 3: The circled letterPrompts:Which letter is being circled?Which character is being highlighted with a red oval?**Answer:** The letter "n" is highlighted in red and encircled.Success rate: 100%Counting overlapping shapesPrompts:How many {shapes} are in the image? Answer with only the number in numerical format.Count the {shapes} in the image. Answer with a number in curly brackets e.g. {3}.The model answered with either 6, 7, or 8 but never gave the correct number, which is 9. And guess what? Even GPT-4 cant do it!Success rate: 0%Following single-colored pathsPrompts:How many single-colored paths go from A to C? Answer with a number in curly brackets, e.g., {3}Count the one-colored routes that go from A to C. Answer with a number in curly brackets, e.g., {3}.To find the number of single-colored paths from point A to point C, we need to count the distinct paths that only use one color throughout their entire length.**Step 1: Identify the colors used on the graph**Looking at the graph provided, we can see that there are four different colors used:* Red* Blue* Green* PurpleHowever, upon closer inspection, it becomes apparent that not all these colors are utilized in creating paths from A to C. We focus only on those colors that lead directly or indirectly from A to C without interruption.**Step 2: Determine the relevant paths**Upon examining the graph closely for paths using a single color from A to C:* There is one path going directly from A to B (using red) and then continuing in another segment. {* Another path leads from A to D and then to C, but this involves two colors.Given this analysis, only one path utilizes a single color continuously from A to C.**Conclusion**Therefore, the number of single-colored paths from point A to point C is1}.Answer: {1}.To find the number of single-colored paths from point A to point C, we need to analyze the given graph and count the distinct paths that meet this criterion.**Step 1: Identify Single-Colored Paths**Single-colored paths are those where all edges belong to a single color. In this context, a path is considered single-colored if it consists entirely of red edges.**Step 2: Count Distinct Red Edges from A to C**From point A to point C, there are two distinct red edges that form a direct path:- One edge connects A directly to C.- Another set of edges forms a longer path through intermediate nodes but ultimately leads back to the same conclusion regarding single-color paths.**Step 3: Consideration of Alternative Paths**Upon closer inspection, it becomes apparent that while there are multiple paths from A to C, only two of these paths consist entirely of red edges. This is because any path involving more than one color would not be considered "single-colored."**Conclusion**Given the analysis above, the correct answer reflects the count of single-colored paths (all red) from point A to point C. {Answer:2}Success rate > 60%Both GPT-4o and Claude 3.5 Sonnet failed in this task.After playing with this model for a while, I dont see a significant difference; it can be compared to GPT-4o and Sonnet-3. However, its still not as good as Gemini-1.5 Pro, which I believe might be the best VLM right now. Keep in mind that this is just the 11B version of the model. If you have enough resources to test the 90B version, simply replace 11B with 90B in the following code.Try it yourselfInstall Ollama!curl -fsSL https://ollama.com/install.sh | shStart ollamaimport osget_ipython().system = os.system!ollama serve &Pull the model!ollama pull llama3.2-vision Try itimport ollamadef extract_document_info(image_path): response = ollama.chat( model='llama3.2-vision', messages=[{ 'role': 'user', 'content': "your question about the image" ), 'images': [image_path] }] ) return response#exampleimage_path ="/kaggle/input/image.jpg"result = extract_document_info(image_path)print(result["message"]["content"])Thats it! This was a quick review of LLaMA 3.2 Vision (11B version). Its actually good at OCR too, I tested it on some really challenging images. The 90B version is better, of course, but it requires substantial resources to run. Once I find a good resource, Ill attach it!llama3.2-vision:90bLlama 3.2 Vision is a collection of instruction-tuned image reasoning generative models in 11B and 90B sizes.ollama.comVLMs are BlindResearch showing that vision language models (VLMs) fail on simple visual tasks that are easy for humans.vlmsareblind.github.ioJoin thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AI
    0 Σχόλια 0 Μοιράστηκε 15 Views
  • TOWARDSAI.NET
    DSPy: Machine Learning Attitude Towards LLM Prompting
    DSPy: Machine Learning Attitude Towards LLM Prompting 0 like November 8, 2024Share this postAuthor(s): Serj Smorodinsky Originally published on Towards AI. Transition from prompt string manipulations to a PyTorch like frameworkThis member-only story is on us. Upgrade to access all of Medium.Link to the official tutorialFull code at your one stop LLM classification projectHeres a link to a short YouTube video with the code rundownMy goal is to showcase complex technologies through non trivial use cases. This time I have chosen to focus DSPy framework. Its raison detre (reason of being) is to abstract, encapsulate and optimize the logic that is needed for tasking LLM outputs.DSPy allows coders to specify inputs and outputs for an LLM task, and let the framework deal with composing the best possible prompt.Why should you care?You can brag about it during lunchImprove code readabilityImprove LLM task outputsThis is the first part of a series, in which we will focus on an implementation of LLM based classifier. In the next instalment we go deeper with actual optimization.What is DSPy?Why DSPy?Use case: LLM intent classifier for customer serviceDSPy is a framework that was created by Stanford researches. I love the way that the official docs explain so Im attaching it here:DSPy emphasises programming over prompting. It unifies techniques for prompting and fine-tuning LMs as well as improving them with reasoning and tool/retrieval augmentation, all expressed through a Read the full blog for free on Medium.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post
    0 Σχόλια 0 Μοιράστηκε 15 Views
και άλλες ιστορίες