


The leading AI community & content platform making AI accessible to all.
2k writers | 330k followers
2k writers | 330k followers
1 χρήστες τους αρέσει
332 Δημοσιεύσεις
2 τις φωτογραφίες μου
0 Videos
0
Προεπισκόπηση
Μοιράσου το
Μοιράστηκε με
Πρόσφατες ενημερώσεις
-
SQL Best Practices Every Data Scientist and Analyst Should Knowtowardsai.netSQL Best Practices Every Data Scientist and Analyst Should Know 0 like February 18, 2025Share this postAuthor(s): Carlos da Costa Originally published on Towards AI. Write efficient, scalable, and easy-to-read queriesThis member-only story is on us. Upgrade to access all of Medium.Photo by John Schnobrich on UnsplashWriting efficient and readable SQL queries is a fundamental skill for a data scientist or analyst. Following SQL best practices improves query performance, maintainability, and team collaboration.Well-structured queries not only run faster but also make it easier for others to understand and modify them.In this guide to SQL best practices, well explore essential tips to help you write better, optimized SQL queries for data retrieval and analysis focusing on the following concepts:Select Only the Data You Need Improve Performance and Reduce LoadUse Consistent Naming Conventions Enhance Readability and MaintainabilityWrite Descriptive Aliases Make Queries More UnderstandableMaintain Consistent Indentation and Formatting Improve Code Structure for Team CollaborationUse JOINs Explicity for Clarity Prevent Ambiguity and Improve Query ReadabilityUse Common Table Expressions (CTEs) for Clarity Break Down Complex Queries into Manageable PartsAdd Clear and Concise Comments Improve Collaboration and Future MaintainabilityIts a common mistake among data scientists and analysts to use SELECT * in SQL queries, especially when were in a rush or simply too lazy to specify the exact columns we need. You can reduce database load and optimize resource Read the full blog for free on Medium.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post0 Σχόλια ·0 Μοιράστηκε ·32 Views
-
Important LLMs Papers for the Week from 03/02 to 09/02towardsai.netImportant LLMs Papers for the Week from 03/02 to 09/02 0 like February 18, 2025Share this postLast Updated on February 18, 2025 by Editorial TeamAuthor(s): Youssef Hosni Originally published on Towards AI. Stay Updated with Recent Large Language Models ResearchThis member-only story is on us. Upgrade to access all of Medium.Large language models (LLMs) have advanced rapidly in recent years. As new generations of models are developed, researchers and engineers need to stay informed on the latest progress.This article summarizes some of the most important LLM papers published during the First Week of February 2025. The papers cover various topics shaping the next generation of language models, from model optimization and scaling to reasoning, benchmarking, and enhancing performance.Keeping up with novel LLM research across these domains will help guide continued progress toward models that are more capable, robust, and aligned with human values.LLM Progress & BenchmarkingLLM ReasoningLLM AgentsLLM Preference Optimization & AlignmentLLM Scaling & OptimizationLLM Safety & SecurityRetrieval Augmented Generation (RAG)Attention ModelsMost insights I share in Medium have previously been shared in my weekly newsletter, To Data & Beyond.If you want to be up-to-date with the frenetic world of AI while also feeling inspired to take action or, at the very least, to be well-prepared for the future ahead of us, this is for you.Subscribe below to become an AI leader among your peers and receive content not present in any other platform, including Medium:Data Science, Machine Learning, AI, Read the full blog for free on Medium.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post0 Σχόλια ·0 Μοιράστηκε ·2 Views
-
Is Artificial Intelligence Ushering Cognitive Decline?towardsai.netLatestMachine LearningIs Artificial Intelligence Ushering Cognitive Decline? 0 like February 17, 2025Share this postAuthor(s): Cezary Gesikowski Originally published on Towards AI. Impact of Cutting-Edge Intelligent Tech on Critical ThinkingThis member-only story is on us. Upgrade to access all of Medium.image by the author | digital / analog collage | 2025 Cezary GesikowskiHigher confidence in AI is associated with less critical thinking, while higher self-confidence is associated with more critical thinking. (Lee et al., 2025)The rapid rise of generative artificial intelligence (AI) and automation in daily life and work has sparked concern about their impact on human cognition. AI now plays an increasing role in decision-making, writing, creative tasks, and data analysis, fundamentally altering how people engage with intellectual activities. AI-driven tools like ChatGPT assist with text generation, Grammarly enhances writing accuracy, and Midjourney creates unique visual content. The pace of innovation in AI is increasing as competition worldwide heats up to win the AI game.image: Timeline illustrating key milestones in the evolution from machine learning to generative AI. | Source:Accenture, 2023These technologies streamline workflow but also raise concerns about diminishing human creative and analytical engagement. As AI systems handle tasks that once required human mental effort, are we becoming mentally lazy or losing critical skills? The debate surrounding AI and its impact on cognitive abilities has become a significant point of discussion among scholars, psychologists, neuroscientists, and Read the full blog for free on Medium.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post0 Σχόλια ·0 Μοιράστηκε ·16 Views
-
Building Agentic AI with Java: My Development Journeytowardsai.netLatestMachine LearningBuilding Agentic AI with Java: My Development Journey 0 like February 17, 2025Share this postLast Updated on February 18, 2025 by Editorial TeamAuthor(s): Janahan Sivananthamoorthy Originally published on Towards AI. This member-only story is on us. Upgrade to access all of Medium.Generated by: Grok/XHi there!If you are a member, just scroll and enjoy the post!Not a member? click the link here to enjoy the full article.Remember when I was really excited about generative AI and LLMs in my last post (links at the end if you missed the party!)? Well, buckle up, because things have gotten even more interesting. Ive been diving headfirst into the world of Agentic AI, and its mind-blowing. As someone who spends most of the time wrestling with Java and building systems, I was itching to see how this new breed of AI could shake things up. Now Im exploring how to bring Agentic AI use cases into our Java systems its like adding superpowers to our code! Turns out, this isnt your basic if-then AI. Were talking next-level stuff. They can think on their feet, make plans (and change them!), and basically take charge.Instead of AI that just follows instructions, these new AIs are like super-helpful assistants. They can handle all sorts of tasks, big and small, with less help from us. Its a whole new world of AI, and Im here to share Read the full blog for free on Medium.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post0 Σχόλια ·0 Μοιράστηκε ·17 Views
-
From Training Language Models to Training DeepSeek-R1towardsai.netFrom Training Language Models to Training DeepSeek-R1 0 like February 17, 2025Share this postAuthor(s): Akhil Theerthala Originally published on Towards AI. Reasoning Models #1 An overview of trainingFrom RNNs to LLMs, a comprehensive overview of how training regimes changed.This member-only story is on us. Upgrade to access all of Medium.You probably already understand the potential of reasoning models. Playing around with O1 or DeepSeek-R1 shows us these models enormous promise. As enthusiasts, we are all curious to build something like these models.We all start on this path, too. However, from the sheer scale of things, we get overwhelmed by where we can start. Rightfully so, earlier, around 67 years ago, we only needed an input and output to train a module. As someone who builds those models, we know that getting these two things right is hard. However, things are way more complex now. We need additional task-specific data for every task we do.As an enthusiast, I want to dig deeper into these reasoning models and learn what they are and how they work. As a part of this process, I also plan to share everything Ive learned as a series of articles to get a chance to discuss these topics with like-minded folks. So, please keep commenting and sharing your thoughts as you read this article.Without delay, Id like to dive into todays topic the Read the full blog for free on Medium.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post0 Σχόλια ·0 Μοιράστηκε ·53 Views
-
DeepSeek AI A Technical Overviewtowardsai.netDeepSeek AI A Technical Overview 0 like February 17, 2025Share this postLast Updated on February 17, 2025 by Editorial TeamAuthor(s): M. Haseeb Hassan Originally published on Towards AI. Part 2 of 3Generative AI technologies dramatically developed in recent years because of large language models GPT-4, Claude, and Gemini which pushed the limits of artificial intelligence systems. The development of artificial intelligence models has produced exceptional outcomes that deliver human-authored text and resolve complex issues in various fields of expertise. GPT-4 from OpenAI operates across hundreds of billions of tokens to process multiple language inputs while taking the first position in benchmarks SuperGLUE and MMLU. PaLM-2 from Google delivers outstanding multilingual functionality and reasoning capabilities thanks to its 340 billion parameter system.DeepSeek AI A Technical OverviewAn OverviewSafer and scalable AI systems have made developing innovative computational structures a requirement for addressing growing market needs. DeepSeek AI represents the newest generation of artificial intelligence model which expands the boundaries of natural language processing functionality. Training standards indicate that contemporary LLMs need trillions of tokens to attain optimal performance even though DeepSeek keeps its training data specifics confidential. The engineers behind DeepSeek AI designed this model to resolve important issues throughout AI infrastructure including efficiency deep context processing and multilingual capability. See this blog for a general overview and understanding of DeepSeek AI:DeepSeek AI The Future is HereDeepSeek AI is an advanced AI genomics platform that allows experts to solve complex problems using cutting-edge deeppub.towardsai.netThe distinct architectural elements in DeepSeek make possible industry-leading performance capability in addition to maximizing resource consumption. GPT-4 operates with a token inference time of milliseconds yet DeepSeek uses dynamic attention routing alongside hierarchical tokenization to decrease this to a 30% improvement for faster and more efficient performance. DeepSeek concentrates on offering language support to low-resource populations just like Metas NLLB which covers 200 languages alongside enhanced translation capabilities for minority languages. The exceptional capabilities of DeepSeek AI result from its unique architectural design advanced training methods and cutting-edge specifications.In this blog, well explore the DeepSeek AI model architecture in detail, uncovering the technical innovations that make it a standout in the crowded field of generative AI. The exploration of DeepSeek AIs underlying technology provides essential information for all AI researchers developers and AI enthusiasts who want to understand this modern leading AI system.History and DevelopmentA team of persistent researchers developed DeepSeek AI using their objective to establish a model that would both match and exceed typical LLMs regarding performance outcomes and adjustable operations. The research team analyzed existing challenges from the market which included both expensive model training procedures and restricted capacity to handle intricate complex tasks. DeepSeek AI developers concentrated on scalable and efficient design because it differed from GPT-4 and PaLM-2. The development team took on three main problems to solve:Limited contextual depth in long-form text generation.Inadequate multilingual support for low-resource languages.The head-on address of major issues has made DeepSeek AI into a model that brings cutting-edge research and practical utilization together.Design & ArchitectureThis section includes the model details, unique design and feature components in the DeepSeek architecture,Model DesignAt its core, the DeepSeek AI model architecture is built on a transformer-based foundation, which has become the gold standard for NLP tasks. DeepSeek incorporates various improvements to regular transformer systems beginning with its design approach.Distinctive Layer Configurations: DeepSeek integrates two attention models that utilize dense connections together with sparse connections to develop its layer structure. By using this design the model processes extended sequences better with sustained accuracy rates.Attention Mechanism Innovations: The model routes computational attention to dynamically selected parts in the input sequence through its attention-routing mechanism. The system decreases both computational redundancy and speeds up inference tasks.Scaling Strategies: The DeepSeek system employs modular scaling techniques that allow its components to extend horizontally across multiple GPUs plus vertically by increasing its layer complexity without impacting overall performance.DeepSeek AI Model Architecture [GeeksforGeeks]Unique ComponentsThe DeepSeek AI outperforming capabilities are the result of unique architectural components and their optimized utilizations.Novel Embedding Techniques: DeepSeek develops context-specific embeddings that maintain a dynamic response to the semantic conditions found in the input text to strengthen polysemy and homonymy processing.Specialized Token Processing: The model performs hierarchical token processing which divides complicated inputs into interconnected smaller components.Advanced Context Understanding: DeepSeek uses its multi-hop memory network to store and access information from past sections of dialogues or documentation which enhances contextual comprehension of the model.Innovative Neural Network Elements: Stable training occurs in deep networks because of the gated residual connections which prevent gradient vanishing and enable smooth operation across very deep architecture.Data Processing & TrainingThe training corpus of DeepSeek covers more than 10 trillion tokens through the use of documentation from scientific literature and legal records as well as social media content.PreprocessingThe data cleaning process together with deduplication methods ensures the data remains of high quality before analysis. The team applies specialized preprocessing methods dedicated to handling terms particular to specific domains. Multi-lingual capabilities in DeepSeek extend to more than 50 languages support despite placing priority on minimal resource languages which other models typically ignore.Training OptimizationThe process begins with fine-tuning the base model (DeepSeek-V3) using a small dataset of carefully curated chain-of-thought (CoT) reasoning examples. These examples are carefully curated to ensure diversity, clarity, and logical consistency. By the end of this phase, the model demonstrates improved reasoning abilities, setting the stage for more advanced training phases.DeepSeek AI Training Technique [Geeksforgeeks]The following techniques were utilized in the detailed training process of the DeepSeek AI:Distributed Training Infrastructure: The distributed computing platform utilizes thousands of GPUs to perform a more efficient training process with a 40% faster execution time compared to standard equipment.Computational Efficiency: Implementation of mixed-precision training with gradient checkpointing techniques enhances the performance of memory utilization and speeds up computations.Loss Function Innovations: DeepSeek utilizes a personalized loss function that combines perplexity with diversity to find the perfect balance between accurate and creative output text.Regularization Techniques: To promote generalization while countering overfitting the team employs adaptive dropout together with label smoothing.Performance and BenchmarksDeepSeek surpasses GPT-4 in benchmark accuracy by delivering superior performance of 510% on GLUE, SuperGLUE, and SQuAD. Through its optimized design the model attains a 30% shorter inference time than models with equivalent size.DeepSeek AI Performance Comparison [deepseek.com]DeepSeek provides superior performance for low-resource languages by obtaining 15% better BLEU scores than competitor systems during translation operations. DeepSeek maintains sequential information spanning up to 10,000 tokens which makes it an exceptional solution for creating and summarizing long content. The creative output capabilities of DeepSeek demonstrate accuracy together with innovative performance since it excels in creative writing tasks and code generation.Innovators and DifferentiatorsMany different factors play a key role in the exceptional performance of DeepSeek AI. The engineers and developers have come up with excellent collaborations to get the outperforming results.Dynamic Attention Routing: DeepSeek executes resource distribution more effectively through this innovation to achieve reduced latency combined with lower energy usage.Hierarchical Tokenization: The model provides exact solution processing through its process of dividing inputs into smaller processing units.Modular Scaling: DeepSeeks design supports smooth scalability which makes it work with different application types.Performance Improvements: DeepSeek leads the industry by delivering unprecedented accuracy enhancement and efficient performance improvements compared to past versions.Resource Utilization: The models optimized structural design decreases energy emissions during training and running processes which helps organizations achieve sustainability targets.Adaptability: DeepSeek presents architecture flexibility which makes it suitable for healthcare as well as financial applications and various other domains.ConclusionThe DeepSeek AI model architecture represents a significant leap forward in the field of generative AI. DeepSeek established a new benchmark for performance while using innovative approach designs together with cutting-edge training practices. The future development will include implementing methods of federated learning and on-device AI to boost the models functionality. DeepSeek maintains a prime position to influence the development of the evolving generative AI environment.Stay Tuned !!If you enjoyed the article and wish to show your support, make sure to: Give a round of applause (50 claps) to help get featured Follow me on Medium to stay updated with my latest content Explore more articles on my Medium Profile Connect with me on LinkedIn, Twitter, and InstagramJoin thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post0 Σχόλια ·0 Μοιράστηκε ·53 Views
-
Copyright Policies for AI-Generated Content: The Global Landscapetowardsai.netAuthor(s): Mohit Sewak, Ph.D. Originally published on Towards AI. Copyrighting the Clones: Is AI the Future Picasso or Just a Fancy Photocopier?Lets come straight to the point and kick things off with a place you know pretty well good ol USA!Section 1: Uncle Sam Says No Robots Allowed in Copyright Town (USA)Ah, America! Land of the free, home of the brave, and sticklers for human-made art. You know how much they love their originality thing, right? Well, when it comes to copyright, the US Copyright Office is like that strict bouncer at a club, only letting in creations with a human touch pass (United States Copyright Office, 2023). No robot DJs allowed, apparently.[Image: , Prompt: A stern-looking Uncle Sam character wearing a Copyright Office badge, standing at a velvet rope, turning away a robot artist with a paintbrush., Caption: Human Input Only: In Uncle Sams copyright club, AI creations are on the VIP blacklist., alt: Humorous depiction of US copyright policy as excluding AI art]Remember that time when I was trying to get a patent for my well, lets just say it was a very innovative algorithm back in my PhD days at USC? The patent officer looked at me with the same expression the Copyright Office probably gives AI-generated art skeptical, to say the least! They wanted to see the human genius, the sweat, the tears, the late-night coffee-fueled coding sessions. Basically, they wanted to know I was the brains behind it, not just some fancy machine spitting out code.The Human-Authorship Doctrine: No Humans, No Copyright. Period.So, heres the deal, according to the US Copyright Office: Copyright is for original works of authorship (Copyright Act, Title 17). Key word: authorship. And in their book, authorship means human authorship. AI, in their view, is just a fancy tool, like a super-powered paintbrush or a word processor on steroids. Think of it like this: you use a hammer to build a chair, you own the chairs copyright, not the hammer manufacturer. Same logic applies to AI (United States Copyright Office, 2023).Pro Tip: If you are creating something using AI and want to copyright it in the US, make sure you, the human, have added significant creative input. Dont just rely on the AI output as is. Be the director, not just the audience!AI-Assisted vs. AI-Generated: The Fine LineNow, its not all black and white. If you use AI as a tool, like say, you ask ChatGPT to write a first draft of your blog (hypothetically, of course! ), and then you heavily edit it, rewrite chunks, add your own jokes, and sprinkle in some Mohit-magic boom! Thats considered human-assisted. Copyrightable! Because you, my friend, are the author (United States Copyright Office, 2023). The AI is just your ber-smart intern, helping you out.But, if you just ask the AI to create something from scratch and you literally just copy-paste and claim it as yours? Nope. Not in the USA. Thats AI-generated, and according to the Copyright Office, thats a no-go for copyright protection (United States Copyright Office, 2023). Its like trying to copyright a photograph taken by a security camera interesting, maybe, but no human author there, right?Content Types, Applications, and Stakeholders: Who Cares and Why?Lets break it down for different types of content:Images, Text, Music: Pure AI creations? Copyright denied. But, if you take an AI-generated image and then, say, paint over it digitally, add elements, change the style drastically, and make it your own artistic expression then those human-authored elements can be copyrighted. Its all about showing your creative fingerprint.Software Code: If youre a coder using AI tools to speed things up like GitHub Copilot suggesting lines of code and you are still making the key architectural decisions and writing significant chunks of code yourself, then your code can be copyrighted. AI is just your coding buddy, not the lead programmer.Generative AI Platforms: Think Midjourney, DALL-E, ChatGPT. These platforms themselves cant copyright the raw output their AI spits out based on user prompts. However, they might start focusing on tools that let users significantly modify the AI output, so users can add that crucial human spark and claim copyright. Smart move, eh?Creative Industries: Artists, musicians, writers using AI need to be savvy. They have to ensure their work isnt just AI regurgitation, but a genuine blend of human creativity and AI assistance. Think of it as a collaboration, where the human is the senior partner.News and Journalism: Imagine an AI writing news summaries. Straight-up AI news might be copyright-less. But if a human journalist uses AI to assist in research, fact-checking, or even drafting, and then adds their editorial judgment, analysis, and writing flair that content can be protected. Human plus AI, thats the ticket.Stakeholders in this Copyright Conundrum:Creators: Gotta prove that human spark! Document your creative process, show your edits, your unique inputs. Basically, show your work!Users: You might not own the copyright to pure AI stuff you generate, limiting how you can use it commercially. Be careful if you are planning to build a business on just copy-pasting AI outputs.Platforms: Need to empower users to add their human creativity. Think about offering editing tools, style transfer options, ways to remix and personalize AI outputs.Industries: Stock photo sites, music libraries relying heavily on pure AI content? They might need to rethink their strategy because purely AI stuff in the US? Copyright denied!Trivia Time: Did you know that in the US, ideas themselves are not copyrightable, only the expression of those ideas? So, you cant copyright the idea of a story about a boy wizard, but you can copyright Harry Potter because its how J.K. Rowling expressed that idea (Copyright Act, Title 17). Its all about the execution, baby!Creativity is intelligence having fun. Albert Einstein.Lets make sure humans are still having the most fun, even with AI in the mix!Einsteins Wisdom: Let humans have the creative fun, even when AI is in the lab.Pro Tip for the Road: Always keep records of your human contributions when using AI. Think of it as building your copyright paper trail. Dates, drafts, edits, creative choices document everything! It could save your skin (and your copyright!) later.Okay, USA copyright scene check! Next stop on our world tour? Across the pond to the land of Shakespeare and slightly different AI copyright rules! Lets hop over to the United Kingdom!Section 2: Cheers, Mate! UKs Computer-Generated Works A Quirky Copyright CornerRight then, off to the UK! Now, the Brits, bless their innovative hearts, have a rather unique take on this whole AI copyright thing. While the US is all about human authorship or bust, the UK has this quirky little provision called computer-generated works (CGW) tucked away in their Copyright, Designs and Patents Act of 1988 (UK Copyright, Designs and Patents Act 1988, Section 178). Its like they saw the AI future coming way back then!British Innovation: A spot of tea and a closer look at AI copyright with a monocle of curiosity.I remember presenting my research at a conference in UK once. During the Q&A, a very proper-sounding gentleman asked me about the implications of AI on intellectual property. I started talking about human authorship, US style, and he politely interrupted, Ah, but have you considered our Computer-Generated Works provision, old boy? Talk about a curveball! Thats when I realized the UK was playing a different copyright game altogether.CGW: Copyright Without a Human Author? Blimey!Heres the head-scratcher: UK law actually allows for copyright even when theres no human author! Section 178 defines a CGW as a work generated by computer in circumstances such that there is no human author of the work (UK Copyright, Designs and Patents Act 1988, Section 178). Whoa! Mind blown, right?And Section 9(3) goes even further, stating that for CGW, the author shall be taken to be the person by whom the arrangements necessary for the creation of the work are undertaken (UK Copyright, Designs and Patents Act 1988, Section 9(3)). So, basically, if a computer (or AI) creates something all on its own, someone still gets to be the author and hold copyright. And that someone is whoever made the arrangements for it to happen. Trippy, isnt it?Pro Tip: The UKs CGW provision could be a game-changer for AI developers and platforms. But, the interpretation of arrangements is key, and still a bit murky. Legal battles might be brewing!Arrangements Necessary: The Million-Pound QuestionWhat exactly are these arrangements necessary? Thats the million-pound question, isnt it? Does it mean the person who wrote the AIs code? The company that runs the AI platform? The user who types in the prompt? The law doesnt spell it out clearly, and thats where the legal fun (and headaches) begin.The UK Intellectual Property Office (UK IPO) is currently revisiting this CGW provision because, lets face it, AI has gotten way more sophisticated since 1988. They are wondering if this provision is still fit for purpose in a world of super-smart, almost autonomous AI (UK Intellectual Property Office). Think about it, back then, computer-generated probably meant some basic algorithms creating pixel art. Now, we have AI composing symphonies and writing screenplays!Content Types, Applications, and Stakeholders: A British PerspectiveHow does this CGW quirkiness play out in practice?Text, Images, Music, Code: Theoretically, pure AI creations in these formats could be copyrightable in the UK under CGW, provided someone made the arrangements. This is way more lenient than the US stance. Imagine an AI composing a pop song entirely on its own. In the UK, someone could potentially copyright it!Generative AI Platforms: This is where it gets interesting. Platforms like DeepMind or Stability AI could argue that they are the ones making the arrangements necessary for AI content creation, and therefore, they could claim copyright on AI-generated outputs. Potentially a big advantage for platforms operating in the UK!Creative Industries: Even with CGW, UK creators might still prefer to show human authorship to avoid legal ambiguity. Human-assisted might still be seen as safer copyright ground, even with the CGW safety net.Software Development: Companies using AI to generate code in the UK could potentially claim copyright under CGW if they can argue they made the arrangements. This could incentivize AI-driven software innovation in the UK.Stakeholders in the UK Copyright Drama:Arrangers (Platforms/Companies): They are the potential copyright winners in the UK system. But, they need to navigate the murky waters of arrangements necessary and be ready for legal interpretations and challenges.Users: Potentially benefit from broader copyright protection. If platforms can copyright AI outputs, maybe they will offer more user-friendly licensing options? But uncertainty remains until the arrangements definition is clarified.Legal Interpretation: The word person in UK law generally includes companies as well as individuals. So, AI companies themselves might be able to claim CGW copyright. This is a huge deal!Industries: Industries investing in AI content generation might see the UK as a more copyright-friendly zone than the US. Could the UK become an AI content creation hub because of CGW? Intriguing thought!Trivia Time: The UK was one of the first countries in the world to recognize computer-generated works in its copyright law. Talk about being ahead of the curve! Though, maybe even they didnt foresee AI becoming this creative back in 88! (UK Copyright, Designs and Patents Act 1988).To be or not to be, that is the question of AI copyright! Yours Truly, Dr. Mohit Sewak, while paraphrasing some famous Brit.Okay, Shakespeare probably didnt say that exactly, but he might have pondered AI copyright if he were around today!Shakespeare Contemplates AI: To CGW or not to CGW, that is the copyright question.Pro Tip for the Road: If you are operating an AI platform in the UK, get yourself a good lawyer, mate! Seriously. The CGW provision is a legal gray area. Understanding arrangements necessary and how to best position yourself to claim CGW copyright is crucial.Alright, from London fog to Brussels bureaucracy! Lets hop over to the European Union and see how they are grappling with AI and copyright. Spoiler alert: its a bit more complicated, and involves more meetings!Section 3: Bonjour Copyright! The EUs Own Intellectual Creation TangoAh, the European Union! A continent of culture, history, and a rather nuanced approach to copyright. Unlike the UKs quirky CGW provision, the EU, in general, leans towards the human touch doctrine, much like the US (Directive 2001/29/EC of the European Parliament). But, as with anything EU-related, its a bit more harmonized and directive-driven.Davids Dilemma: Pondering the EUs Originality standard for AI art in the age of directives.I remember attending a conference in Brussels once, discussing EU digital policy. Trying to understand the EUs stance on AI copyright felt like trying to navigate a well, an EU committee meeting! Lots of languages, lots of opinions, and a general feeling that everyone is trying to agree, but slowly. Very slowly.Originality is Key: Own Intellectual Creation or Bust in the EUThe EU copyright framework, shaped by directives like the Copyright Directive 2001/29/EC, emphasizes originality as a condition for copyright protection (Directive 2001/29/EC of the European Parliament). But what does originality mean? The Court of Justice of the European Union (CJEU), the EUs top court, has weighed in. They say originality means a work is the authors own intellectual creation, reflecting their personality and free and creative choices. Translation? Human authorship, basically (CJEU interpretations of EU Directives).Pro Tip: In the EU, originality isnt just about being new; its about being a reflection of the human authors creative personality. Think of it as artistic DNA they want to see your unique creative code in the work.So, across most EU member states, purely AI-generated stuff? Probably not copyrightable. They want to see that human creative contribution (EU Directives and National Laws of Member States). Its all about that human intellectual creation. The EU is very much in the human-in-the-loop camp when it comes to copyright.Ongoing Discussions, Evolving Policies: The EU ShuffleNow, the EU isnt ignoring AI. Oh no, they are discussing it. A lot! The AI Act, for example, is a massive piece of EU legislation aiming to regulate AI (Artificial Intelligence Act). And within that, and broader EU digital strategy talks, AI and IP are definitely on the agenda (Artificial Intelligence Act). But, specific EU-wide copyright policy for AI-generated content? Still evolving. Think of it as a slow, but steady, EU policy dance. A tango, perhaps? Lots of steps forward, some steps back, and a bit of side-stepping.Content Types, Applications, and Stakeholders: An EU PerspectiveSo, how does the EUs originality focus impact things on the ground?Text, Images, Music, Code: Like the US, in the EU, pure AI creations in these formats are unlikely to get copyright protection. The human originality requirement is a tough hurdle. AI-assisted works? Potentially copyrightable, if they show significant human creative input. Dj vu, anyone?Generative AI Platforms: EU platforms face similar challenges as their US counterparts. They need to enable users to inject that own intellectual creation magic to make AI outputs copyrightable. EU users need tools to personalize, modify, and add their creative stamp.Creative Industries: EU artists using AI? Focus, focus, focus on demonstrating your original creative contribution! Make it clear that you are not just pressing generate and calling it a day. Show the human artistry!Research and Education: AI-generated academic papers or educational materials without significant human intellectual input? Copyright unlikely. EU academia still values human thought, analysis, and writing, surprise!Stakeholders in the EU Copyright Epic:Creators: Must prove their work is their own intellectual creation. Substantial human input, show your personality in the work! Think of it as infusing your creative soul into the AI-assisted creation.Users: Similar limitations as in the US purely AI-generated content? Copyright-challenged. Commercial use might be tricky without adding human creativity.EU Policymakers: They are in a tight spot. Balancing encouraging AI innovation with upholding the traditional principles of human authorship and intellectual creation. Its a tough policy balancing act.Industries: Need to navigate potentially varying interpretations of originality across different EU member states. And lobby for clearer EU-wide guidelines. EU-level harmonization, anyone?Trivia Time: The EU Copyright Directive (2001/29/EC) was adopted way before the current AI boom. Talk about trying to fit a square peg in a round hole! The EU framework was designed for a pre-AI world, and now they are trying to adapt it to the AI revolution. Policy catch-up in action!Europe was created by history. America was created by philosophy. Margaret Thatcher.Well, when it comes to AI copyright, both history (EU tradition of human authorship) and philosophy (US focus on individual creativity) are clashing with technology!EU Copyright Committee: Harmonizing Humor and Headaches over AI Arts Authorship.Pro Tip for the Road: Keep an eye on EU policy developments! The EU is actively discussing AI and IP. Engage in the conversation! Your voice (especially if you are a creator or platform in the EU) matters! Lobbying might be in order!Alright, EU copyright tango paso doble-d! Lets take a step back and zoom out to the global level.Next stop: WIPO the World Intellectual Property Organization! Think of them as the United Nations of copyright but for the whole world!Section 4: WIPO: The Global Copyright Chat Club Talking AI, Not Ruling ItNow, lets step onto the global stage, folks! Forget country-specific policies for a moment. Enter WIPO the World Intellectual Property Organization (WIPO). Think of WIPO as the global forum where all countries come to chat about intellectual property, including the AI copyright conundrum (WIPO Conversation on IP and AI). They are like the ultimate global copyright discussion club, but with fancy reports and important-sounding committees.Global Copyright Crossroads: WIPO facilitating the worldwide chat on AIs artful authorship.If you could ever go to present at a WIPO conference once in Geneva, you will realize that is fascinating! Delegates from all over the world, passionately debating the future of IP in the age of AI. It feels like being in a global brainstorming session, trying to collectively wrap your heads around this AI copyright beast.Facilitating Dialogue, Not Dictating Law: WIPOs RoleHeres the key thing about WIPO: they dont set binding international laws (WIPO). Nope. They are more about facilitating international dialogue. They host discussions, commission studies, publish reports, and try to foster consensus among their member states (WIPO Standing Committee on Copyright and Related Rights (SCCR)). Think of them as global copyright matchmakers, trying to find common ground, but not actually officiating any marriages.Pro Tip: WIPO is the place to watch for global trends in AI copyright policy. Their discussions and reports shape the international conversation and influence national policies. Pay attention to WIPO!WIPOs Standing Committee on Copyright and Related Rights (SCCR) is where a lot of the AI copyright action happens (WIPO Standing Committee on Copyright and Related Rights (SCCR)). They explore different models for copyrighting AI-generated content, from sticking to strict human authorship to maybe, just maybe, exploring new models that acknowledge AIs creative role. They are basically trying to figure out: how do we adapt copyright for the AI age?Global Harmonization Efforts: Finding Common Ground in a Copyright BabelWIPOs work is crucial for trying to bring some international harmonization to this crazy AI copyright landscape (WIPO Conversation on IP and AI). Right now, as we are seeing, different countries have wildly different approaches US says no AI copyright, UK says maybe CGW copyright, EU says human originality, and we havent even gotten to Asia yet! WIPO is trying to identify areas of agreement and disagreement. Can we find some global best practices? Can we at least reduce the cross-border copyright chaos? Thats the WIPO hope!Implications for the Global Policy Landscape:Shaping the Conversation: WIPOs work sets the tone for global discussions. Their reports become reference points for national policymakers worldwide. They are basically curating the global AI copyright conversation.International Standards (Maybe, Eventually): WIPOs efforts could lead to some international guidelines or best practices down the line. Dont expect a binding global AI copyright treaty tomorrow, but WIPO can nudge countries towards more consistent approaches. Slow and steady wins the harmonization race?Stakeholders at the Global Level:National Governments: Governments worldwide look to WIPO for analysis and guidance when crafting their own national AI and IP policies. WIPO is like the global AI copyright policy advisor.International Organizations: Organizations like the UN, WTO, etc., use WIPO as a platform for coordinating international IP efforts related to AI. WIPO is the IP hub for the global org world.Industries and Creators (Global): Benefit from WIPOs efforts to clarify the messy international landscape. More clarity = less cross-border uncertainty in AI copyright. Global creators, rejoice (potentially)!Trivia Time: WIPO was established in 1967, but its roots go way back to 1883 with the Paris Convention for the Protection of Industrial Property. Talk about a long history in the IP game! Theyve seen copyright evolve from printing presses to well, AI algorithms! (WIPO).The only way to do great work is to love what you do. Steve (WIPOs) Jobs.Trying to make the global IP landscape a little more lovable, even with AI shaking things up!Steve Jobs at WIPO: Great work, great IP, and loving the AI copyright challenge.Pro Tip for the Road: Follow WIPOs work on AI and IP! Check out their website, read their reports, see what the SCCR is up to. If you want to understand the future of global AI copyright, WIPO is your go-to source. Stay globally informed!Okay, global copyright chat club adjourned! Lets now swing East, way East, to the land of dragons, dumplings, and a surprisingly pragmatic approach to AI copyright. Next stop: China!Section 5: Ni Hao Copyright! Chinas Pragmatic Path Certain Intellectual Achievement PleaseGong Xi Fa Cai, folks! Weve landed in China, a land of rapid tech innovation, and a copyright landscape thats well, lets call it evolving rapidly when it comes to AI. While the US and EU are holding onto human authorship tightly, and the UK is playing with CGW, China is carving out its own, more pragmatic path (China: first copyright ruling on AI-generated image, 2022). Think of it as copyright with Chinese characteristics!Panda Judge in China: Weighing Intellectual Achievement in the age of AI artistry.If you visit Beijing for a tech conference. You will find, the buzz around AI palpable! Everyone will be talking about AI innovation, AI applications, AI everything! But when you ask about copyright policies for AI art, the answers will be less definitive, more watch this space. China is still figuring things out, but they are moving fast, and with a distinctly pragmatic approach.Certain Intellectual Achievement: Not Just Human, But Original and SkillfulChina isnt strictly saying human authorship only like the US. But they are also not fully embracing CGW like the UK. Instead, Chinese courts are starting to assess AI-generated content based on whether it reflects a certain intellectual achievement (China: first copyright ruling on AI-generated image, 2022). What does that even mean? Well, its about looking at the level of human input in setting up the AI, selecting data, and refining the outputs. Its a bit like saying, Show us you put some real effort and skill into this AI creation process, human!Pro Tip: Chinas certain intellectual achievement standard is more flexible than strict human authorship, but less clear than CGW. Demonstrating human involvement and originality is key to copyright success in China.Recent court decisions in China suggest that if a human provides detailed prompts, selects and arranges AI-generated elements, and exercises creative judgment, the resulting work might be copyrightable in China (China: first copyright ruling on AI-generated image, 2022). Its not just about pressing generate. Its about directing the AI, curating its output, and adding your human creative sauce.Human Involvement as Key: Directing the AI OrchestraThink of it like conducting an orchestra. The AI is the orchestra, full of instruments and musical potential. You, the human, are the conductor. You choose the music (prompts), you guide the performance (refine outputs), you shape the final sound (creative judgment). The orchestra (AI) creates music, but the intellectual achievement, the artistic vision, comes from the conductor (human). Thats kind of the Chinese vibe.Content Types, Applications, and Stakeholders: The Chinese AngleHow does this intellectual achievement approach play out in China?Images, Text, Art: Chinas approach is potentially more permissive than the US or EU. Some AI-generated content could get copyright if you can show sufficient human input and originality. Emphasis on human direction and curation of AI.Generative AI Platforms: China might be seen as more platform-friendly than the US or EU. Platforms could argue they have a stronger claim to copyright over AI outputs, especially if users are guided to provide detailed prompts and creative direction. Platform-user collaboration for copyright?AI Art and Design: Chinese creators might find it easier to protect AI-assisted works compared to artists in stricter jurisdictions. As long as they can demonstrate their intellectual achievement in guiding the AI. Human-AI creative partnerships, Chinese style!Technology Companies: Investing in AI content generation in China? Might offer more IP protection opportunities compared to the US or EU. China could become an attractive zone for AI content businesses seeking copyright.Stakeholders in the Chinese Copyright Landscape:Creators: Benefit from a potentially broader scope of copyright protection. But still need to demonstrate human input and intellectual achievement. Show your creative kung fu!Platforms and Companies (China-based): May find it easier to establish copyright for AI-generated content. Potentially a competitive advantage for Chinese AI businesses.Legal System (China): Developing case law to define the boundaries of copyrightability for AI works. Balancing AI innovation with IP rights is the Chinese legal tightrope walk.Industries (China): Industries in China might gain a competitive edge by leveraging AI for content creation and protecting it with copyright, potentially more easily than in other regions. China: the rising AI copyright power?Trivia Time: Chinas first copyright ruling on an AI-generated image happened only recently, in late 2022. Talk about real-time policy evolution! China is actively shaping its AI copyright rules as the technology develops, very much learning by doing! (China: first copyright ruling on AI-generated image, 2022).The journey of a thousand miles begins with a single step. Lao Tzu.Chinas AI copyright journey is just beginning, and its taking pragmatic steps to navigate this new terrain!Lao Tzus Wisdom for AI Copyright: A pragmatic journey starts with understanding.Pro Tip for the Road: If you are doing AI content creation in China, or targeting the Chinese market, pay close attention to court decisions and evolving legal interpretations. Chinas AI copyright approach is dynamic. Stay agile and informed! Copyright kung fu requires constant learning!From pragmatic China, were heading Down Under! Lets jump to Australia and see how they are wrestling with AI and copyright and kangaroos!Section 6: Gday Copyright! Australias CGW Down Under Arrangements and ReviewsCrikey! Weve landed in Australia, the land of sunshine, surf, and another country with a computer-generated works (CGW) provision! Yep, just like the UK, Australia has had CGW in its Copyright Act since 1968 (Australian Copyright Act 1968). Are the Aussies onto something here? Lets find out!Aussie Copyright Kangaroo: Surfing the waves of AI IP, CGW style Down Under.One of my doctoral colleague, once gave a talk at a university in Sydney. Afterwards, during a barbie (thats Aussie for BBQ, mate!), a law professor asked him, So, what do you reckon about our CGW, eh? He was like, CG-what now? Turns out, Australia, like the UK, has this unusual copyright approach that could be quite relevant in the AI age.Arrangements Necessary The Aussie InterpretationJust like the UK, Australian copyright law says that for CGW, the author is the person who made the arrangements necessary for the creation of the work (Australian Copyright Act 1968). Sound familiar? It should! Its almost word-for-word the same as the UK provision. So, copyright can exist for AI-generated works in Australia, and its all about who made those arrangements. Ringing any CGW bells yet?Pro Tip: Australias CGW provision is very similar to the UKs. Arrangements necessary is still the key phrase, and just as ambiguous. Legal interpretation is crucial.IP Australia, the Aussie IP office, is currently reviewing their IP framework in light of AI advancements (IP Australia). They are asking the big questions: Is CGW still good enough for todays AI? Do we need to update copyright law for more complex AI outputs? Think of it as an Aussie copyright review and revamp in progress!Ongoing Review: Copyright Down Under, Under ScrutinyThis review is important because, just like in the UK, the original CGW provision was written way before the AI revolution we are seeing now. Are arrangements in 1968 the same as arrangements in 2025, when AI can write novels and compose operas? Probably not! IP Australia is trying to figure out if their CGW provision is future-proof, or needs a tune-up.Content Types, Applications, and Stakeholders: The Aussie SpinHow does CGW and the ongoing review affect things in Australia?Text, Images, Music, Code: Similar to the UK, Australias CGW could mean copyright protection for pure AI creations. It hinges on who made the arrangements. Platforms? Developers? Users? Still unclear, but the potential is there.Generative AI Platforms: Australia, like the UK, might be a more appealing copyright jurisdiction for platforms. Platforms could argue they are making the necessary arrangements for AI content generation. Aussie AI platform advantage?Creative Industries: CGW offers a copyright avenue for Aussie creators using AI. But the ongoing review adds uncertainty. Will CGW stay as is? Will it be changed? Creators are in a wait-and-see mode.Technological Innovation: The review aims to balance encouraging AI innovation with ensuring proper IP protection. Australia wants to be AI-friendly, but also IP-savvy. Balancing act, Aussie style!Stakeholders in the Australian Copyright Saga:Arrangers (Developers/Platforms): They are the potential authors under CGW. But they need to navigate the arrangements definition and the ongoing review. Aussie legal limbo for now?Users: Potential copyright for AI content, but also uncertainty due to the review. Will the copyright rules change? Users are in the dark a bit.IP Australia: Tasked with modernizing Aussie copyright law for the AI age. Big job, mate! Balancing innovation and IP protection, Down Under style.Industries: Australian industries need to watch the IP review closely. The outcomes will shape the future copyright landscape for AI content in Australia. Aussie industries, stay tuned!Trivia Time: Australias Copyright Act of 1968 was a landmark piece of legislation and it included CGW provisions way back then! Talk about foresight, or maybe just lucky guess? Either way, Australia was ahead of the AI copyright curve, decades ago! (Australian Copyright Act 1968).No worries, mate! An Aussie friend of Dr. Mohit Sewak.Well, maybe a few worries about AI copyright in Australia, but hopefully, the ongoing review will sort things out, no worries!Kangaroo Copyright Rally: Aussie marsupials demand clarity on AI art authorship.Pro Tip for the Road: Keep an eye on IP Australias review of AI and IP! Follow their consultations, read their reports. The future of Aussie AI copyright is being shaped right now! Stay informed, Down Under copyright watchers!From the land of kangaroos and CGW, we hop over to maple syrup and politeness! Lets head to Canada and see their take on AI and copyright. Eh?Section 7: Eh? Canadas Copyright Human Authorship, But Maybe Open to Chat?Howdy, folks, and welcome to Canada! Land of hockey, maple syrup, and a somewhat polite approach to AI copyright. Canada, like the US and EU, generally leans towards human authorship for copyright (Canadian Intellectual Property Office). But, theres a Canadian niceness in the air, a hint that they might be a bit more open to interpretation than their US and EU counterparts. Lets explore, eh?Canadian Copyright Moose: Politely pondering human authorship in the AI era.My researcher fraternity has always enjoyed presenting at conferences in Canada. People are friendly, the scenery is stunning, and the copyright discussions well, they are thoughtful and polite. Theres a sense that Canada is watching the global AI copyright debate, considering different viewpoints, and taking their time to decide their own path.Human Authorship, Originality and Skill & Judgement The Canadian WayCanadian copyright law, rooted in their Copyright Act, requires originality and authorship (Copyright Act Canada). Sounds familiar, right? But unlike the US law, the Canadian Copyright Act doesnt explicitly say human authorship is needed. Hmm, wiggle room? Maybe! However, current interpretations and practices in Canada do lean towards human creativity as essential for copyright (Canadian Intellectual Property Office). So, human authorship is the de facto norm, even if not explicitly stated in law.Canadian courts have emphasized that copyright protects works that originate from an author and involve more than just mechanical copying (Canadian Intellectual Property Office). It needs to be the product of skill and judgment. Again, sounds very human-centric, doesnt it?Pro Tip: Canada, like the US and EU, favors human authorship for copyright. Originality and skill & judgment are key Canadian copyright concepts. Human creativity is still valued up North!Evolving Stance: Watching, Waiting, and Chatting?The Canadian Intellectual Property Office (CIPO) is definitely monitoring the global AI and IP scene (Canadian Intellectual Property Office). They are engaging in discussions, watching what WIPO, US, UK, EU, and China are doing. Canadas approach is likely to evolve as global case law and policy discussions mature. Think of Canada as the thoughtful observer in the AI copyright classroom, taking notes, asking polite questions, and pondering their next move.Content Types, Applications, and Stakeholders: The Canadian OutlookHow does Canadas human authorship, but open to chat stance play out?Text, Images, Music, Code: Generally understood in Canada that pure AI creations might not meet the originality and human authorship bar for copyright. AI-assisted works? More likely to be copyrightable, if humans show skill and judgment. Similar to the US and EU ballpark.Generative AI Platforms: Likely Canadian policy will align with the US and EU platforms need to enable human creative input for copyrightable outputs. Canadian users need AI tools that allow for personalization and human artistry.Creative Industries: Canadian artists using AI should focus on demonstrating their creative contributions, their skill and judgment. Show the human behind the AI curtain!Legal Framework Development: Canada is in an observation and deliberation phase. Awaiting international consensus and national case law to further define their AI copyright stance. Canada: the copyright policy ponderer.Stakeholders in the Canadian Copyright Conversation:Creators: Need to show sufficient human skill and judgment in their AI-assisted works to be considered original and copyrightable. Canadian creators, showcase your human artistry!Users: Face uncertainty, as the exact copyright boundaries for AI content are still being defined in Canada. Canadian users, stay tuned for policy updates, eh?CIPO and Policymakers: Under pressure to clarify Canadas stance and potentially update copyright law for AI. Canadian policymakers, the copyright spotlight is on you!Industries: Seek clarity and predictability in Canadian copyright law to guide AI content investments. Canadian industries seeking copyright certainty in the AI age.Trivia Time: Canada is known for its politeness even in copyright policy? While they lean towards human authorship, the lack of explicit human requirement in their Copyright Act might signal a potential openness to evolving interpretations in the future. Canadian copyright politeness, maybe with a hint of flexibility?The true north strong and free. Canadian national anthem.Canada aims to be a strong and free innovator in AI and figure out the copyright stuff too, eh!Canadian Mounties Welcome: Politely navigating the AI copyright frontier with a smile and a nod.Pro Tip for the Road: Engage with CIPO! Participate in consultations, provide feedback. Canadas AI copyright policy is still evolving, and your voice can help shape it. Canadian creators and platforms, make your voices heard, politely, of course!Alright, from polite Canada, lets take a final hop across the Pacific to sushi, samurai, and a surprisingly chill attitude towards AI copyright! Last stop on our world tour: Japan!Section 8: Konnichiwa Copyright! Japans Chill Copyright Vibe Use Rights and Exploitation FocusKonnichiwa from Japan! Land of cherry blossoms, bullet trains, and a rather relaxed approach to AI copyright? Yep, Japan is kind of the outlier in our copyright world tour. While most countries are debating human authorship and originality, Japan is well, focusing on use rights and exploitation (Japanese Copyright Act). Think of it as copyright, Japanese style less about who created it, more about who can use it, and how.Samurai of Copyright: Japanese focus on AI use rights with zen-like IP mastery.If you ever remember visiting Tokyo for a tech conference. The energy around AI was incredible! Robots everywhere, AI-powered gadgets, and a general sense of embracing the AI future, wholeheartedly. When I asked about copyright for AI art, the response was surprisingly unconcerned. More like, AI creates, so what? Lets figure out how to use it!No Explicit No AI Copyright Policy Focus on Creative ExpressionHeres the thing: Japans Copyright Act doesnt explicitly deny copyright to AI-generated works (Japanese Copyright Act). Mind. Blown. The focus in Japan is less on authorship and more on the rights of those who use and exploit creative works. Some experts even suggest that if AI creates something creative, it could be protected in Japan, and the rights might belong to whoever operates the AI system (Expert Interpretations of Japanese Copyright Act). Whoa, Japan, you rebel!Pro Tip: Japans copyright approach is unique less about human authorship, more about creative expression and use rights. Potentially very AI-friendly!The key criterion in Japan is whether the AI output is a creative expression of thoughts or sentiments (Japanese Copyright Act). If it is, then copyright might apply. And who gets the rights? Well, thats less defined, and open to interpretation. Potentially the AI operator, platform, or someone else? Japan is keeping it flexible, it seems.Rights Holder Ambiguity: Less Authorship, More UsageThe question of who the rights holder is for AI-generated content is less clear-cut in Japan (Expert Interpretations of Japanese Copyright Act). Is it the AI operator? The user who prompted the AI? Or the company that developed the AI? Japanese law doesnt explicitly say. Its more about figuring out the use and exploitation of the creative output, rather than obsessing over who the author is in the traditional sense. Japanese copyright zen?Content Types, Applications, and Stakeholders: The Japanese PerspectiveHow does Japans chill copyright vibe impact things?Text, Images, Music, Code: Japan could potentially allow copyright for pure AI creations in these formats if they are deemed creative. Potentially the most permissive regime weve seen on our tour! AI art paradise?Generative AI Platforms: Japan might be the most favorable jurisdiction for platforms to claim IP rights over AI outputs. Platforms could argue they are the rights holders by operating the AI system. Japanese AI platform gold rush, maybe?Technological Innovation: Japans approach could really encourage investment in AI content generation. IP protection might be more readily available, boosting AI innovation in Japan. Japanese tech boom, fueled by AI copyright?Commercial Use of AI Content: Businesses might find it easier to secure and use copyright for AI-generated materials in Japan. Japanese businesses, AI content advantage?Stakeholders in the Japanese Copyright Zen Garden:AI System Operators: May have a stronger claim to copyright in Japan compared to other jurisdictions. Japanese AI operators, potential copyright kings?Users and General Public: Implications for users are less clear. Focus shifts from human authorship to exploitation rights. Will this benefit users, or just AI operators? Japanese users, copyright question marks?Japanese Government: Aiming to foster AI innovation, and navigating IP complexities, with a pragmatic approach. Japanese government, balancing innovation and IP, with Zen calm?Industries (Japan): Industries in Japan might have an advantage in leveraging AI for content creation, thanks to potentially more flexible copyright rules. Japanese industries, AI copyright edge?Trivia Time: Japan is known for its robotics and AI innovation. Is their relaxed copyright stance towards AI-generated works a deliberate strategy to encourage even more AI innovation? Maybe! Japan might be playing the long game in the AI copyright race!Fall seven times, stand up eight. Japanese proverb.Japans AI copyright approach might be different, might face challenges, but they are likely to keep innovating and adapting, standing up again and again!Zen and Copyright: Japan finds peace in AIs creative chaos, focusing on use and harmony.Pro Tip for the Road: If you are interested in AI copyright in Asia, Japan is the jurisdiction to watch. Their focus on use rights and exploitation, rather than strict authorship, is a radical departure from Western approaches. Japan: the AI copyright wild card!Comparative Summary: Copyright Across the Globe A Quick Cheat Sheet!Phew! Weve just zipped around the world in well, hopefully less than 80 days, exploring the wild and wacky world of AI copyright policies! Time for a quick cheat sheet, a rapid-fire recap of what weve learned on our copyright safari:Copyright Across the Globe A Quick Cheat SheetConclusion: The AI Copyright Wild West For NowSo, there you have it, folks! Our whirlwind tour of global AI copyright policies. And the verdict? Its a mixed bag, to say the least! A bit of a copyright wild west out there, wouldnt you say? Different countries, different approaches, and a whole lot of uncertainty.Key Takeaways for the Road Ahead:For Creators and Entrepreneurs: Know your jurisdiction! Copyright laws for AI content are highly country-specific. If you are creating AI art, music, code, etc., understand the copyright rules in your target markets. Global copyright strategy, essential!For Platforms and Companies: Human input is your copyright friend! Design AI systems that empower users to add meaningful human creativity. Focus on AI-human collaboration, not just pure AI generation, especially in stricter jurisdictions. Human-AI partnerships, the copyright key!For Policymakers: Balance innovation and IP! Incentivize AI development, but also protect human creativity and authorship principles. International collaboration and harmonization are crucial. Global AI copyright coordination, needed now!The Future of AI Copyright: Stay Tuned, Its Gonna Be a Ride!The AI copyright saga is far from over. In fact, its just getting started! As AI gets even smarter, more creative, and more integrated into our lives, the copyright debate will only intensify. Expect more legal battles, policy revisions, and international discussions in the coming years. The world is still figuring out the rules of this AI copyright game. And the decisions we make now will shape the future of creativity, innovation, and the relationship between humans and machines in the digital age. Exciting and slightly terrifying, right?Call to Action: Your Voice Matters!Stay informed! Engage in the AI copyright conversation! Talk to policymakers, participate in online forums, share your opinions. The future of AI copyright is being written as we speak. And your voice, yes yours, matters in shaping that future.Lets make sure its a future that encourages both AI innovation and human creativity. Deal?Global Voices, Global Copyright: Lets shape the future of AI arts ownership, together, worldwide.Pro Tip for the Very End: This is just the beginning of the AI copyright story. The law is always playing catch-up with technology. Expect changes, expect debates, expect surprises! The AI copyright landscape will likely be dynamic for years to come. Buckle up, its going to be a fascinating ride!Alright, my friend, thats our AI copyright world tour! Hope you enjoyed the ride! Now, if youll excuse me, I need to go copyright-proof my next AI-assisted blog post just in case! Stay creative, stay informed, and stay humorous! Cheers!ReferencesAustralian GovernmentAustralian Government. (2024, March 28). Artificial intelligence and IP rights. IP Australia.Canadian GovernmentCanadian Intellectual Property Office. (n.d.). Copyright and artificial intelligence. Government of Canada.ChinaChina: first copyright ruling on AI-generated image. (2022, December 22). Lexology.European UnionDirective 2001/29/EC of the European Parliament and of the Council of 22 May 2001 on the harmonisation of certain aspects of copyright and related rights in the information society. (2001, May 22). EUR-Lex.EU Artificial Intelligence Act. Artificial intelligence act.United Kingdom GovernmentUnited Kingdom Government. (2023, November 2). Artificial intelligence and intellectual property: copyright and patents. GOV.UK.UK Copyright, Designs and Patents Act 1988, Section 9(3), Section 178. (1988). legislation.gov.uk.UK Intellectual Property Office. Artificial intelligence (AI) and intellectual property.United States Copyright OfficeUnited States Copyright Office. (2023, March). Copyright and artificial intelligence: Part 2 Copyrightability report.Copyright Act, Title 17 of the U.S. Code. (n.d.). U.S. Copyright Office.World Intellectual Property Organization (WIPO)WIPO. (n.d.). WIPO conversation on intellectual property (IP) and artificial intelligence (AI). World Intellectual Property Organization.WIPO Standing Committee on Copyright and Related Rights (SCCR). Standing Committee on Copyright and Related Rights (SCCR).JapanJapanese Copyright Act. (n.d.). Japanese Law Translation Database System. (PS: Limited official English documentation directly on AI copyright policy. Further, academic and expert interpretations may be needed.)Canada Copyright ActCopyright Act Canada. (n.d.). Department of Justice Canada.Disclaimers and DisclosuresThis article combines the theoretical insights of leading researchers with practical examples, and offers my opinionated exploration of AIs ethical dilemmas, and may not represent the views or claims of my present or past organizations and their products or my other associations.Use of AI Assistance: In the preparation for this article, AI assistance has been used for generating/ refining the images, and for styling/ linguistic enhancements of parts of content.License: This work is licensed under a CC BY-NC-ND 4.0 license. Attribution Example: This content is based on [Title of Article/ Blog/ Post] by Dr. Mohit Sewak, [Link to Article/ Blog/ Post], licensed under CC BY-NC-ND 4.0.Follow me on: | Medium | LinkedIn | SubStack | X | YouTube |Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AI0 Σχόλια ·0 Μοιράστηκε ·21 Views
-
Quick Start Robotics and Reinforcement Learning with MuJoCotowardsai.netAuthor(s): Yasin Yousif Originally published on Towards AI. A starter tutorial of its basic structure, capabilities, and main workflowImages are rendered from xml source in menagerie repo under BSD-3-Clause for Trossen, and Apache license for Franka and ApptronikMujoCo is a physics simulator for robotics research developed by Google DeepMind and written in C++ with a Python API. The advantage of using MujoCo lies in its various implemented models along with full dynamic and physics properties, such as friction, inertia, elasticity, etc. This realism allows researchers to rigorously test reinforcement learning algorithms in simulations before deployment, mitigating risks associated with real-world applications. Simulating exact replicas of robot manipulators is particularly valuable, as it enables training in a safe virtual environment and then seamless transition to production. Notable examples of these models include popular brands like ALOHA, FRANKA, and KUKA readily available within MujoCo.Table of Content:OverviewMJCF FormatThe TaskContinuous Proximal Policy OptimizationTraining ResultsConclusionOverviewBeyond the core MujoCo library (installable via pip install mujoco), two invaluable packages enhance its capabilities: dm_control (https://github.com/google-deepmind/dm_control) and mujoco_menagerie (https://github.com/google-deepmind/mujoco_menagerie).mujoco_menagerie offers a wealth of open-source robot models in .xml format, simplifying the simulation of complex systems. These models encompass diverse designs, as illustrated in the image above.In dm_control, (installable also with pip: pip install dm_control with its own version of MujoCo), a very useful code base is provided for creating Reinforcement Learning pipelines from MujoCo models as environments classes with suitable .step(), .reward() methods. These pipelines are available via its suite subpackage, and are intended to serve as benchmarks, on which different proposed Reinforcement learning methods can be evaluated and compared. Therefore, it should not be alerted when used for that purpose.These benchmarks can be shown by running the following:# Control Suitefrom dm_control import suitefor domain, task in suite.BENCHMARKING: print(f'{domain:<{max_len}} {task}')which will give the following domains and tasks among others:Additionally dm_control allow the manipulation of the MJCF models of the entities from within the running script, utilizing its PyMJCF subpackage. Therefore, the user doesn't need to change the xml files in order to add new joints, or replicate a certain structure for example.MJCF MuJoCoMuJoCo XML Configuration File Format MJCFTo show a working example of an MJCF file, we will review the car.xml source code available with MujoCo github repository here. It basically exhibits a triple-wheel toy vehicle, with two front lights, where it has two main degrees of freedom DoFs: Forward-Backward and Left-Right movements.By taking a look at the first part of the code we note that the code always is between <mujoco> ..</mujoco> tags. We also note the compiler tag defines what <compiler> is used (Euler by default) and allow setting its options.<mujoco> <compiler autolimits="true"/>Next, as some objects in the model may need its own customized texture and geometric shape unlike standard shapes such as spheres and boxes, the <texture>, <material> and <mesh> tags can be utilized as follows. We note also in the <mesh> tag that the exact points coordinates are provided in the vertex option, where each row represent a point in the surface.<asset> <texture name="grid" type="2d" builtin="checker" width="512" height="512" rgb1=".1 .2 .3" rgb2=".2 .3 .4"/> <material name="grid" texture="grid" texrepeat="1 1" texuniform="true" reflectance=".2"/> <mesh name="chasis" scale=".01 .006 .0015" vertex=" 9 2 0/> -10 10 10 9 -2 0 10 3 -10 10 -3 -10 -8 10 -10 -10 -10 10 -8 -10 -10 -5 0 20" </asset>The <default> tag is helpful to set some default values for certain classes, like the wheelclass which will be always of certain shape, size and color (defined with type, size, and rgba respectively)<default> <joint damping=".03" actuatorfrcrange="-0.5 0.5"/> <default class="wheel"> <geom type="cylinder" size=".03 .01" rgba=".5 .5 1 1"/> </default> <default class="decor"> <site type="box" rgba=".5 1 .5 1"/> </default></default>The first body in Mujoco models is always is the <worldbody> with the order of 0, as a parent object for all the other bodies in the model. Since we have only one car, the only children body of it should be car.Within each body we can define its children of other bodies, geometries, joint or lights, with <body>, <geom>, <joint>, <light> tags respectively.This is shown in the next snippet, in which we note the options of name, class and pos among others, which define the unique name, the defined class in default and the initial position of the parent tag respectively.<worldbody> <geom type="plane" size="3 3 .01" material="grid"/> <body name="car" pos="0 0 .03"> <freejoint/> <light name="top light" pos="0 0 2" mode="trackcom" diffuse=".4 .4 .4"/> <geom name="chasis" type="mesh" mesh="chasis" rgba="0 .8 0 1"/> <geom name="front wheel" pos=".08 0 -.015" type="sphere" size=".015" condim="1" priority="1"/> <light name="front light" pos=".1 0 .02" dir="2 0 -1" diffuse="1 1 1"/> <body name="left wheel" pos="-.07 .06 0" zaxis="0 1 0"> <joint name="left"/> <geom class="wheel"/> <site class="decor" size=".006 .025 .012"/> <site class="decor" size=".025 .006 .012"/> </body> <body name="right wheel" pos="-.07 -.06 0" zaxis="0 1 0"> <joint name="right"/> <geom class="wheel"/> <site class="decor" size=".006 .025 .012"/> <site class="decor" size=".025 .006 .012"/> </body> </body></worldbody>As the car can move in any direction including jumping and flipping with respect to the ground floor, it gets <freejoint/> tag with 6 DoFs. While each of its wheels: right and left wheels, get one DoF, along its previously defined axis with the zaxis="0 1 0"option the yaxis.The active control handles in MujoCo are defined with the <tendon> tag, defining group of joints as the <fixed> tag, and then with the <actuator> tag, to define the exact name and control range of the motor <tag>. As in the following code.<tendon> <fixed name="forward"> <joint joint="left" coef=".5"/> <joint joint="right" coef=".5"/> </fixed> <fixed name="turn"> <joint joint="left" coef="-.5"/> <joint joint="right" coef=".5"/> </fixed></tendon><actuator> <motor name="forward" tendon="forward" ctrlrange="-1 1"/> <motor name="turn" tendon="turn" ctrlrange="-1 1"/></actuator>This system of tendons conveniently control the car, as we can control the linear movement of the car, with the "forward" tendon, having forward displacement of 0.5 for both wheels, and the turning movement with the "turn" tendon, having opposite directions of displacement for each of the wheels, which physically will make the car turn.The degree of displacement is controlled with both of the defined motors, by multiplying their values with the coef values of the tendons.Lastly, the <sensor> tag defines the joints that should read, as generalized displacements value on its DoF.<sensor> <jointactuatorfrc name="right" joint="right"/> <jointactuatorfrc name="left" joint="left"/> </sensor></mujoco>The TaskIn order to train and run the reinforcement learning agent to control the car, we must set a clear purpose of the intended behavior. For example we may aim to make the car take a circular path or drive towards a fixed, but unknown position.For this example, we will define a reward so that the car drive from its initial position A=(0,0,0) towards B=(-1,4,0). This point is somehow to the left of the car, so it has to turn as well as drive in straight line, as shown below.Made by authorFor this task, we must define a reward function in relation to the euclidean distance between the current position of the car and the target position. We choose to take the exponent of the negative distance np.exp(-np.linalg.norm(A,B)) to represent this reward so that the values are always in the range [0,1].Continuous Proximal Policy OptimizationAs we noted in the XML file, the range of the actuators values is continuous, from -1 to 1. This means that the action space is continuous too; therefore, the training algorithm should be suitable to address these scenarios.This means that algorithms like DQN will not be suitable, as it can only be applied to discrete action spaces. However, actor critic methods, like PPO, can still be used to train models of continuous action space.The PPO code used here for this task is based on CleanRL single-file implementation for the continuous PPO, but with modifying some parameters and replacing the environment with our newly written environment enveloping the previous MujoCo model.Practically we train for 2e6 steps, with 2500 steps per episode. As the default update rate in MujoCo is 2ms, then 2500 steps translates to 5 seconds.It is worth noting that the discrete PPO update formulas are the same for the continuous case, expect for the type of the output distribution in the policy model, where it will be categorical Categorical in the discrete case, and Gaussian Normal, or any other continuous distribution in the continuous case.Next we will show the used environment for stepping and simulating the MujuCo model, which will be used for the training program of PPO.Training EnvironmentAs we will be using the main MujoCo package (not dm_control), we will do the following imports:import mujocoimport mujoco.viewerimport numpy as npimport timeimport torchWe then define the init method of the environment class, in which:The XML file of the model is loaded with the command: mujoco.MjModel.from_xml_path(), which will result in the model structure containing the geometries and constants such as time-steps, and gravity constant in model.opt.The data structure are loaded from the model structure with the command data = mujoco.MjData(model). In this structure, the current state values (like generalized velocity data.qvel, generalized position data.qpos, as well as actuator values data.ctrl), can be read and set.Duration is 5 seconds, which can be mapped to the simulation time by delaying it in specific amount, as the simulation is usually much faster. For example 5 seconds maybe simulated in 0.5 seconds.Rendering: if the render variable is set to True. A viewer GUI will be initialized with mujoco.viewer.launch_passive(model,data). The passive feature is needed so that the GUI doesn't block the code execution. However, it will be updated to the recent values in data when viewer.sync() is called, and it should be closed with viewer.close()class Cars(): def __init__(self, max_steps = 3*500, seed=0,render=False): self.model = mujoco.MjModel.from_xml_path('./car.xml') self.data = mujoco.MjData(self.model) self.duration = int(max_steps//500) self.single_action_space = (2,) self.single_observation_space = (13,) self.viewer = None self.reset() if render: self.viewer = mujoco.viewer.launch_passive(self.model, self.data)In reset() method, the data structure will be rested based on the original model using mujoco.mj_resetData.Here we can choose the shape of the state we will be using to solve our problem. We note as the task is only moving in 2D, therefore we need the current Cartesianposition of the car data.body('car').xpos, in addition to its orientation data.body('car').xquat, and lastly the velocity data.body('car').cvel also maybe helpful to judge if we want to accelerate of decelerate.Note that data.body() or data.geom() allow named access to these objects as defined in the XML file, or even by their index number , where 0 always indicate the worldbody.def reset(self): mujoco.mj_resetData(self.model, self.data) self.episodic_return = 0 state = np.hstack((self.data.body('car').xpos[:3], self.data.body('car').cvel, self.data.body('car').xquat)) if self.viewer is not None: self.viewer.close() self.viewer = mujoco.viewer.launch_passive(self.model, self.data) return stateAs our task is to reach the point [-1,4], then our reward can be as simple as the distance between the current position and the destination. However, taking exp(-distance) seems more suitable since it will restrict the rewards values to the range [0,1], which can lead to better stability in learning.As mentioned previously all we have to do to synchronize the change to the viewer window is to invoke the command self.viewer.sync().def reward(self, state, action): car_dist = (np.linalg.norm(np.array([-1,4]-state[:2]))) return np.exp(-((car_dist)))def render(self): if self.viewer.is_running(): self.viewer.sync()def close(self): self.viewer.close()In the step() routine, the actual model will be updated. First by setting the current action of the forward and turning movements in the data.ctrl. But it is noted that the action is transformed with the np.tanh() which has the output range of [-1,1]. This will allow the neural network of the policy to be trained on the full range [-Inf, Inf] for its output vector, which is easier to represent, as smaller values may get rounded during training.We additionally keep count of the episodic return, and handle the terminal case, by resting the environment.def step(self, action): self.data.ctrl = np.tanh(action) mujoco.mj_step(self.model, self.data) state = np.hstack((self.data.body('car').xpos[:3], self.data.body('car').cvel, self.data.body('car').xquat)) reward = self.reward(state, np.tanh(action)) self.episodic_return += reward done = False info = {} if self.data.time>=self.duration: done = True info.update({'episode':{'r':self.episodic_return,'l':self.data.time}}) info.update({"terminal_observation":state.copy()}) state = self.reset() return state, reward, done, infoThis finished the main environment class of the car model. It is not that complicated or hard to write. However, in dm_control a customized environments and pipelines with various tools are already available and ready to be used for training RL agents. An extensive topic, that is left for explorations in future posts.Training ResultsAfter training the PPO program with the previous environment and using a suitable agent network, we record the following training curve for the episodic return.Made by authorWe can see that the model is clearly learning, albeit slowly. There you have it. Your first simulated and controlled RL agent with MujoCo.However, we still need to see it in action: does the robot really move towards point [-1,4]? To do that we need to run the following testing program with the render variable set to True.def main(): duration = 5 env = Cars(max_steps=duration*500,render=True) #2000000 is the training iterations policy = torch.load(f'ppo_agent_cars_{2000000}_mlp.pth') state = env.reset() start = time.time() while time.time() - start < duration: with torch.no_grad(): action = policy.actor(torch.Tensor(state).to('cuda')).cpu().numpy()[:2] state, reward, done, info = env.step(action) if done: break time.sleep(0.003) env.render() env.close()After initializing the environment and loading the trained model with pytorch. We get the initial state by resetting the environment. Inside the while loop, we alternate between inferring the action from the actor model, and stepping the environment. Lastly we keep rendering each frame with env.render().If we run the program without any delay, we will get a very fast simulation that we may not be able to observe and depending on our while condition, it may get repeated many times before the program finishes.To avoid that, we delay the execution by some amount with time.sleep(). The program may still run several times (before duration seconds has passed), but it will be observable.In my case, this code shows the car moving exactly as shown in the image above (in The Task section), but as the speed is limited and the episode length is only 5 seconds, the simulation ends before reaching the point [-1,4], because reaching the point will be physically impossible in that case, no matter how long the model is trained.ConclusionWhile this tutorial merely scratches the surface of MuJoCos vast API capabilities, it equips you with the foundational knowledge to embark on your robotic simulation journey. MuJoCos C++ foundation enables lightning-fast performance, making it ideal for training intricate robots of diverse configurations.This versatility positions MuJoCo as a valuable tool in both research and industry:Research: Researchers can rigorously test and compare novel reinforcement learning algorithms within challenging, realistic scenarios without the logistical complexities and costs of physical prototyping.Industry: Manufacturers can thoroughly evaluate robot designs and models in environments mirroring real-world conditions, ensuring optimal performance before deployment.This Reinforcement and Imitation Learning series will delve deeper into specific, popular algorithms, exploring their intricacies and applications. Subscribe or follow along to stay informed and explore the full potential of these powerful techniques!Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AI0 Σχόλια ·0 Μοιράστηκε ·53 Views
-
Oops! AI did it again: The Rise of Autonomous Social Media Agents (Part 1)towardsai.netAuthor(s): Reelfy Originally published on Towards AI. How AI-powered agents are reshaping content creation, engagement, and automation giving small businesses the power to compete like never before.A group of AI Agents collaborating to take actions. As imagined by Dall-E 3The Small Business Dilemma: The High Cost of Staying RelevantPicture this: A small business owner is juggling product development, customer service, sales, and operations and on top of that, theyre expected to maintain an engaging social media presence.Meanwhile, large brands have entire marketing teams crafting daily content, analyzing trends, and ensuring their audience stays engaged. The result? Small brands get drowned out in an algorithm-driven world that rewards consistency, creativity, and engagement.The harsh reality, the stats paint a clear picture: The average internet user spends 143 minutes per day on social media, yet organic reach is declining. Businesses must now take a multi-channel approach to stay relevant, but small brands lack the resources to produce constant, engaging content. (Sprout Social, 2024) 44% of consumers prefer to learn about a new product or service via short-form video content, but without AI-powered automation, small brands struggle to produce enough engaging video content to stay relevant. (Sprout Social, 2024 & HubSpot, 2025) 87% of marketers report increased sales from video marketing, yet many small businesses lack the time, skills, or budget to create professional videos. (Sprout Social, 2024 & HubSpot, 2025)Despite the obvious benefits of an active social presence, the cost of hiring a social media manager or agency is often too high for small businesses. Theyre left with two options:1.Do it themselves, burning time and energy that could be spent growing their business.2.Stay inactive, missing out on new customers and brand awareness.But what if we could automate social media strategy, content creation, and posting without sacrificing quality?AI Agents: Making Social Media Automation a RealityFor years, AI lacked the reasoning capabilities to replace human strategists in creative fields like marketing. But thanks to recent advancements, AI agents can now analyze, plan, execute, and adapt just like a human social media manager.Why Now? The Three Key AI Breakthroughs Cost Reduction: Open-source AI models like DeepSeek, Mistral, and LLaMA provide high-quality reasoning at a fraction of the cost of proprietary AI. Advanced Multi-Step Reasoning: AI agents can now think step-by-step, just like a human strategist, making them capable of adapting content to trends and audience behavior. Creative AI Pipelines: AI-powered tools like Reelfy enable end-to-end video creation, turning text prompts into engaging social media videos something only human designers could do before.The result? AI is no longer just an assistant it can be the entire social media team, handling everything from strategy to content creation, scheduling, and even performance analysis. By observing how posted videos perform, AI agents can refine their approach, adapting future content based on real-time insights.Understanding AI Agents: The Thought-Action-Observation CycleAt the core of our AI-driven social media agent is a concept called the Thought-Action-Observation (TAO) Cycle. This is how modern AI agents can reason, interact with the world, and learn from their actions just like a human strategist.This gif was introduced as part of the free Agent Course given by HuggingFace Step 1: Thought Internal Reasoning & Strategy (ReAct Approach)Before an AI agent can act, it needs to think. Using a ReAct (Reasoning + Acting) approach, our AI agent will: Analyze the brands messaging, past posts, and audience behavior using LlamaIndex. Research industry trends and competitor strategies using LangChain. Develop a content plan based on engagement patterns and social media trends.In this phase, the AI is essentially a strategist, planning what to post, when to post, and how to engage users. Example of AI Thought Process:This brand focuses on eco-friendly fashion. Based on recent social media trends, sustainable clothing recycling challenges are a hot topic. I should create content that highlights how this brand supports sustainable fashion. Step 2: Action Engaging with the WorldOnce the AI has a plan, it executes actions. Using SmolAgents, our agent will: Generate AI-powered video content with Reelfys story-to-video pipeline. Schedule and post content on Instagram, ensuring consistency. Monitor engagement, tracking likes, comments, and shares.At this stage, the AI isnt just a thinker its also a content creator and social media manager. Example of AI Action:I will generate a 15-second Instagram Reel showcasing sustainable fashion tips, using a story-to-video pipeline. Then, I will post it at 6 PM when engagement is highest. Step 3: Observation Learning & AdaptingAfter posting, the AI doesnt just move on it analyzes performance and adjusts future content strategies.Using real-time analytics, the agent: Tracks which content performs best. Analyzes engagement metrics to determine optimal posting times. Adjusts the content style and messaging for better performance.Over time, this means the AI improves its social media strategy, just like a human marketing expert would. Example of AI Observation & Adaptation:This video received 3x more engagement than last weeks. Users commented that they liked the storytelling format. I should create more narrative-driven content for the next post.Conclusion: The Foundation for AI-Driven Social MediaIn this first part, we laid the groundwork for how AI can fully automate social media marketing for small businesses. What we covered: The problem small brands face in maintaining a consistent online presence. Why AI agents are now capable of replacing human strategists at a fraction of the cost. The Thought-Action-Observation (TAO) cycle, which allows AI to reason, act, and learn.Whats Coming in Part 2?In the next part, well take these concepts and build a working AI social media agent step-by-step. Youll see: How the agent extracts brand identity and audience insights. How AI-powered storytelling generates engaging videos. How SmolAgents automate posting and engagement tracking. How AI refines future content based on performance metrics.With open-source AI, small businesses no longer need a full marketing team they just need the right AI agent. Stay tuned for Part 2, where we bring this AI-powered social media manager to life!Written by Garry NewballReferences[1] LangChain Conceptual Guide. Retrieved From: https://python.langchain.com/docs/concepts/[2] ReAct: Synergizing Reasoning and Acting in Language Models. Retrieved From: https://arxiv.org/abs/2210.03629[3] Llama Index Documentation. Retrieved From: https://docs.llamaindex.ai/en/stable/[4] DeepSeek Reasoning Model. Retrieved From: https://huggingface.co/deepseek-ai/DeepSeek-R1[5] Conceptual Guide to SmolAgents. Retrieved From: https://huggingface.co/docs/smolagents/conceptual_guides/intro_agents[6] Understanding the Importance of Social Media Marketing. Retrieved from: https://sproutsocial.com/insights/importance-of-social-media-marketing-in-business[7] 52 Visual Content Marketing Statistics You Should Know. Retrieved from: https://blog.hubspot.com/marketing/visual-content-marketing-strategy[8] 50+ Must-Know Social Media Marketing Statistics for 2024. Retrieved from: https://sproutsocial.com/insights/social-media-statisticsJoin thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AI0 Σχόλια ·0 Μοιράστηκε ·38 Views
-
RAG vs Fine-Tuning for Enterprise LLMstowardsai.netLatestMachine LearningRAG vs Fine-Tuning for Enterprise LLMs 0 like February 17, 2025Share this postLast Updated on February 17, 2025 by Editorial TeamAuthor(s): Paul Ferguson, Ph.D. Originally published on Towards AI. RAFT vs Fine-Tuning Image created by authorAs the use of large language models (LLMs) grows within businesses, to automate tasks, analyse data, and engage with customers; adapting these models to specific needs (e.g., responding to support queries with current policies or analyzing legal documents with custom terms) is increasingly important, and with it, the need to plan accordingly.Two approaches prevail in this space:Fine-tuning, which adjusts the models core knowledgeRetrieval-Augmented Generation (RAG), which incorporates data in the responseEach method has its advantages, disadvantages, and tradeoffs, but the choice is not always obviousThis guide provides a step-by-step framework for technical leaders and their teams to:Understand how RAG and fine-tuning work in plain termsChoose the approach that best fits their data, budget, and goalsAvoid common implementation pitfalls: poor chunking strategies, data drift, and othersCombine both methods for complex use casesUnderstanding the Core TechniquesFine-TuningFine-tuning is a technique of adjusting the parameters of a pre-trained LLM to specific tasks using domain-specific datasetsThis ensures that the model is well-suited for that specific task (e.g., legal document review)It excels in tasks that require specialised terminologies or brand-specific responses but needs a lot of computational resources and may become obsolete with new data.For instance, a medical LLM fine-tuned on clinical notes can make more accurate recommendations because it understands niche medical terminology.Fine-tuning Architecture Image created by authorWithin the fine-tuning architecture weve included both:In green, the steps to generate the fine-tuned LLMIn red, the steps to query the modelNote: within the query section, weve labelled the system responsible for controlling/co-ordinating the query and response an intelligent system: this is just a general name for illustration purposes. Within enterprise systems there are many different variations that can exist within this intelligent system, which themselves may include AI agents, or other LLMs to provide more sophisticated functionality.Retrieval-Augmented Generation (RAG)RAG enhances LLMs by fetching additional information from external sources during inference to improve the response. It combines the users query with other relevant information to ensure the accuracy of the response (potentially incorporating live data).Some of its key advantages include:Less hallucinations since the model is forced to rely on actual data;Transparent (it cites sources);Easy to adapt to changing data environment without modifying the model.Example: A customer support chatbot using RAG can fetch the real time policy from internal databases to answer the queries accurately.RAG Architecture Image created by authorAgain, weve coloured-coded the architecture:Green, denotes the pre-query aspects of the system: associated with indexing the documentsRed, identifies the steps that are executed at query timeFrom examining the two architectures of fine-tuning vs RAG, we can see a number of key differences, one of the most striking is the overall complexity of the RAG system: but we should be careful not to lose sight of the complexity associated with the fine-tuning step in fine-tuningAlthough it is only represented by one step in the architecture, it is still a complex and potentially costly processIt also requires careful preparation of the custom data, as well as correct monitoring of the fine-tuning to ensure that the learns the desired informationHowever, one key aspect related to the RAG complexity is that at query time there is a lot more work being done by the RAG system: this will naturally result in longer query times.Key Decision FactorsWhen selecting between RAG and fine-tuning, consider these factors:RAG vs Fine-Tuning Decision Factors Image created by authorNote: in certain circumstances, a Hybrid Approach is needed (which we discuss below)Common Challenges and SolutionsRAG Challenges1. Chunking IssuesProblem: The poor chunk size leads to incomplete context or irrelevant document retrieval.Solution: Use overlapping chunks (e.g., 25% token overlap) or semantic splitting at logical segments (sentences or paragraphs).2. Retrieval QualityProblem: Overreliance on vector similarity missing keyword critical matches.Solution: Combine vector embeddings with keyword based BM25 scoring for hybrid search.3. Response ConsistencyProblem: Noisy retrieved contexts lead to varying output.Solution: Create structured prompt templates that enforce source citation and output format.Fine-Tuning ChallengesCatastrophic ForgettingProblem: Models lose general knowledge in the process of domain adaptation.Solution: Use parameter efficient methods like Low Rank Adaptation (LORA) with Bayesian regularisation to preserve the overall capability.2. Data QualityProblem: Biased or outdated training data affects the output.Solution: Build a validation pipeline with domain experts and automate checks for the dataset (e.g., balance, outliers).3. Version ControlProblem: Managing model iterations is prone to error.Solution: Keep a model lineage registry (with tools like Hugging Face Model Hub) and document hyperparameters and training data.Implementation Best PracticesRAG ImplementationData Pipeline Design: Use semantic search in vector databases like Pinecone and chunk documents to achieve relevance efficiency.Evaluation: Set up an automated testing framework like Ragas to assess the accuracy of responses and how well they are grounded in the data.Security: Secure sensitive data with access control (role-based) and metadata.Fine-Tuning ImplementationData Preparation: Use large, properly labelled datasets (>10,000 examples) to build the model and reduce the risk of overfitting.Parameter Efficiency: Use LoRA to reduce computational costs while retaining the general capabilities of the model.Validation: Check the output against domain experts to ensure that it meets the requirements of the task.Hybrid Approach: RAFTRAFT (Retrieval-Augmented Fine-Tuning) combines the best of both worlds by integrating RAG with fine-tuning to create models that excel at knowledge-based tasks (e.g., legal and healthcare domains are some of the most common industries for hybrid approaches, due to their need for domain specialisation as well as requiring highly accurate, and traceable results).In terms of the RAFT architecture, this involves a straightforward combination of the two architectures already illustrated:Firslty creating a fine-tuned LLMThen integrating the fine-tuned LLM (instead of a pre-trained LLM) into the RAG architectureKey ComponentsTraining Data Design: Select a set of oracle documents that contain correct answers and a set of distractor documents that contain irrelevant information to teach the model to focus on credible sources.Training Process: Fine-tune the model to explicitly refer to the retrieved passages and to reduce the chance of hallucinations using techniques like chain of thought prompting.Implementation Steps:Index domain documents (e.g., policy updates).Create synthetic QA pairs with oracle and distractor contexts marked.Fine-tune using LoRA to preserve the general ability to generate text while adapting to new data.BenefitsReduced Hallucination: Basing responses on verified sources.Domain Adaptation: Outperforms other methods in dynamic, specialised environments.Key Takeaways & ConclusionRAG, Fine-Tuning, RAFT decision matrix Image created by authorThe key to successful deployment of enterprise LLMs depends on aligning the strategy with operational realities:RAG vs. Fine-Tuning: Use RAG for transparent solutions with dynamic data (e.g., customer-facing chatbots). Fine-tune when deep domain customisation is needed (e.g., healthcare).Hybrid Strategies: Some examples of hybrid approaches include RAFT or RoG (Reasoning on Graphs) for combining real-time retrieval with domain expertise for tasks like building a legal compliance tool.Continuous Evaluation: Periodically check the retrieval accuracy (using tools like Ragas) and model outputs to prevent drift or hallucinations.No single approach fits all, but understanding these principles ensures your LLM investments deliver scalable, accurate results.If youd like to find out more about me, please check out www.paulferguson.me, or connect with me on LinkedIn.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post0 Σχόλια ·0 Μοιράστηκε ·34 Views
-
What Millions of Conversations Reveal About AIs Real Economic Impacttowardsai.netWhat Millions of Conversations Reveal About AIs Real Economic Impact 0 like February 17, 2025Share this postLast Updated on February 17, 2025 by Editorial TeamAuthor(s): Vita Haas Originally published on Towards AI. A new study by Anthropic offers a rare, data-driven glimpse into how artificial intelligence is actually being used in the workplace rather than how its supposed to be used. Based on over four million real-world conversations with Claude.ai, the research, titled Which Economic Tasks are Performed with AI? Evidence from Millions of Claude Conversations, maps these interactions to occupations and tasks using the U.S. Department of Labors O*NET database.The study set out to answer some fundamental questions: Where is AI proving most useful? What kinds of skills are people outsourcing to it? And is it helping humans work smarter or simply taking over their tasks altogether?From the Anthropic Economic IndexThe findings suggest that AI is not sweeping entire industries off the table, nor is it single-handedly ushering in an age of mass unemployment. Instead, it has cozied up in knowledge-based jobs, particularly in software development and writing-related work, where it plays the role of an indefatigable assistant. Together, these two domains account for nearly 50% of all recorded AI interactions. More broadly, 36% of occupations now use AI for at least a quarter of their tasks, though only 4% lean on it for over 75% of their workload.Rather than replacing workers outright, AI is proving itself to be a helpful, if sometimes overeager, co-worker. In 57% of cases, it assists by improving efficiency, refining ideas, or offering guidance a second set of (digital) eyes on a project. The remaining 43% of cases involve full automation, where AI is essentially left to its own devices, completing tasks with little or no human oversight.Where AI Thrives And Where Its Nowhere to Be FoundAI, it turns out, is a bit of a specialist. It excels in structured, text-based, analytical work things like coding, content creation, data analysis, and technical documentation. If a task involves generating reports, suggesting programming fixes, summarizing research, or transforming a dull paragraph into something vaguely readable, AI is right at home.But despite all the grand predictions about automation upending everything, AI remains conspicuously absent from roles that demand physical dexterity, improvisation, and real-world decision-making. Professions such as surgeons, electricians, construction workers, and restaurant staff remain largely unaffected possibly because no one in their right mind wants a chatbot performing open-heart surgery or tiling their bathroom (yet).Similarly, AI has made little headway in high-stakes leadership roles such as executives, doctors, and top-tier lawyers, where a misplaced decimal or a legally questionable clause could mean actual disaster. AI may be decent at suggesting a strongly worded email, but it still lacks the judgment needed to steer companies or make life-and-death decisions.At least, for now.From the Anthropic Economic IndexCo-Pilot or a Replacement? The 57/43 SplitThe study highlights two primary ways AI is being used: as a collaborator or as an automator.In 57% of cases, AI acts as an augmentation tool, making workers more efficient without actually replacing them. A writer refines a draft with AI-generated suggestions. A programmer uses AI for debugging. A financial analyst double-checks a models assumptions. In these cases, AI is less of a replacement and more of a productivity booster like a calculator, if calculators also occasionally got things hilariously wrong.But in 43% of cases, AI takes full control of tasks, completing them with minimal human input. An executive generates a report and barely skims it before sending it off. A marketing team lets AI produce ad copy with only a token edit. A financial model is built entirely by AI, with no human adjustments. Here, AI has shifted from assistant to invisible worker, quietly absorbing responsibilities that once belonged to humans.For now, augmentation remains more common than automation but as AI improves, that balance could change.Who Uses AI the Most?AI adoption follows a surprisingly clear pattern across different wage levels.Mid-to-high-wage professionals especially in software development, finance, marketing, and content creation use AI the most. These jobs involve structured, repetitive, and digital work, making them a natural fit for AI assistance.But at both extremes of the income spectrum, AI adoption drops. Low-wage jobs, like those in food service, warehouse work, and construction, dont use AI much because their work is physical, unpredictable, and requires hands-on adaptability. At the other end, top executives, doctors, and high-end lawyers have also shown low AI adoption, likely due to regulatory complexity, human judgment, or the sheer consequences of getting something wrong.So while AI is reshaping middle-class knowledge work, it has yet to make serious inroads into either physical labor or elite professions.The Strange Case of Outsourced ThinkingOne of the most eyebrow-raising findings in the study is the type of skills people are handing over to AI.At the top of the list is critical thinking.Closely followed by active listening and reading comprehension all of which, you know, seem fairly important if humans want to continue being good at thinking for themselves. The fact that AI is already being used to support reasoning and decision-making suggests that people arent just outsourcing what to think they may be outsourcing how to think.This isnt necessarily a bad thing. AI can help structure arguments, point out gaps in logic, and offer alternative perspectives. But if people start leaning on AI for comprehension, judgment, and decision-making, we may find ourselves in an era where independent thinking is more of an optional skill.If AI continues to take over tasks that require analysis, problem-solving, and reasoning, will humans get better at using these tools or simply forget how to do it themselves?From the Anthropic Economic IndexWhat It Means for the Future of WorkAI isnt taking over the workforce overnight, but it is fundamentally changing how work gets done. Instead of eliminating jobs outright, its making workers faster, more efficient, and increasingly reliant on AI for both routine and complex tasks.For now, AI is a tool, not a replacement. But this study suggests that its role is expanding. Writing assistance has evolved into fully automated content generation. Code suggestions have transformed into self-writing software. Data analysis has moved beyond interpretation to prediction and action.At what point does AI shift from enhancing human work to becoming an independent force in economic decision-making? The study suggests we may already be testing that boundary. AI is increasingly involved in critical thinking, decision-making, and comprehension skills once considered uniquely human. If this reliance continues, industries may need to redefine what human expertise actually means.This raises urgent questions for policymakers, businesses, and workers. If AI keeps absorbing more cognitive tasks, will professionals shift from deep expertise to AI oversight roles? How will companies measure competence and accountability when AI is doing more of the heavy lifting?The future of work wont be defined by whether AI replaces humans, but by how much responsibility we hand over to it. Whether this transition leads to greater productivity and innovation or a slow erosion of human autonomy and expertise depends entirely on how we choose to integrate AI into our workflows.For now, AI remains a silent partner, amplifying productivity while still requiring human oversight. But as businesses adopt AI at scale, one question looms larger than ever:not just what AI can do but how much of the decision-making we are comfortable handing over.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post0 Σχόλια ·0 Μοιράστηκε ·34 Views
-
A Unified Approach To Multimodal Learningtowardsai.netLatestMachine LearningA Unified Approach To Multimodal Learning 0 like February 14, 2025Share this postLast Updated on February 17, 2025 by Editorial TeamAuthor(s): Yash Thube Originally published on Towards AI. A Unified Approach To Multimodal LearningWe live in a world of multimodal data. Think about it: a restaurant review isnt just text; its accompanied by images of the food, the ambiance, and maybe even the menu. This combination of text, images, and ratings is multimodal data, and its everywhere. Traditional machine learning models often struggle with this kind of data because each mode (text, image, rating) has its unique structure and characteristics. A picture is high-dimensional, text is sequential, and a rating is just a number. How do we effectively combine these disparate data types to make better predictions?The paper Generative Distribution Prediction: A Unified Approach to Multimodal Learning introduces a new framework called Generative Distribution Prediction (GDP) to tackle this challenge. GDPs core idea is to use generative models to understand the underlying process that creates this multimodal data. Imagine an artist trying to recreate a scene. They dont just copy it; they understand the relationships between the objects, the lighting, and the overall composition. Similarly, generative models learn the underlying structure of the data, allowing them to create synthetic data that resembles the real thing. This synthetic data, as well see, is key to improving predictions.The Problem: Multimodal Data is MessyIt offers a clever solution: it uses generative models to create synthetic data that captures the combined information from all modalities. Think of it as the model learning to imagine new restaurant reviews, complete with pictures and ratings, based on what it has already seen. By learning this generative process, the model gains a deeper understanding of the relationships between the different modes. This understanding then allows it to make better predictions on real data.How GDP Works: Two-Step ProcessIt works in two main steps:Constructing a Conditional Generator: This step focuses on building a generative model that can create synthetic data conditioned on specific input values. For example, the model might generate a synthetic restaurant review (text, image, rating) given a specific cuisine type and price range. This often involves transfer learning, where a pre-trained generative model is fine-tuned on the specific multimodal data. A key component here is the use of dual-level shared embeddings (DSE). Embeddings are a way of representing data as vectors of numbers, capturing semantic meaning. DSE creates shared embeddings at two levels, helping the model to learn relationships between different modalities and also adapt to new, unseen data (a process called domain adaptation).Using Synthetic Data for Point Prediction: Once the conditional generator is trained, it can be used to create synthetic data for any given input. This synthetic data represents the possible responses associated with that input. The model then makes a prediction by finding the response that minimizes the prediction error on this synthetic data. This is like the model saying, Based on what Ive learned about how reviews are generated, this is the most likely rating for this restaurant.Why is it Better?Unified Framework: It handles multimodal data within a single generative modeling framework, eliminating the need for separate models for each modality.Mixed Data Types: It can handle different data types (text, images, tabular data) seamlessly, modeling the conditional distribution of the variables of interest.Robustness and Generalizability: By training on synthetic data, GDP becomes more robust to noise and variations in the real data, improving its ability to generalize to new, unseen examples.Key Contributions and Theoretical FoundationsThe paper makes several important contributions:GDP Framework: Introduces the GDP framework for multimodal supervised learning using generative models.Theoretical Foundation: Provides theoretical guarantees for GDPs predictive accuracy, especially when using diffusion models as the generative backbone. It analyzes two key factors: generation error (how different the synthetic data is from the real data) and synthetic sampling error (the error introduced by using a finite sample of synthetic data).Domain Adaptation: Proposes a novel domain adaptation strategy using DSE to bridge the gap between different data distributions.Multimodal Diffusion Models: The Generative EngineA crucial component of GDP is the use of diffusion models as the generative engine. Diffusion models are a powerful type of generative model that works by gradually adding noise to data until it becomes pure noise, and then learning to reverse this process to generate data from noise. The paper introduces a specialized diffusion model for multimodal data, integrating structured tabular data with unstructured data like text and images through shared embeddings and a shared encoder-decoder architecture.Numerical Examples and ResultsThe paper evaluates GDP on a variety of tasks, including:Domain adaptation for Yelp reviewsImage captioningQuestion answeringAdaptive quantile regressionThe results consistently show that GDP outperforms traditional models and state-of-the-art methods in terms of predictive accuracy, robustness, and adaptability.In simple terms, It is Like a master chef. Imagine a master chef who has tasted thousands of dishes. They dont just memorize the recipes; they understand the complex interplay of flavors, textures, and ingredients. GDP is like that chef. It learns the underlying recipe for multimodal data, allowing it to generate new dishes (synthetic data) and, more importantly, make better predictions about the real dishes it encounters. By understanding the generative process it unlocks a reasonable potential of multimodal data, leading to more accurate and robust predictions across a wide range of applications.The future directions involve making GDP more computationally efficient, applying it to a wider range of problems, and developing a deeper theoretical understanding of its properties with various generative models.Stay Curious.See you in the next one!Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post0 Σχόλια ·0 Μοιράστηκε ·18 Views
-
Understandability of Deep Learning Modelstowardsai.netUnderstandability of Deep Learning Models 0 like February 15, 2025Share this postLast Updated on February 17, 2025 by Editorial TeamAuthor(s): Lalit Kumar Originally published on Towards AI. This member-only story is on us. Upgrade to access all of Medium.Blackbox nature of DL modelsDeep learning systems are a kind of black box when it comes to analysing how they give a particular output, and as the size of the model increase this complexity is further increased. These models despite their impressive performance across various domains, often suffer from lack of transparency issue. Their internal workings are very complex and not easy to understand, hence they are sometimes also referred as black boxes. This lack of transparency hinders trust and limits their applicability in safety-critical domains. It is difficult to judge how these powerful models arrive at their decisions. This challenge, often referred to as the deep learning understandability problem, has spurred significant research efforts to develop techniques that shed light on the inner workings of these models. For, a smaller model, it may be possible to explore the internal representations and try to understand the model's decision-making process. But as the model size increases so is the problem to understand its decision-making mechanism.Then, how to keep a track of these models functioning and interpret them?Following are some of the solutions which handle Deep Learning Models understandability problem:This technique Read the full blog for free on Medium.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post0 Σχόλια ·0 Μοιράστηκε ·17 Views
-
Explaining Transformers as Simple as Possible through a Small Language Modeltowardsai.netLatestMachine LearningExplaining Transformers as Simple as Possible through a Small Language Model 0 like February 14, 2025Share this postAuthor(s): Alex Punnen Originally published on Towards AI. And understanding Vector Transformations and VectorizationsThis member-only story is on us. Upgrade to access all of Medium.I have read countless articles and watched many videos about Transformer networks over these past few years. Most of these were very good, yet I struggled to understand the Transformer Architecture while the main intuition behind it (context-sensitive embedding) was easier to grasp. While giving a presentation I tried a different and more effective way. Hence this article is based on that talk and hoping this will be effective.What I cannot build. I do not understand. Richard FeynmanI also remembered that when I was learning about Convolutional Neural Nets, I did not understand it fully till I built one from scratch. Hence I have built a few notebooks, which you can run in Colab and highlights of those are also presented here without cluttering as I feel without this complexity it won't be possible to understand in depth.Please read this brief article if you are unclear about Vectors in the ML context before you go in.Everything should be made as simple as possible, but not simpler. Albert EinsteinBefore we talk about Transformers and jump into the complexity of Keys, Queries, Values, Self-attention and complexity of Multi-head Attention, which Read the full blog for free on Medium.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post0 Σχόλια ·0 Μοιράστηκε ·30 Views
-
Analyze Data Like A Python Protowardsai.netAnalyze Data Like A Python Pro 0 like February 14, 2025Share this postAuthor(s): Katlego Thobye Originally published on Towards AI. Create a Cheat Sheet and Stop Googling the DocsThis member-only story is on us. Upgrade to access all of Medium.Photo by Mathew Schwartz on UnsplashI remember drowning in online searches for the first few months of my data science journey. Every task, no matter how small, required an endless cycle of searching Stack Overflow, poring over Pandas documentation, and frantically flipping between Matplotlib examples. My code was a Frankensteinian monster of copy-pasted snippets, barely held together with duct tape and hope. I spent more time debugging syntax errors and wrestling with data types than analyzing anything. From struggling to determine the difference between df.loc() and df. iloc(), and the vast assortment of filling methods (method=ffill) these commands swam before my eyes, a jumbled mess of possibilities I could never quite grasp.One particularly frustrating day, I spent hours trying to create a simple scatter plot with different colors based on a categorical variable. I jumped between Matplotlibs documentation, Seaborn tutorials, and countless blog posts, each offering a slightly different (and often conflicting) approach. That's when it hit me: I needed a consolidated resourcemy own data science cheat sheet that captured the essential Pandas, NumPy, and Matplotlib commands I used most frequently, along with clear examples and explanations.I started Read the full blog for free on Medium.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post0 Σχόλια ·0 Μοιράστηκε ·33 Views
-
Investigating Transformer Attention and Reinforcement Learning Dynamics Using SelfGenerated Structural Datatowardsai.netLatestMachine LearningInvestigating Transformer Attention and Reinforcement Learning Dynamics Using SelfGenerated Structural Data 0 like February 14, 2025Share this postLast Updated on February 14, 2025 by Editorial TeamAuthor(s): Shenggang Li Originally published on Towards AI. Cracking the Code: Synthetic Data as the Key to Understanding and Enhancing LLMsThis member-only story is on us. Upgrade to access all of Medium.Photo by Joshua Sortino on UnsplashBuilding large language models (LLMs) can be an endless battle against noisy, messy data. But what if we could strip away that noise and experiment in a clean, controlled environment? Thats exactly what we achieve with synthetic data structured token sequences like A, B, and ACB, designed to mimic relationships between words in NLP. So, we can explore and refine core LLM mechanisms without getting lost in real-world complexities.At the heart of this study are Multi-Head Latent Attention (MLA) and Group Relative Policy Optimization (GRPO), two powerful techniques inspired by DeepSeek. MLA optimizes how attention is distributed across tokens, while GRPO adjusts attention dynamically based on feedback, ensuring that critical tokens receive more focus. For instance, a token sequence like ACB isnt just processed linearly; GRPO learns which tokens to prioritize based on their impact on predictions.This project builds on AlphaGos strategies, where Monte Carlo Tree Search (MCTS) and reinforcement learning refined decision-making. I apply a similar idea to LLMs, using multi-path exploration to let multiple token contexts evolve simultaneously. Reinforcement learning then picks the best paths, cutting down on data needs while Read the full blog for free on Medium.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post0 Σχόλια ·0 Μοιράστηκε ·30 Views
-
AI for Everyone: The Biggest AI Myths People Still Believetowardsai.netAuthor(s): Sophia Banton Originally published on Towards AI. AI for Everyone: The Biggest AI Myths People Still BelieveAI will take away all of our jobs. AI will develop intelligence beyond human control and dominate the world. AI doesnt have personal opinions.I hear these claims all the time but as someone who works in AI, Ive seen firsthand that the truth is far more interesting. Lets separate fact from fiction.1 AI Will Take All Our Jobs The Myth: AI will replace humans, leading to mass unemployment. The Reality: AI automates tasks, not entire jobs it actually creates new work opportunities.Fears about AI-driven job loss arent new people have had the same fears about new technology throughout history. Think back to the introduction of ATMs. People feared bank tellers would become obsolete. The opposite happened: tellers transitioned into customer service and financial advisory roles, and banks actually hired more staff overall. This wasnt a fluke its part of a historical pattern weve seen before.During the Industrial Revolution, automation displaced many traditional workers, but it also created entirely new job categories, like factory supervisors, machine maintenance specialists, and industrial engineers. Todays AI revolution is following the same path, generating new roles such as:AI ethics specialists who ensure AI systems are fair and responsiblePrompt engineers who teach AI how to respond more effectivelyMachine learning auditors who verify AI decisionsAI trainers who help refine and improve AI models Lessons from ExperienceI witnessed this automation transformation firsthand when my team created an AI solution to identify and prioritize influential stakeholders. Previously, teams spent countless hours manually researching who mattered most. Our AI solution automated that process but instead of eliminating jobs, it allowed people to shift their focus from finding the right people to optimizing engagement with them. Instead of replacing human effort, the AI amplified strategic decision-making.Key Takeaway: AI doesnt replace human effort it shifts it. While automation handles repetitive tasks, people are freed up to focus on strategy, creativity, and relationship-building.2 AI Will Become Self-Aware and Take Over (The Terminator Myth) The Myth: AI will wake up and overthrow humanity. The Reality: AI follows the rules we give it it doesnt think or desire like a human.Pop culture fuels many fears about AI taking over. One famous example is Sophia the Robot. Sophia the Robot once joked about world domination, and people panicked. But Sophia is not self-aware its just a chatbot with a humanoid face. The entire fear of AI taking over comes from science fiction, not science. Lessons from ExperienceAI models dont wake up. In fact, they often fail spectacularly when faced with real-world complexity. I once developed an AI to predict trends from physicians notes. It performed exceptionally well on the practice examples, but when tested on actual clinical notes, its accuracy plummeted. The real-world data was too messy filled with shorthand, misspellings, and abbreviations that the model hadnt seen before. AI only knows what it has seen before it doesnt decide to improve itself.Key Takeaway: AI isnt plotting world domination its following math, not malice.3 AI Is Completely Unbiased The Myth: AI is neutral because its based on data, not emotions. The Reality: AI learns and copies human biases from its examples.Bias, or prejudice, can show up in AI responses. This happens because AI learns from existing information, just like a child learns from their surroundings. If the information it learns from has unfairness, the AI might innocently repeat those behaviors when answering our questions or having conversations.Bias in AI isnt theoretical its happening in real-world systems today. We have seen it in areas such as:Credit scoring models offering lower credit limits to women than men, despite similar financial backgrounds. (Stanford HAI)AI-driven risk assessments in the criminal justice system disproportionately flagging minorities as high-risk. (Innocence Project)Facial recognition software making more mistakes with darker skin tones because it was mostly tested on lighter-skinned people. (ACLU)These biases arent intentional, but they are real and can have serious consequences. Lessons from ExperienceWhile building an internal AI solution to identify and prioritize key stakeholders, I was asked whether we should include a column for gender. I declined. Why? Because the data was uneven there were significantly more men in the profession. Including gender would have skewed the AIs decisions, unintentionally reducing the influence scores of women. By recognizing this risk early, we prevented the AI from reinforcing systemic bias and ensured a fairer approach to stakeholder analysis.Key Takeaway: AI is only as fair as what it learns from. Responsible AI means watching out for these biases.4 AI Is Replacing Human Creativity The Myth: AI is taking over art, writing, and music humans will no longer be needed. The Reality: AI assists creativity, but it doesnt replace human originality.This is one of the areas in which modern AI shines brightest the ability to rapidly generate the artistry that lives in our minds. Popular tools like DALLE and Midjourney allow us to dream, and for adults, its a chance to fall in love with art like we did when we were young. But despite their potential, AI tools sit idly, waiting for us. When we make requests, they use the music, paintings, literature, and other works they have learned from to try and create what we ask for. In that regard, AI is relying on our creativity to express its own. Ive experienced this firsthand while using AI for visual storytelling. Lessons from ExperienceI use AI to help generate visuals for my writing. Sometimes, AI adds something unexpected that makes me rethink my own ideas.Below, I asked the AI to help me create an image of my sisters and me as children at a waterfall. I guided the AI, but the yellow bow and the distinct color choices yellow, pink, and blue were all AI-generated. The AI intuitively captured the idea that in families, sisters express themselves in their own unique ways.Despite AIs brilliant additions to my imagination, at the end of the day its my vision shaping the final result. AI isnt replacing my creativity its amplifying it.Key Takeaway: AI enhances creativity it doesnt replace the artist behind the vision.5 AI Fully Understands What We Say The Myth: AI gets us like another human would. The Reality: AI analyzes words based on patterns, not emotionally.Early conversational AIs like Alexa and Siri didnt really understand us. You could ask the same question two different ways and get completely different answers. This wasnt necessarily a bad thing it was just how they worked. Their responses were more rigid and rule-based, meaning they werent great at handling open-ended conversations, but they also didnt go off track as easily.Modern AIs like ChatGPT are better at understanding human language as it is written and spoken, but they still struggle with the quirks of our language things like sarcasm or humor. So, no, they dont fully understand us, but they are getting better at it. And even when they seem to understand us, its just a computer program processing words and making predictions about what the answers should be. Its a remarkable and giant leap for technology, and possibly humanity, but still just a computer program following instructions. Ive tested this firsthand, and the results were eye-opening. Lessons from ExperienceI once tested ChatGPT by feeding it a sarcastic sentence:Me: Oh great, another Monday! My favorite day of the week.AI: Glad to hear you enjoy Mondays! They can be a great start to a productive week!ChatGPT cheerfully responded by agreeing with my sentence. It had no clue I was being sarcastic. This shows that AI doesnt understand words it simply predicts what should come next. Even as conversational AIs like ChatGPT get better, they will still struggle with the subtleties of human expression, like sarcasm and humor. Why? Because they dont truly grasp meaning they just make the best guess of what should come next in conversation.Key Takeaway: Conversational AI is impressive, but its still just using patterns, not true understanding.The Truth About AIAI isnt magic, and its not our enemy. AI is a powerful tool it reflects human ingenuity, but also our biases and limitations. Ive witnessed the myths firsthand, but Ive also seen the incredible potential of AI to improve our lives when used responsibly. So, if youre unsure about AIs role in society and our lives, give it a try and you may unlock new ideas, opportunities and even new forms of self-expression.The greatest myth of all? That AIs future is out of our hands.About the AuthorSophia Banton is an AI Solution Lead specializing in Responsible AI governance, workplace AI adoption, and AI strategy in IT. With a background in bioinformatics, public health, and data science, she brings an interdisciplinary approach to AI implementation and governance. She writes about the real-world impact of AI beyond theory, bridging technical execution with business strategy. Connect with her on LinkedIn or explore more AI insights on Medium.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AI0 Σχόλια ·0 Μοιράστηκε ·36 Views
-
How We Taught Machines to Thinktowardsai.netHow We Taught Machines to Think 0 like February 13, 2025Share this postLast Updated on February 13, 2025 by Editorial TeamAuthor(s): Vita Haas Originally published on Towards AI. And Occasionally Fail SpectacularlyThis member-only story is on us. Upgrade to access all of Medium.Artificial Intelligence is no longer lurking in the shadows its in our pockets, powering our social feeds, recommending what to binge-watch next, and occasionally generating images of people with far too many fingers. AI has evolved from a niche academic pursuit into an inescapable part of daily life. But the journey to this point has been anything but smooth. It is a tale of grandiose predictions, crushing disappointments, and moments so ridiculous they could be lifted straight from a Monty Python sketch. Lets dive into the origins of AI and the often bizarre road that led us to todays world of neural networks, chatbots, and algorithmic absurdity.photo by authorThe story of AI starts in the 1940s, when scientists, fresh off a World War that had forced them to push technological innovation to its limits, started pondering a question: Can machines think? Enter Warren McCulloch and Walter Pitts, two neuroscientists who proposed the first theoretical model of a neural network in 1943. Their work suggested that the brain could be mimicked by a system of mathematical logic gates, inspiring later neural network research.Meanwhile, Alan Turing, the eccentric British mathematician Read the full blog for free on Medium.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post0 Σχόλια ·0 Μοιράστηκε ·19 Views
-
Beginner GPT-4 Prompting For Surprisingly Easy Maps and Reportstowardsai.netBeginner GPT-4 Prompting For Surprisingly Easy Maps and Reports 0 like February 13, 2025Share this postAuthor(s): John Loewen, PhD Originally published on Towards AI. No-code prompts for rapid data visualization reportingThis member-only story is on us. Upgrade to access all of Medium.As a computer science professor of 20+ years, I have heaps of experience in writing Python code for data visualizations.Until recently, the art of creating beautiful data visualizations was reserved for full-on computer programmers and data analysts.This has changed with the new data analysis tools that are built in to the GPT-4 chat interface. You can now create maps and charts and integrate them into a PDF report all from the main interface without having to write a single line of code.It all starts with the upload of a data file (for example, a CSV or XLS) and you prompt GPT-4 to do the rest.How, you ask? Lets do it together, starting with a simple dataset.Lets get off the ground running by downloading an interesting and relevant data set.Our World in Data has oodles of datasets on global development statistics. This includes a dataset on civil liberties that is called the Human Rights Index.Now GPT-4 cannot download data files for us, at least not as of today. I am hoping this is an update in future versions of this LLM.You can initiate the download (in XSLX or Read the full blog for free on Medium.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post0 Σχόλια ·0 Μοιράστηκε ·33 Views
-
NN#6 Neural Networks Decoded: Concepts Over Codetowardsai.netNN#6 Neural Networks Decoded: Concepts Over Code 0 like February 13, 2025Share this postAuthor(s): RSD Studio.ai Originally published on Towards AI. This member-only story is on us. Upgrade to access all of Medium.Source: Article by PushkarIn the previous article, we dissected the mechanics of backpropagation, gradient descent, and the pivotal role of the chain rule in training neural networks. While these concepts form the backbone of deep learning, they are merely the starting point. The real challenge lies in optimizing these processes to ensure models converge efficiently, avoid local minima, and generalize well to unseen data.This article dives into the art and science of optimization techniques, focusing on how to refine stochastic gradient descent (SGD), a form of which we studied in previous article, adapt learning rates dynamically and leverage advanced optimizers. By the end, youll understand how algorithms like Adam, RMSprop and Momentum transcend vanilla SGD to accelerate training and improve model performance.Optimization algorithms are the techniques we use to fine-tune the internal parameters of our models, guiding them towards making more accurate predictions. Theyre the mechanisms that allow neural networks to truly learn from their mistakes and improve over time. They are what makes it all make sense at the core level and more!Stochastic Gradient Descent (SGD) is the workhorse of neural network optimization. Unlike batch gradient descent, which Read the full blog for free on Medium.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post0 Σχόλια ·0 Μοιράστηκε ·20 Views
-
#62 Will AI Take Your Job?towardsai.net#62 Will AI Take Your Job? 0 like February 13, 2025Share this postAuthor(s): Towards AI Editorial Team Originally published on Towards AI. Good morning, AI enthusiasts! Yet another week, and reasoning models and Deepseek are still the most talked about in AI. We are joining the bandwagon with this weeks resources focusing on whether Deepseek is better than OpenAIs o3-Mini, how to achieve OpenAI o1-mini level reasoning with open-source models, and sharing practical tutorials on fine-tuning the DeepSeek R1 model to generate human-like responses, and more.I will also answer one existential question that has probably haunted you: will AI take your job? So hope you enjoy the read!Whats AI WeeklyThis week in Whats AI, I want to address something thousands of you have asked, will AI take my job? So here are some thoughts on how different categories of human work could be impacted by LLM. This could help you decide where to focus your LLM development efforts. Their current capabilities are particularly impactful in routine, repetitive, and information-intensive tasks, while human strengths such as creativity, critical thinking, and emotional intelligence remain indispensable. Lets dive into this in more detail. Read the complete article here or watch the video on YouTube. Louis-Franois Bouchard, Towards AI Co-founder & Head of CommunityLearn AI Together Community section!AI poll of the week!Open source is promising, and yes, the performance gap is bridging too, but if the cost wasnt a problem for large-scale deployment, do you think open-source, while more flexible, is more complex to deploy? Tell us in the Discord thread!Collaboration OpportunitiesThe Learn AI Together Discord community is flooding with collaboration opportunities. If you are excited to dive into applied AI, want a study partner, or even want to find a partner for your passion project, join the collaboration channel! Keep an eye on this section, too we share cool opportunities every week!1. Jakdragonx is looking for developers, community builders, content creators, and educators for their AI-powered education platform. The platform is designed to help educators, parents, and independent learners create and manage custom courses and learning schedules. If you think you would enjoy working in this niche, reach out in the thread!2. Jiraiya9027 wants to learn mathematical concepts and coding for GenAI models like GANs and diffusion models. They are currently looking for study partners, and if you want to focus on the math side too, connect in the thread!3. Mongong_ is working on an intelligence framework that uses hypergraph-based AI, tensor embeddings, and adaptive reasoning to solve complex problems across domains. They are looking for collaborators for the project. If you are interested, contact them in the thread!Meme of the week!Meme shared by bin4ry_d3struct0rTAI Curated sectionArticle of the weekReinforcement Learning-Driven Adaptive Model Selection and Blending for Supervised Learning By Shenggang LiThis article discusses a novel framework for adaptive model selection and blending in supervised learning inspired by reinforcement learning (RL) techniques. It proposes treating model selection as a Markov Decision Process (MDP), where the RL agent analyzes dataset characteristics to dynamically choose or blend models like XGBoost and LightGBM based on performance metrics. The methodology includes Q-learning and a multi-armed bandit approach to optimize model selection while minimizing human intervention. The results indicate that the RL-driven method can outperform traditional static model selection by adapting to changing data distributions, reducing human intervention, and enhancing predictive accuracy. It also highlights the potential for future applications in automated machine learning systems.Our must-read articles1. Fine-tuning DeepSeek R1 to Respond Like Humans Using Python! By Krishan WaliaThis article provides a comprehensive guide on fine-tuning the DeepSeek R1 model to generate human-like responses. It outlines the process of preparing a structured dataset, utilizing Python libraries such as unsloth, torch, and transformers and leveraging Google Colab for computational efficiency. It explains the importance of LoRA adapters in enhancing model responses and details the training process, including hyperparameter settings. It concludes with instructions on saving the fine-tuned model to the Hugging Face Hub, emphasizing the accessibility of fine-tuning for developers aiming to create more emotive and engaging AI interactions.2. DeepSeek-TS+: A Unified Framework for Multi-Product Time Series Forecasting By Shenggang LiThis article presents DeepSeek-TS+, a unified multi-product time series forecasting framework that integrates Multi-Head Latent Attention (MLA) and Group Relative Policy Optimization (GRPO). The author extends MLA into a dynamic state-space model, allowing latent features to adapt over time, while GRPO enhances decision-making by refining forecasts based on previous predictions. The framework is compared to traditional ARMA models and GRU networks, demonstrating superior performance in capturing complex inter-product relationships and non-linear dynamics. It details the technical aspects of MLA-Mamba and GRPO, showcasing their effectiveness in improving forecasting accuracy and robustness in sales predictions across multiple products. Future applications in various domains are also discussed.3. Why DeepSeek-R1 Is so Much Better Than o3-Mini & Qwen 2.5 MAX Here The Results By Gao Dalie ()This article compares the performance of three AI models: DeepSeek-R1, o3-Mini, and Qwen 2.5 MAX. It highlights the strengths and weaknesses of each model in tasks such as coding and mathematics. DeepSeek-R1 is noted for its superior reasoning capabilities and cost-effectiveness, while o3-Mini offers faster responses but lacks depth in reasoning. Qwen 2.5 MAX excels in multimodal tasks but struggles with size accuracy in outputs. It concludes that while o3-Mini shows promise, DeepSeek-R1 remains the preferred choice for complex reasoning and mathematical tasks due to its performance and pricing advantages.4. Achieve OpenAI o1-mini Level Reasoning with Open-Source Models By Yu-Cheng TsaiThis article discusses DeepSeeks R1 model and its distilled versions, designed to enhance reasoning capabilities while being more efficient. The distilled models, trained through supervised fine-tuning, maintain strong reasoning abilities despite their smaller size, making them practical for various applications. It also highlights the performance of these models in reasoning tasks and emphasizes the importance of fine-tuning with domain-specific data to improve their effectiveness in specialized contexts.If you are interested in publishing with Towards AI, check our guidelines and sign up. We will publish your work to our network if it meets our editorial policies and standards.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post0 Σχόλια ·0 Μοιράστηκε ·22 Views
-
From FP32 to INT8: The Science of Shrinking AI Modelstowardsai.netLatestMachine LearningFrom FP32 to INT8: The Science of Shrinking AI Models 0 like February 12, 2025Share this postAuthor(s): Harsh Maheshwari Originally published on Towards AI. Understanding quantization of neural network along with their implementationThis member-only story is on us. Upgrade to access all of Medium.The training compute requirement for the famous AI models have become 45x in the last 10 years! The graph below contains data of this training compute requirement of notable AI models, over the years. Fitting a line on this data shows us that the requirement has increased 4.5 times per year.Image from https://epoch.ai/data/notable-ai-models with CC licenseIn the context of AI models, training compute refers to the total computational power needed to train a model, which is proportional to the memory required. This includes both the storage for the models trainable parameters and the memory needed for the intermediate values generated during inference, which result from the input interacting with the parameters. As models grow larger, both the computational and memory requirements increase drastically.For a computer, memory is ultimately measured in bits. One way to optimize memory usage is by changing how numbers are represented within the model. This technique, known as quantization, reduces the precision of numbers to save space and improve efficiency. Before diving into quantization, lets first explore the different ways numbers can be represented in a computer.The parameter values in a model are very commonly represented Read the full blog for free on Medium.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post0 Σχόλια ·0 Μοιράστηκε ·72 Views
-
The Complexities of Trend Detection in Time Series Datatowardsai.netThe Complexities of Trend Detection in Time Series Data 0 like February 12, 2025Share this postAuthor(s): Lily Chen Originally published on Towards AI. One common use case of working with time series data is trend detection, or trend analysis. A real life example is looking at memory data to see if a service is experiencing a memory leak.Some time ago, my team at Datadog released a Memory Leak Wizard, whose aim was to help users solve memory leak issues. When the feature was first released, we asked the users to manually answer a set of questions before bringing them to what they need to solve their potential memory leak issue.We first ask if there is a goroutine leak.If there isnt (as in the example above), we ask about Live Go Memory.In this case, there is growth, and after users selects Yes, we show the relevant Comparison Flame Graph that they can use to investigate where the growth in memory is from.Of course, in this modern age of AI, you might be wondering why does a human need to answer this? And you are correct, this could very much be automated.Fast forward a few months, we released the automated version of the Memory Leak workflow.Automatic trend analysisHowever, the process of capturing an upward trend in a time series data has some complexities involved. Here are some pitfalls I ran into and lessons I learned that might help others or help my future self if she were to do this kind of work again.First of all, is this even an LLM problem? My conclusion after working on this project is no. This is not a problem best solved by the current generation of LLMs like ChatGPT-4o. But before we get to why thats the case, lets delve first into some of the complexities of working with time series data.The necessity of scaling the dataThe way to identify linear growth in data is by performing linear regression. Linear regression finds the slope of the line of best fit, and the R-Squared value. Typically, to assess whether a dataset has a linear upward trend, we first define a threshold for slope and a threshold for the R-Squared value. A reasonable threshold for the slope could be 0.1. If both values are greater or equal to the threshold, we say the dataset has a linear growth.Linear regression, however, is sensitive to the magnitude of the data. When performing time series analysis, it is important to take that into consideration.For all examples below, the x-axis values are in milliseconds since epoch, or what Date.now() would return in Javascript.Lets look at a simple dataset with 2 points. In both examples below, the y doubled. But in the first case, a linear regression will return a slope thats pretty much 0. A slope of zero (or below) means no upward trend.Case 1: y axis values go from 10 to 20If you try to fit a linear regression line through this data for an hour time frame, you will get a slope close to zero, i.e. no growth, even though the data doubled.Case 2: y axis value go from 1 million to 2 millionWhen looking at memory usage in bytes, the y-axis values could be in the order of millions or billions. In this case, you might not want to scale the x-axis, depending on the actual time frame and memory values are.The caveat with scaling dataIf you scale the x-axis, the value of the magnitude of the slope kind of becomes meaningless on their own, and not to be relied on alone.Lets look at this following example. Lets say the threshold for positive growth is 0.1, and you have 2 points where the y-values go from 100 to 101. Lets say theyre taken 1 minute apart so the deltaX (without scaling) is 60 * 1000.Without scaling, the effective slope is 1 / 60,000 or 0.000017 (i.e. very close to zero), so it would not pass check for linear growth. Good.But lets say we scale the x-axis. Lets set t1 to be 0, and t2 to be t1 + 60000. Or in other words, if t2 is Date.now(), t1 would be t2 - 60000.To scale the timestamp, lets divide the timestamps by 6000. The delta X after scaling would be 10.In this case, the effective slope is 1 / 10, or 0.1, and would pass a threshold of >=0.1 for linear growth.Thus, slope alone is not reliable when scaling of the axis is involved.To get around this, we also need to look at the min(y) (minimum of the y-values) and max(y) (maximum of the y-values) of the entire series. If the max(y) is not a certain threshold greater than min(y), we do not label the graph as trending upward. For example, if max(y) <= min(y) * 1.05, i.e. if the data has not increased by 5% between the min and max, we can call it not increasing in a significant manner, even if linear regression after scaling gives us a large slope with high confidence.This is not foolproof, because what if max(y) is actually before min(y) in the time series? Were using the ideal scenario that max(y) is towards the end and min(y) towards the beginning to filter out non significant growth. Hopefully it is enough in the context of what were doing at Datadog in the context of the Memory Leak Workflow. However, its something to keep an eye out on, and well deal with this problem if we ever cross the bridge.So far Ive only talked about linear regression, which is good at detecting linear data. However, there could be many shapes of growth in the context of a memory leak, such as quadratic or exponential growth.Ill write in a later blog post the challenges of working with other regressions in the context of real world trend analysis, but first, the big question you might be wondering:Can LLMs do better?What would happen if we give the data to LLMs like ChatGPT-4o and ask it to tell us if the graph is increasing?I fed the model some goroutines data and asked if theres an upward trend, and it answered:While the trend is technically increasing, the low R-squared value suggests a lot of variability in the data, meaning the upward movement isnt strong or consistentTo us humans, this is clearly an increase not just linear, but what appears to be exponential. Yet, ChatGPT-4o labeled this as an upward movement that isnt strong or consistent. Fail.You can tell from the image ChatGPT generated that it did a linear regression on the dataset. If you expand to view more, you can see it using scipy.stats.import pandas as pdimport matplotlib.pyplot as pltfrom scipy.stats import linregress# makes pandas DataFrame from the given data...# Perform linear regression to determine trendslope, intercept, r_value, p_value, std_err = linregress(df['timestamp'], df['value'])The aforementioned goroutines data needs scaling; notice how the slope is almost zero? The data would also much better fit an exponential or quadratic curve. Yet ChatGPT-4o isnt smart enough to know those things on its own.We learned 2 things from just this one example:LLMs arent magic, and in this case, it used statistics under the hood.LLMs dont solve this type of problem well (at least not without prompt engineering).Ive written previously about how the current generation of LLMs dont handle context-dependent problems well. When it comes to working with time series data in the real world, its all about the context.Perhaps we could train an LLM to do this through prompt engineering, telling it to run exponential and quadratic regressions if linear regression fails to pass the threshold. But is doing that really better than coding it up? Id have to know exactly what to do to detect an upward trend, whereas in the ideal world, the LLM can do it automatically, the LLM knows all the pitfalls and edge cases to look out for.I have to admit that when I undertook the task of automating the Datadog Memory Leak Workflow, my first instinct was surely ChatGPT can do this. But now that I know ChatGPT merely do regressions under the hood, without even taking into account the intricacies of working with time series data, I know it cant be trusted to solve this kind of problem.It seems nowadays, many people treat any problem as an LLM problem. I often see on LinkedIn posts from people that are along the line of we wont need to hire software engineers soon because I had [insert LLM model] code up some [insert prototype functionality] for me.While Im certain optimistic about the accelerating improvements were making to LLMs, weve observed here that to blindly use LLMs to solve any problem can be dangerous. I know this is true when it comes to code generation, as of Feb 2025. If you trust these models blindly, youd either get lucky or get it wrong.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post0 Σχόλια ·0 Μοιράστηκε ·58 Views
-
Why AI Doesnt Suggest a Viking Longship When You Ask for the Best Cruise Dealstowardsai.netWhy AI Doesnt Suggest a Viking Longship When You Ask for the Best Cruise Deals 0 like February 12, 2025Share this postAuthor(s): Vita Haas Originally published on Towards AI. This member-only story is on us. Upgrade to access all of Medium.Imagine walking into a library the size of a small country, and instead of a single librarian, you have an entire team, each with a very particular specialty. One knows everything about astrophysics, another is a whiz at Renaissance literature, and a third can tell you, in excruciating detail, how to assemble an IKEA bookshelf without losing your sanity. Now, instead of running around trying to find the right librarian yourself, a magical, invisible assistant listens to your request and sends it straight to the most qualified one without you lifting a finger.photo by authorThat, dear reader, is the essence of an Agent Router a system that ensures the right AI takes the right job, so you dont end up getting medical advice from a chatbot trained on stock market predictions.At its core, an Agent Router is an intelligent system that decides which AI agent (or model) should handle a specific task. Think of it as the ultimate traffic controller for AI, ensuring that questions, commands, and problems are directed to the right digital brain for the job.If AI systems were an orchestra, an Agent Router would be Read the full blog for free on Medium.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post0 Σχόλια ·0 Μοιράστηκε ·68 Views
-
Freelancing in AI/ML: Building Projects That Stand Outtowardsai.netFreelancing in AI/ML: Building Projects That Stand Out 0 like February 12, 2025Share this postAuthor(s): Aleti Adarsh Originally published on Towards AI. Have You Ever Wondered Why Some AI/ML Freelancers Succeed While Others Struggle?This member-only story is on us. Upgrade to access all of Medium.Lets be real getting into AI/ML freelancing sounds exciting, right? The idea of working on cutting-edge projects, making money while sipping coffee at your favorite cafe, and having the freedom to choose what you work on? Dreamy. But heres the thing this field is CROWDED. Like, everyone and their cat is learning Python crowded.So, how do you stand out?Ive been there, trying to land gigs in AI/ML, only to realize that having a certification or completing a few courses on Coursera isnt enough. Clients dont just want an AI engineer they want a problem solver. Someone who can take messy, real-world data and turn it into something valuable.This article is going to be your roadmap. Whether youre just starting out or struggling to land high-paying gigs, Ill walk you through how to build standout projects that scream, Hire me now!Lets start with some tough love. If youve been applying for freelancing gigs and hearing crickets, youre probably making one (or more) of these mistakes:Your portfolio is just a bunch of tutorial projects If your GitHub is filled with Titanic survival predictions and MNIST digit classifiers, we Read the full blog for free on Medium.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post0 Σχόλια ·0 Μοιράστηκε ·46 Views
-
Tired of LLM Chaos? LiteLLM Should Be Your Defaulttowardsai.netTired of LLM Chaos? LiteLLM Should Be Your Default 0 like February 12, 2025Share this postLast Updated on February 12, 2025 by Editorial TeamAuthor(s): Mandar Karhade, MD. PhD. Originally published on Towards AI. Stop juggling multiple LLM APIs and their standardsThis member-only story is on us. Upgrade to access all of Medium.Large Language Models (LLMs) are revolutionizing everything from content creation to customer service. But navigating the ever-growing landscape of models and APIs from OpenAI, Google, Anthropic, and more can quickly become a tangled mess. Imagine juggling multiple API keys, wrestling with different code formats, and constantly rewriting your application every time you want to try a new model. Frustrating, right?This is where LiteLLM steps in as a powerful and surprisingly simple solution. Think of it as your universal remote control for LLMs. This article will break down why LiteLLM is gaining traction, when its your best choice, and how it stacks up against alternatives, all in a clear and practical way.Why Should You Care About LiteLLM?LiteLLM offers a suite of compelling advantages that make working with LLMs significantly smoother and more efficient. Heres the breakdown in easy-to-digest points:LiteLLM provides a unified, OpenAI-compatible API to access a wide range of LLMs from different providers. This means you learn one way to interact with models like OpenAIs GPT series, Googles Gemini family, Anthropics Claude, and even open-source models. No more wrestling with different documentation and SDKs for each provider.Imagine you want Read the full blog for free on Medium.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post0 Σχόλια ·0 Μοιράστηκε ·30 Views
-
NN#5 Neural Networks Decoded: Concepts Over Codetowardsai.netNN#5 Neural Networks Decoded: Concepts Over Code 0 like February 12, 2025Share this postLast Updated on February 12, 2025 by Editorial TeamAuthor(s): RSD Studio.ai Originally published on Towards AI. This member-only story is on us. Upgrade to access all of Medium.Source: Analytics VidhyaIn our ongoing quest to unlock the brains of AI, weve built a foundation of understanding, from the neuron-inspired perceptron to the power of activation functions in creating non-linear models. Weve even equipped our models with a compass in the form of loss functions, allowing them to measure the discrepancy between their predictions and the real world. But possessing that compass doesnt inherently ensure correct navigation. The next question is, How can our models know when the needle has gone astray? and how to correct them?This is where backpropagation enters the story.Backpropagation is the ingenious algorithm that allows neural networks to truly learn from their mistakes. Its the mechanism by which they analyze their errors and adjust their internal parameters (weights and biases) to improve their future performance. Just as a skilled musician tunes their instrument to produce harmonious sounds, backpropagation allows neural networks to tune themselves, gradually refining their predictions until they resonate with the underlying patterns in the data. So, how do machine brains tune?The Challenge of Blame Assignment: Where Did We Go Wrong?Imagine a complex machine with thousands, millions, or even billions of interconnected Read the full blog for free on Medium.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post0 Σχόλια ·0 Μοιράστηκε ·28 Views
-
DeepSeek R1: The Controversial Innovation That Slashes Training Energy by 40% But Is It Really Paving the Way for a Greener Future?towardsai.netLatestMachine LearningDeepSeek R1: The Controversial Innovation That Slashes Training Energy by 40% But Is It Really Paving the Way for a Greener Future? 0 like February 11, 2025Share this postLast Updated on February 12, 2025 by Editorial TeamAuthor(s): Hasitha Pathum Originally published on Towards AI. This member-only story is on us. Upgrade to access all of Medium.Image by DeepseekArtificial intelligence (AI) research and development have witnessed exponential growth in recent years. As machine learning models become more complex and powerful, the computational resources required to train these models have surged dramatically. This increase in computational demand has led to rising energy consumption and, consequently, a significant environmental footprint. Amidst these challenges, DeepSeek R1 is making headlines by reducing training energy consumption by an impressive 40%. This breakthrough not only promises to lower operational costs but also heralds a new era of sustainable AI research.In this article, we delve into the transformative impact of DeepSeek R1 on AI training efficiency. We explore how this innovation reduces energy consumption, the implications for operational costs, and the broader environmental benefits. We also examine the technical innovations driving this advancement and discuss the future prospects for sustainable, energy-efficient AI.The past decade has seen deep learning models achieve remarkable feats from mastering complex games to revolutionizing natural language processing. However, the computational power needed to train these models often comes at a significant cost. Large-scale neural networks require enormous amounts of energy, which not only drives up operational expenses Read the full blog for free on Medium.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post0 Σχόλια ·0 Μοιράστηκε ·48 Views
-
Scale HUMAN animations with OmniHuman-1 (Technical Review)towardsai.netScale HUMAN animations with OmniHuman-1 (Technical Review) 0 like February 11, 2025Share this postLast Updated on February 12, 2025 by Editorial TeamAuthor(s): Deltan Lobo Originally published on Towards AI. Rethinking the Scaling-Up of One-Stage Conditioned Human Animation ModelsThis member-only story is on us. Upgrade to access all of Medium.Image courtesy: OmniHuman-1 paperVideo generation models have been fun till now. Weve seen several video generation models that create content especially HUMAN ANIMATIONS.But those models focus only on either facial expressions or body movement but not both.TikToks parent company ByteDance bridges this gap by releasing OmniHuman[1] Now we are able to try out a broader range of animations, from subtle lip-syncing to full-body motion by maintaining consistency in different body proportions.Of course yeah, traditional models were capable of creating human videos. But as I said it was only limited to a specific body movement and also even if it was created, the characters in the video would look lifeless.I hope you might be aware of some portrait videos giving explanations about something.But OmniHuman solves this up to a certain extent. Whether youre animating a portrait, half-body, or full-body character, this model adapts seamlessly. Supports both talking and singing, handles human-object interactions and challenging body poses, and accommodates different image styles.Video generation has made huge leaps in recent years Thanks to diffusion models[2] and transformer architectures[3]. These models work incredibly well for general video generation (like text-to-video systems).But, Read the full blog for free on Medium.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post0 Σχόλια ·0 Μοιράστηκε ·103 Views
-
Parameter-Efficient Fine-Tuning (PEFT): A Hands-On Guide with LoRAtowardsai.netAuthor(s): BeastBoyJay Originally published on Towards AI. Imagine building a powerful AI model without needing massive computational resources PEFT makes that possible, and Ill show you how with LoRA from scratch.IntroductionTraditional fine-tuning challenges :Fine-tuning large models sounds cool until reality hits. Imagine trying to sculpt a masterpiece but needing a giant crane just to lift your tools. Thats what traditional fine-tuning feels like. Youre working with millions (sometimes billions) of parameters, and the computational cost can skyrocket faster than your coffee bill during finals week.Hardware Struggles:Got a spare supercomputer lying around? Probably not.GPUs heat up like your phone during a marathon PUBG session.RAM gets maxed out faster than your Netflix binge in 4K.Data Dilemma:You need a ton of data, or your model behaves like a forgetful student on exam day.Gathering and cleaning that much data? A nightmare in itself.Snail-Speed Training:Hit run and wait and wait and maybe even take a nap while your model chugs along.Maintenance Mayhem:Tiny tweaks mean re-training the whole colossal beast.Waste of time, energy, and your already-thin patience.Need a solution :PEFT, solution for this traditional bulky fine-tuning method. Think of PEFT (Parameter-Efficient Fine-Tuning) as upgrading a car by just changing the tires instead of rebuilding the whole engine. Instead of retraining every parameter in a massive model, PEFT tweaks just the essential parts saving time, resources, and sanity.Why it rocks:Resource-Smart: No supercomputer required.Time-Saving: Faster results with minimal effort.Scalable: Handles large models like a pro.What is PEFT ?PEFT (Parameter-Efficient Fine-Tuning) is like giving your AI model a performance boost by only adjusting the most important parameters, rather than retraining the entire thing. Think of it as overclocking your model without needing to upgrade the whole motherboard.Why Is PEFT Necessary?Reduced Training Costs:Instead of burning through a fortune in GPU time to retrain the whole model, PEFT lets you fine-tune with minimal resources, saving both cash and computing power.Faster Adaptation to Tasks:PEFT allows you to quickly adapt large models to new tasks by only tuning the necessary components speeding up the training process without sacrificing accuracy.Minimal Memory Requirements:Rather than loading the entire model into memory, PEFT uses fewer resources, letting you work on large-scale models without draining your system.How PEFT works ?The core idea of the PEFT is toTypes of PEFT techniques :LoRA (Low-Rank Adaptation) :Lets talk about one of the coolest tricks in PEFT (Parameter-Efficient Fine-Tuning) LoRA. Imagine youve got this massive pre-trained model, like a Transformer, thats already packed with all sorts of knowledge. Now, instead of modifying everything in the model, LoRA lets you tweak just the essentials specifically, a few sneaky little low-rank matrices that help the model adapt to new tasks. The rest of the model stays frozen in time, like an immovable fortress, while LoRA does its magic.So, how does LoRA work its sorcery?Heres the gist of it: Lets say theres a weight matrix W in the model (maybe in the attention mechanism, where the model decides whats important in the input). LoRA comes in and says, Why not approximate W as the product of two much smaller matrices, A and B? Mathematically, its like:WABThese matrices, A and B, are low-rank which, in nerd terms, means they have way fewer parameters to deal with compared to the original weight matrix. The magic? Because A and B are so much smaller, weve got fewer parameters to tune during fine-tuning.But thats not all heres the real kicker:When it comes to fine-tuning, LoRA focuses only on training the parameters of A and B. The rest of the massive model stays locked, untouched. Its like having the keys to just one door in a huge mansion youre making minimal changes, but theyre all targeted and impactful.By doing this, you reduce the number of parameters you need to update during fine-tuning, which makes the whole process way more efficient. Youre getting the same task-specific performance without the heavy lifting of retraining everything. Its like finding the shortcut in a maze you still reach the goal, but with way less effort!Adapters :Lets talk about Adapters not the kind you plug into your phone charger, but these nifty little modules that slot into the transformer architecture like a perfect puzzle piece!Imagine youve got a powerful pre-trained model, and you need to adapt it to a new task. Instead of retraining the entire thing, you introduce an adapter a lightweight, task-specific module that fits neatly after each transformer block. The best part? You dont have to touch the core model at all. Its like adding a few extra gears to a well-oiled machine without dismantling the whole thing.Heres the lowdown on how adapters work:Insertion into Layers: Think of an adapter as a mini-module that slides in after key layers in the transformer, like right after the attention or feed-forward layers. It usually consists of a couple of fully connected layers, where the input size is the same as the original layer (because, lets face it, we dont want to mess with the models flow), but the output dimension is smaller. Its like a sleek, efficient middleman.Task-Specific Tuning: Heres where the fun happens: When you fine-tune the model, only the adapter parameters are updated. That means the core model packed with all its pre-trained knowledge stays frozen, like a wise professor whos teaching you everything they know, but youre just adding some extra knowledge with the adapter. The adapter absorbs the task-specific tweaks without messing up the original wisdom of the model.The Big Win?The core model retains its massive, generalized knowledge while the adapter learns just enough to tackle the new task. Its like teaching a world-class musician a new song without changing their entire repertoire. Efficient, fast, and keeps things clean.Prefix Tuning :Lets get into the groove of Prefix Tuning a clever, minimalist trick that adds just the right amount of guidance to steer a model without overhauling its entire structure. Its like giving your car a gentle nudge to take a different route without touching the engine. Cool, right?Heres how Prefix Tuning works its magic:Learnable Prefix: Picture this: before the model gets to process the input text, you prep a small, task-specific set of tokens this is your prefix. Its like a little note that says, Hey, focus on this when youre working! These tokens are learnable, meaning you can train them to carry the relevant task information. Importantly, the rest of the models weights stay locked down, untouched.Controlling Attention: The prefix isnt just a random add-on. These tokens guide the models attention mechanisms, telling it which parts of the input to focus on. Its like placing a signpost at the start of the road, directing the model on where to head next. So, when the model generates an output, its subtly influenced by the prefix tokens, helping it stay on track for the specific task at hand.The Beauty of Prefix Tuning?The brilliance of prefix tuning lies in its simplicity. Youre not retraining the entire model or altering its inner workings. Instead, youre enhancing its attention just enough to guide it in the right direction for the task you need it to perform.BitFit :Lets dive into BitFit, a deceptively simple yet highly effective PEFT technique thats like tweaking just the small dials on a well-tuned machine to get the perfect result. Instead of overhauling the entire system, BitFit focuses on the tiniest components to make a big impact.How BitFit Works:Bias Tuning: Imagine your model is a giant network of gears and levers (aka weights) that are already trained and doing their thing. Now, instead of retraining every gear, BitFit zooms in on the bias terms the extra parameters that get added to the final output of each layer. These bias terms are like small adjustments that help shift the models output in the right direction, but they dont have the complexity or weight of the entire models weights.Minimalist Fine-Tuning: The trick is that only the bias terms are tuned, while the rest of the models weights remain frozen. Bias terms are much smaller in number compared to the full set of weights, so youre making very targeted changes. Its like fine-tuning the volume on a speaker without touching the entire sound system. Youre still getting the desired sound (or task performance), but without the hassle of fiddling with everything.Why BitFit Rocks:The real charm of BitFit is its efficiency. By focusing on just a few parameters, youre able to fine-tune a model for a specific task while keeping the computational load light. Its a great way to make tweaks without the heavy lifting of full model fine-tuning, making it fast and resource-friendly.Implementing LORA from scratch in Pytorch:Now i will explain you how you can Implement the LORA from scratch so that you have more deep understanding about it.importing necessary libraries :import torchimport torchvision.datasets as datasetsimport torchvision.transforms as transformsimport torch.nn as nnfrom tqdm import tqdmMaking torch model deterministic :_ = torch.manual_seed(0)Training a small model :Lets have some fun with LoRA! Well start by building a small, simple model to classify those classic MNIST digits you know, the ones everyone loves to work with when learning machine learning. But heres the twist: instead of stopping at basic digit classification, were going to take it up a notch.Well identify one digit our network struggles with (maybe it just doesnt vibe with the number 7?), and fine-tune the whole thing using LoRA to make it smarter and better at recognizing that tricky number. Its going to be a cool mix of training, tweaking, and improving perfect for seeing LoRA in action!Loading the Dataset:transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])# Load the MNIST datasetmnist_trainset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)# Create a dataloader for the trainingtrain_loader = torch.utils.data.DataLoader(mnist_trainset, batch_size=10, shuffle=True)# Load the MNIST test setmnist_testset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)test_loader = torch.utils.data.DataLoader(mnist_testset, batch_size=10, shuffle=True)# Define the devicedevice = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")Model Architecture:class SimpleNN(nn.Module): def __init__(self, hidden_size_1=1000, hidden_size_2=2000): super(SimpleNN,self).__init__() self.linear1 = nn.Linear(28*28, hidden_size_1) self.linear2 = nn.Linear(hidden_size_1, hidden_size_2) self.linear3 = nn.Linear(hidden_size_2, 10) self.relu = nn.ReLU() def forward(self, img): x = img.view(-1, 28*28) x = self.relu(self.linear1(x)) x = self.relu(self.linear2(x)) x = self.linear3(x) return xmodel = SimpleNN().to(device)Training Loop:def train(train_loader, model, epochs=5, total_iterations_limit=None): cross_el = nn.CrossEntropyLoss() optimizer = torch.optim.Adam(model.parameters(), lr=0.001) total_iterations = 0 for epoch in range(epochs): model.train() loss_sum = 0 num_iterations = 0 data_iterator = tqdm(train_loader, desc=f'Epoch {epoch+1}') if total_iterations_limit is not None: data_iterator.total = total_iterations_limit for data in data_iterator: num_iterations += 1 total_iterations += 1 x, y = data x = x.to(device) y = y.to(device) optimizer.zero_grad() output = model(x.view(-1, 28*28)) loss = cross_el(output, y) loss_sum += loss.item() avg_loss = loss_sum / num_iterations data_iterator.set_postfix(loss=avg_loss) loss.backward() optimizer.step() if total_iterations_limit is not None and total_iterations >= total_iterations_limit: returntrain(train_loader, model, epochs=1)After executing the above code your small model will get trained and ready to inference,but before that let me keep a copy of the original weights (cloning them) so later we can prove that a fine-tuning with LoRA doesnt alter the original weights.original_weights = {}for name, param in model.named_parameters(): original_weights[name] = param.clone().detach()Now, Testing the performance of the Trained Mode :def test(): correct = 0 total = 0 wrong_counts = [0 for i in range(10)] with torch.no_grad(): for data in tqdm(test_loader, desc='Testing'): x, y = data x = x.to(device) y = y.to(device) output = model(x.view(-1, 784)) for idx, i in enumerate(output): if torch.argmax(i) == y[idx]: correct +=1 else: wrong_counts[y[idx]] +=1 total +=1 print(f'Accuracy: {round(correct/total, 3)}') for i in range(len(wrong_counts)): print(f'wrong counts for the digit {i}: {wrong_counts[i]}')test()Output:Accuracy: 0.954wrong counts for the digit 0: 31wrong counts for the digit 1: 17wrong counts for the digit 2: 46wrong counts for the digit 3: 74wrong counts for the digit 4: 29wrong counts for the digit 5: 7wrong counts for the digit 6: 36wrong counts for the digit 7: 80wrong counts for the digit 8: 25wrong counts for the digit 9: 116As you can see the worst performing digit is 9.LoRA Implementation :Define the LoRA parameterization as described in the paper. The full detail on how PyTorch parameterizations work is here: clickclass LoRAParametrization(nn.Module): def __init__(self, features_in, features_out, rank=1, alpha=1, device='cpu'): super().__init__() # Section 4.1 of the paper: # We use a random Gaussian initialization for A and zero for B, so W = BA is zero at the beginning of training self.lora_A = nn.Parameter(torch.zeros((rank,features_out)).to(device)) self.lora_B = nn.Parameter(torch.zeros((features_in, rank)).to(device)) nn.init.normal_(self.lora_A, mean=0, std=1) # Section 4.1 of the paper: # We then scale Wx by /r , where is a constant in r. # When optimizing with Adam, tuning is roughly the same as tuning the learning rate if we scale the initialization appropriately. # As a result, we simply set to the first r we try and do not tune it. # This scaling helps to reduce the need to retune hyperparameters when we vary r. self.scale = alpha / rank self.enabled = True def forward(self, original_weights): if self.enabled: # Return W + (B*A)*scale return original_weights + torch.matmul(self.lora_B, self.lora_A).view(original_weights.shape) * self.scale else: return original_weightsimport torch.nn.utils.parametrize as parametrizedef linear_layer_parameterization(layer, device, rank=1, lora_alpha=1): # Only add the parameterization to the weight matrix, ignore the Bias # From section 4.2 of the paper: # We limit our study to only adapting the attention weights for downstream tasks and freeze the MLP modules (so they are not trained in downstream tasks) both for simplicity and parameter-efficiency. # [...] # We leave the empirical investigation of [...], and biases to a future work. features_in, features_out = layer.weight.shape return LoRAParametrization( features_in, features_out, rank=rank, alpha=lora_alpha, device=device )parametrize.register_parametrization( model.linear1, "weight", linear_layer_parameterization(model.linear1, device))parametrize.register_parametrization( model.linear2, "weight", linear_layer_parameterization(model.linear2, device))parametrize.register_parametrization( model.linear3, "weight", linear_layer_parameterization(model.linear3, device))def enable_disable_lora(enabled=True): for layer in [model.linear1, model.linear2, model.linear3]: layer.parametrizations["weight"][0].enabled = enabledDisplay the number of parameters added by LoRA.total_parameters_lora = 0total_parameters_non_lora = 0for index, layer in enumerate([model.linear1, model.linear2, model.linear3]): total_parameters_lora += layer.parametrizations["weight"][0].lora_A.nelement() + layer.parametrizations["weight"][0].lora_B.nelement() total_parameters_non_lora += layer.weight.nelement() + layer.bias.nelement() print( f'Layer {index+1}: W: {layer.weight.shape} + B: {layer.bias.shape} + Lora_A: {layer.parametrizations["weight"][0].lora_A.shape} + Lora_B: {layer.parametrizations["weight"][0].lora_B.shape}' )# The non-LoRA parameters count must match the original networkassert total_parameters_non_lora == total_parameters_originalprint(f'Total number of parameters (original): {total_parameters_non_lora:,}')print(f'Total number of parameters (original + LoRA): {total_parameters_lora + total_parameters_non_lora:,}')print(f'Parameters introduced by LoRA: {total_parameters_lora:,}')parameters_incremment = (total_parameters_lora / total_parameters_non_lora) * 100print(f'Parameters incremment: {parameters_incremment:.3f}%')Output:Layer 1: W: torch.Size([1000, 784]) + B: torch.Size([1000]) + Lora_A: torch.Size([1, 784]) + Lora_B: torch.Size([1000, 1])Layer 2: W: torch.Size([2000, 1000]) + B: torch.Size([2000]) + Lora_A: torch.Size([1, 1000]) + Lora_B: torch.Size([2000, 1])Layer 3: W: torch.Size([10, 2000]) + B: torch.Size([10]) + Lora_A: torch.Size([1, 2000]) + Lora_B: torch.Size([10, 1])Total number of parameters (original): 2,807,010Total number of parameters (original + LoRA): 2,813,804Parameters introduced by LoRA: 6,794Parameters incremment: 0.242%Freezing all the parameters of the original network and only fine tuning the ones introduced by LoRA. Then fine-tune the model on the digit 9 and only for 100 batches.# Freeze the non-Lora parametersfor name, param in model.named_parameters(): if 'lora' not in name: print(f'Freezing non-LoRA parameter {name}') param.requires_grad = False# Load the MNIST dataset again, by keeping only the digit 9mnist_trainset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)exclude_indices = mnist_trainset.targets == 9mnist_trainset.data = mnist_trainset.data[exclude_indices]mnist_trainset.targets = mnist_trainset.targets[exclude_indices]# Create a dataloader for the trainingtrain_loader = torch.utils.data.DataLoader(mnist_trainset, batch_size=10, shuffle=True)# Train the network with LoRA only on the digit 9 and only for 100 batches (hoping that it would improve the performance on the digit 9)train(train_loader, model, epochs=1, total_iterations_limit=100)After Training the above new LoRA introduced weights modelVerifying that the fine-tuning didnt alter the original weights, but only the ones introduced by LoRA.# Check that the frozen parameters are still unchanged by the finetuningassert torch.all(model.linear1.parametrizations.weight.original == original_weights['linear1.weight'])assert torch.all(model.linear2.parametrizations.weight.original == original_weights['linear2.weight'])assert torch.all(model.linear3.parametrizations.weight.original == original_weights['linear3.weight'])enable_disable_lora(enabled=True)# The new linear1.weight is obtained by the "forward" function of our LoRA parametrization# The original weights have been moved to net.linear1.parametrizations.weight.original# More info here: https://pytorch.org/tutorials/intermediate/parametrizations.html#inspecting-a-parametrized-moduleassert torch.equal(model.linear1.weight, model.linear1.parametrizations.weight.original + (model.linear1.parametrizations.weight[0].lora_B @ model.linear1.parametrizations.weight[0].lora_A) * model.linear1.parametrizations.weight[0].scale)enable_disable_lora(enabled=False)# If we disable LoRA, the linear1.weight is the original oneassert torch.equal(model.linear1.weight, original_weights['linear1.weight'])Testing the network with LoRA enabled (the digit 9 should be classified better)# Test with LoRA enabledenable_disable_lora(enabled=True)test()Output:Accuracy: 0.924wrong counts for the digit 0: 47wrong counts for the digit 1: 27wrong counts for the digit 2: 65wrong counts for the digit 3: 240wrong counts for the digit 4: 89wrong counts for the digit 5: 32wrong counts for the digit 6: 54wrong counts for the digit 7: 137wrong counts for the digit 8: 61wrong counts for the digit 9: 9Testing the network with LoRA disabled (the accuracy and errors counts must be the same as the original network)enable_disable_lora(enabled=False)test()Output:wrong counts for the digit 0: 31wrong counts for the digit 1: 17wrong counts for the digit 2: 46wrong counts for the digit 3: 74wrong counts for the digit 4: 29wrong counts for the digit 5: 7wrong counts for the digit 6: 36wrong counts for the digit 7: 80wrong counts for the digit 8: 25wrong counts for the digit 9: 116Conclusion :The implementation weve walked through demonstrates the power and efficiency of LoRA in practice. Through our MNIST example, weve seen how LoRA can significantly improve model performance on specific tasks (like digit 9 recognition) while adding only 0.242% more parameters to the original model. This perfectly illustrates why PEFT techniques, particularly LoRA, are becoming increasingly important in the AI landscape.Key takeaways from our exploration:PEFT techniques like LoRA make fine-tuning accessible even with limited computational resourcesBy focusing on crucial parameters, we can achieve significant improvements in task-specific performanceThe original model weights remain unchanged, allowing for multiple task-specific adaptationsThe implementation requires minimal code changes to existing architecturesThe future of AI model adaptation lies in such efficient techniques that balance performance with resource utilization. As models continue to grow in size and complexity, PEFT approaches will become even more crucial for practical applications.GitHub Repository :I have created an project in which you can fine tune resnet on your custom dataset by using the technique that we have just learned.For the complete code and implementation details, visit: github.com/yourusername/peft-lora-guideJoin thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AI0 Σχόλια ·0 Μοιράστηκε ·100 Views
και άλλες ιστορίες