-
- ΑΝΑΚΆΛΥΨΕ
-
-
-
-
The leading AI community & content platform making AI accessible to all.
2k writers | 330k followers
2k writers | 330k followers
Πρόσφατες ενημερώσεις
-
TOWARDSAI.NETVolga — On-Demand Compute in Real-Time AI/ML — Overview and ArchitectureAuthor(s): Andrey Novitskiy Originally published on Towards AI. TL;DR Volga is a real-time data processing/feature calculation engine tailored for modern AI/ML. It is designed to support various types of features, including streaming (online), batch (offline), and on-demand features, via a hybrid push+pull architecture: a custom Streaming Engine (for online+offline) and an On-Demand Compute Layer (for on-demand). In this post, we will dive deep into the on-demand compute layer, on-demand features, use cases, and architecture. Content: What it is and what it is for Examples Architecture API Overview A missing part of the Ray ecosystem How Streaming and On-Demand work together Next steps What it is and what it is for Most real-time systems operate on streams of events (e.g. user clicks/purchases, ride requests, credit card transactions, etc.) and represent a fully event-driven system: all data transformations/custom logic can be somehow tied to an event that triggered it, and that is true for any part of the system. These kinds of systems can be handled by a stream processing engine alone. ML workloads are a bit different: they belong to a class of request-based systems (in our case, we talk about model inference requests). This class of systems includes most modern web applications, whose architecture is based on the request-response model and built using the notion of a server (or a web service). Generally speaking, the request-response pattern can also be transformed into a purely event-driven system where each request is a separate event (this is a good design direction to explore). However, in practice, request-based systems are usually stateless and have different requirements for scalability, latency, throughput, data availability and fault tolerance, resulting in a different infrastructure stack requirements compared to what a streaming engine offers. As a result, in the context of real-time data processing and feature generation, most ML-based systems require a layer that would be able to process incoming requests with minimized latency, perform arbitrary computation logic, and serve results as soon as possible so it can be used in other parts of the system (e.g. model serving) or directly by the user — this is what we call the On-Demand Compute layer. Examples Some examples of real-time ML systems that require on-demand request-time computations and cannot rely only on a streaming engine alone may include: A search personalization system which relies on a user’s GPS coordinates: the data is available only at request-time and should be handled immediately for relevant results. A recommender system, where responses rely on an expensive computation (e.g., embedding dot product, GPU-based operations, etc.) and/or communication with 3rd party services (e.g., querying another model) — handling this in a streaming engine would create a bottleneck and would require a very careful design. This is the part that many “AI/ML-ready” streaming engines miss: event-time processing alone is not sufficient to cover all real-time AI/ML needs. For that reason, Volga separates its architecture into the Push Part, where the Streaming Engine is the king, and also introduces the Pull Part, handled by the On-Demand Compute Layer, where request-time compute is done. Most modern ML feature/data platforms adopt a similar architecture (On-Demand features in Tecton, Feature Extractors in Fennel, Resolvers in Chalk). Another good example is Pinterest’s Homefeed Recommender System’s Real-Time Feature Pipeline, which also has a separation between event-time compute, handled by a streaming engine (Flink), and request-time compute, handled by a custom service. Real-time Feature Generation Pipeline at Pinterest Architecture In summary, in Volga, the On-Demand Compute Layer is a pool of workers used to execute arbitrary user-defined logic (what we call an on-demand feature) at request/inference time and serve it back to the user. It is built to be interoperable with Volga’s Streaming Engine, so the whole system can run arbitrary computation DAGs that include execution at both event and request times. Let’s take a look at the working parts of the system and the request lifecycle. On-Demand Compute Layer architecture OnDemandCoordinator This is the first component that comes into play. The OnDemandCoordinator is an actor responsible for orchestrating and tracking OnDemandServers—worker actors (more below). The OnDemandCoordinator handles logical worker isolation (configuring which features each worker is responsible for), scaling up and down, health checks, and restarts if needed. Load Balancer The outside component that handles incoming requests and distributes them among cluster nodes. This is usually a cloud-based resource (for our benchmarks, we used AWS Application Load Balancer), but in practice, it can be any other setup (e.g., Nginx/MetalLB). Note that the Load Balancer is not a part of Volga and represents a most likely deployment pattern. OnDemandServer A Python worker that performs logic described in on-demand features. The worker process runs an instance of a Starlette server to handle incoming requests, each listening to a fixed port on a host node. This way, the OS (Linux only) round-robins all the requests to workers on that node, keeping the load balanced. Each worker is initiated with a list of feature definitions that it is supposed to handle (initiation is handled by the OnDemandCoordinator). When a request arrives, the OnDemandServer parses which target features it is supposed to execute and compiles a DAG of all dependent features. Remember that Volga supports two types of features: on_demand (handled by the On-Demand Layer) and pipeline (handled by the streaming engine). Since the most powerful aspect of Volga is that it supports both event and request time compute, on_demand features can depend on both other on_demand features as well as pipeline features. This fact creates a special execution flow: the features DAG is topologically sorted and executed in-order; on_demand features are executed using their dependents' results as inputs. In the On-Demand environment, pipeline features are treated simply as reads to storage: the end-to-end flow of Volga is that the actual execution of pipeline features is handled by the streaming engine, which writes pipeline execution results to shared storage asynchronously. The On-Demand worker simply reads the corresponding pipeline feature results (the way it reads it is also configurable in OnDemandDataConnector, more about it below) and uses it as input for on-demand logic. Storage The storage is an abstraction shared between Push and Pull parts: streaming jobs materialize pipeline results in the storage, on-demand workers perform asynchronous computations based on materialized data and serve results. Note that in the On-Demand environment, the storage is read-only (on_demand features do not need to store anything). The storage is a configurable interface, which can use an arbitrary backend (via implementing PipelineDataConnector and OnDemandDataConnector). Note that since we can run Volga in both online and offline modes, each mode has different storage requirements, e.g. online requires minimizing read/write latency (Redis/Scylla), offline is for capacity-optimized store (HDFS, lakes): this is something for the user to consider. API Overview On-Demand features are created using the on_demand decorator and can depend on pipeline features or other on_demand features. from volga.api.source import sourcefrom volga.api.on_demand import on_demand# mock simple pipeline feature via streaming source@source(TestEntity)def test_feature() -> Connector: return MockOnlineConnector.with_periodic_items( items=[...] period_s=1 )# on-demand features@on_demand(dependencies=[('test_feature', 'latest'])def simple_feature() -> TestEntity: dep: TestEntity, multiplier: float = 1.0"""Simple on-demand feature that multiplies the value"""return TestEntity( id=dep.id,) value=dep.value * multiplier, timestamp=datetime.now() The dependencies parameter describes the dependent features; the order should match the corresponding arguments in the function. Note that dependency is a 2-tuple: the first value is the name of the dependent feature, and the second is the query_name defined in OnDemandDataConnector (MockDataConnector in our case): it defines how we fetch values for test_feature - in this case, we simply fetch the latest (more about data connector queries below). Start workers and register features to serve: # start coordinator firstcoordinator = create_on_demand_coordinator(OnDemandConfig( num_servers_per_node=2, server_port=DEFAULT_ON_DEMAND_SERVER_PORT, data_connector=OnDemandDataConnectorConfig( connector_class=MockOnDemandDataConnector, connector_args={} )))ray.get(coordinator.start.remote())# register 'simple_feature'ray.get(coordinator.register_features.remote( FeatureRepository.get_features_with_deps(['simple_feature']))) Compose a request using required keys and query features in real-time: request = OnDemandRequest( target_features=['simple_feature'], feature_keys={ 'simple_feature': [ {'id': 'test-id'}, {'id': 'test-id-1'}, {'id': 'test-id-2'} ] }, udf_args={ 'simple_feature': {'multiplier': 2.0} })client = OnDemandClient(DEFAULT_ON_DEMAND_CLIENT_URL)response = self.loop.run_until_complete(client.request(request))pprint(response.results)...OnDemandResponse(results={'simple_feature': [ [{'id': 'test-id', 'value': 4.0, 'timestamp': '2025-04-06T16:30:24.324526'}], [{'id': 'test-id-1', 'value': 6.0, 'timestamp': '2025-04-06T16:30:24.324536'}], [{'id': 'test-id-2', 'value': 8.0, 'timestamp': '2025-04-06T16:30:24.324541'}]]}, server_id=11) A missing part of the Ray ecosystem A careful reader may note that the On-Demand architecture somewhat resembles that of Ray Serve (model serving infrastructure used by Ray). Indeed, both systems are request-based and are complementary to each other, as both systems represent vital parts of the end-to-end model inference flow: getting features first and then using them for actual inference. While Ray provides the model serving part, feature serving/calculation is missing, requiring users to rely on custom data serving layers, which significantly increases complexity and operational costs of running real-time ML. The On-Demand Layer is designed to fill this spot and, along with model serving, to become the initial user-facing frontier for modern ML-based systems. This will help to move towards a more homogeneous system design, removing outside dependencies and, with Volga’s Streaming Engine, unifying real-time data processing on top of Ray. How Streaming and On-Demand work together This section discusses the shared storage between the Streaming Engine (Push) and On-Demand (Pull) parts and how the On-Demand layer interfaces with it. All of the on_demand features directly or indirectly depend on pipeline features' results, which exist in shared storage (this includes simply serving pipeline features). To simplify the feature definition API and hide the data layer control from the user, the decision was made to abstract all storage-related data fetching logic from the actual feature logic into a separate class that can be reused across different features: OnDemandDataConnector (see the Architecture diagram above). Since pipeline jobs can produce semantically different results, the way we fetch data for on_demand features should also be configurable to reflect this semantics, e.g. some features need the most recent values, some need to window data until a certain period, some need to perform more complex queries like nearest-neighbor search (RAGs). Let's take a look at InMemoryActorOnDemandDataConnector used in the local dev environment (represents an interface with InMemoryCacheActor): class InMemoryActorOnDemandDataConnector(OnDemandDataConnector): def __init__(self): self.cache_actor = None async def init(self): self.cache_actor = get_or_create_in_memory_cache_actor() def query_dict(self) -> Dict[str, Callable]: return { 'latest': self.fetch_latest, 'range': self.fetch_range, } async def fetch_latest() -> self, feature_name: str, keys: List[Dict[str, Any]] List[List[Any]]: return await self.cache_actor.get_latest.remote(feature_name, keys) async def fetch_range() -> self, feature_name: str, keys: List[Dict[str, Any]], start: Optional[Decimal], end: Optional[Decimal] List[List[Any]]: return await self.cache_actor.get_range.remote( feature_name, keys, start, end ) async def close(self): pass The core method that the user needs to define is query_dict: It maps an arbitrary fetching function to a simple name that we pass to the on_demand decorator when creating features (remember the latest param in the sample_feature example above). Arguments passed to these functions are parsed from the request object using the same arg names as keys. This separation of data fetching from feature logic allows for much cleaner and reusable code, as well as safe, controlled, and optimized access to the data layer — user-defined code won’t be able to hammer the storage or do anything indecent. Next steps On-demand features currently work only in online mode; Volga does not support calculating on-demand features on historical data. This is an interesting engineering problem that requires turning request-response-based systems into an event stream (suitable for offline mode) and building a streaming pipeline to fully execute on the streaming engine. As you may have noticed, on-demand features get general parameters and data connector parameters from the user’s request. What if we want to get those from the dependent feature? This will require creating an arg_mapping to map arguments to functions and updating the executor ordering logic. Some on-demand features may require local state (e.g. initializing a client for a third-party service). Fault tolerance with health checks and restarts needs to be implemented. Current execution is on an asyncio loop; a thread pool and process/actor pool are needed. If you are interested in helping with these and becoming a contributor, check out the RoadMap and feel free to reach out! In the next post, we will run load-testing benchmarks and show how the On-Demand Compute Layer performs under high request load. Thanks for reading! Please star the project on GitHub, join the community on Slack, share the blog and leave your feedback. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI0 Σχόλια 0 Μοιράστηκε 25 ViewsΠαρακαλούμε συνδέσου στην Κοινότητά μας για να δηλώσεις τι σου αρέσει, να σχολιάσεις και να μοιραστείς με τους φίλους σου!
-
TOWARDSAI.NETHow to Prompt GPT-4 For Authentic Data Visuals That Make You Look Like a ProHow to Prompt GPT-4 For Authentic Data Visuals That Make You Look Like a Pro 0 like April 23, 2025 Share this post Last Updated on April 23, 2025 by Editorial Team Author(s): John Loewen, PhD Originally published on Towards AI. Exploratory analysis and data storytelling on global forest loss Prompting GPT-4 for exploratory data analysis and storytelling are an essential tool to add to your data science toolbox. For example, bar chart analysis, including grouped bar charts and rate-change, offers different perspectives on a particular data series. In this tutorial, let’s use GPT-4’s impressive data analysis capabilities to analyze the Global Tree Cover Loss dataset. With simple prompting, we can create multiple bar chart visualizations from our data set to provide a more granular country level analysis. Let’s step through some examples! The foundation of any good visualization is the dataset. For this tutorial, we’re working with a tree cover loss dataset. The Global Forest Watch data set contains key statistics about global forests, including rates of forest change, forest extent and drivers of deforestation (found HERE). At their website, scroll down until you see the following: Global annual tree cover loss dataset. Click on the download icon to download the file. After you have downloaded the file, you will need to unzip it. In the new directory there will be a few files. You can use the Upload File utility as part of the GPT-4 interface to load our data set. In the GPT-4 chat window, click on the ‘+’ icon to upload a file.Select the file called… Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post0 Σχόλια 0 Μοιράστηκε 17 Views
-
TOWARDSAI.NETThe Power of Less: How Chain of Draft Makes AI Reasoning Faster and CheaperThe Power of Less: How Chain of Draft Makes AI Reasoning Faster and Cheaper 0 like April 23, 2025 Share this post Author(s): MKWriteshere Originally published on Towards AI. In today’s AI landscape, large language models (LLMs) like GPT-4 and Claude can solve complex problems with impressive accuracy. But this capability comes at a cost, both in processing time and computational resources. What if these AI systems could think just as effectively while writing much less? That’s the premise behind an innovative approach called “Chain of Draft” (CoD), developed by Zoom Communications researchers. Let’s explore how this technique helps AI models reason more efficiently by writing less, much like how humans jot down quick notes rather than full paragraphs when solving problems. When tackling complex problems, modern AI systems often use a technique called Chain of Thought (CoT). This approach encourages the AI to break down problems step-by-step, showing its work in detailed explanations. While effective, this method leads to extremely wordy responses. For example, when solving a simple math problem like “Jason had 20 lollipops and gave some to Denny, leaving 12. How many did he give away?”, an AI using Chain of Thought might write: Initially, Jason had 20 lollipops.After giving some to Denny, Jason now has 12 lollipops.To find out how many lollipops Jason gave to Denny, we need to calculate the difference between the initial number of lollipops and the remaining number.We can… Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post0 Σχόλια 0 Μοιράστηκε 20 Views
-
TOWARDSAI.NETMCP with PydanticAIMCP with PydanticAI 0 like April 23, 2025 Share this post Last Updated on April 23, 2025 by Editorial Team Author(s): Barrett Studdard Originally published on Towards AI. Building a basic MCP server and interacting with PydanticAICredit to Kenny Eliason on Unsplash In my prior article on building a streaming approach with Pydantic AI, I built a pattern around streaming with API calls to Anthropic. In this article, we’ll look to expand to use Pydantic AI MCP Clients. Before implementing a connection to a MCP server via Pydantic AI, let’s review what MCP is at a very high level as well as implement a simple server with Fast API. At a high level, an MCP server allows for a standardized way to define how an LLM interacts with tools. Instead of defining tools on a one-off basis in our LLM application, we can utilize prebuilt or custom servers that expose tools. This allows for both reusability for servers we may build ourselves or plugging into various vendor or open source MCP servers — preventing us from reinventing the wheel when we want to use a new tool. For more information, I’d recommend reading through Anthropic’s release post, the model context protocol site, and browsing through the python sdk github repo. For our MCP server, we’ll define one very basic tool — getting a user’s name. This allows us to hardcode a name and verify the LLM is picking up the information… Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post0 Σχόλια 0 Μοιράστηκε 16 Views
-
TOWARDSAI.NETIs AI Truly Thinking? Or Just Crunching Data Like a Pro?Is AI Truly Thinking? Or Just Crunching Data Like a Pro? 0 like April 23, 2025 Share this post Author(s): Harshit Kandoi Originally published on Towards AI. Photo by Matt Seymour on Unsplash “When my AI chatbot predicted my next question before I even thought? Or just guessing my queries based on my typos?” Welcome to the future, where AI models like ChatGPT-4o and Grok 3 can write essays, summarise trending data, and even crack jokes better than my college roommate’s. It feels so talented. But is it? Or is it just leeching mountains of data and giving us nearly good approximations? This blog explores the question of whether AI is truly intelligent like a human or just a fast, fancy pattern-matching machine. We’ll explore what “smart” really means, the difference between reasoning and regurgitating, and why it matters, especially when AI is everywhere from medical centers to job hiring decisions. 👀 Have you ever had a moment where an AI response made you say, “Wait… is it reading my mind?” Drop your story in the comments below — we’re listening. When we talk about intelligence, especially for humans, we’re not just talking about getting an A+ grade or remembering random history quizzes. Real intelligence means when we to face unfamiliar situations and still manage to figure things out. It’s about reasoning, adapting, creating, and doing something new from no clear path. Photo by… Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post0 Σχόλια 0 Μοιράστηκε 17 Views
-
TOWARDSAI.NETHow to Instantly Explain Your Code with Visuals (Powered by GPT-4)How to Instantly Explain Your Code with Visuals (Powered by GPT-4) 0 like April 22, 2025 Share this post Author(s): Mukundan Sankar Originally published on Towards AI. Tired of people not reading your blog or GitHub README? Here’s how to turn your Python script into a visual story anyone can understand in seconds. In Part 1, I introduced Code to Story — a tool that helps you turn raw Python code into a structured, human-readable story. I built it for one reason: I was tired of writing code I couldn’t explain. Not because I didn’t understand it. But because when someone — a hiring manager, a teammate, even a friend — asked, “What does this do?” …I froze. I’d stumble. I’d default to low-energy phrases like: “Oh, it’s just something I was playing around with…” I realized I had spent hours solving a real problem… only to fail at the most important step: communication. So I built a tool that solved that — something that turns code into a narrative. That was Part 1. But there was a deeper layer I hadn’t solved yet. Even after turning code into blog posts, people still didn’t engage. Why? Because they didn’t have the time. When I sent my blog to: Future hiring managersFriends I respectDevelopers I admire …they didn’t react. Not because they didn’t care. But because they were busy. Busy working. Job-hunting. Parenting. Resting. The truth hit me hard: No one owes your work their time. But you can make your work easier to understand in less time. So I asked myself: What’s the fastest way for someone to “get it” without reading anything? And the… Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post0 Σχόλια 0 Μοιράστηκε 43 Views
-
TOWARDSAI.NETTAI#149: OpenAI’s Agentic o3; New Open Weights Inference Optimized Models (DeepMind Gemma, Nvidia Nemotron-H)Author(s): Towards AI Editorial Team Originally published on Towards AI. What happened this week in AI by Louie This week, OpenAI finally released its anticipated o3 and o4-mini models, shifting the focus towards AI agents that skillfully use tools. DeepMind also made significant contributions with its cost-effective Flash 2.5 model and a highly optimized, distilled version of Gemma 3, underscoring a key industry trend: major labs are increasingly delivering inference-optimized open-weight models ready for deployment. OpenAI’s new o3 and o4-mini arrived with a notable shift in focus from the December model preview. Rather than just a reasoning upgrade over o1, o3’s core innovation lies in its agentic capabilities, trained via reinforcement learning to intelligently use tools like web search, code interpreter, and memory in a loop. It behaves more like a streamlined, rapid-response “Deep Research-Lite“. Set it to a task, and o3 can often return an answer in just 30 seconds to three minutes, much faster than the 10–30 minutes Deep Research might take. While o3’s outputs are less detailed, this speed and integrated tool use make it ideal for many real-world questions needing quick, actionable answers. On the BrowseComp complex web search benchmark (which we discussed in depth last week), o3 achieves 49.7% vs. Deep Research 51.5%, o4-mini 28.3%, and GPT-4o (all with browsing). Of course, this doesn’t measure the more in-depth research tasks where Deep Research still excels. This agentic nature allows o3 to break past some LLM-based search limitations. Because it actively plans and uses tools like search iteratively and natively, users don’t need to be as wary of it simply summarizing the first low-quality blog post it finds. It can handle multiple files, provide coherent, complete answers, and automatically perform multiple web searches to find up-to-date information, significantly cutting down errors and making the ChatGPT experience far more useful. Another key new strength is its ability to manipulate image inputs using code, cropping and zooming in to make sure it can identify key features. This has led to some fun demonstrations of o3’s skills at the “GeoGuessr” photo location guessing game! o3 and o4-mini’s Agentic reasoning vs. prior generations of reasoning models Source: Towards AI. Benchmark results also show o3’s new strengths. On the Aider polyglot coding benchmark, o3 achieved an impressive 79.6% (at $111.0 for the evaluation), surpassing Gemini 2.5 Pro’s 72.9% (but this was delivered at just $6.3 cost) and GPT-4.1’s 52.4% (at $9.9 cost). However, a hybrid approach using o3-high as the planning “architect” and GPT-4.1 as the code-writing “editor” set a new state-of-the-art, scoring 82.7% on Aider while reducing costs to just $69.3. The cost-efficient o4-mini also impressed, achieving 72.0% on Aider (at $19.6 per evaluation), making it a powerful option for developers balancing performance and budget. On OpenAI’s MRCR long context test at 128,000 token length, o3 scored 70.3%, demonstrating solid long-context ability, though still trailing Gemini Pro 2.5’s leading 91.6%. While the o3 and o4-mini releases show clear progress over OpenAI’s predecessors in both performance and cost-efficiency, the perceived capability leap feels somewhat moderated by prior access to similar functionalities through Deep Research and the strong performance of new competitors like Gemini Pro 2.5. In other releases this week, DeepMind’s Gemini Flash 2.5 offers great performance at a very affordable base price ($0.15/M input, $0.60/M output), but activating its “Thinking Mode” for reasoning tasks comes with a substantial output token cost premium, jumping to $3.50/M. In contrast, xAI’s Grok-3 Mini, priced consistently at $0.30/M input and $0.50/M output, has emerged as a surprising leader in cost-efficiency for reasoning models. On the GPQA science benchmark, Grok-3 Mini scored 79%, slightly edging out both Flash 2.5 Thinking (78%) and the more expensive o4-mini high (78%). For code generation on LiveCodeBench v5, Grok-3 Mini achieved 70%, again surpassing Flash 2.5 Thinking (63%) while remaining competitive with o4-mini high (80%). These results position Grok-3 Mini as a great option for developers seeking high performance in reasoning and coding without breaking the bank. This week also had something new for open weights models. DeepMind’s latest iteration of Gemma models, optimized through Quantization-Aware Training (QAT), continues an important industry trend: advanced AI labs are increasingly performing advanced inference optimization internally, rather than leaving it to users. QAT enables Gemma 3’s powerful 27B parameter model — initially needing 54 GB of memory (BF16) — to run smoothly on consumer GPUs like the NVIDIA RTX 3090 using just 14.1 GB (int4), while maintaining high quality. This trend was also demonstrated this week by NVIDIA’s Nemotron-H, a family of efficient hybrid models (8B, 47B, 56B) combining Mamba-2, Self-Attention, and FFN layers. Their compressed 47B model matches larger 70B-class models like Llama 3 and Qwen 2 while being significantly faster and smaller. NVIDIA used compression techniques like layer dropping and FFN pruning, specifically targeting deployment on consumer hardware like 32GB GPUs. Similar efforts to release inference-ready models were also seen recently from the DeepSeek team, who distilled their R1 reasoning model into smaller, easier-to-deploy “dense” models. This shift suggests developers will increasingly rely on officially optimized, deployment-ready variants instead of undertaking quantization or pruning themselves. Why should you care? For non-technical LLM users and businesses: As we noted last week, the growing variety of models available — often now with very specific strengths and weaknesses — means selecting the right tool for the job is more important than ever. You might use o3 via ChatGPT for its quick, tool-assisted answers and natural interaction style, but switch to Gemini for tasks requiring deep understanding of very long documents, or explore Grok mini for quick and cost-sensitive reasoning. Experimenting with how these models use tools (like search or analysis) is key to unlocking their value for everyday tasks. Moving beyond relying on one single model will become standard practice. We think the new o3 model will later become the “router” layer in OpenAI’s upcoming GPT-5; trained not just to use tools but also to activate different LLM models for specific tasks according to their strengths. This will simplify the user experience, but a strong understanding of the core strengths of different foundation models will still lead to the best results. For LLM developers and enterprises: OpenAI’s o3 introduces an accessible, ready-to-use smart agent. Its core strength isn’t just raw reasoning, but its trained ability to intelligently select and use tools. Experimenting with this agentic capability in complex LLM workflows is crucial. We anticipate that this architecture, where a central model intelligently routes tasks and orchestrates tools and other specialized models, will be the foundation for most future agent systems. Learning how to leverage o3’s tool-use skills now provides a valuable head start. The success of hybrid approaches on Aider (o3 planner + 4.1 executor) also proves that combining models based on their unique strengths is becoming essential for state-of-the-art performance and efficiency. Developers who master these multi-model strategies and understand the nuances of agentic tool use will be best positioned going forward. Hottest News 1. OpenAI Launches a Pair of AI Reasoning Models, o3 and o4-Mini OpenAI has unveiled two new AI models: o3 and o4-mini, replacing the earlier o1 and o3-mini versions. The o3 model stands out as OpenAI’s most advanced reasoning AI to date, capable of integrating visual inputs, such as sketches or whiteboards, into its reasoning processes. It can also manipulate images by zooming or rotating them to aid interpretation. They are now available to ChatGPT Plus, Pro, and Team users, with o3-pro support expected soon. 2. xAI Adds a ‘Memory’ Feature to Grok xAI has announced a new “memory” feature for its chatbot, Grok. This feature enables Grok to remember details from past conversations, allowing it to provide more personalized responses over time. For instance, if you ask Grok for recommendations, it will tailor its suggestions based on your previous interactions, assuming you’ve used it enough to establish your preferences. 3. OpenAI Open Sourced Codex CLI OpenAI has released Codex CLI, an open-source command-line tool designed to run locally from the terminal software. This tool links OpenAI’s models with local code and computing tasks, enabling the models to write and edit code on a desktop and perform actions like moving files. Codex CLI also supports multimodal reasoning by allowing users to pass screenshots or low-fidelity sketches to the model, combined with access to local code. 4. Cohere Launched a New Embed 4 Cohere has introduced Embed 4, a multimodal search solution tailored for businesses. This tool leverages advanced language models to enhance search capabilities across various data types, offering improved efficiency and scalability for enterprise applications. Embed 4 delivers state-of-the-art accuracy and efficiency, helping enterprises securely retrieve their multimodal data to build agentic AI applications. 5. Mistral Released a Suite of Models for Different Classification Tasks Mistral AI has announced new advancements in AI model optimization, focusing on enhancing scalability and practical applications for businesses. The company aims to refine large-scale models for improved performance across diverse industries. These models are designed to handle various classification tasks, providing businesses with more efficient and scalable AI solutions. 6. Google Released a Preview Version of Gemini 2.5 Flash Google has unveiled Gemini 2.5 Flash, now available in preview. This new version introduces a “thinking budget” feature, allowing developers to control the amount of computational reasoning the AI uses for different tasks. This provides a balance between quality, cost, and response latency. Gemini 2.5 Flash offers improved speed, efficiency, and performance for developers building AI-powered applications. 7. Meta Unveils Perception Language Model Meta has introduced PerceptionLM, a dataset and model aimed at advancing AI’s visual understanding capabilities. This open-access release provides tools for training models that interpret complex visual data with greater detail and accuracy. PerceptionLM is designed to enhance AI’s ability to comprehend and reason about visual information, contributing to more sophisticated multimodal AI systems. Five 5-minute reads/videos to keep you learning 1. Building an AI Study Buddy: A Practical Guide to Developing a Simple Learning Companion This step-by-step guide walks you through creating a lightweight study companion using Groq’s ultra-fast inference with Llama 3 or Mistral, paired with LangChain, FAISS, and Sentence Transformers for RAG. You’ll also learn how to deploy a simple, modular frontend using Streamlit — perfect for summarizing, generating, and learning on the go. 2. Identifying and Scaling AI Use Cases OpenAI released a practical framework to help teams find high-impact AI use cases. The guide includes department-specific examples, real-world stories, and actionable checklists to support adoption and scaling across the organization. 3. Automating Content Creation With Qwen2.5-Omni Qwen2.5-Omni, a multimodal model by Alibaba’s Qwen team, handles text, images, audio, and video, and generates both text and speech. This tutorial shows you how to set up the model, automate audio content creation, and integrate it with vector databases for advanced workflows. 4. Introducing HELMET: Holistically Evaluating Long-Context Language Models HELMET offers a holistic benchmark for evaluating long-context language models (LCLMs). This blog post explains key findings and how practitioners can use HELMET to differentiate between various LCLMs in future research and applications. It also includes a guide for using HELMET with HuggingFace. 5. How To Think About Agent Frameworks This blog analyzes agent frameworks and distinguishes between agents and workflows. It also introduces LangGraph as an orchestration framework that combines declarative and imperative APIs to manage complex agentic systems effectively. 6. Voice AI & Voice Agents: An Illustrated Primer This guide explores the current state of conversational voice AI in 2025, detailing how LLMs are used to transform unstructured speech into structured data across various applications, including healthcare and customer service. It covers the core technologies involved — speech-to-text, text-to-speech, audio processing, and network transport — and discusses best practices for building production-ready voice agents. Repositories & Tools Kimi VL is an open-source Mixture-of-Experts vision-language model that excels in multimodal reasoning and long-context understanding with only 2.8B activated parameters. Jump Server is an open-source Privileged Access Management tool that provides DevOps and IT teams with on-demand and secure access to SSH, RDP, Kubernetes, Database, and RemoteApp endpoints. Code Server allows users to run VS Code on any machine and access it in the browser. BitNet is the first open-source, native 1-bit LLM at the 2-billion parameter scale. Top Papers of The Week 1. Collaborative Reasoner: Self-Improving Social Agents With Synthetic Conversations This paper introduces Collaborative Reasoner (Coral), a framework for evaluating and improving collaborative reasoning in LLMs. Coral turns traditional reasoning problems into multi-agent, multi-turn tasks, where two agents must reach a shared solution through natural conversation. These dialogues simulate real-world dynamics, pushing agents to challenge, negotiate, and align on joint conclusions. 2. ReTool: Reinforcement Learning for Strategic Tool Use in LLMs ReTool enhances long-form reasoning by combining tool use with reinforcement learning. It uses real-time code execution and feedback to refine strategies over time. ReTool’s 32B model scores 67% on the AIME benchmark, outperforming text-only RL baselines and demonstrating emergent behaviors like self-correcting code, pushing the frontier in hybrid neuro-symbolic reasoning. 3. xVerify: Efficient Answer Verifier for Reasoning Model Evaluations As reasoning models like OpenAI’s o1 adopt slow-thinking strategies, traditional evaluations fall short. xVerify offers a more reliable answer verifier, trained on the VAR dataset, achieving over 95% accuracy. It significantly outperforms existing methods and proves effective and generalizable across tasks. 4. Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? This study questions the assumption that Reinforcement Learning with Verifiable Rewards (RLVR) enhances reasoning abilities beyond what’s already in the base model. RL training shifts output distribution towards more rewarding responses, improving performance at lower pass@k values but restricting reasoning boundaries. Distillation, unlike RLVR, introduces genuinely new capabilities, prompting a reevaluation of RLVR’s impact on reasoning capacities in LLMs. 5. InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models InternVL3 introduces a new multimodal pre-training approach, simultaneously acquiring linguistic and multimodal skills. Employing advanced techniques like V2PE and SFT, InternVL3 sets a new standard with a 72.2 score on the MMMU benchmark among open-source MLLMs. It rivals ChatGPT-4o and others, planning a public release of its training data and model weights to promote open research. Quick Links 1. OpenAI is working on a social network prototype, similar to X (formerly Twitter), focused on sharing AI-generated images from ChatGPT. The project adds a new dimension to OpenAI’s rivalry with Elon Musk and Meta, both of which are also exploring social AI integrations. The platform will help feed real-time data back into model training. 2. DeepSeek is open-sourcing its modified inference engine built on vLLM. After running into challenges like code divergence and infrastructure lock-ins, they’re shifting gears — partnering with open-source projects, modularizing components, and contributing their performance optimizations to the community. 3. OpenAI just introduced Flex processing, a lower-priced API tier for tasks that don’t need fast responses. Available in beta for o3 and o4-mini, it’s aimed at non-production use cases like evaluations, data enrichment, or background jobs. The tradeoff: slower responses and occasional downtime. Who’s Hiring in AI LLM Data Researcher @Turing (USA/Remote) Software Engineer II @GumGum (Santa Monica, CA, USA) Backend Developer @ZenGRC (Remote) Data Scientist Intern — Singapore @GoTo Group (Singapore) Senior Software Engineer ( Search) @Simpplr (Hybrid, India) Interested in sharing a job opportunity here? Contact [email protected]. Think a friend would enjoy this too? Share the newsletter and let them join the conversation. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI0 Σχόλια 0 Μοιράστηκε 37 Views
-
TOWARDSAI.NETSay Goodbye to Manual Data Analysis: Meet Your New AI Agent!Say Goodbye to Manual Data Analysis: Meet Your New AI Agent! 0 like April 22, 2025 Share this post Last Updated on April 22, 2025 by Editorial Team Author(s): Gencay I. Originally published on Towards AI. Explore the future of AI with Data Science, ChatGPT, Agents, and Data Analysis for better decisions. I love doing Data analysis, don’t get me wrong. But what if it can be enhanced? Or better automated? So this journey has started with a bold question: What If Data Analysis Didn’t Need You Anymore? But no, no. There should be a driver in the driver's seat, which will be us. In this article, you and I will explore how AI automates/enhances Data Analysis. And trust me, it is not as hard as you think! And if you don’t want to do it by yourself, I’ll provide you with a link where you can use it. Let’s get started! In this dataset, we are going to use this data project. Reference 5 years ago, you should have downloaded the dataset, read it with pandas, and explored it by using codes like this. import pandas as pddf = pd.read_csv("/Users/learnai/Downloads/LiveLongerData (1).csv")df.head() Here is the output. SS of the outputs What about the column names? Let’s see. df.info() Here is the output. SS of the outputs Good, but outdated. What about developing an AI agent empowered with ChatGPT? Trust me, it is not too complicated. Just paste the code I’ll give you. Photo by Marvin Meyer on Unsplash First, let’s install these libraries. pip install langchain-openaipip install langchain_experimental.agentspip install pandas openai Good, now you are good to go. Let’s see the entire… Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post0 Σχόλια 0 Μοιράστηκε 47 Views
-
TOWARDSAI.NETDeepSeek-V3 Explained Part 4: Multi-Token PredictionDeepSeek-V3 Explained Part 4: Multi-Token Prediction 0 like April 22, 2025 Share this post Author(s): Nehdiii Originally published on Towards AI. Vegapunk №04 One Piece Character Generated with ChatGPT This is the fourth article in our DeepSeek-V3 series, where we explain the final major architectural innovation in DeepSeek [1, 2] models: multi-token prediction. In previous articles, we explained how DeepSeek carefully balances various architectural trade-offs: Multi-head Latent Attention optimizes memory efficiency while maintaining model performance during decoding.DeepSeekMoE balances knowledge sharing and expert specialization within the Mixture of Experts (MoE) architecture.Auxiliary-Loss-Free Load Balancing achieves effective load balancing without compromising the main training objective. In this article, we will explore how DeepSeek strikes yet another balance — between efficiency and quality in text generation. Table of contents for this article: Background: Introduce the fundamentals of the decoding process in LLMs, focusing on how next-token prediction works and its limitations. We also review prior works on multi-token prediction (MTP), discussing the design choices, as well as the advantages and limitations of these approaches.DeepSeek’s Multi-Token Prediction: Explain how it works and discuss the design choices, with a focus on how it differs from prior works. Additionally, we introduce how DeepSeek’s MTP strategy can be combined with speculative decoding to accelerate inference.Evaluation: Discuss the impact of MTP on both training performance and inference efficiency.Summary.Reference. Other articles in the DeepSeek series: Part 1 : Multi-head… Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post0 Σχόλια 0 Μοιράστηκε 46 Views
-
TOWARDSAI.NETBuilding GPT From First Principles: Code and IntuitionAuthor(s): Akhil Shekkari Originally published on Towards AI. Figure — 0 The main goal of this blog post would be to Understand each component inside GPT with Intuition and be able to Code it in Plain PyTorch. Please have a look at the below two Figures. Our Implementation will heavily follow the Figure-1. 2. I will be taking lots of Ideas and concepts from Figure – 2 (taken from Anthropic’s paper on https://transformercircuits.pub/2021/framework/index.html ). we will use this Figure for our Intuition and Understanding.Figure — 1 Figure — 2 For Every component, I will go over the theory required. This is important because we have to understand why that particular component / concept is used. Next I will go over coding part. Lets look at all the Individual components of a Transformer: 1. Residual Stream(Also Known as Skip Connections) 2. Embedding Matrix3. Layer Normalization4. Positional Encoding5. Self Attention Mechanism(Causal Masking)6. Multi — Layer Perceptron7. UnEmbedding Matrix Before looking at Residual Stream, It is always good to approach concepts with an example in mind. One of the main reason I came across when people say they find it difficult to code is, the problem with input output dimensions. Before and After every Transformation, we should know how the vector changes and its dimension changes. Let the example sentence be “Messi is the greatest of all time”. For this example, there are 7 tokens(1 word = 1 token for simplicity). let us take 1 token can be represented in 50 dimensions. we call this d_model. Batch size is usually the no. of examples we feed to model at given point of time. since we are working with demo example and we have only one example, let us consider a batch_size of 1.Let us Assume max length of any sentence in our dataset is less than or equal to 10. we call this seq_len. Let the total no. of tokens in our vocabulary is 5000. we call this d_vocab. So the Configurations of our toy example is: d_model = 50 d_vocab = 5000Note: The above config is for toy example. In our Code, we will be working with Actual GPT level Configs. (See Below) Let’s define our Config Note: There are lot oh hyper parameters, which you haven’t seen. But don’t worry, we will cover all of them in later parts of the blogfrom dataclasses import dataclass## lets define all the parameters of our model@dataclassclass Config: d_model: int = 768 debug: bool = True layer_norm_eps: float = 1e-5 d_vocab: int = 50257 init_range: float = 0.02 n_ctx: int = 1024 d_head: int = 64 d_mlp: int = 3072 n_heads: int = 12 n_layers: int = 12cfg = Config()print(cfg) Note that @dataclass is simplifying a lot of stuff for us. With @dataclass we get constructor and a clean output representation when we want to print the parameters of that class. No Need of huge boilerplate code. Without that, we would have to write the same class like this. class Config: def __init__(self, d_model=768, d_vocab=50257): self.d_model = d_model self.d_vocab = d_vocab def __repr__(self): return f"Config(d_model={self.d_model}, d_vocab={self.d_vocab})" Some Common Implementation details for all the Components: 1. For Every Component , we define a class. 2. Every Class needs to subclass nn.Module. This is important for many reasons like storing model parameters, using helper functions etc., You can read more about this at https://pytorch.org/tutorials/beginner/nn_tutorial.html3. super().__init__() makes sure the constructor of nn.Module gets called. https://www.geeksforgeeks.org/python-super-with-__init__-method/4. We then pass config obj to that class, to set values for our parameters as required.What is Embedding matrix ? This is just the plain lookup table. You look for an embedding vector of a particular token. Questions to ask before coding: Q. What is the Input for Embedding Matrix ? A. Int[Tensor, ‘batch position’] Here [batch, position] represent dimensions. position refers to token postion.Q. What is the Output for Embedding Matrix ? A. Float[Tensor, ‘batch seq_len d_model’]Returns corresponding embedding vectors in the above shape.class Embed(nn.Module): def __init__(self, cfg: Config): super().__init__() self.cfg = cfg self.W_E = nn.Parameter(t.empty((cfg.d_vocab, cfg.d_model))) nn.init.normal_(self.W_E, std = self.cfg.init_range) def forward(self, tokens: Int[Tensor, 'batch position']) -> Float[Tensor, 'batch seq_len d_model']: return self.W_E[tokens] Every trainable parameter in a neural network needs to be tracked and updated based on gradients. PyTorch simplifies this with nn.Parameter(). Any tensor wrapped in nn.Parameter is automatically registered as a trainable weight.nn.init.normal_ fills the tensor with values drawn from a normal distribution (a.k.a. Gaussian), in-place. Our Embedding matrix will be of the shape (d_vocab, d_model). Intuitively, we can read it as, for every token the matrix row will represent its corresponding embedding vector. What is Positional Embedding ? This can also be thought of as a lookup table. Here Instead of token Ids, we have numbers/positions.Positional Embedding Refers to a learned vector assigned to each position (like a token embedding).Think of this as model learns that certain positions have certain tokens and relationships between them which is useful for attention tasks downstream. Small Clarification: In the original paper “Attention is all you Need”, authors came up with Positional Encoding. It’s not learned it’s a fixed function (based on sine and cosine) you add to the input embeddings. In our GPT, we use Positional Embedding. More Intuition: For the example"Akhil plays football."Positional embeddings evolve such that: pos[0] → helps identify "Akhil" as the subjectpos[1] → contributes to verb detectionpos[2] → contributes to object predictionQuestions to ask before coding: Q. What is the Input for Positional Embedding ? A. Int[Tensor, ‘batch position’] Here [batch, position] represent dimensions. position refers to token postion.Q. What is the Output for Positional Embedding ? A. Float[Tensor, ‘batch seq_len d_model’]Returns corresponding embedding vectors in the above shape.class PosEmbed(nn.Module): def __init__(self, cfg: Config): super().__init__() self.cfg = cfg self.W_pos = nn.Parameter(t.empty((cfg.n_ctx, cfg.d_model))) nn.init.normal_(self.W_pos, std=self.cfg.init_range) def forward(self, tokens: Int[Tensor, "batch position"]) -> Float[Tensor, "batch position d_model"]: batch, seq_len = tokens.shape return einops.repeat(self.W_pos[:seq_len], "seq d_model -> batch seq d_model", batch=batch) Here n_ctx is the context length of our model. That means at any given time, we will have atmost of n_ctx number token to be positioned. In the forward pass, we slice out the relevant position vectors from our learned embedding matrix, and repeat them across the batch. This gives us a tensor of shape [batch, seq_len, d_model], so each token gets a learnable embedding for its position — which we can then add to the token embeddings. What is a Residual Stream ? The straight path from Embed to UnEmbed from Figure — 2. You can kind of think of this as a central part in a Transformer. Information inside this stream flows forward. By forward, I mean from Embedding stage to UnEmbedding stage. The tokens will be represented with their corresponsing embeddings via the Embedding Table. These Embeddings then enter the Residual Stream. We represent the example Messi is the greatest of all time, inside the residual stream in the following dimensions. [batch_size, seq_len, d_model] ==> [1, 10, 50] (since each token is 50 dimensional vector, and we have 7 tokens. Here we pad the remaining 3 tokens with zeros to maintain dimensions.) Next Steps In general, Input gets sent to LayerNorm Attention Heads, Read Information from this Residual Stream. Attention heads are responsible for moving information within tokens, based on Attention Matrix. (More on this in Attention Section) MLP does explicit read and write operations(new vectors) onto this Residual stream. They can also delete information from Residual Stream. (Will explain more on this in later sections) What is Layer Normalization ? Fundamental reason why we do normalization is to keep the data flowing nicely through gradients without gradients vanishing or exploding. Figure — 5 From Figure — 5 , we can see two hyper parameters. These are Gamma(scaling factor) and beta(shifting factor). We make the values inside our embedding vector in Normalized form. E[x] is mean. Then we allow the model a little bit of room as training progresses for the purpose of Scaling and Shifting. we can see small epsilon in order to avoid division by zero error. Questions to Ask: Q. What does Layer Norm take as input ? A. It takes the residual after attention. [Batch posn d_model]Q. What does it return ? A. It just normalizes existing values on the embedding vector. Doesn’t add anything new. So returns normalized values.Note: dim = -1 denotes perform operations on the last dimension. Here last dimension is d_model. So, we take mean and varience along the embedding vector of each token independently. ### LayerNorm Implementationclass LayerNorm(nn.Module): def __init__(self, cfg: Config ): ### has the x input vector super().__init__() self.cfg = cfg self.w = nn.Parameter(t.ones(cfg.d_model)) ## these are gamma and beta self.b = nn.Parameter(t.zeros(cfg.d_model)) ## learnable def forward(self, residual: Float[Tensor, 'batch posn d_model']) -> Float[Tensor, 'batch posn d_model']: residual_mean = residual.mean(dim = -1, keepdim=True) residual_std = (residual.var(dim = -1, keepdim=True, unbiased=False)+ cfg.layer_norm_eps).sqrt() residual = (residual - residual_mean) / residual_std residual = residual * self.w + self.b return residual Multi Head Attention: Okay. Let’s think in simple terms first. Before talking about Multiple Attention Heads, let us understand what happens in a Single Attention Head. Questions to Ask: Q. What does an Attention Head get as an Input ? A. The Attention head reads what is present in residual stream. i.e., Float [batch seq_len d_model]. From our toy examples, this might be one of the examples like “Messi is the greatest of all time”Q. After the completion if Self Attention Process(from the attention block), what does the output look like ? A. Float[Tensor, ‘batch seq_len d_model’]. The output is still the same example, but there is a lot of information movement. Let’s go through that in detail.Information Movement:(Intuition) Let's take two tokens from above. (for our convenience, we represent each token in 4 dimensions.)Below is the state of embedding vectors before entering the attention block. Messi→ [0.1 0.9 2.3 7.1]greatest → [2.1 4.4 0.6 1.8] Once these tokens enter into Attention block, tokens start to attend to the tokens came before in order to include more context and change its representation. This is called Causal Self Attention. Messi is the greatest of all time. In this example, when greatest wants to encode some context inside of it, it can only use the context from the words Messi, is and the. From these words, the representation of greatest changes. After Attention,Messi→ [0.1 0.9 2.3 7.1]greatest → [0.2 1.1 0.6 1.8] (changed representation) what does that mean ? Look at the greatest vector. Now It represents some “Messi” inside of it. This is kinda like while constructing the embedding vector for “greatest” it is referring to a bit of Messi. This is what is information Movement. But still, we want to know how this process exactly happens. Let me introduce few Matrices which are important in this process. In the literature, these are named as Queries, Keys and Values. Q = Input * Wq K = Input * WkV = Input * WvHere Input is our example “Messi is the greatest of all time”. The Idea of Q, K and V are to do linear transformation of Input into a different space where these Inputs are represented in more meaningful way. Let's see the dimensions of these matrix multiplications on our toy example. Input/redisual = [1 10 50] [batch seq d_model] “Wq” matrix dimension depends on how many heads we want to have in our model. This is a very important statement because, if we decide to have only one attention head, then we can have Wq = [n_head, d_model, d_model] ==> [1, 50,50]If we decided to have n_heads, then the dimension will be [d_model/n_heads]. We represent this quantity as d_head. So, if we want to have 5 heads, The dimensions of Wq will be [n_head, d_model, d_head] ==> [5, 50, 10]. Lets say we want to have 5 heads, then Q = [1 10 50] * [5, 50, 10] ==>[batch seq d_model] * [n_head d_model d_head] ==> [1 10 5 10] [Batch seq_len n_head d_head] The extra dimension in the beginning is batch. For Q, K and V we will clearly see how all of this fits together in a diagram. The same applies for K and V matrices. First let’ s talk about K. K = [1 10 50] * [5 50 10] ==> [1 10 5 10] Attention is calculated by multiplying the Matrices Q and K. Remember, Attention matrix will always be a Square Matrix. Please look at the diagram I made. I tried to communicate what do those dimensions even mean. Look at the left part. [Batch seq d_head d_model] Figure — 3 I took two example sentences. 1. I good 2. You badIn Left representation, for one batch we are having two examples. Of those two examples, we have 2 seq tokens per example. For each token, we have all the heads which is like full d_model dimension. we are not computing attention per head. But, we want it such that for every batch , and for every head we want those tokens to be represented by different Attention heads parallely. Right side of the representation helps in that. That is the reason while computing attention, we permute the shapes. (hope this helps!!!) Note: Don’t worry all of this transformation can be done very intuitively through einsum. you will see this in coding. Now that we have understood, how attention is computed, let’s get back to our Messi example. Earlier we talked about how “greatest” would attend to “Messi”. we get a [10,10] matrix of all words of our toy example attending to all the other words. Here After getting the attention matrix, we apply causal masking to prevent words attending to future words. “greatest” cannot attend to “time”. After that, we apply Softmax on Attention Matrix. Softmax gives us a score that would sum to 1 along that row. For the word greatest, it would tell how much percent it should attend to “Messi”, how much to “is” and How much to “the” and itself.I took some other example from google to make things visually simple. You can easily connect this with our Messi example. Figure — 4 Once this is done, next step is to Multiply this matrix with our Value vector. As I discussed above, value vector will also be nothing but linear transformation of Input to another Space.V = Input * Wv ==> [1 10 50] * [5 50 10] ==> [1 10 5 10] Z = V * A==> [batch seq_len n_head d_head] * [batch n_head q_seq k_seq] ==> [1 10 5 10] * [1 5 10 10] ==> [1 10 5 10] [batch seq_len n_head d_head] Again, once you look at einsum code, this is self explanatory. Z is the output from 1 head. We stack all these outputs of [1 10 5 10] horizontally. There are 5 heads. so the result becomes [1 10 5 50]. The concatenation of all these heads is then multiplied with one final Output Matrix(Wo) which can be intuitively thought of as learning to represent how to combine all these outputs from different heads. (Z from all the heads) * Wo [1 10 5 50] * [5 10 50][n_head d_head d_model]==> [1 10 50] This is how information is moved in between tokens. I know there are a lot of dimensions here, but this is the core part. once you grab gist of it, everything looks straighforward. Now the information is moved inside the Residual Stream. Look at the code for Implementing Attention below. There are bias initializations which are self explanatory. Note: I use “posn” and “seq_len” interchangeably. They are the same. Implementation details: Regarding implementing causal mask is tril and triu functions in PyTorch. please look at them as they are straightforward. Register buffer is the process of creating temperory parameters that doesnt require gradient tracking. They give nice functionality of moving between CPU and GPU if we register them with PyTorch provided buffer. class Attention(nn.Module): ### register your buffer here IGNORE: Float[Tensor, ''] def __init__(self, cfg: Config): super().__init__() self.cfg = cfg self.W_Q = nn.Parameter(t.empty((cfg.n_heads, cfg.d_model, cfg.d_head))) self.W_K = nn.Parameter(t.empty((cfg.n_heads, cfg.d_model, cfg.d_head))) self.W_V = nn.Parameter(t.empty((cfg.n_heads, cfg.d_model, cfg.d_head))) self.W_O = nn.Parameter(t.empty((cfg.n_heads, cfg.d_head, cfg.d_model))) self.b_Q = nn.Parameter(t.zeros((cfg.n_heads, cfg.d_head))) self.b_K = nn.Parameter(t.zeros((cfg.n_heads, cfg.d_head))) self.b_V = nn.Parameter(t.zeros((cfg.n_heads, cfg.d_head))) self.b_O = nn.Parameter(t.zeros((cfg.d_model))) nn.init.normal_(self.W_Q, std=self.cfg.init_range) nn.init.normal_(self.W_K, std=self.cfg.init_range) nn.init.normal_(self.W_V, std=self.cfg.init_range) nn.init.normal_(self.W_O, std=self.cfg.init_range) self.register_buffer('IGNORE', torch.tensor(float('-inf'), dtype=torch.float32, device = device)) # mention device also def forward(self, normalized_resid_pre: Float[Tensor, 'batch pos d_model']) -> Float[Tensor, 'batch pos d_model']: ### calculate query, key and value vectors and go according to the formula q = ( einops.einsum( normalized_resid_pre, self.W_Q, "batch posn d_model, nheads d_model d_head -> batch posn nheads d_head" ) + self.b_Q ) k = ( einops.einsum( normalized_resid_pre, self.W_K, "batch posn d_model, nheads d_model d_head -> batch posn nheads d_head" ) + self.b_K ) v = ( einops.einsum( normalized_resid_pre, self.W_V, "batch posn d_model, nheads d_model d_head -> batch posn nheads d_head" ) + self.b_V ) attn_scores = einops.einsum( q, k, "batch posn_Q nheads d_head, batch posn_K nheads d_head -> batch nheads posn_Q posn_K" ) attn_scores_masked = self.apply_causal_mask(attn_scores / self.cfg.d_head**0.5) attn_pattern = attn_scores_masked.softmax(-1) # Take weighted sum of value vectors, according to attention probabilities z = einops.einsum( v, attn_pattern, "batch posn_K nheads d_head, batch nheads posn_Q posn_K -> batch posn_Q nheads d_head" ) # Calculate output (by applying matrix W_O and summing over heads, then adding bias b_O) attn_out = ( einops.einsum(z, self.W_O, "batch posn_Q nheads d_head, nheads d_head d_model -> batch posn_Q d_model") + self.b_O ) return attn_out def apply_causal_mask() -> Float[Tensor, self, attn_scores: Float[Tensor, "batch n_heads query_pos key_pos"] "batch n_heads query_pos key_pos"]: """ Applies a causal mask to attention scores, and returns masked scores. """ # Define a mask that is True for all positions we want to set probabilities to zero for all_ones = t.ones(attn_scores.size(-2), attn_scores.size(-1), device=attn_scores.device) mask = t.triu(all_ones, diagonal=1).bool() # Apply the mask to attention scores, then return the masked scores attn_scores.masked_fill_(mask, self.IGNORE) return attn_scores Imp takeaway What information we copy depends on the source token’s residual stream, but this doesn’t mean it only depends on the value of that token, because the residual stream can store more information than just the token identity (the purpose of the attention heads is to move information between vectors at different positions in the residual stream).What does that mean ? Messi is the greatest of all timeSo when greatest is referring/Attending back to Messi, it doesn’t just see the value Messi. Residual stream stores much more than just the identity. It refers to things likeMessi is a subject.Messi is a person etc., All of this is stored in the residual stream. Now Input goes into MLP. Multi — layer Perceptron (MLP Layer) This is very important layer. 2/3rds of Model’s parameters are MLPs. These are responsible for Non — Linear Transformation of given Input vectors. The main Intuition of this layer is to form rich projections. To store facts.There is a very Intuitive video made by 3 blue 1 brown about this. It’s a must watch. https://www.youtube.com/watch?v=9-Jl0dxWQs8&t=498s Intuition You can loosely think of the MLP as working like a Key → Value function, where: Input = “Key” (what token currently holds in residual stream)Output = “Value” (what features we want to add to the residual stream) For Example Key = token’s current context vector coming from the residual stream. It Represents the meaning of the token so far (including attention context) Value = non-linear mix of learned featuresCould be:1. “This is a named entity”2. “This clause is negated”3. “A question is being asked”4. “Boost strength-related features”5. “Trigger next layer’s copy circuit”So the MLP says: “Oh you’re a token that’s the subject of a sentence AND you were just negated? Cool. Let me output features relevant to that situation.” Hope you got the intuition. The first hidden layer has 3072 neurons. we call this as d_mlp and have declared it in our config. Also the 2nd hidden layer projects these back to d_model space. These have been shown as W_in and W_out in the code. We use GeLU Non linearity.class MLP(nn.Module): def __init__(self, cfg: Config): super().__init__() self.cfg = cfg self.W_in = nn.Parameter(t.empty(cfg.d_model, cfg.d_mlp)) self.b_in = nn.Parameter(t.zeros(cfg.d_mlp)) self.W_out = nn.Parameter(t.empty(cfg.d_mlp, cfg.d_model)) self.b_out = nn.Parameter(t.zeros(cfg.d_model)) nn.init.normal_(self.W_in, std=self.cfg.init_range) nn.init.normal_(self.W_out, std=self.cfg.init_range) def forward(self, normalized_resid_mid: Float[Tensor, 'batch posn d_model']): ## Its going to do per token level matmul pre = einops.einsum(normalized_resid_mid, self.W_in, 'batch posn d_model,d_model d_mlp->batch posn d_mlp') + self.b_in post = gelu_new(pre) mlp_out = einops.einsum(pre, self.W_out, 'batch posn d_mlp, d_mlp d_model->batch posn d_model') + self.b_out return mlp_out With this, we completed one layer of what we call Transformer Block. There are 12 such layers in GPT-2. Also there are 12 attention heads in GPT that we are implementing. Therefore n_heads = 12 and n_layers = 12. These have already been coded in the config. Our GPT model contains (d_model) 768 dimensions and a vocabulary(d_vocab) of over 50257 tokens. So this Transformer block is repeated 12 times. Code for TransformerBlock is just connecting ( LayerNorm + Attention + MLP & Skip Connections). class TransformerBlock(nn.Module): def __init__(self, cfg: Config): super().__init__() self.cfg = cfg self.ln1 = LayerNorm(cfg) self.attn = Attention(cfg) self.ln2 = LayerNorm(cfg) self.mlp = MLP(cfg) def forward(self, resid_pre: Float[Tensor, 'batch posn d_model']) -> Float[Tensor, 'batch posn d_model']: resid_mid = self.attn(self.ln1(resid_pre)) + resid_pre ### skip connection resid_post = self.mlp(self.ln2(resid_mid)) + resid_mid return resid_post Here skip connections are nothing but adding input directly into the Residual Stream along with Attention and MLP. resid_pre says the residual before normalization, which is raw input. resid mid is the residual after attention and it again gets added. This is done inorder to stabilize training for large amount of time. UnEmbed UnEmbed Matrix is when you want to map the learned representations back to the probability over all the tokens in vocab. Questions to Ask: Q. What input does it take? A. Residual Stream token vector. [batch posn d_model]Q. What does it give out? A. It gives out probability of tokens that are likely given current token. i.e.,a matrix of size [batch posn d_vocab]look at logits for how precisely it is calculated. class UnEmbed(nn.Module): def __init__(self,cfg:Config): super().__init__() self.cfg = cfg self.W_U = nn.Parameter(t.empty(cfg.d_model, cfg.d_vocab)) nn.init.normal_(self.W_U, std=self.cfg.init_range) self.b_U = nn.Parameter(t.zeros((cfg.d_vocab), requires_grad=False)) def forward(self, normalized_resid_final: Float[Tensor, 'batch posn d_model']) -> Float[Tensor, 'batch pos d_vocab']: logits = einops.einsum(normalized_resid_final, self.W_U, 'batch posn d_model, d_model d_vocab -> batch posn d_vocab') + self.b_U return logits Transformer Finally we arrive at the last part. Here, we just need to put all the components we have seen together. Let’s do that !!! class Transformer(nn.Module): def __init__(self, cfg: Config): super().__init__() self.cfg = cfg self.embed = Embed(cfg) self.posembed = PosEmbed(cfg) self.blocks = nn.ModuleList([TransformerBlock(cfg) for _ in range(cfg.n_layers)]) self.ln_final = LayerNorm(cfg) self.unembed = UnEmbed(cfg) def forward(self, tokens: Int[Tensor, 'posn']) -> Float[Tensor, 'batch posn d_vocab']: residual = self.embed(tokens) + self.posembed(tokens) for block in self.blocks: residual = block(residual) logits = self.unembed(self.ln_final(residual)) return logits Here we go from taking tokens as input to calling residual/transformer blocks for 12 times. Implementation detail: Since all the Transformer blocks have their own parameters to be tracked, we need to define them in ModuleList. This is proper way of Initializing a list of blocks we need.Each block will take input from Residual Stream, will learn and contribute their learnings to Residual Stream. Thats it Guys!!!!! Hope you have gained a ton of Knowledge on how to build your own GPT. Support and Follow me for more cool blogs! Thanks to Neel Nanda and Callum McDougall !!!! I have learnt a lot from their Materials and Videos. This blog is inspired from their work. Connect with Me on: https://www.linkedin.com/in/akhilshekkari/ Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI0 Σχόλια 0 Μοιράστηκε 58 Views
-
TOWARDSAI.NETInside the MCP Revolution: How AI Systems Are Learning to Speak the Same LanguageInside the MCP Revolution: How AI Systems Are Learning to Speak the Same Language 0 like April 22, 2025 Share this post Last Updated on April 22, 2025 by Editorial Team Author(s): Harshit Kandoi Originally published on Towards AI. Photo by Gerard Siderius on Unsplash Imagine a network of AI systems consisting of virtual assistants, recommendation engines, and robotic agents, all working on their own. But not “in sync”. Each time you interact with one, you have to start from scratch, unaware of your prior choices, recent interactions, or even the idea on which it operates. The result? Unnecessary processes, inconvenient experiences, and missed the chance to enjoy true machine automation. This is the price we have to pay for context loss, and it’s become a pressing challenge in today’s AI-driven world. Let’s Enter the World of Model Context Protocol (MCP), an innovative way that promises to restructure how AI systems interact and collaborate. MCP is a standardized framework created to allow the sharing of contextual data across models, ensuring continuity, coherence, and connectivity in these complex AI ecosystems. Why does this matter now, compared to ever before? As we know, AI becomes more embedded in everything from health services to autonomous systems, the need for intelligent context-sharing is not just a technical convenience, but it’s a fundamental requirement. Without it, even the most powerful AI models operate in silos, unable to utilise collective knowledge or maintain user continuity. In this blog, we’ll… Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post0 Σχόλια 0 Μοιράστηκε 48 Views
-
TOWARDSAI.NETDeepSeek R1: Pioneering Research and Engineering as a Competitor to Pure Scaling ApproachesDeepSeek R1: Pioneering Research and Engineering as a Competitor to Pure Scaling Approaches 0 like April 21, 2025 Share this post Last Updated on April 21, 2025 by Editorial Team Author(s): Nehdiii Originally published on Towards AI. Dr Vegaounk from One Piece anime image generated with ChatGPT DeepSeek-R1 landed unexpectedly just as many researchers, myself included, were attempting to reverse-engineer OpenAI’s o1 model. It revealed the inner workings of o1 and dispelled the myth that revolutionary algorithms were being developed in secret. Rather than simply releasing a model, DeepSeek provided a comprehensive paper detailing its algorithms, architecture, and training approach. The models were made open-source and freely accessible, although the dataset remains undisclosed. In an era where leading AI labs are tightening access to research due to growing competition, DeepSeek opted for transparency over secrecy. What’s even more remarkable is the global impact DeepSeek-R1 had. Many referred to it as a Sputnik moment. Initially, I assumed the hype was confined to academic and research communities — but I was wrong. It sent shockwaves through the entire U.S. economy, erasing $1 trillion from the stock market and causing the largest drop in Nvidia’s history — losing $600 billion in market value. The momentum didn’t end there. DeepSeek R1 became the most-downloaded free app on the App Store, even surpassing ChatGPT. Friends and family began reaching out, trying to understand what was going on. The scale of the impact exceeded all… Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post0 Σχόλια 0 Μοιράστηκε 40 Views
-
TOWARDSAI.NETThe Silent Backbone: Why Traditional Machine Learning Still Matters in the AI EraAuthor(s): Yuval Mehta Originally published on Towards AI. Photo by Andrea De Santis on Unsplash In a world increasingly enamored by the shimmering promise of generative AI, it’s easy to forget the models that quietly power much of the technology we rely on every day. The glitz of ChatGPT crafting essays or DALL·E spinning art from text has, for many, overshadowed the more unassuming forms of artificial intelligence — those that don’t talk, draw, or compose, but simply decide: Will this customer churn?Is this transaction fraudulent?How much stock should we order next week? These aren’t the kinds of problems where you need a massive transformer model. They are about precision, predictability, and often, explainability. And they are solved remarkably well by the quieter, older siblings of the AI family: traditional machine learning models. The Quiet Strength of Simplicity Beneath the surface of this generative renaissance, traditional machine learning continues to thrive. Not because it’s old-fashioned, but because it’s incredibly good at what it does. There’s a reason why the best data science teams at top-tier companies still rely on logistic regression, XGBoost, and decision trees. It’s not resistance to innovation, it’s recognition of what works. These models are lightweight, effective, and interpretable. You don’t need billions of parameters and terabytes of data to get results. Sometimes, all you need is a clean dataset and a tried-and-tested classifier. AI generated Image from Napkin AI The Data Most Businesses Care About Let’s face it: much of the world’s data isn’t text, image, or video. It’s tables. It’s rows and columns.It’s structured, clean, and curated.According to a 2024 McKinsey report, over 70% of enterprise AI deployments focus on structured data. From banks to hospitals, manufacturing plants to marketing teams, this kind of data forms the operational heartbeat of organizations. And in this structured world, traditional ML shines. You don’t need a 175-billion-parameter model to predict monthly revenue or catch anomalies in server logs. In fact, trying to use one would likely waste compute, time, and money. Interpretability Is Not Optional The beauty of traditional ML lies in its transparency. These models: Train fast (even on a laptop) Are interpretable and auditable Can be easily explained to non-technical stakeholders Try explaining the hidden layers of a transformer to a CFO.Then show them a decision tree with feature importances.Guess which one gets a nod of approval? In sectors like healthcare, finance, and law, where accountability and traceability are legally mandated, traditional ML’s interpretability becomes more than a convenience — it becomes a requirement. Even LLMs Rely on Classical ML Ironically, many LLM pipelines depend on traditional ML under the hood. Tasks like: Intent classification Spam filtering Ranking responses Personalization layers …are often handled by smaller, faster models. So while generative AI gets the spotlight, traditional ML is often doing the heavy lifting backstage. For example, OpenAI’s GPT-based systems frequently use retrieval-augmented generation (RAG), where a traditional vector store is queried using embeddings to retrieve context. The ranking of those results? You guessed it: often powered by traditional ML models. AI generated Image from Napkin AI Cost, Control, and Practicality Not every team has the budget for cloud GPUs or the need to fine-tune massive language models. Sometimes, a well-engineered LightGBM model trained on a few thousand examples delivers more ROI than an entire transformer stack. With traditional ML, you get: Lower training and inference costs Fine-grained feature engineering control Better compliance and governance fit Easier deployment on edge devices In a time when sustainability and carbon emissions are gaining attention in AI development, traditional ML models offer an eco-friendly alternative. The Hybrid Future This isn’t a battle of old vs. new. The most powerful AI systems will be hybrid — combining: The brute strength of generative AI With the surgical precision of classical ML AI generated Image from Napkin AI Imagine an e-commerce platform using a fine-tuned LLM to generate product descriptions, but relying on traditional ML to handle demand forecasting, supply chain optimization, and user segmentation. The future belongs to those who can wield both swords. Final Thoughts Just because a tool is shiny and new doesn’t mean it’s the right one for every job. Traditional ML: Solves real-world problems It is cost-effective and explainable Integrates seamlessly with modern AI stacks As we continue to push the boundaries of what AI can do, let’s not forget the models that already do so much. So next time you’re faced with a machine learning problem, ask yourself: “Do I need a generative model… or just a good old decision tree?” Chances are, the quiet classics still have your back. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI0 Σχόλια 0 Μοιράστηκε 50 Views
-
TOWARDSAI.NETA Practical Guide to Evaluating RAG Systems: Metrics That MatterLatest Machine Learning A Practical Guide to Evaluating RAG Systems: Metrics That Matter 0 like April 21, 2025 Share this post Last Updated on April 21, 2025 by Editorial Team Author(s): Ajit Kumar Singh Originally published on Towards AI. Image by Author Retrieval-Augmented Generation (RAG) revolutionizes how language models ground their answers in external data. By combining a retriever that fetches relevant information from a knowledge base and a generator that creates responses using that information, RAG systems enable more accurate and trustworthy outputs. But how do you evaluate a RAG system? How do you know if it’s retrieving the right context or generating reliable answers? This guide breaks it all down with practical metrics, worked examples, and actionable insights. RAG System Overview Two core components: Retriever: Pulls relevant chunks of information (context) from a vector database.Generator: Uses the context to generate a coherent, factual response. Each stage needs its own set of metrics for proper evaluation. Let’s explore them. The retriever is the first critical component in any RAG (Retrieval-Augmented Generation) system. Its job? To fetch the most relevant and helpful pieces of information from a vector database in response to an input query. To assess how well it’s doing, we rely on three core metrics: Contextual PrecisionContextual RecallContextual Relevancy Let’s explore each one, starting with Contextual Precision. Contextual Precision measures whether the most relevant context nodes (document chunks) are ranked higher than irrelevant ones. It’s not just about what was retrieved, but how well it was ranked.A high Contextual… Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post0 Σχόλια 0 Μοιράστηκε 43 Views
-
TOWARDSAI.NETThe Great Disconnect: Why Talking to Machines Still Feels Like Talking to MachinesThe Great Disconnect: Why Talking to Machines Still Feels Like Talking to Machines 0 like April 21, 2025 Share this post Author(s): MKWriteshere Originally published on Towards AI. From Clippy to ChatGPT: The Thirty-Year Quest to Solve AI’s Hardest ProblemImage Generated by Author Using Gpt-4o (Non-Member Link) In 1995, Gartner published its first “Hype Cycle” report — a visual representation of how technologies evolve from innovation to widespread adoption. Right at the top of this inaugural curve, at the “Peak of Inflated Expectations,” sat “Intelligent Agents.” Three decades later, we’re still riding this rollercoaster of anticipation and disappointment with AI assistants. Imagine the journey as a dramatic mountain trek: enthusiastic climbers rush up the slopes of hype, tumble down into the “Trough of Disillusionment,” then gradually ascend the “Slope of Enlightenment” toward the “Plateau of Productivity.” This is precisely the path AI assistants have followed multiple times. Few developments have experienced as many ups and downs as artificial intelligence assistants. From the much-maligned Clippy to the sophisticated ChatGPT, we’ve witnessed these digital helpers ride waves of hype only to crash against the shores of reality. Yet with each cycle, they’ve grown more capable, inching ever closer to the science fiction dream of a truly intelligent digital companion. Image Source : ReasearchGate Website Let’s unpack this fascinating evolution and understand why contextual reasoning remains the final frontier for AI. Image Source : Microsoft Clippy Cast your mind back to 1997. Microsoft Office users were suddenly introduced to an animated… Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post0 Σχόλια 0 Μοιράστηκε 68 Views
-
TOWARDSAI.NETFine-Tuning Language Models for Business: Making Large Language Models Truly YoursFine-Tuning Language Models for Business: Making Large Language Models Truly Yours 0 like April 20, 2025 Share this post Last Updated on April 21, 2025 by Editorial Team Author(s): Janahan Sivananthamoorthy Originally published on Towards AI. Generated by: Grok/X Hi there!If you are a member, just scroll and enjoy the post!Not a member? click the link here to enjoy the full article. You know how I was totally geeking out about AI in my last couple of posts? We went down some rabbit holes, from how Large Language Models (LLMs) could be a game-changer in Enterprise Java setups to the seriously cool potential of Agentic AI. And Small Language Models (SLMs) — I was practically shouting from the rooftops about how they could be a big win for businesses. But after all that exploring, a big question just kept popping into my head: how do we take these super-smart AI brains and really mold them into our own intelligent tools? Tools that actually get the quirky way company does things? Maybe customer support has this really empathetic and understanding tone, even in tricky situations — could AI learn that? Well, it turns out, there are a couple of seriously clever tricks to make these AI brains way more attuned to what we need: fine-tuning and Retrieval-Augmented Generation (RAG). Think of fine-tuning as basically giving the AI our company’s specific homework so it learns our unique style, potentially leading to… Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post0 Σχόλια 0 Μοιράστηκε 78 Views
-
TOWARDSAI.NETRAGent: A Multi-Agent PDF Whisperer Built on LangChain + LangGraphAuthor(s): Dwaipayan Bandyopadhyay Originally published on Towards AI. Retrieval Augmented Generation is a very well-known approach in the field of Generative AI, which usually consists of a linear flow of chunking a document, storing it in a vector database, then retrieving relevant chunks based on the user query and feeding that to an LLM to get the final response. In recent times, the term “Agentic AI” has been taking the internet by storm, in simple terms it refers to break down a problem into smaller sections and assigning it to certain “agents” who are capable of handling a certain task, and combining smaller agents like that to build a complex workflow. What if we combine this Agentic Approach and Retrieval Augmented Generation? In this article, we will explain a similar concept/architecture we developed using LangGraph, FAISS and OpenAI. Source : Image by Author We will not explore AI Agents and how they work in this article; otherwise, this would become a full-fledged book. But to give a brief overview of what “AI Agents” are, we can consider an “AI Agent” as an assistant, someone or something that is a master in one particular task, multiple agents with multiple capabilities are being added together to make a full Graphical Agentic Workflow, where each agents may communicate with each other, can understand what response the previous agent returned etc. In our approach, we divided the concept of “Retrieval Augmented Generation” into three different tasks and created agent for each task which are capable of handling one specific task, one agent will look into the Retrieval Part, whereas the other will look into the Augmentation Part, and finally the last agent will look into the Generation Part. Then we have combined all three agents to make a complete end-to-end agentic workflow. Let’s dive deep into the coding section. Coding Section Starts Firstly, we will install all the necessary packages required. The best practice would be to create a virtual environment first, and then install the following packages. After they are installed successfully, we will import all the necessary packages to create the Retriever agent first. Coding the RetrieverAgent : from langchain_openai import ChatOpenAIfrom langchain_community.vectorstores.faiss import FAISSfrom langchain_openai import OpenAIEmbeddingsfrom langchain.docstore.document import Documentfrom langchain.text_splitter import RecursiveCharacterTextSplitterfrom langchain.prompts import ChatPromptTemplate, MessagesPlaceholderfrom pypdf import PdfReaderimport refrom dotenv import load_dotenvimport streamlit as stload_dotenv()LLM = ChatOpenAI(model_name="gpt-4o", temperature=0.0)def extract_text_from_pdf(pdf_path): try: pdf = PdfReader(pdf_path) output = [] for i, page in enumerate(pdf.pages, 1): text = page.extract_text() text = re.sub(r"(\w+)-\n(\w+)", r"\1\2", text) text = re.sub(r"(?<!\n\s)\n(?!\s\n)", " ", text.strip()) text = re.sub(r"\n\s*\n", "\n\n", text) output.append((text, i)) # Tuple of (text, page number) return output except Exception as e: st.error(f"Error reading PDF: {e}") return []def text_to_docs(text_with_pages): docs = [] text_splitter = RecursiveCharacterTextSplitter(chunk_size=4000, chunk_overlap=200) for text, page_num in text_with_pages: chunks = text_splitter.split_text(text) for i, chunk in enumerate(chunks): doc = Document( page_content=chunk, metadata={"source": f"page-{page_num}", "page_num": page_num} ) docs.append(doc) return docsdef create_vectordb(pdf_path): text_with_pages = extract_text_from_pdf(pdf_path) if not text_with_pages: raise ValueError("No text extracted from PDF.") docs = text_to_docs(text_with_pages) embeddings = OpenAIEmbeddings() return FAISS.from_documents(docs, embeddings)# Define Toolsdef retrieve_from_pdf(query: str, vectordb) -> dict: """Retrieve the most relevant text and page number using similarity search.""" # Use similarity_search to get the top result docs = vectordb.similarity_search(query, k=3) # k=1 for single most relevant result if docs: doc = docs[0] content = f"Page {doc.metadata['page_num']}: {doc.page_content}" page_num = doc.metadata["page_num"] return {"content": content, "page_num": page_num} return {"content": "No content retrieved.", "page_num": None}RETRIEVE_PROMPT = ChatPromptTemplate.from_messages([ ("system", """), You are the Retrieve Agent. Your task is to fetch the most relevant text from a PDF based on the user's query. - Use the provided retrieval function to get content and a single page number. - Return the content directly with the page number included (e.g., 'Page X: text'). - If no content is found, return "No content retrieved." """ MessagesPlaceholder(variable_name="chat_history"), ("human", "{query}"),]) Explanation of the Code – In this retriever agent code, firstly, we are importing all the necessary modules and classes required, We are storing our credentials, such as OpenAI API Key, in a .env file, which is why the dotenv module has been used here alongside the load_dotenv function call. Next up, we are initialising the LLM by providing required arguments such as model name, temperature, etc. Descriptions of Functions extract_text_from_pdf is being used to read and extract the content of the PDF and cleanse it a bit by fixing hyphenated line breaks, which are causing a word to break into two pieces, converting single newlines into spaces unless they are a part of paragraph spacing, etc. The cleaning process is done page-wise, which is why a loop is applied over the number of pages using the enumerate function. Finally, from this function, the cleansed extracted content is returned alongside its’s pagenumber is returned as a form of list of tuples. If any unwanted error occurs, that too can be handled via the try-except block used; this ensures the code works seamlessly without breaking due to errors. text_to_docs is being used to do Chunking, here the RecursiveCharacterTextSplitter class of the langchain module is being used, each chunk size would be of 4000, and the overlapping would be of 200. Then a loop is being done over the text_with_pages argument, which will receive the output from the previous function, i.e extract_text_from_pdf, as it returns the output in a list of tuples format. Two variables are being used in the loop to consider both items of the tuple. Then the cleansed text is split into chunks and converted into a Document object, which will be further used to convert them into Embeddings. Apart from the page content, the Document object will hold the page number and a string label including the page number as metadata. Each Document will then be appended to a list and returned. create_vectordb This function uses the above two functions to create Embeddings using FAISS (Facebook AI Similarity Search) Vectorstore. It is a lightweight vector store that stores the index locally and helps in doing similarity searches with ease. This function just creates and returns the Vector database. That’s it. retrieve_from_pdf In this function, we are doing the similarity search and getting the top 3 chunks, and if found, then we are considering the first chunk only so that it consists of the most similar content and returning it along with it’s page number as a dictionary. The RETRIEVE_PROMPT is a ChatPromptTemplate consisting of the instruction, i.e System Message for the LLM, mentioning its job as a retriever agent. It will also consider the entire chat history of a particular session and will accept the user query as human input. Coding the Augmentator Agent from langchain.prompts import ChatPromptTemplate, MessagesPlaceholderfrom typing import Optionaldef augment_with_context(content: str, page_num: Optional[int]) -> str: """Augment retrieved content with source context.""" if content != "No content retrieved." and page_num: return f"{content}\n\nAdditional context: Sourced from page {page_num}." return f"{content}\n\nAdditional context: No specific page identified."AUGMENT_PROMPT = ChatPromptTemplate.from_messages([ ("system", """), You are the Augment Agent. Enhance the retrieved content with additional context. - If content is available, append a note with the single page number. - If no content is retrieved, return "No augmented content." """ MessagesPlaceholder(variable_name="chat_history"), ("human", "Retrieved content: {retrieved_content}\nPage number: {page_num}"),]) Explanation of the Functions augment_with_context This is a very straightforward approach where we are looking for some extra information from the provided PDF to solidify the retrieved information by the retrieval agent. If found, the extra content, alongside its page number, will be added to the original retrieved content; otherwise, if both are not found, it will simply return the same original content without any modification The AUGMENT_PROMPT Coding the GeneratorAgent from langchain.prompts import ChatPromptTemplate, MessagesPlaceholderGENERATE_PROMPT = ChatPromptTemplate.from_messages([ ("system", """), You are the Generate Agent. Create a detailed response based on the augmented content. - Focus on DBMS and SQL content. - Append "Source: Page X" at the end if a page number is available. - If the user query consists of terms like "explain", "simple", "simplify" etc. or relatable, then do not return any page number, otherwise return the proper page number. - If the question is not DBMS-related, reply "Not applicable." - Use the chat history to maintain context. """ MessagesPlaceholder(variable_name="chat_history"), ("human", "{query}\nAugmented content: {augmented_content}"),]) The generator agent only consists of the PromptTemplate with the instruction of how to generate final response based on the retrieved content as well as augmented extra information from previous two steps. After all these separate agents are created, it’s time to store them under a single umbrella and form the entire end-to-end workflow using LangGraph. Code for the Graph Creation using LangGraph import streamlit as stfrom langgraph.graph import StateGraph, ENDfrom typing import TypedDict, List, Optionalimport refrom IPython.display import display, Imagefrom retriever import (LLM,extract_text_from_pdf,text_to_docs,create_vectordb,retrieve_from_pdf,RETRIEVE_PROMPT)from augmentation import augment_with_context,AUGMENT_PROMPTfrom generation import GENERATE_PROMPTfrom dotenv import load_dotenvload_dotenv()PDF_FILE_PATH = "dbms_notes.pdf"# Define the Agent Stateclass AgentState(TypedDict): query: str chat_history: List[dict] retrieved_content: Optional[str] page_num: Optional[int] # Single page number instead of a list augmented_content: Optional[str] response: Optional[str]def format_for_display(text): def replace_latex(match): latex_expr = match.group(1) return f"$${latex_expr}$$" # Use $$ for Streamlit Markdown to render LaTeX text = re.sub(r'\\frac\{([^}]+)\}\{([^}]+)\}', r'$\\frac{\1}{\2}$', text) return text# Define Multi-Agent Nodesdef retrieve_agent(state: AgentState) -> AgentState: chain = RETRIEVE_PROMPT | LLM retrieved = retrieve_from_pdf(state["query"], st.session_state.vectordb) response = chain.invoke({"query": state["query"], "chat_history": state["chat_history"]}) #print(retrieved) return { "retrieved_content": retrieved['content'], "page_num": retrieved["page_num"] }def augment_agent(state: AgentState) -> AgentState: chain = AUGMENT_PROMPT | LLM if state["retrieved_content"] and state["retrieved_content"] != "No content retrieved.": # Prepare input for the LLM input_data = { "retrieved_content": state["retrieved_content"], "page_num": str(state["page_num"]) if state["page_num"] else "None", "chat_history": state["chat_history"] } # Invoke the LLM to generate augmented content response = chain.invoke(input_data) augmented_content = response.content # Use the LLM's output else: augmented_content = "No augmented content." return {"augmented_content": augmented_content}def generate_agent(state: AgentState) -> AgentState: chain = GENERATE_PROMPT | LLM response = chain.invoke({ "query": state["query"], "augmented_content": state["augmented_content"] or "No augmented content.", "chat_history": state["chat_history"] }) return {"response": response.content}# Define Conditional Edge Logicdef decide_augmentation(state: AgentState) -> str: if state["retrieved_content"] and state["retrieved_content"] != "No content retrieved.": return "augmentation" return "generation"workflow = StateGraph(AgentState)workflow.add_node("retrieve_agent", retrieve_agent)workflow.add_node("augment_agent", augment_agent)workflow.add_node("generate_agent", generate_agent)workflow.set_entry_point("retrieve_agent")workflow.add_conditional_edges( "retrieve_agent", decide_augmentation, { "augmentation": "augment_agent", "generation": "generate_agent" })workflow.add_edge("augment_agent", "generate_agent")workflow.add_edge("generate_agent", END)agent = workflow.compile()# display(Image(agent.get_graph().draw_mermaid_png(output_file_path="tutor_agent.png")))st.set_page_config(page_title="🤖 RAGent", layout="wide")st.title("🤖 RAGent : Your Personal Teaching Assistant")st.markdown("Ask any question from your book and get detailed answers with a single source page!")# Initialize session state for vector databaseif "vectordb" not in st.session_state: with st.spinner("Loading PDF content... This may take a minute."): try: st.session_state.vectordb = create_vectordb(PDF_FILE_PATH) except Exception as e: st.error(f"Failed to load PDF: {e}") st.stop()# Initialize chat history in session stateif "messages" not in st.session_state: st.session_state.messages = []# Display chat historyfor message in st.session_state.messages: with st.chat_message(message["role"]): st.markdown(message["content"])# User inputuser_input = st.chat_input("Ask anything from the PDF")if user_input: # Add user message to chat history st.session_state.messages.append({"role": "user", "content": user_input}) with st.chat_message("user"): st.markdown(user_input) # Display assistant response with st.chat_message("assistant"): message_placeholder = st.empty() # Prepare chat history for the agent chat_history = [ {"type": "human", "content": msg["content"]} if msg["role"] == "user" else {"type": "ai", "content": msg["content"]} for msg in st.session_state.messages[:-1] # Exclude current input ] # Prepare initial state initial_state = { "query": user_input, "chat_history": chat_history, "retrieved_content": None, "page_num": None, "augmented_content": None, "response": None, # Add field for Ragas sample } # Run the agent with a spinner with st.spinner("Processing..."): final_state = agent.invoke(initial_state) answer = final_state["response"] formatted_answer = format_for_display(answer) # Display response message_placeholder.markdown(formatted_answer) # Update chat history st.session_state.messages.append({ "role": "assistant", "content": formatted_answer }) Explanation of the Code AgentState class — In this class, we are defining a schema that will be enforced on top of the LLM response and the entire “state” will carry this same structure throughout the entire workflow. This will be passed as argument during the StateGraph creation. format_for_display function — This function has a nested function, which will be used to handle LaTeX-based outputs. We are using this because the document may contain fractions which might not be handled by Streamlit properly, so using this as an extra precaution. retrieve_agent function — This will use the retrieve_from_pdf function we defined earlier. Firstly, we will create a chain using the retrieve prompt and LLM. Then, invoke it using the query provided by the user, which is nothing but the user’s question, and also consider the entire chat_history as well, and finally it will return the content and page number. augment_agent function — Here, we will again create a chain using the AUGMENT_PROMPT, this time and check whether the retriever agent returned any content or not. If it returned any content, then we will call the augment_with_context function and pass the retrieved content, page number, as well as the chat_history, then return the content provided by the response. generate_agent function — Here, finally, we are passing the augmented content, user query and chat history so that LLM can leverage the augmented content and generate the final response based on the augmented information and display it to the user. decide_augmentation function — This is an optional step being provided to check whether it is necessary for the augmentation agent to run or not. After all the necessary agents are created, it’s time to combine them to create an end-to-end workflow, which will be done by using the StateGraph class of LangGraph. During initialisation of the StateGraph class, we will pass the AgentState class we defined earlier as its parameter to indicate that during the entire workflow, these are the only keys that will be there in the response, nothing else. Then we are adding the nodes into the StateGraph to create the entire workflow, setting up the entry point manually to make it understand which node will be executed first, adding edges in between the nodes to state how the workflow will look like, adding a conditional edge in between to signify that the node connected with the conditional edge, may or may not be called during the workflow everytime. Finally, compiling the entire workflow to check whether everything is working fine and the graph that has been created is proper or not. We can display the graph using the IPython module and Mermaid ink method. The graph will look like below, if everything goes correctly. Source : Image by Author Then, the rest of the code is entirely Streamlit-based. The user can design the UI according to their choice. We have taken a very basic approach in designing the UI, so that it remains user-friendly. We are considering some session states as well, so to maintain the chat history, user query, etc. This will not start without the user input, meaning that until and unless the user provides any query, the workflow will not start. Screenshots of the Application in Working Condition – Source : Image by Author Source : Image by Author This article has been written in collaboration with Biswajit Das Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI0 Σχόλια 0 Μοιράστηκε 79 Views
-
TOWARDSAI.NETA Novel and Practical Meta‑Booster for Supervised LearningA Novel and Practical Meta‑Booster for Supervised Learning 0 like April 20, 2025 Share this post Last Updated on April 20, 2025 by Editorial Team Author(s): Shenggang Li Originally published on Towards AI. A Stacking‑Enhanced Margin‑Space Framework for Dynamic, Loss‑Driven Ensemble Updates in Classification and RegressionPhoto by Thorium on Unsplash Ensemble methods thrive on diversity, yet most frameworks exploit it sequentially (boosting) or statically (stacking). We introduce Meta‑Booster, a unified system that blends incremental updates — the “deltas” — of several base learners at every boosting step. Built on XGBoost, LightGBM, AdaBoost, and a compact neural network, the method supports both classification and regression. At each round, we: Delta extraction: Capture each learner’s one‑step update — margin increments for classifiers or residual deltas for regressors — to isolate its immediate predictive gain.Stacked combination: Solve a constrained regression on the held‑out set to derive a weight vector that best explains the current residuals, allowing contributions from all learners simultaneously.Iterative update: Apply the weighted delta with an optimal learning rate found via line‑search, producing a greedy, loss‑driven ensemble evolution that adapts to the task. Unlike static stacking, where weights are fixed or full‑model outputs are averaged, Meta‑Booster tweaks the blend a little at every round, always chasing a better validation score. This dynamic scheme not only lifts accuracy (log‑loss, AUC) and precision (MAPE, RMSE) but also shows which learner is pulling its weight at each step. Tests on car‑price and credit‑risk datasets confirm: margin stacking drives classification, residual stacking powers regression…. Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post0 Σχόλια 0 Μοιράστηκε 68 Views
-
TOWARDSAI.NETYour Data is the New Currency: Are You Protecting It?Your Data is the New Currency: Are You Protecting It? 0 like April 20, 2025 Share this post Author(s): Harshit Kandoi Originally published on Towards AI. Photo by micheile henderson on Unsplash Picture this: you book a movie ticket online — easy, right? A few clicks, a confirmation email, done. But then the next day, your inbox is overflowing with movie posters and deals. Your Instagram feed suddenly shows popcorn offers and upcoming movie trailers. Even your Alexa recommends a “Movie Night” playlist. What just happened? That harmless transaction was actually a data drop. One moment of your online activity turned into fuel for dozens of algorithms. Your interest in that one film became a breadcrumb in the massive digital trail you’re leaving behind casually every day. In 2025, every scroll, like, tap, and even voice command feeds an environment that knows more about you than you might expect. This isn’t some kind of target advertising anymore, it’s a full-fledged ecosystem powered by our personal data. On platforms like X, users are openly calling it “surveillance capitalism,” where your habits are tracked, sold, and resold like assets on a stock exchange. Hashtags like #DataEconomy and #PrivacyFail trend daily, and for good reason. But this goes far deeper than ads. Your data isn’t just being used to sell your data tracks, but it’s being used to predict your day-to-day life, learns your… Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post0 Σχόλια 0 Μοιράστηκε 91 Views
-
TOWARDSAI.NETBeyond Search: 86.4% MMLU, 77.6 MTEB, and the New Architecture of Policy UnderstandingLatest Machine Learning Beyond Search: 86.4% MMLU, 77.6 MTEB, and the New Architecture of Policy Understanding 0 like April 20, 2025 Share this post Last Updated on April 20, 2025 by Editorial Team Author(s): R. Thompson (PhD) Originally published on Towards AI. “Amidst the proliferation of generative technologies, the true constraint remains epistemic access — especially within public systems.” The corpus of legal, regulatory, and policy documents maintained by governments and NGOs has grown into a dense, heterogeneous ecosystem. These documents, often drafted in domain-specific language and archived across disparate formats such as PDFs, scanned text, and fragmented HTML, pose a formidable barrier to access and interpretation. For administrators, legal personnel, and constituents, the task of retrieving pertinent clauses or aligning practices with current mandates is fraught with inefficiencies, ambiguity, and latency. A 2019 report by McKinsey quantified the magnitude of this issue, noting that up to 30% of a public employee’s time is spent locating internal information. In domains governed by high regulatory volatility or compliance sensitivity, this inefficiency is not merely inconvenient — it is structurally incapacitating. The imperative for a cognitively intelligent interface between users and policy repositories is now self-evident. This article introduces a systematized retrieval and reasoning framework: the Smart Policy Search Engine. Architected using LangChain’s composability, BGE-M3’s multilingual embedding prowess, ChromaDB’s high-dimensional indexing, and GPT-4’s generative fidelity, the engine serves as a neuro-symbolic bridge between unstructured regulatory texts and human queries. Unlike conventional search systems that rely on keyword density… Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post0 Σχόλια 0 Μοιράστηκε 68 Views
-
TOWARDSAI.NETHave o1 Models Solved Human Reasoning?Latest Machine Learning Have o1 Models Solved Human Reasoning? 0 like April 19, 2025 Share this post Last Updated on April 19, 2025 by Editorial Team Author(s): Nehdiii Originally published on Towards AI. Image Generated By ChatGPT OpenAI made waves in the AI community with the release of their o1 models. As the excitement settles, I feel it’s the perfect time to share my thoughts on LLMs’ reasoning abilities, especially as someone who has spent a significant portion of my research exploring their capabilities in compositional reasoning tasks. This also serves as an opportunity to address the many “Faith and Fate” questions and concerns I’ve been receiving over the past year, such as: Do LLMs truly reason? Have we achieved AGI? Can they really not solve simple arithmetic problems? The buzz around the o1 models, code-named “strawberry,” has been growing since August, fueled by rumors and media speculation. Last Thursday, Twitter lit up with OpenAI employees celebrating o1’s performance boost on several reasoning tasks. The media further fueled the excitement with headlines claiming that “human-like reasoning” is essentially a solved problem in LLMs. Without a doubt, o1 is exceptionally powerful and distinct from any other models. It’s an incredible achievement by OpenAI to release these models, and it’s astonishing to witness the significant jump in Elo scores on ChatBotArena compared to the incremental improvements from other major players. ChatBotArena continues to be the leading platform for… Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post0 Σχόλια 0 Μοιράστηκε 67 Views
-
TOWARDSAI.NETImportant LLM Papers for the Week From 07/04 to 14/04Latest Machine Learning Important LLM Papers for the Week From 07/04 to 14/04 0 like April 19, 2025 Share this post Last Updated on April 19, 2025 by Editorial Team Author(s): Youssef Hosni Originally published on Towards AI. Stay Updated with Recent Large Language Models Research Large language models (LLMs) have advanced rapidly in recent years. As new generations of models are developed, researchers and engineers need to stay informed on the latest progress. This article summarizes some of the most important LLM papers published during the Second Week of April 2025. The papers cover various topics shaping the next generation of language models, from model optimization and scaling to reasoning, benchmarking, and enhancing performance. Keeping up with novel LLM research across these domains will help guide continued progress toward models that are more capable, robust, and aligned with human values. LLM Progress & Technical ReportsLLM ReasoningLLM Training & Fine-TuningAI AgentsVision Language Models Most insights I share in Medium have previously been shared in my weekly newsletter, To Data & Beyond. If you want to be up-to-date with the frenetic world of AI while also feeling inspired to take action or, at the very least, to be well-prepared for the future ahead of us, this is for you. 🏝Subscribe below🏝 to become an AI leader among your peers and receive content not present in any other platform, including Medium: Data Science, Machine Learning, AI, and what is beyond them. Click to read To Data & Beyond, by Youssef Hosni, a… youssefh.substack.com We present Kimi-VL, an… Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post0 Σχόλια 0 Μοιράστηκε 78 Views
-
TOWARDSAI.NETAI Agent Software: The Future of Coding ToolsAuthor(s): Talha Nazar Originally published on Towards AI. Image by Author from AI Imagine a world where software writes itself — the lines between developer and machine blur into a seamless dance of creation. This isn’t a distant sci-fi fantasy; it’s the revolution unfolding. With generative AI advancing at breakneck speed, the very foundation of software development is being rewritten. Enter agent software engines — the game-changing force turning code into a collaborative conversation between human ingenuity and artificial intelligence. This isn’t just an upgrade — it’s a paradigm shift. Traditional coding is evolving into a dynamic partnership, tech roles are being redefined, and a new wave of innovators is rising. The future belongs to those who embrace this transformation early. The question is: Will you lead the change, or watch from the sidelines? What if software could think, learn, and build like a human developer? Enter Agent Software Engines: the next evolution in AI-driven development. These aren’t just code generators — they’re autonomous, intelligent systems powered by cutting-edge large language models (LLMs) and reinforcement learning. They don’t just follow instructions; they understand them, write and refine code, debug complex issues, execute tasks, and — most crucially — learn from every interaction. Unlike traditional tools, agent engines act as true collaborators. They reason, validate, and optimize code through iterative cycles, mimicking the workflow of a seasoned engineering team. Picture OpenAI’s AutoGPT orchestrating tasks, Microsoft’s Autogen managing multi-agent workflows, or MetaGPT assigning specialized roles — frontend dev, database architect, QA engineer — all working in concert to build full-scale software systems. Illustration by Author — Napkin.ai This isn’t automation — it’s augmentation. The future of coding isn’t human vs. machine; it’s human and machine, co-creating at unprecedented speed. The question is: Are you ready to harness the power of AI teammates? The End of Coding as We Know It: How Agent Software Engines Are Rewriting the Rules of Development The software industry is on the brink of its biggest transformation since the invention of high-level programming languages. Agent Software Engines aren’t just changing how we write code — they’re redefining what it means to be a developer. Here’s how the future of coding is taking shape: 1. From Syntax to Strategy: The Rise of Prompt Engineering The “how” of coding is becoming obsolete. Tomorrow’s developers won’t wrestle with semicolons and syntax — they’ll master the art of crafting precise, context-rich prompts that guide AI agents to build entire systems. Coding is no longer about writing lines; it’s about architecting intent. 2. Your AI Team Never Sleeps: The Era of Asynchronous Development Imagine a world where your “development team” includes AI agents that work 24/7 — writing documentation, fixing bugs, and optimizing performance while you focus on big-picture innovation. The future of software isn’t solo geniuses; it’s human-AI collectives operating at unprecedented scale. 3. From Weeks to Hours: The Death of the Development Cycle Could you prototype, test, and deploy in the time it takes to drink your morning coffee? Companies like Cognosys and Sweep AI are already proving it’s possible to use AI agents to collapse development timelines from weeks to hours. The bottleneck is no longer human hands; it’s human imagination. 4. No Code? No Problem: Democratizing Development The biggest shift isn’t just speed — it’s accessibility. Entrepreneurs, designers, and even non-technical stakeholders can now directly shape applications using natural language. The barrier between “idea” and “execution” is vanishing — and with it, the monopoly of traditional coders. Illustration by Author — Napkin.ai The Bottom Line: The future belongs to those who adapt fastest. Will you cling to old workflows, or lead the charge into this new paradigm? One thing is certain: in five years, “coding” won’t mean what it does today. The question is — will you be ahead of the curve, or left behind? Pros and Cons of Agent Software Engines vs Traditional Developers Illustration by Author — Napkin.ai Roadmap to Mastering Agent Software Engineering The biggest opportunity in tech isn’t just using AI — it’s orchestrating it. While most developers are still manually writing code, early adopters are already leveraging AI agents to build software 10x faster. Here’s your battle-tested roadmap to not just adapt, but dominate the coming AI-powered development revolution: Illustration by Author — Napkin.ai Phase 1: Rewire Your Mindset Forget everything you know about “traditional” coding. The future belongs to those who can direct AI, not just program it. Immerse yourself in the paradigm shift: Study AI agent philosophy (LangChain’s The Rise of AI Agents is a must-read). Unlearn coding habits: Watch DeepLearning.AI’s LLM courses to grasp how prompting replaces syntax. Key mindset shift: You’re no longer a coder — you’re an AI conductor. Phase 2: Weaponize Prompt Engineering The most valuable skill of the next decade? Telling AI exactly what you need. Master advanced prompting: Chain-of-thought, ReAct, and iterative refinement. Hands-on labs: Build real projects with AutoGPT, BabyAGI, and LangGraph. Deliverable: A portfolio of AI-built apps (CRUD systems, automated workflows). Phase 3: Assemble Your AI Team The best developers don’t code alone — they lead AI agents like a tech CEO. Deploy specialized agents: Use Autogen for cloud-based teams, and MetaGPT for role-based workflows. Project: Ship a microservice app where AI handles 80% of the work. Pro tip: Document every agent interaction — your “prompt playbook” is your new competitive edge. Phase 4: Build Superhuman Agents Turn basic AI helpers into powered-up co-developers. Integrate tools: Web search, APIs, and vector databases to expand agent capabilities. Add memory: Create agents that learn from past projects. Deliverable: A market-ready MVP built entirely through agent collaboration. Phase 5: Go Pro & Get Paid Transition from learner to high-value AI-augmented developer. Automate your org: Implement agent-driven CI/CD, testing, and docs. Earn credentials: LangChain certification, AI agent hackathon wins. Monetize: Launch a micro-SaaS, freelance as an “agent whisperer,” or build a content empire teaching these skills. The Dark Side of AI-Powered Development: 4 Urgent Challenges We Can’t Ignore The agent software revolution isn’t all sunshine and 10x productivity — real dangers are lurking beneath the hype. As we rush to embrace AI teammates, we’re stumbling into four critical minefields that could define the future of our industry: 1. The “Wizard of Oz” Problem: Empty Expertise in the AI Era We’re raising a generation of developers who can prompt but can’t program. When AI handles the heavy lifting, will we still understand what’s happening under the hood? This isn’t just about skills — it’s about preserving our ability to think like engineers when the AI fails (and it will). 2. The Pandora’s Box of Code Security Every AI agent is a potential attack vector. Recent studies show that unconstrained agents can: Expose API keys through hallucinated code Inherit vulnerabilities from training data Become Trojan horses for supply chain attacks The question isn’t if a major agent-related breach will happen — it’s when. 3. The Black Box Crisis: Who’s Responsible When AI Writes Buggy Code? Imagine debugging a system where: The original “developer” is an AI The logic is too complex for any human to fully parse The error only manifests in production We’re entering an era where we might not understand our codebases — let alone certify their safety. 4. The Developer Identity Crisis The brutal truth? AI won’t replace all developers — just the ones who refuse to evolve. The at-risk jobs aren’t just: ✔️ CRUD app developers✔️ Basic bug fixers✔️ Documentation writersThey’re any role that can’t deliver more value than an agent working at 1/10th the cost. The Wake-Up Call: These aren’t hypotheticals — they’re unfolding right now in early-adopter companies. The developers who thrive won’t just use AI agents; they’ll master mitigating these risks while leveraging the advantages. Illustration by Author — Napkin.ai Your Move: Will you be the one solving these challenges, or become a cautionary tale of the AI transition? The Great Developer Divide: How AI is Creating a Two-Tier Future for Coders The numbers don’t lie — we’re witnessing the fastest workforce transformation in tech history. GitHub’s 2023 data reveals a seismic shift: (a) 55% faster coding with AI tools(b) 3 in 4 junior devs now rely on AI daily(c) 81% of companies will bake AI agents into their SDLC by 2026 But here’s the uncomfortable truth no one’s saying out loud: By 2030, “coder” will mean something radically different. The Coming Reality: ▸ 30–40% of entry-level coding jobs will vanish or morph into AI-management roles ▸ Agent Engineers (avg salary projected: $250k+) will be the new rockstars▸ Whiteboard interviews will die — replaced by AI collaboration challengesThis isn’t speculation — it’s already happening: • FAANG companies are quietly retooling their hiring rubrics• Bootcamps are pivoting to “AI-First Development” curricula• Startups are launching with 1 human and 10 AI agentsIllustration by Author — Napkin.ai There will be two kinds of developers in 5 years: 1. Those who command AI agents 2. Those who compete with AI agents The AI Developer Survival Guide: 4 Non-Negotiable Rules to Stay Relevant The brutal truth? Your coding skills alone won’t save your career. As AI agents become the new “junior developers,” here’s how to bulletproof your future in the industry: 1. Become the Architect, Not the Bricklayer AI writes code — humans solve problems. The developers who thrive will: Master system design and abstraction Think in patterns, not just functions On the “why” while AI handles the “how” 2. Develop Your AI BS Detector The most valuable skill of 2025? Knowing when your AI is: ✓ Brilliant✓ Broken✓ DangerousLearn to audit code like a forensic accountant — your job depends on it.3. Your New Portfolio: AI Collaboration Case Studies Forget GitHub commit streaks. Hiring managers want to see: ✓ Projects where you directed AI agents✓ Documentation of prompt iterations✓ Before/after benchmarks showing your AI leverage4. Join the AI Underground The best opportunities aren’t on job boards — they’re in: • Open-source agent projects• AI dev Discord war rooms• Experimental frameworks 99% of devs ignoreFinal Thoughts The rise of agent software engines isn’t the end of human developers — it’s the beginning of supercharged innovation, where the most successful engineers won’t just write code but will architect intelligence, orchestrate AI teams, and solve problems at unprecedented scale. Just as high-level languages liberated us from assembly, AI collaboration will free us from repetitive tasks, elevating our role to true creators and strategists — those who embrace this shift will define the next era of technological progress, while those who resist risk becoming obsolete. References If this piece resonated with you, please clap, share your thoughts in the comments, and spread the conversation by sharing with fellow developers who need to hear this message. The future is being written now, and your engagement helps shape what comes next! Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI0 Σχόλια 0 Μοιράστηκε 69 Views
-
TOWARDSAI.NETThese 7 AI Tools Might Beat You to That Data Science JobThese 7 AI Tools Might Beat You to That Data Science Job 0 like April 19, 2025 Share this post Last Updated on April 19, 2025 by Editorial Team Author(s): Harshit Kandoi Originally published on Towards AI. Photo by Christina @ wocintechchat.com on Unsplash I don’t know why people are spending years mastering data science when AI tools can produce the same results in minutes. I’m not trying to go against the data science field, but in reality, when I see someone actively learning the vast world of data science through a traditional way, which involves grinding through heavy math calculations, learning coding skills, working on actual real-world datasets, and slowly building something that makes sense. But now it’s all about AI tools, which are not only meant to help us learn faster, but to do the job faster. And in some cases, it often works better. This isn’t some sci-fi fantasy where robots are stealing everyone’s job in a blink. It’s more like a quiet revolution that’s already happening. Tasks like cleaning data, building models, running complex algorithms, and even generating insights are easily handled by tools that are more accessible, fast, and surprisingly creative. You don’t need to be a Python wizard or have knowledge of statistics to use them anymore. That’s both exciting and, honestly, a little unsettling. The goal of this blog is to make you aware of seven AI tools that are slowly but surely reshaping… Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post0 Σχόλια 0 Μοιράστηκε 78 Views
-
TOWARDSAI.NETPPO Explained and Its Constraints: Introducing PDPPO as an AlternativeAuthor(s): Leonardo Kanashiro Felizardo Originally published on Towards AI. What is PPO, and Why is it Popular? Proximal Policy Optimization (PPO) has rapidly emerged as a leading model-free reinforcement learning (RL) method due to its simplicity and strong performance across various domains. PPO combines trust-region policy optimization and clipped objective optimization to ensure stable and efficient policy updates. Explanation of PPO PPO addresses the limitations of previous RL methods like vanilla policy gradient and TRPO (Trust Region Policy Optimization) by balancing exploration and exploitation through controlled policy updates. PPO specifically aims to stabilize training by preventing overly large policy updates, which could lead to catastrophic forgetting or divergence. Actor-Critic and the Role of Advantage Estimation PPO belongs to the family of actor-critic algorithms, where two models work together: The actor updates the policy π(θ,a|s) by selecting actions based on states. The critic evaluates the actor’s decisions by estimating the value function V(π,s). This architecture was first formalized by Konda and Tsitsiklis in their seminal work Actor-Critic Algorithms, as shown in Konda et at. [1], where they demonstrated convergence properties and laid the mathematical foundation for combining policy gradient methods with value function estimation. The advantage function is a critical concept in this setting, defined as: This is a minimal and clean example of how to implement an Actor-Critic architecture in PyTorch: import torchimport torch.nn as nnimport torch.optim as optim class ActorCritic(nn.Module): def __init__(self, state_dim, action_dim): super().__init__() self.shared = nn.Sequential(nn.Linear(state_dim, 128), nn.ReLU()) self.actor = nn.Linear(128, action_dim) self.critic = nn.Linear(128, 1) def forward(self, x): x = self.shared(x) return self.actor(x), self.critic(x)# Example usagestate_dim = 4action_dim = 2model = ActorCritic(state_dim, action_dim)optimizer = optim.Adam(model.parameters(), lr=3e-4)state = torch.rand((1, state_dim))logits, value = model(state)dist = torch.distributions.Categorical(logits=logits)action = dist.sample()log_prob = dist.log_prob(action)# Mock advantage and returnadvantage = torch.tensor([1.0])return_ = torch.tensor([[1.5]])# Actor-Critic lossactor_loss = -log_prob * advantagecritic_loss = (value - return_).pow(2).mean()loss = actor_loss + critic_loss# Backpropagationoptimizer.zero_grad()loss.backward()optimizer.step() PPO Objective and Mathematics The core idea behind PPO is the optimization of the policy network through a clipped objective function: Here: θ represents the parameters of the policy. ε is a hyperparameter typically small (e.g., 0.2) controlling how much the policy can change at each step. A is the advantage function, indicating the relative improvement of taking a specific action compared to the average action. The probability ratio is defined as: This ratio quantifies how much the probability of selecting an action has changed from the old policy to the new one. PyTorch Code Example: PPO Core import torchimport torch.nn as nnimport torch.optim as optim # Assume we already have: states, actions, old_log_probs, returns, values# And a model with .actor and .critic modulesclip_epsilon = 0.2gamma = 0.99# Compute advantagesadvantages = returns - valuesdiscounted_advantages = (advantages - advantages.mean()) / (advantages.std() + 1e-8)# Get new log probabilities and state valueslog_probs = model.actor.get_log_probs(states, actions)ratios = torch.exp(log_probs - old_log_probs.detach())# Clipped surrogate objectivesurr1 = ratios * discounted_advantagessurr2 = torch.clamp(ratios, 1.0 - clip_epsilon, 1.0 + clip_epsilon) * discounted_advantagespolicy_loss = -torch.min(surr1, surr2).mean()# Critic loss (value function)value_estimates = model.critic(states)critic_loss = nn.MSELoss()(value_estimates, returns)# Total losstotal_loss = policy_loss + 0.5 * critic_loss# Backpropagationoptimizer.zero_grad()total_loss.backward()torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=0.5)optimizer.step() PPO’s Advantages and Popularity PPO’s popularity stems from its: Simplicity: Easier to implement and tune compared to other sophisticated methods like TRPO. Efficiency: Faster convergence due to the clipped surrogate objective, reducing the need for careful hyperparameter tuning. Versatility: Robust performance across a wide range of tasks including robotics, games, and operational management problems. Flaws and Limitations of PPO Despite PPO’s successes, it faces several limitations: High Variance and Instability: PPO’s reliance on sample-based estimates can cause significant variance in policy updates, especially in environments with sparse rewards or long horizons. Exploration Inefficiency: PPO typically relies on Gaussian noise for exploration, which can lead to insufficient exploration, especially in complex, high-dimensional state spaces. Sensitivity to Initialization: PPO’s effectiveness can vary greatly depending on initial conditions, causing inconsistent results across training runs. Enter PDPPO: A Novel Improvement To overcome these limitations, Post-Decision Proximal Policy Optimization (PDPPO) introduces a novel approach using dual critic networks and post-decision states. Understanding Post-Decision States Post-decision states, introduced by Warren B. Powell [2], provide a powerful abstraction in reinforcement learning. A post-decision state represents the environment immediately after an agent has taken an action but before the environment’s stochastic response occurs. This allows the learning algorithm to decompose the transition dynamics into two parts: Deterministic step (decision): This representes the state right after the deterministric effects take place. Stochastic step (nature’s response): As soon as we observe the deterministric effects, we also account for the stochastic variables that change the state. Where: f represents the deterministic function mapping the current state and action to the post-decision state sˣ. η is a random variable capturing the environment’s stochasticity. g defines how this stochastic component affects the next state. s’ is the next state Example: Frozen Lake Imagine the Frozen Lake environment. The agent chooses to move right from a given tile. The action is deterministic — the intention to move right is clear. This gives us the post-decision state sˣ: “attempted to move right.” However, because the ice is slippery, the agent may not land on the intended tile. It might slide right, down, or stay in place, with a certain probability for each. That final position — determined after the slippage — is the true next state s’. This decomposition allows value functions to be better estimated: Pre-decision value function: Post-decision value function: This formulation helps decouple the decision from stochastic effects, reducing variance in value estimation and improving sample efficiency. Post-Decision Advantage Calculation Given both critics, PDPPO computes the advantage as: And selects the most informative advantage at each step: This “maximum advantage” strategy allows the actor to favor the most promising value estimate during learning. Updating the Critics and Policy Critic loss functions: Combined actor-critic loss: This architecture, with separate value estimators for deterministic and stochastic effects, enables more stable learning in environments with complex uncertainty. Dual Critic Networks PDPPO employs two critics: State Critic: Estimates the value function based on pre-decision states. Post-Decision Critic: Estimates the value function based on post-decision states. The dual-critic approach improves value estimation accuracy by capturing both deterministic and stochastic dynamics separately. PyTorch Code Example: PDPPO Core import torchimport torch.nn as nnimport torch.optim as optim # Assume: states, actions, old_log_probs, returns, post_returns, # model with actor, critic, post_decision_criticclip_epsilon = 0.2# --- 1. Compute advantages from both critics ---values = model.critic(states)post_values = model.post_decision_critic(post_states)adv_pre = returns - valuesadv_post = post_returns - post_values# Use the max advantage (PDPPO twist)advantages = torch.max(adv_pre, adv_post)advantages = (advantages - advantages.mean()) / (advantages.std() + 1e-8)# --- 2. Policy loss: same PPO-style clip ---log_probs = model.actor.get_log_probs(states, actions)ratios = torch.exp(log_probs - old_log_probs.detach())surr1 = ratios * advantagessurr2 = torch.clamp(ratios, 1.0 - clip_epsilon, 1.0 + clip_epsilon) * advantagespolicy_loss = -torch.min(surr1, surr2).mean()# --- 3. Dual critic loss ---critic_loss = nn.MSELoss()(values, returns)post_critic_loss = nn.MSELoss()(post_values, post_returns)# Total loss with dual critictotal_loss = policy_loss + 0.5 * (critic_loss + post_critic_loss)# --- 4. Backpropagation ---optimizer.zero_grad()total_loss.backward()torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=0.5)optimizer.step() PDPPO vs PPO in Practice Tests on environments such as Frozen Lake and Stochastic Lot-sizing highlight PDPPO’s significant performance improvements as in Felizardo et al. [3]: Improved Stability Across Seeds PDPPO showed lower variance in both cumulative and maximum rewards across different random seeds, particularly in stochastic environments like Frozen Lake. This indicates greater robustness to initialization compared to PPO, which often suffers from unstable learning in such settings.Faster and Smoother Convergence The learning curves of PDPPO are notably smoother and consistently trend upward, while PPO’s often stagnate or oscillate. This suggests that PDPPO’s dual-critic structure provides more accurate value estimates, enabling more reliable policy updates.Better Scaling with Dimensionality In the Stochastic Lot-Sizing tasks, PDPPO’s performance gap widened as the problem dimensionality increased (e.g., 25 items and 15 machines). This demonstrates that PDPPO scales better in complex settings, benefiting from its decomposition of dynamics into deterministic and stochastic parts.More Informative Advantage Estimates By using the maximum of pre- and post-decision advantages, PDPPO effectively captures the most optimistic learning signal at each step — leading to better exploitation of promising strategies without ignoring the stochastic nature of the environment.Better Sample Efficiency Empirical results showed that PDPPO achieved higher rewards using fewer training episodes, making it more sample-efficient — an essential trait for real-world applications where data collection is expensive. Empirical comparison (20–30 Runs) PDPPO significantly outperforms PPO across three environment configurations of the Stochastic Lot-Sizing Problem. The shaded areas represent 95% confidence intervals. Faster convergence Higher peak performance, and Tighter variance bands for PDPPO. A few other alternatives A few other alternatives to address the limitations of PPO include: Intrinsic Exploration Module (IEM) Proposed by Zhang et al. [8], this approach enhances exploration by incorporating uncertainty estimation into PPO. It addresses PPO’s weak exploration signal by rewarding novelty, especially useful in sparse reward settings.Uncertainty-Aware TRPO (UA-TRPO) Introduced by Queeney et al. [7], UA-TRPO aims to stabilize policy updates in the presence of finite-sample estimation errors by accounting for uncertainty in the policy gradients — offering a more robust learning process than standard PPO.Dual-Critic Variants Previous methods, like SAC [4] and TD3 [5], use dual critics mainly for continuous action spaces to reduce overestimation bias. However, they typically do not incorporate post-decision states nor are designed for environments with both deterministic and stochastic dynamics.Post-Decision Architectures in OR Earlier work in operations research (e.g., Powell [2], Hull [6]) used post-decision states to manage the curse of dimensionality in approximate dynamic programming. PDPPO brings this insight into deep RL by using post-decision value functions directly in the learning process. Each of these methods has its trade-offs, and PDPPO stands out by directly tackling the challenge of stochastic transitions via decomposition and dual critics — making it particularly effective in noisy, real-world-like settings. Citation [1] Konda, V. R., & Tsitsiklis, J. N. (2000). Actor-Critic Algorithms. In S.A. Solla, T.K. Leen, & K.-R. Müller (Eds.), Advances in Neural Information Processing Systems, Vol. 12. MIT Press. [2] Powell, W. B. (2007). Approximate Dynamic Programming: Solving the Curses of Dimensionality (2nd ed.). John Wiley & Sons. [3] Felizardo, L. K., Fadda, E., Nascimento, M. C. V., Brandimarte, P., & Del-Moral-Hernandez, E. (2024). A Reinforcement Learning Method for Environments with Stochastic Variables: Post-Decision Proximal Policy Optimization with Dual Critic Networks. arXiv preprint arXiv:2504.05150. https://arxiv.org/pdf/2504.05150 [4] Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In Proceedings of the 35th International Conference on Machine Learning (ICML). [5] Fujimoto, S., van Hoof, H., & Meger, D. (2018). Addressing Function Approximation Error in Actor-Critic Methods. In Proceedings of the 35th International Conference on Machine Learning (ICML). [6] Hull, I. (2015). Approximate Dynamic Programming with Post-Decision States as a Solution Method for Dynamic Economic Models. Journal of Economic Dynamics and Control, 55, 57–70. [7] Queeney, J., Paschalidis, I. C., & Cassandras, C. G. (2021). Uncertainty-Aware Policy Optimization: A Robust, Adaptive Trust Region Approach. In Proceedings of the AAAI Conference on Artificial Intelligence, 35(9), 9377–9385. [8] Zhang, J., Zhang, Z., Han, S., & Lü, S. (2022). Proximal Policy Optimization via Enhanced Exploration Efficiency. Information Sciences, 609, 750–765. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI0 Σχόλια 0 Μοιράστηκε 79 Views
-
TOWARDSAI.NETMastering Hadoop, Part 1: Installation, Configuration, and Modern Big Data StrategiesMastering Hadoop, Part 1: Installation, Configuration, and Modern Big Data Strategies 0 like April 19, 2025 Share this post Author(s): Niklas Lang Originally published on Towards AI. A comprehensive guide covering Hadoop setup, HDFS commands, MapReduce, debugging, advantages, challenges, and the future of big data technologies.Photo by Nam Anh on Unsplash Nowadays, a large amount of data is collected on the internet, which is why companies are faced with the challenge of being able to store, process, and analyze these volumes efficiently. Hadoop is an open-source framework from the Apache Software Foundation and has become one of the leading Big Data management technologies in recent years. The system enables the distributed storage and processing of data across multiple servers. As a result, it offers a scalable solution for a wide range of applications from data analysis to machine learning. This article provides a comprehensive overview of Hadoop and its components. We also examine the underlying architecture and provide practical tips for getting started with it. Before we can start with it, we need to mention that the whole topic of Hadoop is huge, and even though this article is already long, it is not even close to going into too much detail on all topics. This is why we split it into three parts: To let you decide for yourself how deep you want to dive into it: Part 1: Hadoop 101: What it is, why it matters, and who should care This part is for everyone interested in Big… Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post0 Σχόλια 0 Μοιράστηκε 72 Views
-
TOWARDSAI.NET11 Docker Container Images for Generative AI & ML Projects11 Docker Container Images for Generative AI & ML Projects 0 like April 18, 2025 Share this post Author(s): Youssef Hosni Originally published on Towards AI. Docker containers offer significant advantages for machine learning by ensuring consistent, portable, and reproducible environments across different systems. By encapsulating all dependencies, libraries, and configurations in a container, Docker eliminates compatibility issues and the “it works on my machine” problem. This makes it easier to move ML projects between development, cloud, or production environments without worrying about differences in setup. Additionally, Docker enables scalability and isolation, allowing machine learning workflows to be easily scaled using tools like Kubernetes, and ensuring that dependencies do not conflict between different projects. In this article, we will explore 11 Docker container images for Generative AI and machine learning projects. These include tools for development environments, deep learning frameworks, machine learning lifecycle management, workflow orchestration, and large language models. I. Machine Learning & Data Science PythonJupyter Notebook data science stack II. Generative AI & Deep Learning 3. Hugging Face Transformers 4. NVIDIA CUDA deep learning runtime 5. TensorFlow 6. PyTorch 7. Ollama 8. Qdrant III. Workflow Orchestration & ML Lifecycle Management 9. Airflow 10. MLflow 11. Kubeflow Notebooks Most insights I share in Medium have previously been shared in my weekly newsletter, To Data & Beyond. If you want to be up-to-date with the frenetic world of AI while also feeling inspired to take action or, at the very least, to be well-prepared… Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post0 Σχόλια 0 Μοιράστηκε 71 Views
-
TOWARDSAI.NET5 Data Science & AI Career Paths5 Data Science & AI Career Paths 0 like April 18, 2025 Share this post Last Updated on April 18, 2025 by Editorial Team Author(s): Claudia Ng Originally published on Towards AI. A practical guide to understanding the responsibilities, skills, and day-to-day work of the top data science roles. I often get questions on what the different types of data science roles are from readers looking to break into the field. The truth is that data science is a broad term that covers a wide range of roles, from analyzing business metrics to building machine learning models to deploying AI systems. If you’re considering moving into data science or transitioning between different roles, it’s important to understand their differences so you can find the best fit for your skills and career goals. In this post, I’ll break down 5 common roles in data science and AI, covering the key responsibilities, day-to-day tasks, required skill sets, and career paths. I’ve ordered them from more business-oriented roles to more engineering-heavy roles. Image generated by 1. Product Data Scientist Overview Product data scientists (sometimes called data analysts) focus on defining business metrics, recommending actionable insights, and running experiments to drive product strategy. These roles are common in tech companies in consumer apps, marketplace platforms, and SaaS companies. Key Responsibilities Define and track business metrics like conversion, retention, churn, and engagement.Design and analyze A/B tests to evaluate product features.Build dashboards and reports for product managers and business leaders.Conduct deep-dive analysis to inform strategic decisions. Day-to-Day Tasks Writing SQL queries to extract data and test hypotheses… Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post0 Σχόλια 0 Μοιράστηκε 65 Views
-
TOWARDSAI.NET🧠 Building an AI Study Buddy: A Practical Guide to Developing a Simple Learning Companion🧠 Building an AI Study Buddy: A Practical Guide to Developing a Simple Learning Companion 0 like April 18, 2025 Share this post Author(s): Hemanth Sanisetty Originally published on Towards AI. AI Generated Image In the era of digital transformation, education is undergoing a significant evolution. Traditional study methods are being augmented and in some cases, replaced by intelligent tools that cater to individual learning styles and needs. This project serves as a practical demonstration of how AI can be harnessed to process educational content efficiently. This AI Study Buddy is designed to: Summarize educational materials from PDFs and YouTube videos.Generate quizzes and flashcards to reinforce learning.Provide a modular architecture that can be expanded upon. In this guide, we’ll build it from scratch using Groq API for ultra-fast inference using models like LLaMA 3 and Mistral, LangChain, FAISS, and Sentence Transformers for RAG, Streamlit for a production-ready front end. This module focuses on extracting text from PDF documents, enabling the tool to process and summarize textual content effectively. In this module, we use PyMuPDF to iterate through each page of the PDF to extract text. import fitz # PyMuPDFdef extract_text_from_pdf(uploaded_file): """ Extracts and returns the full text from a PDF file using PyMuPDF. """ text = "" with fitz.open(stream=uploaded_file.read(), filetype="pdf") as doc: for page in doc: text += page.get_text() return text The YouTube Transcript Retrieval Module is designed to extract textual transcripts from YouTube videos, enabling subsequent… Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post0 Σχόλια 0 Μοιράστηκε 95 Views
-
TOWARDSAI.NETWhy QLoRA Changes the Game: A Quick Dive into Efficient Fine-Tuning with BERTWhy QLoRA Changes the Game: A Quick Dive into Efficient Fine-Tuning with BERT 0 like April 18, 2025 Share this post Author(s): Saif Ali Kheraj Originally published on Towards AI. Quantized Low-Rank Adaptation — anyone with a mid-range GPU and some curiosity can now fine-tune powerful models without burning through a budget or a power supply. In this article, we will break down QLoRA in plain language. No technical jargon overload, just clear ideas, relatable examples, and a little fun along the way. Let’s start with a quick comparison: Adapters: Instead of retraining the whole model, adapters insert small, trainable blocks. Think of them as sticky notes added to the original book.LoRA (Low-Rank Adaptation): A smarter version that fine tunes just a few key parts of the model — Wq and Wv because they significantly influence the attention computation. Think of it as just rewriting key points or summary in a book instead of the whole story.QLoRA: It applies LoRA techniques to a model that is already compressed using 4 bit quantization(we will go through it). It is efficient, elegant, and powerful. QLoRA stands for Quantized Low-Rank Adaptation. It is a method for fine tuning large language models (LLMs) in a way that is: Memory efficientFriendly to consumer level GPUsStill powerful and accurate It combines two ideas: quantization (compressing data) and low-rank adaptation (tuning only a small part of the model). The result? A streamlined fine… Read the full blog for free on Medium. Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI Towards AI - Medium Share this post0 Σχόλια 0 Μοιράστηκε 76 Views
και άλλες ιστορίες