Towards Data Science
Towards Data Science
Your home for data science & AI. The world's leading publication for data science and artificial intelligence professionals.
  • 1 A la gente le gusta esto.
  • 12 Entradas
  • 2 Fotos
  • 0 Videos
  • 0 Vista previa
  • News
Buscar
Actualizaciones Recientes
  • TOWARDSDATASCIENCE.COM
    AI Agents Processing Time Series and Large Dataframes
    Intro Agents are AI systems, powered by LLMs, that can reason about their objectives and take actions to achieve a final goal. They are designed not just to respond to queries, but to orchestrate a sequence of operations, including processing data (i.e. dataframes and time series). This ability unlocks numerous real-world applications for democratizing access to data analysis, such as automating reporting, no-code queries, support on data cleaning and manipulation.  Agents that can interact with dataframes in two different ways:  with natural language — the LLM reads the table as a string and tries to make sense of it based on its knowledge base by generating and executing code — the Agent activates tools to process the dataset as an object.  So, by combining the power of NLP with the precision of code execution, AI Agents enable a broader range of users to interact with complex datasets and derive insights. In this tutorial, I’m going to show how to process dataframes and time series with AI Agents. I will present some useful Python code that can be easily applied in other similar cases (just copy, paste, run) and walk through every line of code with comments so that you can replicate this example (link to full code at the end of the article). Setup Let’s start by setting up Ollama (pip install ollama==0.4.7), a library that allows users to run open-source LLMs locally, without needing cloud-based services, giving more control over data privacy and performance. Since it runs locally, any conversation data does not leave your machine. First of all, you need to download Ollama from the website.  Then, on the prompt shell of your laptop, use the command to download the selected LLM. I’m going with Alibaba’s Qwen, as it’s both smart and light. After the download is completed, you can move on to Python and start writing code. import ollama llm = "qwen2.5" Let’s test the LLM: stream = ollama.generate(model=llm, prompt='''what time is it?''', stream=True) for chunk in stream:     print(chunk['response'], end='', flush=True) Time Series A time series is a sequence of data points measured over time, often used for analysis and forecasting. It allows us to see how variables change over time, and it’s used to identify trends and seasonal patterns. I’m going to generate a fake time series dataset to use as an example. import pandas as pd import numpy as np import matplotlib.pyplot as plt ## create data np.random.seed(1) #<--for reproducibility length = 30 ts = pd.DataFrame(data=np.random.randint(low=0, high=15, size=length),                   columns=['y'],                   index=pd.date_range(start='2023-01-01', freq='MS', periods=length).strftime('%Y-%m')) ## plot ts.plot(kind="bar", figsize=(10,3), legend=False, color="black").grid(axis='y') Usually, time series datasets have a really simple structure with the main variable as a column and the time as the index. Before transforming it into a string, I want to make sure that everything is placed under a column, so that we don’t lose any piece of information. dtf = ts.reset_index().rename(columns={"index":"date"}) dtf.head() Then, I shall change the data type from dataframe to dictionary. data = dtf.to_dict(orient='records') data[0:5] Finally, from dictionary to string. str_data = "\n".join([str(row) for row in data]) str_data Now that we have a string, it can be included in a prompt that any language model is able to process. When you paste a dataset into a prompt, the LLM reads the data as plain text, but can still understand the structure and meaning based on patterns seen during training. prompt = f''' Analyze this dataset, it contains monthly sales data of an online retail product: {str_data} ''' We can easily start a chat with the LLM. Please note that, right now, this is not an Agent as it doesn’t have any Tool, we’re just using the language model. While it doesn’t process numbers like a computer, the LLM can recognize column names, time-based patterns, trends, and outliers, especially with smaller datasets. It can simulate analysis and explain findings, but it won’t perform precise calculations independently, as it’s not executing code like an Agent. messages = [{"role":"system", "content":prompt}] while True:     ## User     q = input(' >')     if q == "quit":         break     messages.append( {"role":"user", "content":q} )         ## Model     agent_res = ollama.chat(model=llm, messages=messages, tools=[])     res = agent_res["message"]["content"]         ## Response     print(" >", f"\x1b[1;30m{res}\x1b[0m")     messages.append( {"role":"assistant", "content":res} ) The LLM recognizes numbers and understands the general context, the same way it might understand a recipe or a line of code.  As you can see, using LLMs to analyze time series is great for quick and conversational insights. Agent LLMs are good for brainstorming and lite exploration, while an Agent can run code. Therefore, it can handle more complex tasks like plotting, forecasting, and anomaly detection. So, let’s create the Tools.  Sometimes, it can be more effective to treat the “final answer” as a Tool. For example, if the Agent does multiple actions to generate intermediate results, the final answer can be thought of as the Tool that integrates all of this information into a cohesive response. By designing it this way, you have more customization and control over the results. def final_answer(text:str) -> str:     return text tool_final_answer = {'type':'function', 'function':{   'name': 'final_answer',   'description': 'Returns a natural language response to the user',   'parameters': {'type': 'object',                 'required': ['text'],                 'properties': {'text': {'type':'str', 'description':'natural language response'}} }}} final_answer(text="hi") Then, the coding Tool. import io import contextlib def code_exec(code:str) -> str:     output = io.StringIO()     with contextlib.redirect_stdout(output):         try:             exec(code)         except Exception as e:             print(f"Error: {e}")     return output.getvalue() tool_code_exec = {'type':'function', 'function':{   'name': 'code_exec',   'description': 'Execute python code. Use always the function print() to get the output.',   'parameters': {'type': 'object',                 'required': ['code'],                 'properties': {                     'code': {'type':'str', 'description':'code to execute'}, }}}} code_exec("from datetime import datetime; print(datetime.now().strftime('%H:%M'))") Moreover, I shall add a couple of utils functions for Tool usage and to run the Agent. dic_tools = {"final_answer":final_answer, "code_exec":code_exec} # Utils def use_tool(agent_res:dict, dic_tools:dict) -> dict:     ## use tool     if "tool_calls" in agent_res["message"].keys():         for tool in agent_res["message"]["tool_calls"]:             t_name, t_inputs = tool["function"]["name"], tool["function"]["arguments"]             if f := dic_tools.get(t_name):                 ### calling tool                 print(' >', f"\x1b[1;31m{t_name} -> Inputs: {t_inputs}\x1b[0m")                 ### tool output                 t_output = f(**tool["function"]["arguments"])                 print(t_output)                 ### final res                 res = t_output             else:                 print(' >', f"\x1b[1;31m{t_name} -> NotFound\x1b[0m")     ## don't use tool     if agent_res['message']['content'] != '':         res = agent_res["message"]["content"]         t_name, t_inputs = '', ''     return {'res':res, 'tool_used':t_name, 'inputs_used':t_inputs} When the Agent is trying to solve a task, I want it to keep track of the Tools that have been used, the inputs that it tried, and the results it gets. The iteration should stop only when the model is ready to give the final answer. def run_agent(llm, messages, available_tools):     tool_used, local_memory = '', ''     while tool_used != 'final_answer':         ### use tools         try:             agent_res = ollama.chat(model=llm,                                     messages=messages,                                                                                                              tools=[v for v in available_tools.values()])             dic_res = use_tool(agent_res, dic_tools)             res, tool_used, inputs_used = dic_res["res"], dic_res["tool_used"], dic_res["inputs_used"]         ### error         except Exception as e:             print(" >", e)             res = f"I tried to use {tool_used} but didn't work. I will try something else."             print(" >", f"\x1b[1;30m{res}\x1b[0m")             messages.append( {"role":"assistant", "content":res} )         ### update memory         if tool_used not in ['','final_answer']:             local_memory += f"\nTool used: {tool_used}.\nInput used: {inputs_used}.\nOutput: {res}"             messages.append( {"role":"assistant", "content":local_memory} )             available_tools.pop(tool_used)             if len(available_tools) == 1:                 messages.append( {"role":"user", "content":"now activate the tool final_answer."} )         ### tools not used         if tool_used == '':             break     return res In regard to the coding Tool, I’ve noticed that Agents tend to recreate the dataframe at every step. So I will use a memory reinforcement to remind the model that the dataset already exists. A trick commonly used to get the desired behaviour. Ultimately, memory reinforcements help you to get more meaningful and effective interactions. # Start a chat messages = [{"role":"system", "content":prompt}] memory = ''' The dataset already exists and it's called 'dtf', don't create a new one. ''' while True:     ## User     q = input(' >')     if q == "quit":         break     messages.append( {"role":"user", "content":q} )     ## Memory     messages.append( {"role":"user", "content":memory} )              ## Model     available_tools = {"final_answer":tool_final_answer, "code_exec":tool_code_exec}     res = run_agent(llm, messages, available_tools)         ## Response     print(" >", f"\x1b[1;30m{res}\x1b[0m")     messages.append( {"role":"assistant", "content":res} ) Creating a plot is something that the LLM alone can’t do. But keep in mind that even if Agents can create images, they can’t see them, because after all, the engine is still a language model. So the user is the only one who visualises the plot. The Agent is using the library statsmodels to train a model and forecast the time series.  Large Dataframes LLMs have limited memory, which restricts how much information they can process at once, even the most advanced models have token limits (a few hundred pages of text). Additionally, LLMs don’t retain memory across sessions unless a retrieval system is integrated. In practice, to effectively work with large dataframes, developers often use strategies like chunking, RAG, vector databases, and summarizing content before feeding it into the model. Let’s create a big dataset to play with. import random import string length = 1000 dtf = pd.DataFrame(data={     'Id': [''.join(random.choices(string.ascii_letters, k=5)) for _ in range(length)],     'Age': np.random.randint(low=18, high=80, size=length),     'Score': np.random.uniform(low=50, high=100, size=length).round(1),     'Status': np.random.choice(['Active','Inactive','Pending'], size=length) }) dtf.tail() I’ll add a web-searching Tool, so that, with the ability to execute Python code and search the internet, a general-purpose AI gains access to all the available knowledge and can make data-driven decisions.  In Python, the easiest way to create a web-searching Tool is with the famous private browser DuckDuckGo (pip install duckduckgo-search==6.3.5). You can directly use the original library or import the LangChain wrapper (pip install langchain-community==0.3.17). from langchain_community.tools import DuckDuckGoSearchResults def search_web(query:str) -> str:   return DuckDuckGoSearchResults(backend="news").run(query) tool_search_web = {'type':'function', 'function':{   'name': 'search_web',   'description': 'Search the web',   'parameters': {'type': 'object',                 'required': ['query'],                 'properties': {                     'query': {'type':'str', 'description':'the topic or subject to search on the web'}, }}}} search_web(query="nvidia") In total, the Agent now has 3 tools. dic_tools = {'final_answer':final_answer,             'search_web':search_web,             'code_exec':code_exec} Since I can’t add the full dataframe in the prompt, I shall feed only the first 10 rows so that the LLM can understand the general context of the dataset. Additionally, I will specify where to find the full dataset. str_data = "\n".join([str(row) for row in dtf.head(10).to_dict(orient='records')]) prompt = f''' You are a Data Analyst, you will be given a task to solve as best you can. You have access to the following tools: - tool 'final_answer' to return a text response. - tool 'code_exec' to execute Python code. - tool 'search_web' to search for information on the internet. If you use the 'code_exec' tool, remember to always use the function print() to get the output. The dataset already exists and it's called 'dtf', don't create a new one. This dataset contains credit score for each customer of the bank. Here's the first rows: {str_data} ''' Finally, we can run the Agent. messages = [{"role":"system", "content":prompt}] memory = ''' The dataset already exists and it's called 'dtf', don't create a new one. ''' while True:     ## User     q = input(' >')     if q == "quit":         break     messages.append( {"role":"user", "content":q} )     ## Memory     messages.append( {"role":"user", "content":memory} )              ## Model     available_tools = {"final_answer":tool_final_answer, "code_exec":tool_code_exec, "search_web":tool_search_web}     res = run_agent(llm, messages, available_tools)         ## Response     print(" >", f"\x1b[1;30m{res}\x1b[0m")     messages.append( {"role":"assistant", "content":res} ) In this interaction, the Agent used the coding Tool properly. Now, I want to make it utilize the other tool as well. At last, I need the Agent to put together all the pieces of information obtained so far from this chat.  Conclusion This article has been a tutorial to demonstrate how to build from scratch Agents that process time series and large dataframes. We covered both ways that models can interact with the data: through natural language, where the LLM interprets the table as a string using its knowledge base, and by generating and executing code, leveraging tools to process the dataset as an object. Full code for this article: GitHub I hope you enjoyed it! Feel free to contact me for questions and feedback, or just to share your interesting projects.  Let’s Connect  The post AI Agents Processing Time Series and Large Dataframes appeared first on Towards Data Science.
    0 Commentarios 0 Acciones 29 Views
  • TOWARDSDATASCIENCE.COM
    When Physics Meets Finance: Using AI to Solve Black-Scholes
    DISCLAIMER: This is not financial advice. I’m a PhD in Aerospace Engineering with a strong focus on Machine Learning: I’m not a financial advisor. This article is intended solely to demonstrate the power of Physics-Informed Neural Networks (PINNs) in a financial context. When I was 16, I fell in love with Physics. The reason was simple yet powerful: I thought Physics was fair. It never happened that I got an exercise wrong because the speed of light changed overnight, or because suddenly ex could be negative. Every time I read a physics paper and thought, “This doesn’t make sense,” it turned out I was the one not making sense. So, Physics is always fair, and because of that, it’s always perfect. And Physics displays this perfection and fairness through its set of rules, which are known as differential equations. The simplest differential equation I know is this one: Image made by author Very simple: we start here, x0=0, at time t=0, then we move with a constant speed of 5 m/s. This means that after 1 second, we are 5 meters (or miles, if you like it best) away from the origin; after 2 seconds, we are 10 meters away from the origin; after 43128 seconds… I think you got it. As we were saying, this is written in stone: perfect, ideal, and unquestionable. Nonetheless, imagine this in real life. Imagine you are out for a walk or driving. Even if you try your best to go at a target speed, you will never be able to keep it constant. Your mind will race in certain parts; maybe you will get distracted, maybe you will stop for red lights, most likely a combination of the above. So maybe the simple differential equation we mentioned earlier is not enough. What we could do is to try and predict your location from the differential equation, but with the help of Artificial Intelligence. This idea is implemented in Physics Informed Neural Networks (PINN). We will describe them later in detail, but the idea is that we try to match both the data and what we know from the differential equation that describes the phenomenon. This means that we enforce our solution to generally meet what we expect from Physics. I know it sounds like black magic, I promise it will be clearer throughout the post. Now, the big question: What does Finance have to do with Physics and Physics Informed Neural Networks? Well, it turns out that differential equations are not only useful for nerds like me who are interested in the laws of the natural universe, but they can be useful in financial models as well. For example, the Black-Scholes model uses a differential equation to set the price of a call option to have, given certain quite strict assumptions, a risk-free portfolio. The goal of this very convoluted introduction was twofold: Confuse you just a little, so that you will keep reading Spark your curiosity just enough to see where this is all going. Hopefully I managed . If I did, the rest of the article would follow these steps: We will discuss the Black-Scholes model, its assumptions, and its differential equation We will talk about Physics Informed Neural Networks (PINNs), where they come from, and why they are helpful We will develop our algorithm that trains a PINN on Black-Scholes using Python, Torch, and OOP. We will show the results of our algorithm. I’m excited! To the lab! 1. Black Scholes Model If you are curious about the original paper of Black-Scholes, you can find it here. It’s definitely worth it Ok, so now we have to understand the Finance universe we are in, what the variables are, and what the laws are. First off, in Finance, there is a powerful tool called a call option. The call option gives you the right (not the obligation) to buy a stock at a certain price in the fixed future (let’s say a year from now), which is called the strike price. Now let’s think about it for a moment, shall we? Let’s say that today the given stock price is $100. Let us also assume that we hold a call option with a $100 strike price. Now let’s say that in one year the stock price goes to $150. That’s amazing! We can use that call option to buy the stock and then immediately resell it! We just made $150 – $150-$100 = $50 profit. On the other hand, if in one year the stock price goes down to $80, then we can’t do that. Actually, we are better off not exercising our right to buy at all, not to lose money. So now that we think about it, the idea of buying a stock and selling an option turns out to be perfectly complementary. What I mean is the randomness of the stock price (the fact that it goes up and down) can actually be mitigated by holding the right number of options. This is called delta hedging. Based on a set of assumptions, we can derive the fair option price in order to have a risk-free portfolio. I don’t want to bore you with all the details of the derivation (they are honestly not that hard to follow in the original paper), but the differential equation of the risk-free portfolio is this: Where: C is the price of the option at time t sigma is the volatility of the stock r is the risk-free rate t is time (with t=0 now and T at expiration) S is the current stock price From this equation, we can derive the fair price of the call option to have a risk-free portfolio. The equation is closed and analytical, and it looks like this: With: Where N(x) is the cumulative distribution function (CDF) of the standard normal distribution, K is the strike price, and T is the expiration time. For example, this is the plot of the Stock Price (x) vs Call Option (y), according to the Black-Scholes model. Image made by author Now this looks cool and all, but what does it have to do with Physics and PINN? It looks like the equation is analytical, so why PINN? Why AI? Why am I reading this at all? The answer is below : 2. Physics Informed Neural Networks If you are curious about Physics Informed Neural Networks, you can find out in the original paper here. Again, worth a read. Now, the equation above is analytical, but again, that is an equation of a fair price in an ideal scenario. What happens if we ignore this for a moment and try to guess the price of the option given the stock price and the time? For example, we could use a Feed Forward Neural Network and train it through backpropagation. In this training mechanism, we are minimizing the error L = |Estimated C - Real C|: Image made by author This is fine, and it is the simplest Neural Network approach you could do. The issue here is that we are completely ignoring the Black-Scholes equation. So, is there another way? Can we possibly integrate it? Of course, we can, that is, if we set the error to be L = |Estimated C - Real C|+ PDE(C,S,t) Where PDE(C,S,t) is And it needs to be as close to 0 as possible: Image made by author But the question still stands. Why is this “better” than the simple Black-Scholes? Why not just use the differential equation? Well, because sometimes, in life, solving the differential equation doesn’t guarantee you the “real” solution. Physics is usually approximating things, and it is doing that in a way that could create a difference between what we expect and what we see. That is why the PINN is an amazing and fascinating tool: you try to match the physics, but you are strict in the fact that the results have to match what you “see” from your dataset. In our case, it might be that, in order to obtain a risk-free portfolio, we find that the theoretical Black-Scholes model doesn’t fully match the noisy, biased, or imperfect market data we’re observing. Maybe the volatility isn’t constant. Maybe the market isn’t efficient. Maybe the assumptions behind the equation just don’t hold up. That is where an approach like PINN can be helpful. We not only find a solution that meets the Black-Scholes equation, but we also “trust” what we see from the data. Ok, enough with the theory. Let’s code. 3. Hands On Python Implementation The whole code, with a cool README.md, a fantastic notebook and a super clear modular code, can be found here P.S. This will be a little intense (a lot of code), and if you are not into software, feel free to skip to the next chapter. I will show the results in a more friendly way Thank you a lot for getting to this point Let’s see how we can implement this. 3.1 Config.json file The whole code can run with a very simple configuration file, which I called config.json. You can place it wherever you like, as we will see. This file is crucial, as it defines all the parameters that govern our simulation, data generation, and model training. Let me quickly walk you through what each value represents: K: the strike price — this is the price at which the option gives you the right to buy the stock in the future. T: the time to maturity, in years. So T = 1.0 means the option expires one unit (for example, one year) from now. r: the risk-free interest rate is used to discount future values. This is the interest rate we are setting in our simulation. sigma: the volatility of the stock, which quantifies how unpredictable or “risky” the stock price is. Again, a simulation parameter. N_data: the number of synthetic data points we want to generate for training. This will condition the size of the model as well. min_S and max_S: the range of stock prices we want to sample when generating synthetic data. Min and max in our stock price. bias: an optional offset added to the option prices, to simulate a systemic shift in the data. This is done to create a discrepancy between the real world and the Black-Scholes data noise_variance: the amount of noise added to the option prices to simulate measurement or market noise. This parameter is add for the same reason as before. epochs: how many iterations the model will train for. lr: the learning rate of the optimizer. This controls how fast the model updates during training. log_interval: how often (in terms of epochs) we want to print logs to monitor training progress. Each of these parameters plays a specific role, some shape the financial world we’re simulating, others control how our neural network interacts with that world. Small tweaks here can lead to very different behavior, which makes this file both powerful and delicate. Changing the values of this JSON file will radically change the output of the code. 3.2 main.py Now let’s look at how the rest of the code uses this config in practice. The main part of our code comes from main.py, train your PINN using Torch, and black_scholes.py. This is main.py: So what you can do is: Build your config.json file Run python main.py --config config.json main.py uses a lot of other files. 3.3 black_scholes.py and helpers The implementation of the model is inside black_scholes.py: This can be used to build the model, train, export, and predict. The function uses some helpers as well, like data.py, loss.py, and model.py. The torch model is inside model.py: The data builder (given the config file) is inside data.py: And the beautiful loss function that incorporates the value of is loss.py 4. Results Ok, so if we run main.py, our FFNN gets trained, and we get this. Image made by author As you notice, the model error is not quite 0, but the PDE of the model is much smaller than the data. That means that the model is (naturally) aggressively forcing our predictions to meet the differential equations. This is exactly what we said before: we optimize both in terms of the data that we have and in terms of the Black-Scholes model. We can notice, qualitatively, that there is a great match between the noisy + biased real-world (rather realistic-world lol) dataset and the PINN. Image made by author These are the results when t = 0, and the Stock price changes with the Call Option at a fixed t. Pretty cool, right? But it’s not over! You can explore the results using the code above in two ways: Playing with the multitude of parameters that you have in config.json Seeing the predictions at t>0 Have fun! 5. Conclusions Thank you so much for making it all the way through. Seriously, this was a long one Here’s what you’ve seen in this article: We started with Physics, and how its rules, written as differential equations, are fair, beautiful, and (usually) predictable. We jumped into Finance, and met the Black-Scholes model — a differential equation that aims to price options in a risk-free way. We explored Physics-Informed Neural Networks (PINNs), a type of neural network that doesn’t just fit data but respects the underlying differential equation. We implemented everything in Python, using PyTorch and a clean, modular codebase that lets you tweak parameters, generate synthetic data, and train your own PINNs to solve Black-Scholes. We visualized the results and saw how the network learned to match not only the noisy data but also the behavior expected by the Black-Scholes equation. Now, I know that digesting all of this at once is not easy. In some areas, I was necessarily short, maybe shorter than I needed to be. Nonetheless, if you want to see things in a clearer way, again, give a look at the GitHub folder. Even if you are not into software, there is a clear README.md and a simple example/BlackScholesModel.ipynb that explains the project step by step. 6. About me! Thank you again for your time. It means a lot  My name is Piero Paialunga, and I’m this guy here: I am a Ph.D. candidate at the University of Cincinnati Aerospace Engineering Department. I talk about AI, and Machine Learning in my blog posts and on LinkedIn and here on TDS. If you liked the article and want to know more about machine learning and follow my studies you can: A. Follow me on Linkedin, where I publish all my storiesB. Follow me on GitHub, where you can see all my codeC. Send me an email: piero.paialunga@hotmail.comD. Want to work with me? Check my rates and projects on Upwork! Ciao. P.S. My PhD is ending and I’m considering my next step for my career! If you like how I work and you want to hire me, don’t hesitate to reach out. The post When Physics Meets Finance: Using AI to Solve Black-Scholes appeared first on Towards Data Science.
    0 Commentarios 0 Acciones 36 Views
  • (Many) More TDS Contributors Are Now Eligible for Earning Through the Author Payment Program
    A new, more inclusive earnings tier When we launched the TDS Author Payment Program back in February, our goal was clear: it was important for us “to reward the articles that help us reach our business goals in proportion to their impact.” Since the program’s launch, however, we realized that the number of articles that crossed the initial earnings threshold (5,000 engaged views) was smaller than we’d hoped for. That wasn’t ideal. One of the main advantages of being an independent publication—and a data-focused one, at that—is that we are nimble enough to course-correct when we need to, and can make changes quickly to the benefit of our contributors. We’re thrilled to share that we’ve recently introduced a new earnings tier: articles that gain 500 engaged views can now earn a minimum payout of $100. The immediate result, and the one we care about the most, is that the number of eligible articles will increase—drastically. A lower threshold will also lead, in many cases, to a much shorter wait before authors know whether an article will earn, providing an incentive for more frequent contributions. We didn’t want to penalize those authors who took a chance on us during the program’s early days, so we’re applying this inclusive earnings tier retroactively, to all eligible articles published since the program launched on February 19. (We’ve already contacted all authors who published on TDS in February and whose articles have crossed the 500 engaged-view threshold.) All other details concerning the Author Payment Program remain the same, so if you’ve already reviewed and accepted its terms and conditions, there’s no further action you need to take. Stats are live! Earnings are just one measure of an article’s reach and impact. Since launching the new TDS site, our authors’ most-requested feature—by a wide margin!—has been access to their articles’ stats. This was always on our roadmap (as we mentioned earlier, we are a data-focused publication, after all), but the consistent feedback we’ve received from our community made it clear that we needed to prioritize it, so we did. Good news: as of today, all published authors can track their articles’ performance directly from their dashboard on the TDS Contributor Portal — just look for the Analytics tab on the left side of your screen. You’ll be able to see your total views and engaged views (reminder: the latter are views by readers who spend at least 30 seconds on an individual article), as well as the number of total and engaged views during the 30-day earning period following publication. Also visible is the estimated payout for each article given the most current engaged-view count. These stats will give you a solid snapshot of your work’s reach over time, as well as a clear idea of how close you are to crossing each earning tier. Please keep in mind that stats update once a day, so while we understand the impulse to hit the Refresh button every 3 minutes (or seconds…), you can probably find a better use of your time—like brainstorming for your next article! As our publication continues to evolve, our team is hard at work on the next set of features that will improve TDS for readers and authors alike. Stay tuned—and feel free to reach out (at publication@towardsdatascience.com) with any questions, requests, or feedback you’d like to share. The post (Many) More TDS Contributors Are Now Eligible for Earning Through the Author Payment Program appeared first on Towards Data Science.
    0 Commentarios 0 Acciones 34 Views
  • TOWARDSDATASCIENCE.COM
    Load-Testing LLMs Using LLMPerf
    Deploying your Large Language Model (LLM) is not necessarily the final step in productionizing your Generative AI application. An often forgotten, yet crucial part of the MLOPs lifecycle is properly load testing your LLM and ensuring it is ready to withstand your expected production traffic. Load testing at a high level is the practice of testing your application or in this case your model with the traffic it would be expecting in a production environment to ensure that it’s performant. In the past we’ve discussed load testing traditional ML models using open source Python tools such as Locust. Locust helps capture general performance metrics such as requests per second (RPS) and latency percentiles on a per request basis. While this is effective with more traditional APIs and ML models it doesn’t capture the full story for LLMs.  LLMs traditionally have a much lower RPS and higher latency than traditional ML models due to their size and larger compute requirements. In general the RPS metric does not really provide the most accurate picture either as requests can greatly vary depending on the input to the LLM. For instance you might have a query asking to summarize a large chunk of text and another query that might require a one-word response.  This is why tokens are seen as a much more accurate representation of an LLM’s performance. At a high level a token is a chunk of text, whenever an LLM is processing your input it “tokenizes” the input. A token differs depending specifically on the LLM you are using, but you can imagine it for instance as a word, sequence of words, or characters in essence. Image by Author What we’ll do in this article is explore how we can generate token based metrics so we can understand how your LLM is performing from a serving/deployment perspective. After this article you’ll have an idea of how you can set up a load-testing tool specifically to benchmark different LLMs in the case that you are evaluating many models or different deployment configurations or a combination of both. Let’s get hands on! If you are more of a video based learner feel free to follow my corresponding YouTube video down below: NOTE: This article assumes a basic understanding of Python, LLMs, and Amazon Bedrock/SageMaker. If you are new to Amazon Bedrock please refer to my starter guide here. If you want to learn more about SageMaker JumpStart LLM deployments refer to the video here. DISCLAIMER: I am a Machine Learning Architect at AWS and my opinions are my own. Table of Contents LLM Specific Metrics LLMPerf Intro Applying LLMPerf to Amazon Bedrock Additional Resources & Conclusion LLM-Specific Metrics As we briefly discussed in the introduction in regards to LLM hosting, token based metrics generally provide a much better representation of how your LLM is responding to different payload sizes or types of queries (summarization vs QnA).  Traditionally we have always tracked RPS and latency which we will still see here still, but more so at a token level. Here are some of the metrics to be aware of before we get started with load testing: Time to First Token: This is the duration it takes for the first token to generate. This is especially handy when streaming. For instance when using ChatGPT we start processing information when the first piece of text (token) appears. Total Output Tokens Per Second: This is the total number of tokens generated per second, you can think of this as a more granular alternative to the requests per second we traditionally track. These are the major metrics that we’ll focus on, and there’s a few others such as inter-token latency that will also be displayed as part of the load tests. Keep in mind the parameters that also influence these metrics include the expected input and output token size. We specifically play with these parameters to get an accurate understanding of how our LLM performs in response to different generation tasks.  Now let’s take a look at a tool that enables us to toggle these parameters and display the relevant metrics we need. LLMPerf Intro LLMPerf is built on top of Ray, a popular distributed computing Python framework. LLMPerf specifically leverages Ray to create distributed load tests where we can simulate real-time production level traffic.  Note that any load-testing tool is also only going to be able to generate your expected amount of traffic if the client machine it is on has enough compute power to match your expected load. For instance as you scale the concurrency or throughput expected for your model, you’d also want to scale the client machine(s) where you are running your load test. Now specifically within LLMPerf there’s a few parameters that are exposed that are tailored for LLM load testing as we’ve discussed: Model: This is the model provider and your hosted model that you’re working with. For our use-case it’ll be Amazon Bedrock and Claude 3 Sonnet specifically. LLM API: This is the API format in which the payload should be structured. We use LiteLLM which provides a standardized payload structure across different model providers, thus simplifying the setup process for us especially if we want to test different models hosted on different platforms. Input Tokens: The mean input token length, you can also specify a standard deviation for this number. Output Tokens: The mean output token length, you can also specify a standard deviation for this number. Concurrent Requests: The number of concurrent requests for the load test to simulate. Test Duration: You can control the duration of the test, this parameter is enabled in seconds. LLMPerf specifically exposes all these parameters through their token_benchmark_ray.py script which we configure with our specific values. Let’s take a look now at how we can configure this specifically for Amazon Bedrock. Applying LLMPerf to Amazon Bedrock Setup For this example we’ll be working in a SageMaker Classic Notebook Instance with a conda_python3 kernel and ml.g5.12xlarge instance. Note that you want to select an instance that has enough compute to generate the traffic load that you want to simulate. Ensure that you also have your AWS credentials for LLMPerf to access the hosted model be it on Bedrock or SageMaker. LiteLLM Configuration We first configure our LLM API structure of choice which is LiteLLM in this case. With LiteLLM there’s support across various model providers, in this case we configure the completion API to work with Amazon Bedrock: import os from litellm import completion os.environ["AWS_ACCESS_KEY_ID"] = "Enter your access key ID" os.environ["AWS_SECRET_ACCESS_KEY"] = "Enter your secret access key" os.environ["AWS_REGION_NAME"] = "us-east-1" response = completion( model="anthropic.claude-3-sonnet-20240229-v1:0", messages=[{ "content": "Who is Roger Federer?","role": "user"}] ) output = response.choices[0].message.content print(output) To work with Bedrock we configure the Model ID to point towards Claude 3 Sonnet and pass in our prompt. The neat part with LiteLLM is that messages key has a consistent format across model providers. Post-execution here we can focus on configuring LLMPerf for Bedrock specifically. LLMPerf Bedrock Integration To execute a load test with LLMPerf we can simply use the provided token_benchmark_ray.py script and pass in the following parameters that we talked of earlier: Input Tokens Mean & Standard Deviation Output Tokens Mean & Standard Deviation Max number of requests for test Duration of test Concurrent requests In this case we also specify our API format to be LiteLLM and we can execute the load test with a simple shell script like the following: %%sh python llmperf/token_benchmark_ray.py \ --model bedrock/anthropic.claude-3-sonnet-20240229-v1:0 \ --mean-input-tokens 1024 \ --stddev-input-tokens 200 \ --mean-output-tokens 1024 \ --stddev-output-tokens 200 \ --max-num-completed-requests 30 \ --num-concurrent-requests 1 \ --timeout 300 \ --llm-api litellm \ --results-dir bedrock-outputs In this case we keep the concurrency low, but feel free to toggle this number depending on what you’re expecting in production. Our test will run for 300 seconds and post duration you should see an output directory with two files representing statistics for each inference and also the mean metrics across all requests in the duration of the test. We can make this look a little neater by parsing the summary file with pandas: import json from pathlib import Path import pandas as pd # Load JSON files individual_path = Path("bedrock-outputs/bedrock-anthropic-claude-3-sonnet-20240229-v1-0_1024_1024_individual_responses.json") summary_path = Path("bedrock-outputs/bedrock-anthropic-claude-3-sonnet-20240229-v1-0_1024_1024_summary.json") with open(individual_path, "r") as f: individual_data = json.load(f) with open(summary_path, "r") as f: summary_data = json.load(f) # Print summary metrics df = pd.DataFrame(individual_data) summary_metrics = { "Model": summary_data.get("model"), "Mean Input Tokens": summary_data.get("mean_input_tokens"), "Stddev Input Tokens": summary_data.get("stddev_input_tokens"), "Mean Output Tokens": summary_data.get("mean_output_tokens"), "Stddev Output Tokens": summary_data.get("stddev_output_tokens"), "Mean TTFT (s)": summary_data.get("results_ttft_s_mean"), "Mean Inter-token Latency (s)": summary_data.get("results_inter_token_latency_s_mean"), "Mean Output Throughput (tokens/s)": summary_data.get("results_mean_output_throughput_token_per_s"), "Completed Requests": summary_data.get("results_num_completed_requests"), "Error Rate": summary_data.get("results_error_rate") } print("Claude 3 Sonnet - Performance Summary:\n") for k, v in summary_metrics.items(): print(f"{k}: {v}") The final load test results will look something like the following: Screenshot by Author As we can see we see the input parameters that we configured, and then the corresponding results with time to first token(s) and throughput in regards to mean output tokens per second. In a real-world use case you might use LLMPerf across many different model providers and run tests across these platforms. With this tool you can use it holistically to identify the right model and deployment stack for your use-case when used at scale. Additional Resources & Conclusion The entire code for the sample can be found at this associated Github repository. If you also want to work with SageMaker endpoints you can find a Llama JumpStart deployment load testing sample here. All in all load testing and evaluation are both crucial to ensuring that your LLM is performant against your expected traffic before pushing to production. In future articles we’ll cover not just the evaluation portion, but how we can create a holistic test with both components. As always thank you for reading and feel free to leave any feedback and connect with me on Linkedln and X. The post Load-Testing LLMs Using LLMPerf appeared first on Towards Data Science.
    0 Commentarios 0 Acciones 59 Views
  • TOWARDSDATASCIENCE.COM
    How to Use Gyroscope in Presentations, or Why Take a JoyCon to DPG2025
    Image by author This article explores how browser-based computational notebooks — particularly the WLJS Notebook — can transform static slides into dynamic, real-time experiences. This approach isn’t limited to presentations; you can prepare interactive lecture notes for students or colleagues and publish it on web. For data scientists, physicists, it highlights new ways to communicate models, simulations, and visualizations, making complex ideas more intuitive and engaging. Is a PDF Enough? Animations, bells and whistles, especially the kind that were popular in PowerPoint 15–20 years ago, have largely taken a backseat. Add to this the compatibility issues between LibreOffice and MS Office (even between versions for Windows and Mac), the presence or absence of necessary fonts — and the desire to do something unusual on the “stage” fades away quickly. Have a look at modern technical presentations: quite often, it’s just a PDF document consisting of pages with vector and raster graphics, and sometimes GIF animations that eat up megabytes (like this post), with no mercy. Unused Potential It’s worth separating decorative bells and whistles from those that carry additional information in some media format. For example, take a look at ECMA-363 [1] specification. Convert MATLAB Figure to 3D PDF / Image by Ioannis F. Filippidis (fig2u3d manual), BSD-2-Clause A 3D model inside a PDF document simply enhances the user/viewer experience. You observe the object from different angles/cross-sections. It’s disappointing that such a feature is almost nowhere supported except for Adobe Acrobat and likely will not be. It feels like we made a leap in the past, but now we have returned to static slides. Large Scientific Conference DPG DPG-Frühjahrstagung  is a large European physics conference organized by the German Physical Society (DPG) [2]. Every year, they gather more than 10^4 scientists and take place in German cities, covering a vast array of physics fields. Deutsche Physikalische Gesellschaft / Image by Wikimedia, PD-textlogo DPG2025 (Spring Meeting) took place in the wonderful city of Regensburg / Photo by Tobi &Chris, Pexels License There are so many presentations, and it lasts almost a week, so by the end, it becomes too overwhelming. Nevertheless, this does not diminish its value as a platform for networking, practicing presentations, and a reliable way to learn what is currently on the market, which trains have gone already, and which are just departing. The participants in plenary sessions are mostly Master’s students and PhD students, with postdocs being rarer. Such a large and accessible platform is an excellent motivation to try something new even if something may go wrong. What is a JoyCon? Surely, the reader has seen devices like this: An average PPT clicker device / AI Generated “PPT Clicker” image using Dalle 3 by OpenAI This device acts as a slide switcher and sometimes as a laser pointer, connecting via Bluetooth or through a dongle. In any case, it is a type of controller with buttons. Controllers can be more interesting — like the one from the 2017 Nintendo Switch handheld console JoyCon (R) / Image adapted Wikimedia, PD It is not much bigger, but it has some additional cool features: Analog stick 11 buttons IR camera (difficult to use, no good API documentation) Full IMU (Inertial Measurement Unit) aka gyroscope with an accelerometer Bluetooth connectivity; recognized as a regular HID The buttons can indeed be mapped to PowerPoint, or the stick can be used to control slides, emulating mouse or keyboard clicks, as it was implemented in these projects: Hackster: Right Joy-con Controller as a Remote (Python) [3] Medium: Nintendo Switch Joy-Con Presentation Remote (USB Override MacOS) [4] I thought it would be cool to somehow use the IMU and analog stick. But for that, one would need to go beyond PowerPoint and PDF Moving Slides to Browser Environment The idea behind this is not new, but it’s important to remember that this approach may not work for everyone. However, by moving the presentation display and creation to the browser (particularly Javascript and HTML lands), we automatically gain access to all the possibilities of modern web technology: peripheral device support, JavaScript, CSS animation magic, and much more, including video. It’s important to note that all of this is cross-platform by default and will work almost everywhere. For example, slides can be created in Markdown (or/and HTML) with the help of a simple framework (rather, a small library)  RevealJS [5] There is also MDX-based presentation engines, and things like Manim [6], Motion Canvas [7], but these guys require even more skills to master. The RevealJS API is quite simple, so controlling the slides via JavaScript commands is easy to implement: setTimeout(() =>{ Reveal.navigateNext(1); }, 1000) However, this direct approach has significant drawbacks. It requires an internet connection, and if you’d prefer to avoid it, you’ll need to use bundlers (such as Rollup) and embed all JavaScript libraries into a single HTML file, for instance. Alternatively, you could run a local web server. Option with Jupyter Notebook If you like Python and IPYNB, then use nbconvert — it will convert your notebook directly into a RevealJS presentation, and you won’t even notice it! Or use the extension for Jupyter—RISE [8] Create Presentation from Jupyter Notebook / Image by author In any case, the idea remains simple — we need to somehow enter the web browser environment to take advantage of all the possibilities of JoyCon. Try it on Binder! Option with WLJS Notebook My opinion on WLJS [9] might be somewhat biased as I am one of its developers (and active users). This open-source IDE with a notebook interface is more tightly integrated with the web environment, as slides are not exported there but are instead executed and are just another type of output cell, alongside the familiar Markdown. WLJS Notebook / Image by author Under the hood, it also uses RevealJS but with a few extra features: It works offline It allows embedding interactive elements and components, similar to LaTeX Beamer It is integrated with Wolfram Language (freeware distribution) See more about it in this story [10]. An ultimate guide on how to make presentation there we published in our official blog: Dynamic Presentation, or How to Code a Slide with Markdown and WL [11]. Let’s Dive into JoyCon So, the easiest option is to use the already ready-made library joy-con-webhid [12]. Why spend time reinventing the wheel when people have already done a great job for us? npm install joy-con-webhid --prefix . All subsequent examples will be taken from the WLJS Notebook. However, you can do pretty much the same thing using Python + FAST API to interface with JavaScript or something similar, or even just use JS alone. The online version of the notebook is available here [13]. First, let’s listen to what’s coming from the controller port. Code .esm import { connectJoyCon, connectedJoyCons } from 'joy-con-webhid'; // Create connect button const connectButton = document.createElement('button'); connectButton.className = 'relative cursor-pointer rounded-md h-6 pl-3 pr-2 text-left text-gray-500 focus:outline-none ring-1 sm:text-xs sm:leading-6 bg-gray-100'; connectButton.innerText = "Connect"; let connectionState = "Connect"; let isJoyConConnected = false; let lastUpdateTime = performance.now(); let isAllowedToConnect = false; // main handler function (warning! called at 60FPS) function handleJoyConInput(detail) { const currentTime = performance.now(); if (currentTime - lastUpdateTime > 50) { // slow down lastUpdateTime = currentTime; console.log(detail); } } // JoyCon periodically goes to sleep, we need to wake it up const connectionCheckInterval = setInterval(async () => { if (!isAllowedToConnect) return; const connectedDevices = connectedJoyCons.values(); isJoyConConnected = false; for (const joyCon of connectedDevices) { isJoyConConnected = true; if (joyCon.eventListenerAttached) continue; await joyCon.open(); await joyCon.enableStandardFullMode(); await joyCon.enableIMUMode(); await joyCon.enableVibration(); await joyCon.rumble(600, 600, 0.5); joyCon.addEventListener('hidinput', ({ detail }) => handleJoyConInput(detail)); joyCon.eventListenerAttached = true; } updateConnectionState(); }, 2000); // Update button state function updateConnectionState() { if (isJoyConConnected && connectionState !== "Connected") { connectionState = "Connected"; connectButton.innerText = connectionState; connectButton.style.background = '#d8ffd8'; } else if (!isJoyConConnected && connectionState !== "Connect") { connectionState = "Connect"; connectButton.innerText = connectionState; connectButton.style.background = ''; } } // Handle click event connectButton.addEventListener('click', async () => { isAllowedToConnect = true; if (!isJoyConConnected) { await connectJoyCon(); } }); // Just decorations const container = document.createElement('div'); container.innerHTML = `<small>Presenter controller</small>`; container.appendChild(connectButton); container.className = 'flex flex-col gap-y-2 bg-white rounded-md shadow-md'; // Return DOM element to the page this.return(container); // When a cell got removed this.ondestroy(() => { cancelInterval(connectionCheckInterval); }); The most important function here is: function handleJoyConInput(detail) { const currentTime = performance.now(); if (currentTime - lastUpdateTime > 50) { // slow down lastUpdateTime = currentTime; console.log(detail); //output to the console } } It looks like there are many steps to do. In reality, most of this code deals with connecting the controller and drawing a large “Connect” button. Don’t pay too much attention to the special methods — they can easily be replaced with those available in your specific environment: this.return(dom) passes a DOMElement for embedding on the page this.ondestroy(function) calls function when the cell is deleted, to clean up timers, etc. The first line .esm is a way to specify the JavaScript cell subtype in WLJS Notebook, which requires pre-bundling. When we run this code cell, we will see the following: DOM Output Element / Image by author Then follow these steps: Disconnect the controller from the Nintendo Switch (System → Controllers → Disconnect). Pair the JoyCon (R) with the PC by holding the small button on the side. Press “Connect” on our presenter controller. Opening the browser console, we reveal the following messages: { "buttonStatus": { "y": false, "x": false, "b": false, "a": false, "r": false, "zr": false, "sr": false, "sl": false, "plus": false, "rightStick": false, "home": false, }, "analogStickRight": { "horizontal": "0.1", "vertical": "0.3" }, "actualAccelerometer": { "x": 0, "y": 0, "z": 0 }, "actualGyroscope": { "dps": { "x": 0, "y": 0, "z": 0 }, "rps": { "x": 0, "y": 0, "z": 0 } } } Quite a lot of data! Let’s try using this for the benefit of our presentation Buttons To begin, we can use two buttons to switch slides Image by author In the WLJS notebook, slides can also be controlled programmatically through a Wolfram wrapper function that calls the RevealJS API. FrontSlidesSelected["navigateNext", 1] // FrontSubmit All that’s left is to trigger this function at the right moment when the button (or switch) is clicked. To do this, events need to be sent from the Javascript world to the Wolfram machine, where we can then do whatever we want with them. This results in the following diagram: Image by author You don’t have to think about this, since it is seamlessly implemented via APIs Let’s go back to the code cell and modify the handler. Code //.... //....... const buttonStates = { //all buttons states on JoyCon (R) a: false, b: false, home: false, plus: false, r: false, sl: false, sr: false, x: false, y: false, zr: false }; const joystickPosition = [0.0, 0.0]; let restingJoystick = [0.0, 0.0]; let isCalibrated = false; function handleJoyConInput(detail) { if (!isCalibrated) { //calibration restingJoystick = [Number(detail.analogStickRight.horizontal), Number(detail.analogStickRight.vertical)]; isCalibrated = true; return; } const currentTime = performance.now(); if (currentTime - lastUpdateTime > 50) { lastUpdateTime = currentTime; let buttonPressed = false; let joystickMoved = false; for (const key of Object.keys(buttonStates)) { if (!buttonStates[key] && detail.buttonStatus[key]) buttonPressed = true; buttonStates[key] = detail.buttonStatus[key]; } const verticalOffset = Number(detail.analogStickRight.vertical) - restingJoystick[1]; const horizontalOffset = Number(detail.analogStickRight.horizontal) - restingJoystick[0]; if (Math.abs(verticalOffset) > 0.1 || Math.abs(horizontalOffset) > 0.1) { joystickMoved = true; } joystickPosition[0] = horizontalOffset; joystickPosition[1] = -verticalOffset; if (buttonPressed) { for (const key of Object.keys(buttonStates)) { if (buttonStates[key]) { server.kernel.io.fire('JoyCon', true, key); break; } } } if (joystickMoved) { server.kernel.io.fire('JoyCon', joystickPosition, 'Stick'); } } } //....... //.. As you can see, we have added several items here: Joystick calibration — analog sticks drift, so their digital position is never perfect 0.,0.. State of all buttons — why hammer the door every time if you only need to gently knock when the state changes? This reduces system stress. Sending states to the event pool — this is specific to WLJS, where we send data to the Wolfram machine (or Python if you’re in Jupyter). The last point looks like this (replace with the equivalent in your environment): server.kernel.io.fire(String name, Object state, String pattern); Then, on the Wolfram side, we can easily subscribe to these events like this EventHandler["name", { "pattern" -> Function[state, Print[state]; ] }] This is very convenient, as Javascript sends the names of the pressed buttons as the pattern. In this case, you can immediately subscribe to slide switching, for example, like this: ZR — next slide Y — back Thus, programmatically controlling slides becomes intuitive: EventHandler["JoyCon", { "zr" -> (FrontSubmit[FrontSlidesSelected["navigateNext", 1]]&), "y" -> (FrontSubmit[FrontSlidesSelected["navigatePrev", 1]]&) }]; Let’s Test in Practice Let’s create a simple presentation. Start typing with .slide # Slide 1 __Hey Medium!__ --- ![](https://upload.wikimedia.org/wikipedia/en/7/7d/Lenna_%28test_image%29.png) Now, let’s connect the JoyCon to the PC and link it to our Javascript script by pressing the Connect button. Then, subscribe to the events once in the active session. Now, just run the cell with the slides: The first big step in mastering JoyCon has been made! / Image by author Analog Stick  The stick theoretically allows controlling two sliders simultaneously. For the DPG Spring Meetings, I had the idea of a live demonstration of a very peculiar effect m𝒶𝑔𝒾c𝑎𝓁 𝓌𝑜𝓇𝒹𝓈 𝒻𝓇𝑜𝓂 𝓅𝒽𝓎𝓈𝒾𝒸𝓈. I believe that some concepts are much more impactful and comprehensible when demonstrated live on stage. Here is a condensed code snippet for the interactive widget: FaradayWidget := ManipulatePlot[ Abs[(E^(I w (-1 + Sqrt[1 + (f/((-I g - w) w + (d - w0)^2))])) + E^(I w (-1 + Sqrt[1 + (f/((-I g - w) w + (d + w0)^2))]))) /. {g -> 0.694, w0 -> 50.0}] , {w, 20, 80}, {{f,2},0,100,1}, {{d,0},0,10,1} , FrameLabel->{"wavenumber", "transmission"} , Frame->True ]; FaradayWidget The interactive online version of this widget is available here [14] Image by author To embed it in a slide, insert its symbol as a tag (similar to JSX): .slide # Faraday Widget Here it is in action <FaradayWidget/> Now, let’s link it to our stick Image by author To begin with, let’s perform a simple test and bind its position to a disk on the screen: pos = {0.,0.}; EventHandler["JoyCon", {"Stick" -> ((pos = #)&)}]; Graphics[{ Circle[{0,0}, 2.], Disk[pos // Offload, 0.1] }] Image by author Obviously, the movements are too abrupt. Moreover, making small adjustments is kinda painful using JoyCon. The solution? Integration! EventHandler["JoyCon", {"Stick" -> ((pos += 0.1 #)&)}]; Image by author Now, let’s link the pos variable to the sliders of our widget: FaradayWidget := ManipulatePlot[ Abs[(E^(I w (-1 + Sqrt[1 + (f/((-I g - w) w + (d - w0)^2))])) + E^(I w (-1 + Sqrt[1 + (f/((-I g - w) w + (d + w0)^2))]))) /. {g -> 0.694, w0 -> 50.0}] , {w, 20, 80}, {{f,2},0,100,1}, {{d,0},0,10,1} , FrameLabel->{"wavenumber", "transmission"} , Frame->True , "TrackedExpression" -> Offload[5 pos] (* <-- *) ]; Here’s how it looks live on a slide: Image by author And in the actual DPG presentation: Image by author A Moment of Rest Last year, DPG took place in Berlin, and this year — in Regensburg, which has about 23 times fewer population and is 10 times smaller in area. However, the cozy lands of Bavaria have always been closer to my Image by author And this is the university. A solid 60s-style building. Wha Image by author A new invention — a cup “Drink and Eat Me” Image by author As a bonus, every drink gets a hint of waffle flavor! But, watch out — don’t bite into it while it’s filled with hot tea. I couldn’t take more photos since I got sick on the first day and went back home to Augsburg. In general, spending six days at a conference is quite challenging. Image by author Back to business IMU or Gyroscope-Accelerometer Combination To use them, we need to read the corresponding fields from the details object, namely: actualAccelerometer: x, y, z actualGyroscope: rps (radians per second) Code //.. //.... const buttonStates = { a: false, b: false, home: false, plus: false, r: false, sl: false, sr: false, x: false, y: false, zr: false }; const joystickPosition = [0.0, 0.0]; let restingJoystick = [0.0, 0.0]; let isCalibrated = false; let imuEnabled = false; // Enable IMU mode if allowed core.JoyConIMU = async (args, env) => { imuEnabled = await interpretate(args[0], env); }; // Function to handle Joy-Con input function handleJoyConInput(detail) { if (!isCalibrated) { restingJoystick = [Number(detail.analogStickRight.horizontal), Number(detail.analogStickRight.vertical)]; isCalibrated = true; return; } const currentTime = performance.now(); if (currentTime - lastUpdateTime > 50) { // Update every 50ms lastUpdateTime = currentTime; let buttonPressed = false; let joystickMoved = false; for (const key of Object.keys(buttonStates)) { if (!buttonStates[key] && detail.buttonStatus[key]) buttonPressed = true; buttonStates[key] = detail.buttonStatus[key]; } const verticalOffset = Number(detail.analogStickRight.vertical) - restingJoystick[1]; const horizontalOffset = Number(detail.analogStickRight.horizontal) - restingJoystick[0]; if (Math.abs(verticalOffset) > 0.1 || Math.abs(horizontalOffset) > 0.1) { joystickMoved = true; } joystickPosition[0] = horizontalOffset; joystickPosition[1] = -verticalOffset; if (imuEnabled) { server.kernel.io.fire('JoyCon', { 'Accelerometer': Object.values(detail.actualAccelerometer), 'Gyroscope': Object.values(detail.actualGyroscope.dps) }, 'IMU'); } if (buttonPressed) { for (const key of Object.keys(buttonStates)) { if (buttonStates[key]) { server.kernel.io.fire('JoyCon', true, key); break; } } } if (joystickMoved) { server.kernel.io.fire('JoyCon', joystickPosition, 'Stick'); } } } //.... //.. Since IMU is not always needed, the script includes a boolean variable and a control function JoyConIMU[True | False], allowing IMU measurements to be enabled or disabled. The JoyCon, like most other devices with IMU (some smartphones, watches, but definitely not VR headsets or quadcopters), includes: 3-axis gyroscope — returns angular velocity in rad/sec around all three axes 3-axis accelerometer — returns a single acceleration vector Question: Why can’t we use only a gyroscope or an accelerometer?Let’s try outputting both. First, enable IMU usage: JoyConIMU[True] // FrontSubmit; Now, define auxiliary functions and variables: prevTime = AbsoluteTime[]; angles = {0,0,0}; acceleration = {0,0,-1}; process[imu_] := With[{time = AbsoluteTime[]}, With[{dt = time - prevTime}, angles = (angles + {-1,1,1} imu["Gyroscope"][[{3,1,2}]] dt); acceleration = imu["Accelerometer"]; prevTime = time; ] ] What happens here: The accelerometer vector is simply stored in acceleration. Gyroscope data is processed by: Reordering angular velocity values (JoyCon hardware orientation) and adjusting directions. Integrating over time to obtain orientation angles As a result, we obtain: Three angles defining JoyCon orientation angles. One acceleration vector (at rest — the gravity direction) acceleration. These three angles are conveniently expressed as a matrix (tensor): RollPitchYawMatrix[{\[Alpha], \[Beta], \[Gamma]}] // MatrixForm Applying this matrix to any 3D object allows it to be oriented according to these angles. Physically, on the JoyCon, it looks like this: Image by author It is important to note that since we measure only the first derivative (using Gyro), then the initial IMU orientation remains unknown. Therefore, we manually set the initial state, i.e. angles = {0., 0., 0} EventHandler["JoyCon", { "IMU" -> Function[val, process[val]; ] }]; angles = {0,0,0}; (* calibration *) Refresh[acceleration, 0.25] (* dynamically update *) Refresh[angles, 0.25] (* dynamically update *) Real-time Data Output: Image by author Well… Not quite obvious what these values mean. Let’s try to draw then as vectors in 3D space: axis = Table[{{0.,0.,0.}, Table[1.0 KroneckerDelta[i, j], {i,3}]}, {j,3}]; EventHandler["JoyCon", { "IMU" -> Function[val, process[val]; axis[[1]] = {{0.,0.,0.}, RollPitchYawMatrix[angles].{0,1.0,0.0}}; axis[[2]] = {{0.,0.,0.}, RollPitchYawMatrix[angles].{-1.0,0.0,0}}; axis[[3]] = {{0.,0.,0.}, -Normalize[acceleration][[{2,1,3}]]}; axis = axis; ] }]; And then render them as colored cones, where: Blue and red — defines angles derived from the gyroscope data Green — accelerometer data (inverted and normalized) { {Opacity[0.2], Sphere[]}, Red, Tube[axis[[1]]//Offload, {0.2, 0.01}], Blue, Tube[axis[[2]]//Offload, {0.2, 0.01}], Green, Tube[axis[[3]]//Offload, {0.2, 0.01}] } // Graphics3D EventHandler[InputButton["Reset"], Function[Null, angles *= .0]] Image by author The green vector is always aligned “correctly,” while the blue and red vectors, representing gyroscope angles, accumulate errors over time, especially with rapid movements, causing drift. There are many ways to solve this issue. The general idea is to adjust the angles using accelerometer data (green vector), as the accelerometer precisely determines the downward direction ( until an external force disturbs it). For a more detailed explanation, check out a great video by James Lambert [15], which explores these problems and their solutions, including a detailed example with Oculus DK1. Why the heck do we need this in the presentation? I asked myself this question when I discovered how deep the rabbit hole goes. In my talk at the magnetism session, there was exactly one slide where the idea of an IMU made any sense: Image by author Do you see the crystalline structure? Finding a “good” camera angle for it is indeed difficult, so why not rotate it directly, lively? We don’t need all three angles and acceleration-just one will suffice. Thus, we discard the accelerometer and keep only the gyroscope: FrontSubmit[JoyConIMU[True]]; timestamp = AbsoluteTime[]; angle = 0.; rotation = RotationMatrix[angle, {0,0,1.0}]; EventHandler["JoyCon", { "IMU" -> Function[val, With[{angularSpeed = val["Gyroscope"][[1]], time = AbsoluteTime[], oldAngle = angle}, angle += (time - timestamp) angularSpeed; timestamp = time; ]; rotation = RotationMatrix[angle, {0,0,1.0}]; ] }]; Now, we just need to apply the rotation transformation tensor to our 3D structure. Since this crystal contains many ions, which are also colored, I compressed them into base64 base64 code CrystalRawStructure = "1:eJzVWnlMFFccnl1YLIoXh2gVq2KMqbGRqvEMswoVW1EBUWukxRV2ZevC4lsQtZUSL9Bg6oGRo/GCVLBGStRKFZlRUePZKBW1GsEjrYgVNSiXYudg3rAzuzsPZAd5fwwzyze/73fPmzdv8GJjiE6JYZjJmTqEAk2MyaCJ0+oc6J8cqUOg3hSnU9BXXanDFACMCVFaTaRJ34+6tARj5ETpI5bGaE0mvTuNUvCoGK2ut9k9ZhL01F+MOQCMGRW4OQDs/1fvl3bPgwRZ0XPzD6mdSXRg48W1Ne63vdUCINY8+BPLQGndoIBMhrKfGh1od91a77d0FmjVCDHQihHIkUCW2Hr/SUXEhtVMcXhShwCtMVobB/QRTJXojCBaE6c3xuhcMIs1EGKMXxLF1AAtiP2dhoZodQZtRJx+uT5uJcjMoEclztw/K95gsFRR5lddqMPsWE0Ef/sfuG60GU6JiSrsI5o5YKqf0WAEYPrdiEo/zwocZGiyD5YHluEgYcl/v3ebWd1srRN1mBMbpQWCPgBSysflnx3+lgAZnJuCQjbsvKLvKfSXPcjXMeTdSfCOGUQR2NNDNW3Z4EqCJ1e8J7k99HY5rzh3hnba08vY/N9op2km144YknsL/7D1Plt1Pfsc6++er46uulgMTvv6ee6McG9FsEmCHtU4+JgZT3AQzoxyKfLYqQUTPrv2lABT1uScv1xTh4N5lxRjfQYOUIPRzHhoR/LndT0Sh197Q+dVEG25LziCvUkav0kpB3lh8IAqo78rCb7MzA/YnFpPyEkew7m960zD6u8d3NRyur0s5HJAWiiVcNVcochILnQ73ga3s2XxHAeHAu6vcj90E+cbtG3y3b2O92pwpsjz8uqvbqitxcFtt7WBDyb2VYO5zLhhR/L4WX73ir5zIUEV5/YV3m/Dxya/wnlyqf4kRW4PvScWJu64lvOMAMlJ7ocn7nLtLHrDYL8Ytj96c20tIWewjygGB9/MaYKtrVhO8vFcxLb6FPxUtrEB7yQRW3PAJ9U70okEHlHdMh0BFbHQw70jbzS4yOE0SL5t+/WUkZFUS5aRPGlhXd/KArLFhG/62SrnvJgXRIeQzwmZVor/OkgWyxm394GzzckWLP8Qc/Xi0H0NjgtUJPAy7S75umcjLmfExOQyRkxYpbichYI1D/5k0LMvTvqPfC2H28Xk47vfCTO4yeL2XO7VOZN7vZfRcjG5jJZDb8PVhY6IOSSX0fJfBG6X1XIxeUfEHJ483hW5KTmra8eUmozk4oSTkVwccxnJxQs4dRmNd48DTxlnYDg/CZLxqXaJeZ57tFhyk5FcOJkgOnImQ8g5k4HkpVtGJt76wVXWWbeYXEbLxWuOzyeV792pe9k+b4iOZrdaWU6/gLDmzwB60gCgiY3SR5j8jNGxBu0KAYO5cXVP5l1X4tUE6MPNjitqc7JOJZQS1oAw+awAMYxM1mwv9qVORrELleoz9/2DapuE3zdYie9wvotDiRKx1NNe4r8vWTLQzPvzmsMkPrHwedCkogVpwBKtzkkcGvYqRmuuop4+0dM/2UTRV4zuNlGMcY5SKAckFCPLAUkvBqVCymMqVE+DCweur7rP5XElzmdukNGwcgmVj+ZfA6GHWjjAOhpah4SG+qPLVkL3oX5NYn1GOyhUH6012bUe3xHNT7c/J1srs8dsPeLgE4nChcD+koU7KIUq3GKauobpc7BwO1098hktWUNo9ShZ22gdoLW1bb96bF3N2LN6HaEmSnE9Wv/A+h4FaJ5H5oT3bqx7ffRgI/f9liwCZG5T3oLjz4SPLxZYhQP/qKCBm+jvI1aAFRwwgAUS1oA/n3iwb4NrI849OYvBorTKtIL0WmGl9lKt/zbVtYoAJ4YtruxCr/1KAePyjnyloJefIPADKWkLGxVsFUV7PmLR2oNksaIVvgqJkdFLhSRLUi8nJL0Y7Z2QZEl6VQVlOUtkC9Y8Wt+2+Id36yYGjghoGCP0JqdClQ2jgYRWcp5E0gT6HV0TFZQttazA7QuxU7t9xDZHAszgumih5eYIgbO5LmoFWMY2cOa1Qo1hFUXWgC5sc8SBiW2OVrsoBB5j2y1uDbiPbeAEN4Py7cTtthWNFK19oLXu9pyzoTVSSe0dkGQJattWu5W00QldFqOX/dotzIIWbrKOhjPlFiZKoFWosmHHQpINo4GE5me3KJpAvyPJhk8IVrbU0hK3nc3SvmV6szObA+bbnsEYZjNQaYu9e+IGg0H/KaAYmDK2rgS9cH3uoLQN3dUAu/PX6ayDSQRQ7EkIWj6qNykJvGrw9/J760k2Z+Ep4WulLdr0T/ttW/zoLQ7ipvadsTDYmQSLkpXZhVtqhC1YDAzMP61+svFvnKeV7L5tdc70rX4eOxibU+I9aw5QNif18A5KCD8p1FIMnKH0KZ70TT3eFuf0cn2Yu4i2uWTvmZdhtM1X3audUx2chTERA8fML7m2bZabMCYW8+shDpcxGZHircG8C9lur4Q5DPfwW3iFlyvJGTIVJ5T/Z2j8Yq18Od9mLTqyBNqstDjRHXaM8xpS/o90RSRWDx0x4ZWiPVwnznvF0K27wvsPkC4QFtiHFLgO3hGQmL1tz2kPEhzbsnTz57HUhNk9P/DFxXov4bb/KVWrS2aGqdRgWdOVA54X6gkw78eDYekZ3cj/AdfNQvU=" // Uncompress ; CrystalStructure = Graphics3D[ GeometricTransformation[CrystalRawStructure, rotation // Offload] , ViewPoint->3.5{1.0,0.5,0.5} , ImageSize->{550,600} ] Let’s embed it into our slide: .slide # Slide Here is my crystal structure! <CrystalStructure/> Image by author To make it even more convenient, it would be useful to subscribe to IMU only when the slide is active and unsubscribe when leaving it. This is easy to do because RevealJS signals the core on slide state change events. Let’s implement this as a component using .wlx cell type: .wlx InteractiveCrystalStructure := Module[{rotation = RotationMatrix[1Degree, {0,0,1.0}], id = CreateUUID[], timestamp = AbsoluteTime[], angle = 0., CrystalStructure}, CrystalStructure = Graphics3D[ GeometricTransformation[CrystalRawStructure, rotation // Offload] , ViewPoint->3.5{1.0,0.5,0.5} , ImageSize->{550,600} ]; EventHandler[id, { "Slide" -> Function[Null, FrontSubmit[JoyConIMU[True]]; EventHandler["JoyCon", { "IMU" -> Function[val, With[{angularSpeed = val["Gyroscope"][[1]], time = AbsoluteTime[], oldAngle = angle}, angle += (time - timestamp) angularSpeed; timestamp = time; ]; rotation = RotationMatrix[angle, {0,0,1.0}]; ] }]; ], ("Destroy" | "Left") -> Function[Null, FrontSubmit[JoyConIMU[False]]; ] }]; <div> <CrystalStructure/> <SlideEventListener Id={id}/> </div> ] By placing this code on any slide, we achieve the desired result without polluting the global space or interfering with other event handlers. Thus, IMU subscription management is localized to the specific slide, and when switching slides, we correctly enable and disable data handling without affecting other slides and their processing .slide # Before --- # Slide Here is my crystal structure! <InteractiveCrystalStructure/> --- # After These are actual slides from DPG2025: Image by author Short video with my DPG2025 slides Final Code and Notebook The compiled presenter controller cell code is provided under the spoiler. If inserted into an empty cell, it will produce a functional widget for connecting a JoyCon. Compressed cells jsfc4uri1%3AeJztfWtz28aWoGtmdqdq99NU7Q9oa3ZiUiJpAqRkmYqccWTnxlNx4vXjfvGqHJBoSkhAgAJAiaSj%2FRf7af%2FsnnO6G%2BhuNEjKj3vvVN1UTAHd531Ovx%2B4P05fT%2F%2Fh3r17%2BT%2FBz09RXkz%2FEd%2F%2BO%2Fw8zfN0EgVFlCYVyOtFzN%2Fgw7OgCN78v3%2B5d6%2FH89n%2F7nt%2BNJunWcE%2BskmaJHxS%2FEe6OkuTjnrloUjI2S2bZumMPfgtXXUhs3vDx5dR%2BOAEieC%2Fhw%2FZWcaDgrMgCVlerGLOikuuCIE4bLwoijRBYEjMC5X1PSWzUxamk8WMJ0VvQoSexxzfWg8E3oP2iUStkHqTOMjzn4MZB%2FQHGY9B72vgucjyNOvO0ygpeMaydJGEPOzOQnbZPWLzuDtg86zrs4Ivi27Mp4V4usiCVfew32dTECQfpYsijhLeTdKEsyxKLroey2cjAl3m%2BBjzIMT0Iza%2BENhev%2F%2FAIWYEL9lbQAQx985E1l5pupgXmpneFGhEGw5holw440z5BqCmQZzzEgKsUbybh0DgbUQ2mfNsmmazIJnwXpLetNolU%2BECYVpimQP4R8xhLBgJuh02Lp8u0xkvX%2BYx2Ee9ZOVTHlePZaoguSxzVuXTWsEgyK0lGcRZXkST31%2BleUTRc8re93v9DoOf81LhjANQcvEfEtgNFOVnQRyNs8Blsmi2eJ4E49iVlT%2BN4%2FSGh29TaXIDREa9wGYvXr5jszTkLJqyQKAJZTLeE25DiFMW5KtkwlpBdpF3GE%2Bu2%2Bz0ibK7IUtwE0QgA0bwPOPoIUJ63z8XeCem1UCSHxaJKGhFyi6hFIJUwLkLrIHMfFEg2FTBCAApGea2QmASxe1SmClr3ddNV%2BYwl9l%2FXszGPJNEekESxOnFG8x8HV1cFr3LNIvWaVIEcbvDtsBe8wxeAFK4EBlaLiyyBT%2BphCkWWSJfb5U9GJOVzCLLoBbZWB6Esjpk1y5KT9hhHwzAwMwilXEQcwWps1wJUit9GsWTSi4A5KrkvQJL5lboKQhVAl6m1xaEggJVQGxS83e%2BYumU%2FTL%2BDcK0B295Sy%2Fbbc150rV69ntAOGfffMOkS6q8hchr1%2BQ1XMBYndppI7ES7VZXRuihfP%2FLdJpzLG%2B7xgq4zIrK917FSRCvgnBX8lrYOhjIOqYMUjDry6C47AXjvGUq0oYA6vc89scfrISwpZEwhqfsGDCsbpjPri9BOIC3eZw0gnsI3jWlrmlXVVCGlDnPABGiLkt43IvS3jTKeOuBqFwedHRQxh48nUx4zKEzwaFqezBSMXsdxAuel56YFIsgNkDbHYPMn1ZZmk%2FSOd9MogTrhfO8XVG47bAHUCM%2FaLvNidoaIW8o%2FKkFT6erlRULZrtBMQw6yLp9YiKOoef0u5F2q6ncqKkRZndyrR1HYFUqPrZd5Z%2BSM9Sjr3gWpSFGW7xik0sO7Qjatex5qsYrr%2FUYgc8Zwr%2FAxhGcDpEL0areWrKJNVpW0ZjZ7XlbbzyMVkMJ8YxfRxPqG9k9YhVtZROysXcmYLS4%2BY2AMXRsXpr9XTSNKkA6D%2FJ70B4lBY4FOPQ1nxZFACYFXwJ1qLMWZsMh%2BhYSEcpG0qrcZeRxKu0QqkkYZOEPizh%2BCX2czdBQqrYD%2FZkac3BkE1gGFXPMW0d96M3RT7932DaUkIBBGD7XVW89gHEJ9XcgPFsfZSvEbikcGns97ROLrNOcpvm1YrSgZv%2FM7MRL1aCm8fv9fiV81YcQNQF794KNA2xW4VkbL%2BVIxOi0NbDRo7weMtCs26OL%2B6fV%2BIKHe1rENY9DAO7EgnKMbiz8BgwaHvbGweT3Cxqc4ejtX8Pj6TQ8fqCMyziUHVVw76jTbhp9XX0e6FGi13s%2Fiq659P0kxg40RVt93FiPbQKHwHbVca7hihaublNqlhLlz5gIKEun0MIePmLlEsAYOdsweg%2Bj62roLqCFhX98%2B%2FInwPv123wGTcATbGc5jdcRLktjaPi%2FfSjyfrXwgznUWeHZZRSHLcNiNiNjfmAa8yXDn%2B4kjdlFMO%2Buuj4O3W8uIwgMbZogvwzC9AaeqvmN4jLKe6K1aJX022Z2Cvg5iL5qGY6Z4HAjLpunhkZMVhbtE8e0zdvVnL%2F5L%2FBAdZYF8M84rxPl8zhYvflv8DyByncWZVma3WlW6H%2F833v3IDoTwP4AA1nIyx%2Bakz0PQwjDh7MgvLiBIOz9lht100uZ3sqD2RxaAalWh6VzzK%2BaNfkOHlFP0Cv%2BeHuit7%2BCxg8ZvwIwjw%2FYQ2aSldA0koIavCLWo1eg2O8NNaAoiaATHImxiwIN0xeQHGF6IOYYTkWJYd%2BJppuN9AKEdK76KFCHXXnwF1qlK1%2F%2BHeBfDTDjk2j%2BxlCjVAITJGhpPmXWpz%2B%2BfiNaB2hHWxfLDrtYwb81FHl4DuA5WGtltmT1MwxrjeFjDlLlIGnuw7%2BBkXX1LC1QB%2Fjjiz8D8WdogF37V0ADfj369ekXIK%2BHlD6k9CGlH9PzMT5f9TH3ysOUK8K6GlxV%2FIk32qp3yPZZqwuG3GcXSxhdgSnhaYVPA3xatw0kv0ICJxDOgcRZlzgrE2dg4xB1T%2BAcSJyliTO0cQjSK7F9C4eq1VawpODpY8sUrLTntXg2xwGlx4AXou6Dc4ELIO7jzwGi7ZOj9%2FdZF6TRuvQIfurwOSP0hpx1Uw76GIRApa76Zrqn0j0z3Vfpvpk%2BUOkDPX1I9Ic1%2BkOiP6zRHxL9YY3%2BMcEf1%2BCPCf7YhscohHTyoMEYAxMzPJvSldDsqqYahi9mDGzdcuRACu4L7ANhHulNlYMMu8Kg6F2dgCcIkCzIpSvsKAkIo0kiIq8vAqQrsA6EWSSP6k3KIggHa51jaVxJVwrd14T269JIjr7g4RscfYOjX%2BM4KN0s9dSMUWrpS64iz7cNZZSXHIWFnwO03z7%2BHKBe%2B%2FhzgPzgaeAqOYjoLgVIqCHHb8wZNOWIKq57KlomlNXK9PVMz8oc6Jm%2BlTnUM6tYLEf24MiDUynAvt0GVdWcp6D8TVC%2BghpsghooqOEGKMOBslCqapUCSYaRqpSvHA68anDfVYPzrhpcd%2BV03K3VHk%2ByNM8%2FzDPoCU2KVtX2dtgYnsfwPF4bM%2BLYM2QfcaGDqvExNkpUjSPsaiSfsaGjCn8MlNYj%2BYzlS2At2W2DQBx6ZNnT5AL6ZT9k6ezFbPE6CA3BZvA8g%2BfZ2hz%2BQH9qHhWTS5zgE7OPMJ73CZVe86sMNKy1PVUTJ2gUsznWV82GgQYfx%2BkOPL%2BGJ2E7RLW3lH9X8u%2FapoHDAaChSY9UJbxfh59kCnqS5i3EtiHyuYLIo6RF9qmBZDqIi8gKbTpDywHDLhgei2VmAS0JCP1cCiTYga0JF%2FjsozwHgsBEvNpz12Kt0fLh6rID9NsntTCU4B3h%2BY4wYFNoFen%2FWkCHM0vgxW9pgVaLo8lKt6sG2VPy7VcTNZolV7old0abzJu4iXh285o38dqAZMaLjkSGczPKmhi5cErP3IyEh0G5fbTnQel%2F%2BIGAhgok1%2FO7KiBk%2FmqkJVT4E5W%2F1unniJ%2Fr4E0xgAOkXWsTjuOtO9VHlukiHA9ZURfUyrzeZkRXvRtQgP4c4J%2BleFuKt5V4W4m3tXhzdJ%2BpUyhpOVuSU0Xb2Z6cKl7OVuVU8a7nGsNPewpx4yiwYQioW7fDQh4XAS4zvuETo1Gyx6A6IA6Sm9prMVdUCm2uHm6IlHqP5GsOUC%2BB8eXKSMK%2BLMpDf1fy71qMYkX6WPxi2pCeh2OR7x7rQrp8868sIcVAt08D3b4c%2Bg7Koa9HKR6liGEwETQGw2jmmRg5XqdRCENG8MlsZSes7QRtsDnTBpszNdjU3XWniQWjz60tq%2BtdzL8P4P%2BWBvCGJNTLmC1Vx2K2Un2KmVOSWaMks0ZJZhunEjAy5WQCSWJnr4zslZ29NrLXZrZXEfdcxL%2FqNAZVA7p0NpqGWMNunozoi8mIvi2f5FbnhFVMmWHwaJzX8AQpr0bKE6S8GqnGmRChpEPBpimSy6Xq%2B5INuioKaKx%2FoJwuR34CTk4siBkCAeprCWs1TyCgUVT1PDBZY6zJmJTsBDVNkLUafsrwIl5dCScEkRJcabM7mgyrOl9sYcqOIY6sLpHuJRbKSwS%2FNBbtsSWijReloGpGRgqhSjCJrQs6UAOOymJ%2BabEqU0guRbam5saqQKHMZs66yjFmc%2FpSWMxryYkgOWvTJ9sFy3blLQHRL%2BUj0GDVJoTxWhWWFtkM%2FmK13VVORXEFLYJsSU7Ip00%2Bx8xWV%2BLKeBJEvbZOlUoAIZr0%2BnLmAZPRlzJzWROLVDsgVXUCUloxv4dCExmjERUzfGIWbbO5%2BpvNNVTVXguZlTNw8lEYDJubygaDzzRsZYmKZv%2BT7dqq3NQV8VV306db2VdFaKuZBzuY2d%2FFzBB4w3FZY7gN9BlW93RPfr7V%2B5rV%2FS9l9YGIbW%2Br0f1NRtdN2VyGP6dm6GtU%2Fc%2BuGbwvY72%2FT2mzv09p%2F0WmtMuZJ7F7aOQYEXZoUA%2BjQV5UczIt1%2FT2zYjhqHcJfzyai8KR7XqEEpQzS%2FJJ3%2FOydVfBPMhyLrcUXAcZw1MZakcBGDjhN9UGA09OMSNchtt0dwIMQj%2FkOH%2FqHffZQ9E%2Fe%2FWC8svpF9yF9WYxawVZhgPiCCwB1uClIeSRg0WsNtJqG%2FpoOyDu5yPk%2Bsyp2AuOk0SSaoswzHGogrpfjvetYSfyZqflQzUt8F3JYaRyD1SSXajMwNAUujWsMeNB0jK1kShop5dlbodJVXDPi6bUbc22Ok7NuMJOMU8uCpwxJzDQkCWLOAb1%2BqAYpfUEyIkhkUT7rtmFDxXMiP0c%2FOwQcBLEk0UMwN8HBWCtfuLXPFaaaQEQY7rknt%2FQnLKAet%2FXd%2FJOQBB2PKqcR3i4%2FWwKCu1ptaC5aZfwhi68GQ%2BjxWwLpu%2FCjNObLWieC22SRbQbfAtu34XLZ%2FNi1YwY8mkAUefkehlkF1Fyseesxirr36qifVbuF8N9UkDiI6jD9n7Cg11yD%2FFeBwzD9mhTv5Y2gLRXWapR2JOVF9IdR0FOc0mPDsu0Nc%2FS71U6pFU5ORiKq4HfqxcQb75ZuRTp82rKvIWHRf5MO4hxIk7uUe%2BwQN%2Fx3mFyuexFaJWSJL0BVs8gWI1jLCIzxBqiot8rohnPC2jBoHi0ELPrzMUiglutRtU%2BJheNU2R%2BYgpDjaC2lqgrgbPp%2BwybQTN55U5eU3K7LkEQzy9x3YF65soN1JurAR1UFqX5%2BLDQz%2FmQuPeteVKNhtxPRt43qVOOTnwpiGNvE6UaK4lqBpDR8ZBsVVX5Gu2LYDZrYCuydL6rnfgiVNfF2O4YkNVGVbBRu3LsPT7GgOl6ThvvYzvarhpSSBgO%2Buzf2ON%2Bu1ekP0RLHraO2hBNDbgO1MGRgduhbuHIIYLsLpoSmKhktJ11EibeRNFSRENogL91NDFGFaD1tK6s4n2DpVusUsGvUdiWS8qioUhvaWStVpRFE0C9lZG1XlMWDYF6akLH9n9L9ZH29YV2GrwphjihrMjcgJWXOO%2B%2Fwmn1Nc6y39y0nQ6sCMvFa1wxNQjTCoAUHYg4fblRvpUkc1AaB%2BXrKgEPnAI2OEnzDJY5qpVjV2WM1Uk9woztmVU3treQC3pAsrcUtT7uYKC%2Fa8EE0%2BnvSv5d11aQDZJWh90q33rf%2BPO4q66iQdHF3DalPMEljnqYfSnZbY74jey34zbkP8OrgOuNF9Op2mJdSkDjhB%2FiNChaLX845N0j8DXSQGleJIV31OqL81GGo93CPeMXGef5K5694SBN%2BKUFhB7CkdcffIaEr%2Fl1Gi9ot%2FJXk9I7egx2fPTJUhItcXbpRTJNoZTeoAgdBvEWWGKOV%2BKguYTp5TFgtbzDDvMOoXh6ntmTmUbZ7CbI%2BMvgtzT7M89y4Pc6QEWJkEQHQb22OoxZQ42SDagwhvVrqAX0IuugMM4dmNLNgsnTMMTjgXXoIcjUbwIH2PeKpUGkB%2BPJ58HkstVK6FyqfopAB%2B3NF%2FmlhAHvvCnwegJwZLln69Zknc%2BjszROsxfJu9yhmgdm8HwTpRxqfmQfwFujmsV9qF4%2FXPKlO0eZX1p%2BBGRm6MRRk087bIaeGjU67rZDjhlZ3f330lvgxPOOZqKRbq7fUmhv9kZ7IJhhilHNMnRs9hQ3699aVV%2FTKJnCn85lvOZ4l8aLZxtLQKRDauY1SwSFtLJvaKXbohkkmyTETR%2FZRsmwi581SkQlxSERpdsSEakmSYzh9cbKQgNslIuKpUMumU7DxVHD2N4Gr%2BmhS9CkTnUab2vtNzFAP00lW0STZqPNtTP4m22uAW6KBaeAnktAnWKzDXF6tfjCYg467Mglpkxfjdj3aRrjlJTHvlG47wfnbZrfVHm%2BnTeu8oZ2XlDlHdt5WZV3VENca7meX8MN05vELe4hZi%2FmbnkpkzptbpkpHzuUbrFFdoPYlLmOm8Sm7FzX6shUCjcyOfMEpkZ44DdjDmrqQjOCl9O4bDXEfHF3jctaw9JadITdbbKhMpkFc2zDiAtznBpS%2FiSYQ%2FngDWoKEDkT9qcsmjfZGQA%2FpcQ9rW662FjQtBsxNpWzYUM5G9aE0wjuIBvO4TXIRxsAy%2Bstqj7l%2B6Nz9gdTSO8fnYOxvMM2%2B%2FZbdixFMdBa2kUcOAP2%2BDGu3dHqo191fcuOKW0ylLdkILbG58kTNgTWKuX4HHkOkVBXrYjpmOUVIThV6Pu4eNTIteYMmtxscshRhz12OUSmV%2Fp2KoGa3YSsdnAVza3e1VePDV95%2Fb%2BQs5CR5S3P%2B5ruEhPPTf56bHSnw3rGHT1G3Jpc9kMUF1t6glMCaRTXWdjrJV1QaRJDXACRbhbkWgI190SgT%2BS5%2ByIixxZKUWwM6G314YZ60IMKz3MaR%2BbUXNZcB75ZjCfpbBYk4ZbxRK4BNkuGw9FDp2Qix5ZMp7pdRBh7xCtkt6OkJXyzwG5hNwlaEm12rTY5vrl7acyj03CdDnZtcLtbXpUD9Eb2ZJiTSBuHuatGTjhP8shtGZmzAycBSpzWjZweAZCzDSlzduAkQNu4I6DDNljQB5F8p15lznZuEnSzBX0QyXfqVebswEmAbragD3X3wDmGL3N24CRAt1twACINnHqVOdu5SdDNFhyASEOnXmXODpwE6GYLDmE4OXTW7WXOdk4SlCx4bte%2BeiFvqjXK%2B8M21xjl2hzVFu8bShWo7TttV%2BaE83zkmpl2EsLhioB3zhO7cUQkOUMW53LcMw8qZzfxJPidxFM4G8SDKtJ31rNlzo7iCfC7iSdxIIo6rMG9A5z%2FcNqvzNlNQAl%2BJwEVTrP9BqDCwGm%2FMmdH8QT43cSTOBvEg%2Fp74GwEypwdxRPgdxNP4mxy7xDHs077lTm7CSjB7ySgwmm23xBUGDrtV%2BbsKJ4Av5t4EmeDeNC4DJ0tVJmzo3gC%2FG7iSRxwr9UIVPX2ph1hT%2Bu3UZp7LXKrOeBxMM95KO%2BBPeRdukbDwDC3sal%2BZ40PdZHBYPqKndiYZ1KbBfNW64IWjC5wIxoOTTUpjEU8at7vRNDbRnB9R4L%2BZoL1kVLNMNv9VTbdrcrLu%2Fipgm720Z%2BM9p7U1dBqzuiwjSDedhA02LlIEndG6%2Ba%2B3mBKa%2BVX3FBh6UCXSq8c6d65uKXCTvfPnVsnSKbXUXJxVnXEnN0lXK%2Bc0KXeDY3BccNyk8rIiyyIkpG54K1oiCVv7MwicH1Fu7a2J6SpFMJ%2Fu%2BxalosvUmNt%2B7KZwcprv0NaJjcuhxNb9vg0SvgbTsPYNCueInh5n%2Fcp3mNZjXQ7jC%2FndHmfAO7Qelu6KJ7jpW8veZ4HF3TPoUzu7cnbLyV7n17Qcq%2BgLEU5VzemZjxP42u80p3%2FRrej6qvP1SIhkMTdLnjn6lvxat53J%2FZ3EqdexmfpNTcvMdyjBcuMZN%2FriKtftQVU85QECtJCWUm5lkNT7aISaH0O%2BaB6l8tuFgO0J124aAuN23kooyeEAy1xnyB0q8wLeq2j0cYlu9KfYnID5X4XJcXxU9yNLGmHVYjqqmpbyfFS4I7YSX2uXS4MyFkE4xDT%2B64bhpEFEsG50XOxmdzYM9KoiPu%2B4C%2FmzknMg0zFTBlKxqGiKqbKaDzUrpS5LZ%2BUSLU7MneWR9x3qejkZeHDdUzLc%2B%2F76r6a%2Bv%2B9Xq8qm%2BeNWzCwEiqFwAty36b95aBPlzJsKPxGyR%2Bx99gPPD63qwAcQ3vDERvgONdkTPftQj308uwdVo6%2B792ZJc4RulkORszzjzuC93A7c8%2F3vLvzx7HDgO4X8prd8GX%2F94eDZiM7rAxtzPNlgToePr6rfsePG1kd4wrAEbD0af9LvfIbsT36NAk2pklaQC2ySEKq8F3i8esXyQ%2F4%2FYMCWqVpdHF4dldZH%2FsddgTuwOESPRlGw8bZx3E85B3Cw%2BFQy%2FXoH%2F7QUB8fjnB%2Bq4QZNvjCG2IAHhHxQ0W40WSP%2Fbp38iLI0AA8S4L4VRpjSB4%2BvbPuQkYRhn6zAH1dAFHJ1Auh6gpsBKPi4oTU420TQN3jTmi3gQxQ41MjW3tGkDaxe0SToMAdWyFVqTndnNFh43pnaCJbzqBH79ligutEgeyMswM2lo%2Blk6HyLpBY30wZd1hggapdM5pKKJ247Rj40oXB4FhoSsKcUcPyNsjAmIaIUiTVmytb1XwxhwaoamHoVmABRcGGD2ZmtaWc%2BsLlMYsRHXvqqK3ZfbWVul%2Fuhu7bVzmJTpy4zV27Qh5vN9HkoOverRt9RAzYQJoetw6Ftja8BPtB3rMuikhvHMHYHDOqZlIXHuz8ml%2BBMQptP2ntKB1UgeYeuuq8qb3ShIMz3%2F4MiOyc7d6mW%2Fjl1ki9C626K3afUoqcVAqJ7ry4d34kg4LSb21caW9np0vgRYAHptbJm5%2BECDnuq9K49HC4tQ0ERl0GiNStFTq53GrPJHA9MnaQ9tbumOmxpnXOTO%2FXemo02KxtmjcObdohZ%2BwJ%2FKRoO6ui7bj%2FyeF29gXD7Xtz76QWcMauyjuFnMSkLZXkRp2JM%2Bx0Xo2BZwI1hd64gdcOwbej3H%2BlAJSf2IhwB%2BSPL56JT2d8ZpWHOxwHX67a%2B2LmcCle%2B7bI5%2BsOA6T%2FFLqXX0r5TJWPaID2N64ydAu%2FtM6fXtX%2BRd2sfevmM5V%2B5G91NAxN0DRHQ%2Fvtr%2BHsL6v5Fnf%2FbWguXK7mwJu%2BrqLyNe4bqL178z20DOqzc1LCmkUX%2BRiMoeszwVMyNJTFGf6erlPem2JXvCWOGGXVhCeeffGP2%2By%2BuHihtBaOI4DDlnGDPomGswH2NJp33tYb77sT8O9CoI4%2B%2BFz%2BQ53ArcNj8rtVcXqDt8UseDJZddhldHGpveJlMlGxCHnNhzD8pCP24uwebWbHQ01L8hIdeoUU8ZFBSNWg2gLsZFshO3IXIUvNxxUhmkqmbwlWV%2F7QLRhTMVqezS1dh%2F3e8aPD4%2BNDKHz%2BUc8%2FPvIG2pQ8bQWucC3LHHu9R4feI9w45h%2F6vcNHvn90VCETolCfvtaDe%2BblYeA4vfBbkL9P31DE%2B5YeH%2BEKo3Zj7FRdUuDEjXXco6FlJ%2BEWIXPpPlHR2Lo9nc2NUkOo9q2sBGYYVfvgFWJ8i7J4j5xIrUpqBN3HaxtQ14Ev1X7IWrgvmsDm6Q1C4UoYauZtYugPnPw2s8Mf36b6KVLjdRBd5g%2BP7AKmIk6TRriQSLfJb4eWw%2BZBFhV4IF6g%2FRvzDafI7CeWW7rd2PBfJYBkLv4%2BeaLZUSQdnOpBs4GFAP%2FjFPR%2BdHRc50QFjr7GCdH8DfMPD82y6Iss4skw4p%2BANMcC0iq2g3OSGOGU3E2gw3PUQECZTGldij6hROEKf76FQsWigwNDLaKCZ3gj%2BvQrKUFvdQ2%2FTsOb8%2BKn58%2FEF%2FGSWsX68y8fXr97%2Bf1Pz5u7h3bvwuyADLEx2LHz4Gh4oA9SirBx1aiukqGNmDTkIX0bEKPIw1MLiUsu3SAGmpMXfn1tC7dvTtn%2FaQl%2B7A%2FBeAguTpzj5rtxx1nv33fmbk2w1sxBUn2OUOaspVosLgXD0vCRYhyXykXHqSMn0KACPBWf8zMqg%2FvGJgj8r%2Bl%2Bbdki1%2BbLW3ZIKc54UsxRStQSs90nuORLufWeoGhHCZ7gpi4GPjgPlAul58Hkdy7OsRiHjkeuU9DCPpJdW5uoUZdmKQUMs9CNUkeDkTkppTGGgiPeOsZBt5HjoKspQUc%2FIjOqH4OzocX5lZFxaqZRJ%2FzP%2FvjuraXVYDCyUobHO%2BlJx6lH%2BjluW1Z9dmzkOGZtw5unhkfOk8w1HnVrO8%2FtbrD6T3TU1HnIbwPWa3GA1X3gzMZTB3xG1jGjjZ7DEqqNgBxbPppcE1Sh5Agh%2FRjNyHGwpxm%2BPM1SQ6sO29jY1ez8qHYpx0b1bzfaYnhcs0XDKRnXMRuT8YmLjrF33t50vwN%2BNqfvM2%2Ff%2FGfsqNP30l33gEa77SYffgHyYTN5sGKd%2FKatpoJyQJTfB71lD7JxoXOlHtb4cN7A7qq8sYeZn2BpgQnoRIUKpF5139HJbkVBl1K71Y7I1rQaCV7W1kJcRA9REHTqrcr%2BJYugQaWZpJF1j561ikqITWo4yFX6W4T127nKR6BQvXRo3%2BAZYhr7Hncvao0Nxq3edRD27VWl%2B7se3nliFEqx1prxCY%2Bu9WJfw3Ws6Wo89Gbkux4tmDSzMRoYB4Gm5WOFT70FWqqR2HYnrK6OpofVTQyjfB5Ar0LQw%2B7Q2SIv0pl4N9cfGxZfG9gbahr63U0EaxWqcUHOEMPeoSCPfZu7FOTmBSnNDhsUnBOOzW6p0OWKnXK21ur3qlW7DUCrXYDGuwAFuwDhVQ%2B7wGW7AK13gsJbH3ZiWV4zcXKXIFKftjcCSFbHW0JHHUH%2FG4ydxXwXk%2BEdKLvAxeq%2Bgl1csAsg3SiyE%2Bed4mgnKHk7yK76%2FoUi6U%2B4oB7EZ%2BUNXH%2FFePoqSuK%2FXba3RUnIl%2FX9%2FjwU%2BmOH8eE%2B%2B%2FcPH169e%2F38wwe2%2F1DeTD5vVVeSC9W12%2BAw8YKrvU84X994YCAp95PlPRLml6llSrrqlj0xJ57lboTENLciRLfKmVQkgoKQW%2FDUNHIVGUEYPlP73LacdYhwPk1T0%2BKIQGnMcaK49euPYIXStiP2Pz9G4S382j07Y8qiDvBzMOO3v2oMDF%2FRVsEo7JgrdEI6JVvbUlfsj%2FlaGkOIfXWlRYUCeuuqJcF1dIHD5R5EumMnjaQCxUlq%2FLGc9dK1LoPB3EDa3olHpTywaWCgm9%2FFI0wnixmetaiTf%2FbLS6y%2BMC0NQh5WurhO5VDU49dlRGyYspcezcutkgqjvMixMTLUvKTbVuVupNuqupBWKbeqNootJq7ktRWgfZhmLyCOvGH%2FqDoEX2Qre478vZDgvEHdTN%2BZCY5RbLR9U2Km067om%2Bc6t5gARsU0Uchxw7s1pY8FhtJFbi%2BBcO8w8TyTB4PcjZhRwDcVXpzx%2FE1ZW1sMpxM2VmF0Xf5b4ULVX%2FXfa1qWy3BNRB9b%2B3nrBZ0g9%2BTF9qz1ur1n2N8hiZw9M0SxV7TJnQK3Sa1aj8DdoAtHC0xzc7GRs2m7QzN8fQtZM2y5%2Fchs3wSMVhPKjsC3eQE9iCdW6IjpTmyBVT0zgZF8wZ%2FHfCbGegSwJ5gYSD3qLgmXsb2Mx0EB%2FRz8UEeeZt15GiV45pdWN3nYnYXssnvE5nF3wOZZ12cF9LO62OETTxdZsOoe9vtsCoLko3RRxFHCu0macJqc6Hosn40IdJnjYyw%2Btww0xxcC2%2Bv39xxiRvCSvQVEFFNODO%2FZRSgSfVDuhIlyEWhnquUBqGkAoV5C4KyN%2BP6MPH8L%2FcEpnlVIJvpnDehTDGWHV2yUl%2F4dCYod8To2X8UFfHqKuPpPT8nMV7x10Hi38pfm68p8XStos775TQbyqzSP5HHO99V6o%2BhNAEByoSK%2BDhDlZ0FMM9kOM0azxXOKbUdW%2FjSO0xsevk2lGwyQSZrxnvASlIuqKgyyixzq0uRarw0NNqJkUbDOoQjhehoi0TlixNNKUnk4%2BBJKacwlO%2ByVt0SP3LhN%2Fb6uqtFxrZnoZ7r%2BWBLp2esFverOtHaHbYFVd6q1qwVey%2BTaN5ut5uxWb3rlB3c2xrPxpR%2BC7NpF4QmeeTQ%2FTmEUFA25FApbK1FMXuGdx1Y0KAgVjC%2Bh%2F1SH0M6g4unR6ugpvOUtvRCap07Jc3o2Hj49x8%2FnSovrA1bKa9dkNSzMWJ3aaSOxesdC6KDc%2Bst0mtN89a5hAB6xAu59bdtlFV%2B7ktci0sFA23%2BA5hRfOBjnLVOJNu4n6Xl4BWoJYUsiYewegOF3w9ql2eyqSmz8sumfNILTtpWuKbGhVVWJGNLlPAMkiLIs4XEvSnvTKOOtPVFX4OC9AmVsz1hG2BupGKXNcHlp%2FfqKQ7tjkCnXHDaTKMFoEaeicNthe1Br7jXMqBvhbSj7qYVMp6uVi9ri3DZjous7yNpe1rGXI9yHwQ0tjbC6k0vt2AFrUlGp21N03%2BtdjzNxsBsce003dNIJa%2FHWcoyNZOtiN4htszaf6Bd1czkuycV%2BDHMALWOlrNM3dncMr8sONDje5qP38x30jEIrzU9dW32I%2B7QoYOCJ3gDqUMNoKM298Lv1q3fqWTuBtA3hbjC5ffYIzwnTT7%2BnbVWTQPVRvTbRVx51klMGjd2OGlmnGU2zy5AU31I5M3vB6gsoHearOyjKvk8Dgh6bdYdD82l3tPEqh70SYs8cEzd0yQHuxIJydPIt%2FAaMvFjBuHscTH6%2FoDEKsvnX8Hg6DY%2FLz7VV49n7d9ZpN42%2Brj57J3a9Y%2BI4psVirLicE0nOHrgWUG4zNR0bEGDmd37MGSKI7ISm5ZuGpmF0XY1LBbSw249vX%2F4EeL9%2Bm8%2BCOH6CLRenweikHNt%2F%2B1Dk%2FWrhB3OoS8KzyygOW4atbEbG4Hca8yXDn%2B4kjdlFMO%2BuYIQL49KbywjcrY2B88sgTG%2FgSbjmoi%2FPKGK93Sqpt%2FXMFHBzEHtlXk0zwb54XDYTDY2JMfU2%2FYd79%2B7l%2FwQ%2FrxdQJ%2BID7sp781%2Fh4RfarGpB%2FDM8PIvyeRys3vwjPPN89v8BSeiR3Q%3D%3D No evaluation is required (don’t run it ;)). Simply hide the input cell, leaving only the output visible through the properties (click on the top-right corner of the group). Here is a notebook with all examples [13] (some of which work in a browser as well without a kernel). What if using my own laptop is not an option? The WLJS Notebook can be exported to HTML, preserving even some dynamic behavior [16]. This is achieved through a quite sophisticated algorithm that tracks all event chains occurring on the JS and WL sides, attempting to approximate them using a simple state machine. The result is a standard HTML file containing all cells, slides, and the data of these state machines.However, due to the variety of values received from the IMU, the exporter cannot automatically capture this in a JS state machine. Consequently, rotations and the joystick input will not be preserved. Nonetheless, everything else will function as expected. … like a fish needs a bicycle Please refer to the final sentence of that beautiful comics by Zach Weinersmith. If not for the urgent need to showcase a rotating crystalline structure, this post would not exist. Thank you for your attention, and to those who read this far References [1] — ECMA-363 Specification, Wikipedia: https://en.wikipedia.org/wiki/Universal_3D[2] — DPG2025, Homepage: https://www.dpg-physik.de/[3] — Leo. Right Joy-con Controller as a Remote, Hackster: https://www.hackster.io/leo49/right-joy-con-controller-as-a-presentation-remote-5810e4 (2024)[4] — Jen Tong, Nintendo Switch Joy-Con Presentation Remote, Medium: https://medium.com/@mimming/nintendo-switch-joy-con-presentation-remote-5a7e08e7ad11 (2018)[5] — RevealJS, Homepage: https://revealjs.com/[6] — Manim, Homepage: https://www.manim.community/[7] — Motion Canvas, Homepage: https://motioncanvas.io/[8] — RISE, Github Page: https://github.com/damianavila/RISE[9] — WLJS Notebook, Homepage: https://wljs.io/[10] — Vasin K. Reinventing dynamic and portable notebooks with Javascript and Wolfram Language, Medium: https://medium.com/@krikus.ms/reinventing-dynamic-and-portable-notebooks-with-javascript-and-wolfram-language-22701d38d651 (2024)[11] — Vasin K. Dynamic Presentation, or How to Code a Slide with Markdown and WL, Blog post: https://wljs.io/blog/2025/03/02/ultimate-ppt (2025)[12] — Joy-Con WebHID, Github Page: https://github.com/tomayac/joy-con-webhid[13] — JoyCon Presenter Tool, Online notebook: https://jerryi.github.io/wljs-demo/PresenterJoyCon.html[14] — Faraday Effect, Online notebook: https://jerryi.github.io/wljs-demo/THzFaraday.html[15] — James Lambert VR powered by N64, Youtube video: https://www.youtube.com/watch?v=ha3fDU-1wHk[16] — Dynamic HTML, WLJS Documentation page: https://wljs.io/frontend/Exporting/Dynamic%20HTML/ All links provided were visited on March 2025. The post How to Use Gyroscope in Presentations, or Why Take a JoyCon to DPG2025 appeared first on Towards Data Science.
    0 Commentarios 0 Acciones 62 Views
  • TOWARDSDATASCIENCE.COM
    How to Write Queries for Tabular Models with DAX
    Introduction EVALUATE is the statement to query tabular models. Unfortunately, knowing SQL or any other query language doesn’t help as EVALUATE follows a different concept. EVALUATE has only two “Parameters”: A table to show A sort order (ORDER BY) You can pass a third parameter (START AT), but this one is rarely used. However, a DAX query can have additional components. Those are defined in the DEFINE section of the query.In the DEFINE section, you can define Variables and local Measures.You can use the COLUMN and TABLE keywords in EVALUATE, which I have never used until now. Let’s start with some simple Queries and add some additional logic step by step. However, first, let’s discuss the Tools. Querying tools There are two possibilities for querying a tabular model: Using the DAX query view in Power BI Desktop. Using DAX Studio. Of course, the syntax is the same. I prefer DAX Studio over DAX query view. It offers advanced features not available in Power BI Desktop, such as performance statistics with Server Timing and displaying the model’s metrics. On the other hand, the DAX query view in Power BI Desktop provides the option to apply changes in a Measure back to the model directly after I have modified them in the query. I will discuss this later when I explain more about the possibility of defining local measures. You can read the MS documentation on modifying Measures directly from the DAX query view. You can find a link to the documentation in the References section below. In this article, I will use DAX Studio only. Simple queries The simplest query is to get all columns and all rows from a table: EVALUATE     Customer This query returns the entire Customer table: Figure 1 – Simple query on the Customer table. The number of returned rows can be found in the bottom right corner of DAX Studio, as well as the position of the cursor in the Query (Figure by the Author) If I want to query the result of a single value, for example, a Measure, I must define a table, as EVALUATE requires a table as input. Curly brackets do this. Therefore, the query for a Measure looks like this: EVALUATE<br>     { [Online Customer Count]} The result is one single value: Figure 2 – Querying a Measure with Curly brackets to define a table (Figure by the Author) Get only the first 10 rows It’s not unusual to have tables with thousands or even millions of rows. So, what if I want to see the first 10 rows to glimpse the data inside the table? For this, TOPN() does the trick. TOPN() accepts a sorting order. However, it doesn’t sort the data; it only looks at the values and gets the first or last rows according to the sorting criteria. For example, let’s get the ten customers with the latest birthdate (Descending order): EVALUATE<br>    TOPN(10<br>        ,Customer<br>        ,Customer[BirthDate]<br>        ,DESC) This is the result: Figure 3 – Here, the result of TOPN() is used to get the top 10 rows by birthdate. See, that 11 rows are returned, as there are customers with the same birthdate (Figure by the Author) The DAX.guide article on TOPN() states the following about ties in the resulting data: If there is a tie in OrderBy_Expression values at the N-th row of the table, then all tied rows are returned. Then, when there are ties at the N-th row, the function might return more than n rows. This explains why we get 11 rows from the query. When sorting the output, we will see the tie for the last value, November 26, 1980. To have the result sorted by the Birthdate, you must add an ORDER BY: EVALUATE<br>    TOPN(10<br>        ,Customer<br>        ,Customer[BirthDate]<br>        ,DESC)<br>    ORDER BY Customer[BirthDate] DESC And here, the result: Figure 4 – Result of the same TOPN() query as before, but with an ORDER BY to sort the output of the query by the Birthday descending (Figure by the Author) Now, the ties at the last two rows are clearly visible. Adding columns Usually, I want to select only a subset of all columns in a table. If I query multiple columns, I will only get the distinct values of the existing combination of values in both columns. This differs from other query languages, like SQL, where I must explicitly define that I want to remove duplicates, for example with DISTINCT. DAX has multiple functions to get a subset of columns from a table: ADDCOLUMNS() SELECTCOLUMNS() SUMMARIZE() SUMMARIZECOLUMNS() Of these four, SUMMARIZECOLUMNS() is the most useful for general purposes. When trying these four functions, be cautious when using ADDCOLUMNS(), as this function can result in unexpected results. Read this SQLBI article for more details. OK, how can we use SUMMARIZECOLUMNS() in a query: EVALUATE<br>    SUMMARIZECOLUMNS('Customer'[CustomerType]) This is the result: Figure 5 – Getting the Distinct values of CustomerType with SUMMARIZECOLUMNS() (Figure by the Author) As described above, we get only the distinct values of the CustomerType column. When querying multiple columns, the result is the distinct combinations of the existing data: Figure 6 – Getting multiple columns (Figure by the Author) Now, I can add a Measure to the Query, to get the number of Customers per combination: EVALUATE<br>    SUMMARIZECOLUMNS('Customer'[CustomerType]<br>                        ,Customer[Gender]<br>                        ,"Number of Customers", [Online Customer Count]) As you can see, a label must be added for the Measure. This applies to all calculated columns added to a query. This is the result of the query above: Figure 7 – Result of the query with multiple columns and a Measure (Figure by the Author) You can add as many columns and measures as you need. Adding filters The function CALCULATE() is well-known for adding filters to a Measure. For queries, we can use the CALCULATETABLE() function, which works like CALCULATE(); only the first argument must be a table. Here, the same query as before, only that the Customer-Type is filtered to include only “Persons”: EVALUATE<br>CALCULATETABLE(<br>    SUMMARIZECOLUMNS('Customer'[CustomerType]<br>                        ,Customer[Gender]<br>                        ,"Number of Customers", [Online Customer Count])<br>                ,'Customer'[CustomerType] = "Person"<br>                ) Here, the result: Figure 8 – Query and result to filter the Customer-Type to “Person” (Figure by the Author) It is possible to add filters directly to SUMMARIZECOLUMNS(). The queries generated by Power BI use this approach. But it is much more complicated than using CALCULATETABLE(). You can find examples for this approach on the DAX.guide page for SUMMARIZECOLUMNS(). Power BI uses this approach when building queries from the visualisations. You can get the queries from the Performance Analyzer in Power BI Desktop. You can read my piece about collecting performance data to learn how to use Performance Analyzer to get a query from a Visual. You can also read the Microsoft documentation linked below, which explains this. Defining Local Measures From my point of view, this is one of the most powerful features of DAX queries: Adding Measures local to the query. The DEFINE statement exists for this purpose. For example, we have the Online Customer Count Measure. Now, I want to add a filter to count only customers of the type “Person”. I can modify the code in the data model or test the logic in a DAX query. The first step is to get the current code from the data model in the existing query. For this, I must place the cursor on the first line of the query. Ideally, I will add an empty line to the query. Now, I can use DAX Studio to extract the code of the Measure and add it to the Query by right-clicking on the Measure and clicking on “Define Measure”: Figure 9 – Use the “Define Measure” feature of DAX Studio to extract the DAX code for a Measure (Figure by the Author) The same feature is also available in Power BI Desktop. Next, I can change the DAX code of the Measure by adding the Filter: DEFINE <br>---- MODEL MEASURES BEGIN ----<br>MEASURE 'All Measures'[Online Customer Count] =<br>    CALCULATE(DISTINCTCOUNT('Online Sales'[CustomerKey])<br>                ,'Customer'[CustomerType] = "Person"<br>                )<br>---- MODEL MEASURES END ---- When executing the query, the local definition of the Measure is used, instead of the DAX code stored in the data model: Figure 10 – Query and results with the modified DAX code for the Measure (Figure by the Author) Once the DAX code works as expected, you can take it and modify the Measure in Power BI Desktop. The DAX query view in Power BI Desktop is advantageous because you can directly right-click the modified code and add it back to the data model. Refer to the link in the References section below for instructions on how to do this. DAX Studio doesn’t support this feature. Putting the pieces together OK, now let’s put the pieces together and write the following query: I want to get the top 5 products ordered by customers. I take the query from above, change the query to list the Product names, and add a TOPN(): DEFINE  ---- MODEL MEASURES BEGIN ---- MEASURE 'All Measures'[Online Customer Count] =     CALCULATE(DISTINCTCOUNT('Online Sales'[CustomerKey])                 ,'Customer'[CustomerType] = "Person"                 ) ---- MODEL MEASURES END ---- EVALUATE     TOPN(5         ,SUMMARIZECOLUMNS('Product'[ProductName]                         ,"Number of Customers", [Online Customer Count]                         )         ,[Number of Customers]         ,DESC)     ORDER BY [Number of Customers] Notice that I pass the measure’s label, “Number of Customers”, instead of its name. I must do it this way, as DAX replaces the measure’s name with the label. Therefore, DAX has no information about the Measure and only knows the label. This is the result of the query: Figure 11 – The query result using TOPN() combined with a Measure. Notice that the label is used instead of the Measures name (Figure by the Author) Conclusion I often use queries in DAX Studio, as it is much easier for Data Validation. DAX Studio allows me to directly copy the result into the Clipboard or write it in an Excel file without explicitly exporting the data. This is extremely useful when creating a result set and sending it to my client for validation. Moreover, I can modify a Measure without changing it in Power Bi Desktop and quickly validate the result in a table. I can use a Measure from the data model, temporarily create a modified version, and validate the results side-by-side. DAX queries have endless use cases and should be part of every Power BI developer’s toolkit. I hope that I was able to show you something new and explain why knowing how to write DAX queries is important for a Data model developer’s daily life. References Microsoft’s documentation about applying changes from the DAX Query view on the model: Update model with changes – DAX query view – Power BI | Microsoft Learn Like in my previous articles, I use the Contoso sample dataset. You can download the ContosoRetailDW Dataset for free from Microsoft here. The Contoso Data can be freely used under the MIT License, as described in this document. I changed the dataset to shift the data to contemporary dates. The post How to Write Queries for Tabular Models with DAX appeared first on Towards Data Science.
    0 Commentarios 0 Acciones 29 Views
  • TOWARDSDATASCIENCE.COM
    Beyond the Code: Unconventional Lessons from Empathetic Interviewing
    Recently, I’ve been interviewing Computer Science students applying for data science and engineering internships with a 4-day turnaround from CV vetting to final decisions. With a small local office of 10 and no in-house HR, hiring managers handle the entire process. This article reflects on the lessons learned across CV reviews, technical interviews, and post-interview feedback. My goal is to help interviewers and interviewees make this process more meaningful, kind, and productive. Principles That Guide the Process Foster meaningful discussions rooted in real work to get maximum signal and provide transferrable knowledge Ensure applicants solve all problems during the experience– Judge excellence by how much inspiration arises unprompted Make sure even unsuccessful applicants walk away having learned something Set clear expectations and communicate transparently The Process Overview Interview Brief CV Vetting 1-Hour Interview Post-Interview Feedback A single, well-designed hour can be enough to judge potential and create a positive experience, provided it’s structured around real-world scenarios and mutual respect. The effectiveness of the tips would depend on company size, rigidity of existing processes, and interviewers’ personality and leadership skills  Let’s examine each component in more detail to understand how they contribute to a more empathetic and effective interview process. Photo by Sven Huls on Unsplash Interview Brief: Set the Tone Early Link to sanitized version.  The brief provides: Agenda Setup requirements (debugger, IDE, LLM access) Task expectations Brief Snippet: Technical Problem Solving Exercise 1: Code Review (10-15 min) Given sample code, comment on its performance characteristics using python/computer science concepts What signals this exercise provides Familiarity with IDE, filesystem and basic I/O Sense of high performance, scalable code Ability to read and understand code Ability to communicate and explain code No one likes turning up to a meeting without an agenda, so why offer candidates any less context than we expect from teammates? Process Design When evaluating which questions to ask, well-designed ones should leave plenty of room for expanding the depth of the discussion. Interviewers can show empathy by providing clear guidance on expectations. For instance, sharing exercise-specific evaluation criteria (which I refer to as “Signals” in the brief) allows candidates to explore beyond the basics. Code or no code Whether I include pre-written code or expect the candidate to write depends on the time available. I typically reveal it at the start of each task to save time ,  especially since LLMs can often generate the code, as long as the candidate demonstrates the right thinking. CV Vetting: Signal vs Noise You can’t verify every claim on a CV, but you can look for strong signals  Git Introspection One trick is to run git log — oneline — graph — author=gitgithan — date=short — pretty=format:”%h %ad %s” to see all the commits authored by a particular contributor.  You can see what type of work it is (feature, refactoring, testing, documentation), and how clear the commit messages are. Strong signals  Self-directed projects or open-source contributions Evidence of cross-functional communication and impact Weak or Misleading signals Guided tutorial projects are less effective in showing vision or drive Bombastic adjectives like passionate member, indispensable position.  Photo by Patrick Fore on Unsplash Interview: Uncovering Mindsets Reflecting on the Interview Brief I begin by asking for thoughts on the Interview Brief. This has a few benefits: How conscientious are they in following the setup instructions? – Are they prepared with the debugger and LLM ready to go? What aspects confuse them?– I realized I should have specified “Pandas DataFrame” instead of just “dataframe” in the brief. Some candidates without Pandas installed experienced unnecessary setup stress. However, observing how they handled this issue provided valuable insight into their problem-solving approach– This also highlights their attention to detail and how they engage with documentation, often leading to suggestions for improvement. What tools are they unfamiliar with?– If there’s a lack of knowledge in concurrent Programming or AWS, it’s more efficient to spend less time on Exercise 3 and focus elsewhere.– If they’ve tried to learn these tools in the short time between receiving the brief and the interview, it demonstrates strong initiative. The resources they consult also reveal their learning style and resourcefulness. Favorite Behavioral Question To uncover essential qualities beyond technical skills, I find the following behavioral question particularly revealing Can you describe a time when you saw something that wasn’t working well and advocated for an improvement? This question reveals a range of desirable traits: Critical thinking to recognize when something is off Situational awareness to assess the current state and vision to define a better future Judgment to understand why the new approach is an improvement Influence and persistence in advocating for change Cultural sensitivity and change management awareness, understanding why advocacy may have failed, and showing the grit to try again with a new approach Effective Interviewee Behaviours (Behavioural Section) Attuned to both personal behavior and both its effect on, and how it’s affected by others Demonstrates the ability to overcome motivation challenges and inspire others Provides concise, inverted pyramid answers that uniquely connect to personal values Ineffective Interviewee Behaviours (Behavioural Section) Offers lengthy preambles about general situations before sharing personal insights Tips for Interviewers (Behavioural Section)I’ve never been a fan of questions focused on interpersonal conflicts, as many people tend to avoid confrontation by becoming passive (e.g., not responding or mentally disengaging) rather than confronting the issue directly. These questions also often disadvantage candidates with less formal work experience. A helpful approach is to jog their memory by referencing group experiences listed on their CV and suggesting potential scenarios that could be useful for discussion. Providing instant feedback after their answers is also valuable, allowing candidates to note which stories are worth refining for future interviews. Technical Problem Solving: Show Thinking, Not Just Results Measure Potential, Not Just Preparedness Has high agency, jumps into back-of-the-envelope calculations instead of making guesses Re-examines assumptions Low ego to reveal what they don’t know and make good guesses about why something is so based on limited information Makes insightful analogies (eg. database cursor vs file pointer) that show deeper understanding and abstraction Effective Interviewee Behaviours (Technical Section) Exercise 1 on File reading with generators: admitting upfront their unfamiliarity with yield syntax invites the interviewer to hint that it’s not important Exercise 2 on data cleaning after JOIN: caring about data lineage, constraints of the domain (units, collection instrument) shows systems thinking and a drive to fix the root cause Ineffective Interviewee Behaviours (Technical Section) Remains silent when facing challenges instead of seeking clarification Fails to connect new concepts with prior knowledge  Calls in from noisy, visually distracting environments, thus creating friction on top of existing challenges like accents. Tips for Interviewers (Technical Section) Start with guiding questions that explore high-level considerations before narrowing down. This helps candidates anchor their reasoning in principles rather than trivia. Avoid overvaluing your own prepared “correct answers.” The goal isn’t to test memory, but to observe reasoning. Withhold judgment in the moment ,  especially when the candidate explores a tangential but thoughtful direction. Let them follow their thought process uninterrupted. This builds confidence and reveals how they navigate ambiguity. Use curiosity as your primary lens. Ask yourself, “What is this candidate trying to show me?” rather than “Did they get it right?” Photo by Brad Switzer on Unsplash LLM: A Window into Learning Styles Modern technical interviews should reflect the reality of tool-assisted development. I encouraged candidates to use LLMs — not as shortcuts, but as legitimate creation tools. Restricting them only creates an artificial environment, divorced from real-world workflows. More importantly, how candidates used LLMs during coding exercises revealed their learning preferences (learning-optimized vs. task-optimized) and problem-solving styles (explore vs. exploit). You can think of these 2 dichotomies as sides of the same coin: Learning-Optimized vs. Task-Optimized (Goals and Principles) Learning-Optimized: Focuses on understanding principles, expanding knowledge, and long-term learning. Task-Optimized: Focuses on solving immediate tasks efficiently, often prioritizing quick completion over deep understanding. Explore vs. Exploit (How it’s done) Explore: Seeks new solutions, experiments with various approaches, and thrives in uncertain or innovative environments. Exploit: Leverages known solutions, optimizes existing strategies, and focuses on efficiency and results. 4 styles of prompting In Exercise 2, I deleted a file.seek(0) line, causing pandas.read_csv() to raise EmptyDataError: No columns to parse from file.  Candidates prompted LLMs in 4 styles: Paste error message only Paste error message and erroring line from source code Paste error message and full source code Paste full traceback and full source code My interpretations (1) is learning-optimized, taking more iterations (4) is task-optimized, context-rich, and efficient Those who choose (1) start looking at a problem from the highest level before deciding where to go. They consider that the error may not even be in the source code, but the environment or elsewhere (See Why Code Rusts in reference). They optimize for learning rather than fixing the error immediately.  Those with poor code reproduction discipline and do (4) may not learn as much as (1), because they can’t see the error again after fixing it. My ideal is (4) for speedy fixes, but taking good notes along the way so the root cause is understood, and come away with sharper debugging instincts. Red Flag: Misplaced Focus on Traceback Line Even though (2) included more detail in the prompt than (1), more isn’t always better.In fact, (2) raised a concern: it suggested the candidate believed the line highlighted in the Traceback ( — -> 44 df_a_loaded = pd.read_csv) was the actual cause of the error.  In reality, the root cause could lie much earlier in the execution, potentially in a different file altogether. Prompt Efficiency Matters After Step (2), the LLM returned three suggested fixes — only the third one was correct. The candidate spent time exploring Fix #1, which wasn’t related to the bug at all. However, this exploration did uncover other quirks I had embedded in the code (NaNs sprinkled across the joined result from misaligned timestamps as the joining key) Had the candidate instead used a prompt like in Step (3) or (4), the LLM would’ve provided a single, accurate fix, along with a deeper explanation directly tied to the file cursor issue. Style vs Flow Some candidates added pleasantries and extra instructions to their prompts, rather than just pasting the relevant code and error message. While this is partly a matter of style, it can disrupt the session’s flow ,  especially under time constraints or with slower typing ,  delaying the solution. There’s also an environmental cost. Photo by Anastasia Petrova on Unsplash Feedback: The Real Cover Letter After each interview, I asked candidates to write reflections on: What they learned What could be improved What they thought of the process This is far more useful than cover letters, which are built on asymmetric information, vague expectations, and GPT-generated fluff.Here’s an example from the offered candidate. Excelling in this area builds confidence that colleagues can provide candid, high-quality feedback to help each other address blind spots. It also signals the likelihood that someone will take initiative in tasks like documenting processes, writing thorough meeting minutes, and volunteering for brown bag presentations. Effective Interviewee Behaviours (Feedback Section) Communicates expected completion times and follows through with timely submissions. Formats responses with clear structure — using paragraph spacing, headers, bold/italics, and nested lists — to enhance readability. Reflects on specific interview moments by drawing lessons from good notes or memory. Recognizes and adapts existing thinking patterns or habits through meta-cognition Ineffective Interviewee Behaviours (Feedback Section) Submits unstructured walls of text without a clear thesis or logical flow Fixates solely on technical gaps while ignoring behavioural weaknesses. Tips for Interviewers (Feedback Section) Live feedback during the interview was time-constrained, so give written feedback after the interview about how they could have improved in each section, with learning resources– If done independently from the interviewee’s feedback, and it turns out the observations match, that’s a strong signal of alignment – It’s an act of goodwill towards unsuccessful candidates, a building of the company brand, and an opportunity for lifelong collaboration Carrying It Forward: Actions That Matter For Interviewers Develop observation and facilitation skills Provide actionable, empathetic feedback Remember: your influence could shape someone’s career for decades For Interviewees Make the most of the limited information you have, but try to seek more Be curious, prepared, and reflective to learn from each opportunity People will forget what you said, people will forget what you did, but people will never forget how you made them feel – Maya Angelou As interviewers, our job isn’t just to assess — it’s to reveal. Not just whether someone passes, but what they’re capable of becoming. At its best, empathetic interviewing isn’t a gate — it’s a bridge. A bridge to mutual understanding, respect, and possibly, a long-term partnership grounded not just in technical skills, but in human potential beyond the code. The interview isn’t just a filter — it’s a mirror. The interview reflects who we are. Our questions, our feedback, our presence — they signal the culture we’re building, and the kind of teammates we strive to be. Let’s raise the bar on both sides of the table. Kindly, thoughtfully, and together. Photo by Shane Rounce on Unsplash If you’re also a hiring manager passionate about designing meaningful interviews, let’s connect on LinkedIn (https://www.linkedin.com/in/hanqi91/). I’d be happy to share more about the exercises I prepared. Resources Writing useful commit messages: https://refactoringenglish.com/chapters/commit-messages/ Writing impactful proposals: https://www.amazon.sg/Pyramid-Principle-Logic-Writing-Thinking/dp/0273710516 http://highagency.com/ Glue work: https://www.noidea.dog/glue The Missing Readme: https://www.amazon.sg/dp/1718501838 Why Code Rusts: https://www.tdda.info/why-code-rusts The post Beyond the Code: Unconventional Lessons from Empathetic Interviewing appeared first on Towards Data Science.
    0 Commentarios 0 Acciones 30 Views
  • TOWARDSDATASCIENCE.COM
    Retrieval Augmented Generation (RAG) — An Introduction
    The model hallucinated! It was giving me OK answers and then it just started hallucinating. We’ve all heard or experienced it. Natural Language Generation models can sometimes hallucinate, i.e., they start generating text that is not quite accurate for the prompt provided. In layman’s terms, they start making stuff up that’s not strictly related to the context given or plainly inaccurate. Some hallucinations can be understandable, for example, mentioning something related but not exactly the topic in question, other times it may look like legitimate information but it’s simply not correct, it’s made up. This is clearly a problem when we start using generative models to complete tasks and we intend to consume the information they generated to make decisions. The problem is not necessarily tied to how the model is generating the text, but in the information it’s using to generate a response. Once you train an LLM, the information encoded in the training data is crystalized, it becomes a static representation of everything the model knows up until that point in time. In order to make the model update its world view or its knowledge base, it needs to be retrained. However, training Large Language Models requires time and money. One of the main motivations for developing RAG s the increasing demand for factually accurate, contextually relevant, and up-to-date generated content.[1] When thinking about a way to make generative models aware of the wealth of new information that is created everyday, researchers started exploring efficient ways to keep these models-up-to-date that didn’t require continuously re-training models. They came up with the idea for Hybrid Models, meaning, generative models that have a way of fetching external information that can complement the data the LLM already knows and was trained on. These modela have a information retrieval component that allows the model to access up-to-date data, and the generative capabilities they are already well known for. The goal being to ensure both fluency and factual correctness when producing text. This hybrid model architecture is called Retrieval Augmented Generation, or RAG for short. The RAG era Given the critical need to keep models updated in a time and cost effective way, RAG has become an increasingly popular architecture. Its retrieval mechanism pulls information from external sources that are not encoded in the LLM. For example, you can see RAG in action, in the real world, when you ask Gemini something about the Brooklyn Bridge. At the bottom you’ll see the external sources where it pulled information from. Example of external sources being shown as part of the output of the RAG model. (Image by author) By grounding the final output on information obtained from the retrieval module, the outcome of these Generative AI applications, is less likely to propagate any biases originating from the outdated, point-in-time view of the training data they used. The second piece of the Rag Architecture is what is the most visible to us, consumers, the generation model. This is typically an LLM that processes the information retrieved and generates human-like text. RAG combines retrieval mechanisms with generative language models to enhance the accuracy of outputs[1] As for its internal architecture, the retrieval module, relies on dense vectors to identify the relevant documents to use, while the generative model, utilizes the typical LLM architecture based on transformers. A basic flow of the RAG system along with its component. Image and caption taken from paper referenced in [1] (Image by Author) This architecture addresses very important pain-points of generative models, but it’s not a silver bullet. It also comes with some challenges and limitations. The Retrieval module may struggle in getting the most up-to-date documents. This part of the architecture relies heavily on Dense Passage Retrieval (DPR)[2, 3]. Compared to other techniques such as BM25, which is based on TF-IDF, DPR does a much better job at finding the semantic similarity between query and documents. It leverages semantic meaning, instead of simple keyword matching is especially useful in open-domain applications, i.e., think about tools like Gemini or ChatGPT, which are not necessarily experts in a particular domain, but know a little bit about everything. However, DPR has its shortcomings too. The dense vector representation can lead to irrelevant or off-topic documents being retrieved. DPR models seem to retrieve information based on knowledge that already exists within their parameters, i.e, facts must be already encoded in order to be accessible by retrieval[2]. […] if we extend our definition of retrieval to also encompass the ability to navigate and elucidate concepts previously unknown or unencountered by the model—a capacity akin to how humans research and retrieve information—our findings imply that DPR models fall short of this mark.[2] To mitigate these challenges, researchers thought about adding more sophisticated query expansion and contextual disambiguation.  Query expansion is a set of techniques that modify the original user query by adding relevant terms, with the goal of establishing a connection between the intent of the user’s query with relevant documents[4]. There are also cases when the generative module fails to fully take into account, into its responses, the information gathered in the retrieval phase. To address this, there have been new improvements on attention and hierarchical fusion techniques [5]. Model performance is an important metric, especially when the goal of these applications is to seamlessly be part of our day-to-day lives, and make the most mundane tasks almost effortless. However, running RAG end-to-end can be computationally expensive. For every query the user makes, there needs to be one step for information retrieval, and another for text generation. This is where new techniques, such as Model Pruning [6] and Knowledge Distillation [7] come into play, to ensure that even with the additional step of searching for up-to-date information outside of the trained model data, the overall system is still performant. Lastly, while the information retrieval module in the RAG architecture is intended to mitigate bias by accessing external sources that are more up-to-date than the data the model was trained on, it may actually not fully eliminate bias. If the external sources are not meticulously chosen, they can continue to add bias or even amplify existing biases from the training data. Conclusion Utilizing RAG in generative applications provides a significant improvement on the model’s capacity to stay up-to-date, and gives its users more accurate results. When used in domain-specific applications, its potential is even clearer. With a narrower scope and an external library of documents pertaining only to a particular domain, these models have the ability to do a more effective retrieval of new information. However, ensuring generative models are constantly up-to-date is far from a solved problem. Technical challenges, such as, handling unstructured data or ensuring model performance, continue to be active research topics. Hope you enjoyed learning a bit more about RAG, and the role this type of architecture plays in making generative applications stay up-to-date without requiring to retrain the model. Thanks for reading! References A Comprehensive Survey of Retrieval-Augmented Generation (RAG): Evolution, Current Landscape and Future Directions. (2024). Shailja Gupta and Rajesh Ranjan and Surya Narayan Singh. (ArXiv) Retrieval-Augmented Generation: Is Dense Passage Retrieval Retrieving. (2024). Benjamin Reichman and Larry Heck— (link) Karpukhin, V., Oguz, B., Min, S., Lewis, P., Wu, L., Edunov, S., Chen, D. & Yih, W. T. (2020). Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 6769-6781).(Arxiv) Hamin Koo and Minseon Kim and Sung Ju Hwang. (2024).Optimizing Query Generation for Enhanced Document Retrieval in RAG. (Arxiv) Izacard, G., & Grave, E. (2021). Leveraging passage retrieval with generative models for open domain question answering. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (pp. 874-880). (Arxiv) Han, S., Pool, J., Tran, J., & Dally, W. J. (2015). Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems (pp. 1135-1143). (Arxiv) Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. ArXiv. /abs/1910.01108 (Arxiv) The post Retrieval Augmented Generation (RAG) — An Introduction appeared first on Towards Data Science.
    0 Commentarios 0 Acciones 33 Views
  • TOWARDSDATASCIENCE.COM
    Beginner’s Guide to Creating a S3 Storage on AWS
    Introduction AWS is a well-known cloud provider whose primary goal is to allocate server resources for software engineers to deploy their applications. AWS offers many services, one of which is EC2, providing virtual machines for running software applications in the cloud. However, for data-intensive applications, storing data inside EC2 instances is not always the optimal choice. While EC2 offers fast read and write speeds, it is not optimized for scalability. A better alternative is to use S3 storage instead. Storing data in EC2 vs S3 Amazon S3 was specifically designed for storing massive amounts of unstructured data: It has a highly reliable resilience system, thanks to which the durability rate exceeds 99.99%. S3 automatically replicates data across multiple servers to prevent potential data loss. It seamlessly integrates with other AWS services for data analytics and machine learning. Storing data in S3 is significantly more cost-effective compared to EC2. The main use case where EC2 might be preferred is when frequent data access is required. For example, during machine learning model training, where the dataset must be read repeatedly for each batch. In most other cases, S3 is the better choice. About this article The objective of this article is to demonstrate how to create a basic S3 Storage. By the end of the tutorial, we will have a functioning S3 storage that allows remote access to uploaded images. To keep the focus on key aspects, we will cover only the storage creation process and not dive into best security practices. Tutorial # 01. Create S3 storage To perform any operations related to S3 storage management, select the Storage option from the service menu. In the submenu that appears, choose S3. AWS organizes data into collections called buckets. To create a bucket, click Create bucket. Each bucket requires a unique global name. Most other settings can be left as default. Once all options are selected, click Create bucket. After a few seconds, AWS will redirect you to the bucket management panel. # 02. Create folder (optional step) Folders in S3 function similarly to standard computer folders, helping to organize hierarchical data. Additionally, any file stored in an S3 folder will have a URL prefix that includes the folder path. To create a folder, click the Create folder button. In the appearing window, choose a custom name for the folder. After clicking the Create folder button, the folder will be created! You can now navigate to it. Since no images have been uploaded yet, the folder is empty for now, but we will add images in step 4. # 03. Adjust data access As a reminder, our goal is to create a publicly visible image storage that allows remote access. To achieve this, we need to adjust data access policies. By clicking on the Permissions tab under the bucket name, you will see a list of options to modify access settings. We need to unblock public access, so click on the respective Edit button in the interface and uncheck all the checkboxes related to access blocking. After saving the changes, we should see an exclamation mark icon with the “Off” text. Then, navigate to the Bucket policy section and click Edit. To allow read access, insert the following policy text: # 04. Upload images Now it is time to upload images. To do that, navigate to the created “images” folder and click on the Upload button. Click on the Add files button, which will open a file explorer on your computer. Choose and import the images from there. Depending on the number and size of the imported images, AWS might take some time to process them. In this example, I have imported nine images. # 05. Access data After the images have been successfully imported, click on any of their filenames to get more information. In the opened panel, you will see metadata related to the chosen image. As we can see in the “Object URL” field, AWS created a unique URL for our image! Additionally, we can notice that the URL contains the images/ prefix, which corresponds exactly to the folder structure we defined above! Finally, since we have authorized read access, we can now publicly access this URL. If you click on the image URL and copy it into the browser’s address bar, the image will be displayed! The amazing part about this is that you can now create a URL template in the form https://<bucket_url>/<folder_path>/<filename>. By doing so, you can dynamically replace the <filename> field in a program to access images and perform data manipulation. Conclusion In this article, we have introduced the AWS S3 storage system, which is very useful for storing large amounts of unstructured data. With its advanced scalability and security mechanisms, S3 is perfect for organizing massive data volumes at a much lower cost compared to EC2 containers. All images are by the author unless noted otherwise. Connect with me Medium LinkedIn The post Beginner’s Guide to Creating a S3 Storage on AWS appeared first on Towards Data Science.
    0 Commentarios 0 Acciones 33 Views
  • TOWARDSDATASCIENCE.COM
    Building a Personal API for Your Data Projects with FastAPI
    How many times have you had a messy Jupyter Notebook filled with copy-pasted code just to re-use some data wrangling logic? Whether you do it for passion or for work, if you code a lot, then you’ve probably answered something like “way too many”. You’re not alone. Maybe you tried to share data with colleagues or plugging your latest ML model into a slick dashboard, but sending CSVs or rebuilding the dashboard from scratch doesn’t feel correct. Here’s today’s fix (and topic): build yourself a personal API.In this post, I’ll show you how to set up a lightweight, powerful FastAPI service to expose your datasets or models and finally give your data projects the modularity they deserve. Whether you’re a solo Data Science enthusiast, a student with side projects, or a seasoned ML engineer, this is for you. And no, I’m not being paid to promote this service. It’d be good, but the reality is far from that. I just happen to enjoy using it and I thought it was worth being shared. Let’s review today’s table of contents: What is a personal API? (And why should you care?) Some use cases Setting it up with Fastapi Conclusion What Is a Personal API? (And Why Should You Care?) 99% of people reading this will already be familiar with the API concept. But for that 1%, here’s a brief intro that will be complemented with code in the next sections: An API (Application Programming Interface) is a set of rules and tools that allows different software applications to communicate with each other. It defines what you can ask a program to do, such as “give me the weather forecast” or “send a message.” And that program handles the request behind the scenes and returns the result. So, what is a personal API? It’s essentially a small web service that exposes your data or logic in a structured, reusable way. Think of it like a mini app that responds to HTTP requests with JSON versions of your data. Why would that be a good idea? In my opinion, it has different advantages: As already mentioned, reusability. We can use it from our Notebooks, dashboards or scripts without having to rewrite the same code several times. Collaboration: your teammates can easily access your data through the API endpoints without needing to duplicate your code or download the same datasets in their machines. Portability: You can deploy it anywhere—locally, on the cloud, in a container, or even on a Raspberry Pi. Testing: Need to test a new feature or model update? Push it to your API and instantly test across all clients (notebooks, apps, dashboards). Encapsulation and Versioning: You can version your logic (v1, v2, etc.) and separate raw data from processed logic cleanly. That’s a huge plus for maintainability. And FastAPI is perfect for this. But let’s see some real use cases where anyone like you and me would benefit from a personal API. Some Use Cases Whether you’re a data scientist, analyst, ML engineer, or just building cool stuff on weekends, a personal API can become your secret productivity weapon. Here are three examples: Model-as-a-service (MASS): train an ML model locally and expose it to your public through an endpoint like /predict. And options from here are endless: rapid prototyping, integrating it on a frontend… Dashboard-ready data: Serve preprocessed, clean, and filtered datasets to BI tools or custom dashboards. You can centralize logic in your API, so the dashboard stays lightweight and doesn’t re-implement filtering or aggregation. Reusable data access layer: When working on a project that contains multiple Notebooks, has it ever happened to you that the first cells on all of them contain always the same code? Well, what if you centralized all that code into your API and got it done from a single request? Yes, you could modularize it as well and call a function to do the same, but creating the API allows you to go one step further, being able to use it easily from anywhere (not just locally). I hope you get the point. Options are endless, just like its usefulness. But let’s get to the interesting part: building the API. Setting it up with FastAPI As always, start by setting up the environment with your favorite env tool (venv, pipenv…). Then, install fastapi and uvicorn with pip install fastapi uvicorn. Let’s understand what they do: FastAPI[1]: it’s the library that will allow us to develop the API, essentially. Uvicorn[2]: it’s what will allow us to run the web server. Once installed, we only need one file. For simplicity, we’ll call it app.py. Let’s now put some context into what we’ll do: Imagine we’re building a smart irrigation system for our vegetable garden at home. The irrigation system is quite simple: we have a moisture sensor that reads the soil moisture with certain frequency, and we want to activate the system when it’s below 30%. Of course we want to automate it locally, so when it hits the threshold it starts dropping water. But we’re also interested in being able to access the system remotely, maybe reading the current value or even triggering the water pump if we want to. That’s when the personal API can come in handy. Here’s the basic code that will allow us to do just that (note that I’m using another library, duckdb[3], because that’s where I would store the data — but you could just use sqlite3, pandas, or whatever you like): import datetime from fastapi import FastAPI, Query import duckdb app = FastAPI() conn = duckdb.connect("moisture_data.db") @app.get("/last_moisture") def get_last_moisture(): query = "SELECT * FROM moisture_reads ORDER BY day DESC, time DESC LIMIT 1" return conn.execute(query).df().to_dict(orient="records") @app.get("/moisture_reads/{day}") def get_moisture_reads(day: datetime.date, time: datetime.time = Query(None)): query = "SELECT * FROM moisture_reads WHERE day = ?" args = [day] if time: query += " AND time = ?" args.append(time) return conn.execute(query, args).df().to_dict(orient="records") @app.get("/trigger_irrigation") def trigger_irrigation(): # This is a placeholder for the actual irrigation trigger logic # In a real-world scenario, you would integrate with your irrigation system here return {"message": "Irrigation triggered"} Reading vertically, this code separates three main blocks: Imports Setting up the app object and the DB connection Creating the API endpoints 1 and 2 are pretty straightforward, so we’ll focus on the third one. What I did here was create 3 endpoints with their own functions: /last_moisture shows the last sensor value (the most recent one). /moisture_reads/{day} is useful to see the sensor reads from a single day. For example, if I wanted to compare moisture levels in winter with the ones in summer, I would check what’s in /moisture_reads/2024-01-01 and observe the differences with /moisture_reads/2024-08-01.But I’ve also made it able to read GET parameters if I’m interested in checking a specific time. For example: /moisture_reads/2024-01-01?time=10:00 /trigger_irrigation would do what the name suggests. So we’re only missing one part, starting the server. See how simple it is to run it locally: uvicorn app:app --reload Now I could visit: http://localhost:8000/last_moisture to see my last moisture http://localhost:8000/moisture_reads/2024-01-01 to see the moisture levels of January 1st, 2024. http://localhost:8000/trigger_irrigation to start pumping water. But it doesn’t end here. FastAPI provides another endpoint which is found in http://localhost:8000/docs that shows autogenerated interactive documentation for our API. In our case: It’s extremely useful when the API is collaborative, because we don’t need to check the code to be able to see all the endpoints we have access to! And with just a few lines of code, very few in fact, we’ve been able to build our personal API. It can obviously get a lot more complicated (and probably should) but that wasn’t today’s purpose. Conclusion With just a few lines of Python and the power of FastAPI, you’ve now seen how easy it is to expose your data or logic through a personal API. Whether you’re building a smart irrigation system, exposing a machine learning model, or just tired of rewriting the same wrangling logic across notebooks—this approach brings modularity, collaboration, and scalability to your projects. And this is just the beginning. You could: Add authentication and versioning Deploy to the cloud or a Raspberry Pi Chain it to a frontend or a Telegram bot Turn your portfolio into a living, breathing project hub If you’ve ever wanted your data work to feel like a real product—this is your gateway. Let me know if you build something cool with it. Or even better, send me the URL to your /predict, /last_moisture, or whatever API you’ve made. I’d love to see what you come up with. Resources [1] Ramírez, S. (2018). FastAPI (Version 0.109.2) [Computer software]. https://fastapi.tiangolo.com [2] Encode. (2018). Uvicorn (Version 0.27.0) [Computer software]. https://www.uvicorn.org [3] Mühleisen, H., Raasveldt, M., & DuckDB Contributors. (2019). DuckDB (Version 0.10.2) [Computer software]. https://duckdb.org The post Building a Personal API for Your Data Projects with FastAPI appeared first on Towards Data Science.
    0 Commentarios 0 Acciones 43 Views
  • 0 Commentarios 0 Acciones 46 Views
  • 0 Commentarios 0 Acciones 42 Views
Quizás te interese…