Site içinde arama yapın

Yazı

Blogs

Kullanıcılar

Sayfalar

Gruplar

Design Week @DesignWeek paylaşılan bir bağlantı
2025-06-07 21:31:40 ·

design/leader: Studio Noel founder Michelle Noel

6 June, 2025

In our weekly interview series, design leaders answer five questions about design, and five questions about leadership.

Michelle Noel is the founder and strategy director at London branding and design agency Studio Noel, whose clients include Centrepoint, the Natural History Museum, and Imperial College London.
Design
What would your monograph be called?
I actually asked my team about this one, and this is what they came up with – That Pencil Better Be Sharp. This may be down to me only ever writing in pencil and them taking the piss!
What recent design work made you a bit jealous?
It’s not recent, but I still love Pentagram’s Battersea Cats and Dogs home rebrand. I love the energy in the identity and the illustrative approach which feels really expressive and representative of all the different breeds.
What’s an unusual place you get inspiration from?
It’s always been about going outside and getting away from the screen. Walking helps me get inspired as it gives my mind the ability to be still, and I find then that ideas naturally come.
Name something that is brilliantly designed, but overlooked.
The pencil sharpener. There are some beautifully designed old ones that are really interesting in how they work.
What object in your studio best sums up your taste?
We commissioned Rachel Joy to create a piece of art for our studio. I love the bold typography and colour palettes that she uses in all of her signwriting artwork.
Studio Noel’s Rachel Joy artwork
Leadership
What feedback felt brutal at the time but turned out to be useful?
When we first started out as an agency, our website and logo were intentionally minimal. But during a pitch, a client casually remarked, “We didn’t really have a brand identity.”
It was a passing comment, but it stuck with me. I realised that while we were focused on our clients, our own identity wasn’t shining through, especially in our proposals. That moment prompted a shift.
We went back to refine and strengthen how we presented ourselves, making sure our brand was clear, consistent and confidently expressed in everything we created from that point on.
What’s an underappreciated skill that design leaders need?
Leadership is about stepping back and giving others the space to grow. Creating room for more junior team members to try and even fail. This accelerates their development and builds confidence.
We’ve also learned that great ideas can come from anyone in the business, regardless of their role or level. Embracing that mindset has been key to our agency’s growth.
What keeps you up at night?
Often it’s an idea or opportunity sparked by something I’ve read. It could be a new way to approach a strategy, a shift in how we work, or even a big-picture concept that could shape our growth.
What trait is non-negotiable in new hires?
We’re looking for curious, collaborative individuals eager to learn and grow with us. There’s no room for big egos. As we grow the business, we want people who are just as invested in shaping its future.
Complete this sentence, “I wish more clients…”
….prioritised accessibility in their branding. We do a lot of work in this area, including training teams on both the client and agency side, and we’ve seen first-hand how powerful it can be.
Often, the barrier is simply a lack of knowledge, both in what makes a brand accessible and in the broader benefits, like significantly increasing audience reach. With nearly one in five people in the UK living with a disability, that’s a huge portion of their audience that could be unintentionally excluded.

Brands in this article

What to read next

design/leader: OCCA founder Kate Mooney

Interior Design
8 May, 2025
#designleader #studio #noel #founder #michelle

design/leader: Studio Noel founder Michelle Noel
6 June, 2025 In our weekly interview series, design leaders answer five questions about design, and five questions about leadership. Michelle Noel is the founder and strategy director at London branding and design agency Studio Noel, whose clients include Centrepoint, the Natural History Museum, and Imperial College London. Design What would your monograph be called? I actually asked my team about this one, and this is what they came up with – That Pencil Better Be Sharp. This may be down to me only ever writing in pencil and them taking the piss! What recent design work made you a bit jealous? It’s not recent, but I still love Pentagram’s Battersea Cats and Dogs home rebrand. I love the energy in the identity and the illustrative approach which feels really expressive and representative of all the different breeds. What’s an unusual place you get inspiration from? It’s always been about going outside and getting away from the screen. Walking helps me get inspired as it gives my mind the ability to be still, and I find then that ideas naturally come. Name something that is brilliantly designed, but overlooked. The pencil sharpener. There are some beautifully designed old ones that are really interesting in how they work. What object in your studio best sums up your taste? We commissioned Rachel Joy to create a piece of art for our studio. I love the bold typography and colour palettes that she uses in all of her signwriting artwork. Studio Noel’s Rachel Joy artwork Leadership What feedback felt brutal at the time but turned out to be useful? When we first started out as an agency, our website and logo were intentionally minimal. But during a pitch, a client casually remarked, “We didn’t really have a brand identity.” It was a passing comment, but it stuck with me. I realised that while we were focused on our clients, our own identity wasn’t shining through, especially in our proposals. That moment prompted a shift. We went back to refine and strengthen how we presented ourselves, making sure our brand was clear, consistent and confidently expressed in everything we created from that point on. What’s an underappreciated skill that design leaders need? Leadership is about stepping back and giving others the space to grow. Creating room for more junior team members to try and even fail. This accelerates their development and builds confidence. We’ve also learned that great ideas can come from anyone in the business, regardless of their role or level. Embracing that mindset has been key to our agency’s growth. What keeps you up at night? Often it’s an idea or opportunity sparked by something I’ve read. It could be a new way to approach a strategy, a shift in how we work, or even a big-picture concept that could shape our growth. What trait is non-negotiable in new hires? We’re looking for curious, collaborative individuals eager to learn and grow with us. There’s no room for big egos. As we grow the business, we want people who are just as invested in shaping its future. Complete this sentence, “I wish more clients…” ….prioritised accessibility in their branding. We do a lot of work in this area, including training teams on both the client and agency side, and we’ve seen first-hand how powerful it can be. Often, the barrier is simply a lack of knowledge, both in what makes a brand accessible and in the broader benefits, like significantly increasing audience reach. With nearly one in five people in the UK living with a disability, that’s a huge portion of their audience that could be unintentionally excluded. Brands in this article What to read next design/leader: OCCA founder Kate Mooney Interior Design 8 May, 2025 #designleader #studio #noel #founder #michelle

design/leader: Studio Noel founder Michelle Noel

www.designweek.co.uk
6 June, 2025 In our weekly interview series, design leaders answer five questions about design, and five questions about leadership. Michelle Noel is the founder and strategy director at London branding and design agency Studio Noel, whose clients include Centrepoint, the Natural History Museum, and Imperial College London. Design What would your monograph be called? I actually asked my team about this one, and this is what they came up with – That Pencil Better Be Sharp. This may be down to me only ever writing in pencil and them taking the piss! What recent design work made you a bit jealous? It’s not recent, but I still love Pentagram’s Battersea Cats and Dogs home rebrand. I love the energy in the identity and the illustrative approach which feels really expressive and representative of all the different breeds. What’s an unusual place you get inspiration from? It’s always been about going outside and getting away from the screen. Walking helps me get inspired as it gives my mind the ability to be still, and I find then that ideas naturally come. Name something that is brilliantly designed, but overlooked. The pencil sharpener. There are some beautifully designed old ones that are really interesting in how they work. What object in your studio best sums up your taste? We commissioned Rachel Joy to create a piece of art for our studio. I love the bold typography and colour palettes that she uses in all of her signwriting artwork. Studio Noel’s Rachel Joy artwork Leadership What feedback felt brutal at the time but turned out to be useful? When we first started out as an agency, our website and logo were intentionally minimal. But during a pitch, a client casually remarked, “We didn’t really have a brand identity.” It was a passing comment, but it stuck with me. I realised that while we were focused on our clients, our own identity wasn’t shining through, especially in our proposals. That moment prompted a shift. We went back to refine and strengthen how we presented ourselves, making sure our brand was clear, consistent and confidently expressed in everything we created from that point on. What’s an underappreciated skill that design leaders need? Leadership is about stepping back and giving others the space to grow. Creating room for more junior team members to try and even fail. This accelerates their development and builds confidence. We’ve also learned that great ideas can come from anyone in the business, regardless of their role or level. Embracing that mindset has been key to our agency’s growth. What keeps you up at night? Often it’s an idea or opportunity sparked by something I’ve read. It could be a new way to approach a strategy, a shift in how we work, or even a big-picture concept that could shape our growth. What trait is non-negotiable in new hires? We’re looking for curious, collaborative individuals eager to learn and grow with us. There’s no room for big egos. As we grow the business, we want people who are just as invested in shaping its future. Complete this sentence, “I wish more clients…” ….prioritised accessibility in their branding. We do a lot of work in this area, including training teams on both the client and agency side, and we’ve seen first-hand how powerful it can be. Often, the barrier is simply a lack of knowledge, both in what makes a brand accessible and in the broader benefits, like significantly increasing audience reach. With nearly one in five people in the UK living with a disability, that’s a huge portion of their audience that could be unintentionally excluded. Brands in this article What to read next design/leader: OCCA founder Kate Mooney Interior Design 8 May, 2025

600

· 4 Yorumlar ·0 hisse senetleri ·0 önizleme

Please log in to like, share and comment!
abdz @abdz paylaşılan bir bağlantı
2025-06-06 08:26:40 ·

Leiria Film Fest: Playful Branding & Visual Identity

06/05 — 2025

by abduzeedo

Explore the vibrant branding and visual identity of Leiria Film Fest 2025, designed for broad appeal and lasting impact.
The 12th edition of the Leiria Film Fest, held from May 19-25, 2025, presented an opportunity for a compelling visual refresh. Paulo Graça, known as Senhor Paulinho, embraced a design philosophy that is both vibrant and inclusive, setting a new standard for film festival branding and visual identity. His work reflects cinema's power to captivate audiences across generations. This approach moves beyond simple promotion. It seeks to forge an emotional connection, celebrating cinema's universal appeal.
Senhor Paulinho's graphic concept uses an illustrative, almost "naive" aesthetic. This choice relies on simple shapes and hand-drawn lines. These elements evoke the spontaneity seen in children's drawings. They also recall the classic texture of film stock. This handcrafted feel bridges the gap between creating art and imagining stories. These two acts are central to both filmmaking and drawing. Such a visual identity extends beyond mere aesthetics. It serves a deeper symbolic purpose: inviting all ages into the cinematic experience. It creates an accessible and emotionally resonant graphic universe, paying tribute to projection, the moving image, and cinema's visual memory.
The visual elements were designed for versatility. They scale across various platforms without losing impact. From printed materials like posters and programs to digital banners, festival merchandise, and venue signage, the branding remains cohesive. This adaptability ensures a strong visual identity throughout all communication stages. The imagery, featuring bold eyes and abstract film-like motifs, resonates with the festival's focus on competitive and special sessions, children's events, and even an after-party.
This comprehensive branding and visual identity strategy aims to do more than simply advertise an event. It aims to spark curiosity and foster emotional engagement. It celebrates cinema's enduring ability to unite, inspire, and ignite imagination. The design extends an open invitation, welcoming everyone to a shared cultural experience that is both joyful and profound. It reinforces the festival's commitment to community and the arts.
The success of the Leiria Film Fest's visual identity lies in its ability to communicate warmth and accessibility while maintaining a sophisticated design language. It is a testament to how thoughtful branding can elevate an event, making it more memorable and inviting. This project provides a valuable case study for designers seeking to create impactful and inclusive visual narratives.
Graphic Design by Paulo Graça a.k.a. Senhor Paulinho. Visit his work at .
Branding and visual identity artifacts

Tags

branding
#leiria #film #fest #playful #branding

Leiria Film Fest: Playful Branding & Visual Identity
06/05 — 2025 by abduzeedo Explore the vibrant branding and visual identity of Leiria Film Fest 2025, designed for broad appeal and lasting impact. The 12th edition of the Leiria Film Fest, held from May 19-25, 2025, presented an opportunity for a compelling visual refresh. Paulo Graça, known as Senhor Paulinho, embraced a design philosophy that is both vibrant and inclusive, setting a new standard for film festival branding and visual identity. His work reflects cinema's power to captivate audiences across generations. This approach moves beyond simple promotion. It seeks to forge an emotional connection, celebrating cinema's universal appeal. Senhor Paulinho's graphic concept uses an illustrative, almost "naive" aesthetic. This choice relies on simple shapes and hand-drawn lines. These elements evoke the spontaneity seen in children's drawings. They also recall the classic texture of film stock. This handcrafted feel bridges the gap between creating art and imagining stories. These two acts are central to both filmmaking and drawing. Such a visual identity extends beyond mere aesthetics. It serves a deeper symbolic purpose: inviting all ages into the cinematic experience. It creates an accessible and emotionally resonant graphic universe, paying tribute to projection, the moving image, and cinema's visual memory. The visual elements were designed for versatility. They scale across various platforms without losing impact. From printed materials like posters and programs to digital banners, festival merchandise, and venue signage, the branding remains cohesive. This adaptability ensures a strong visual identity throughout all communication stages. The imagery, featuring bold eyes and abstract film-like motifs, resonates with the festival's focus on competitive and special sessions, children's events, and even an after-party. This comprehensive branding and visual identity strategy aims to do more than simply advertise an event. It aims to spark curiosity and foster emotional engagement. It celebrates cinema's enduring ability to unite, inspire, and ignite imagination. The design extends an open invitation, welcoming everyone to a shared cultural experience that is both joyful and profound. It reinforces the festival's commitment to community and the arts. The success of the Leiria Film Fest's visual identity lies in its ability to communicate warmth and accessibility while maintaining a sophisticated design language. It is a testament to how thoughtful branding can elevate an event, making it more memorable and inviting. This project provides a valuable case study for designers seeking to create impactful and inclusive visual narratives. Graphic Design by Paulo Graça a.k.a. Senhor Paulinho. Visit his work at . Branding and visual identity artifacts Tags branding #leiria #film #fest #playful #branding

Leiria Film Fest: Playful Branding & Visual Identity

abduzeedo.com
06/05 — 2025 by abduzeedo Explore the vibrant branding and visual identity of Leiria Film Fest 2025, designed for broad appeal and lasting impact. The 12th edition of the Leiria Film Fest, held from May 19-25, 2025, presented an opportunity for a compelling visual refresh. Paulo Graça, known as Senhor Paulinho, embraced a design philosophy that is both vibrant and inclusive, setting a new standard for film festival branding and visual identity. His work reflects cinema's power to captivate audiences across generations. This approach moves beyond simple promotion. It seeks to forge an emotional connection, celebrating cinema's universal appeal. Senhor Paulinho's graphic concept uses an illustrative, almost "naive" aesthetic. This choice relies on simple shapes and hand-drawn lines. These elements evoke the spontaneity seen in children's drawings. They also recall the classic texture of film stock. This handcrafted feel bridges the gap between creating art and imagining stories. These two acts are central to both filmmaking and drawing. Such a visual identity extends beyond mere aesthetics. It serves a deeper symbolic purpose: inviting all ages into the cinematic experience. It creates an accessible and emotionally resonant graphic universe, paying tribute to projection, the moving image, and cinema's visual memory. The visual elements were designed for versatility. They scale across various platforms without losing impact. From printed materials like posters and programs to digital banners, festival merchandise, and venue signage, the branding remains cohesive. This adaptability ensures a strong visual identity throughout all communication stages. The imagery, featuring bold eyes and abstract film-like motifs, resonates with the festival's focus on competitive and special sessions, children's events, and even an after-party. This comprehensive branding and visual identity strategy aims to do more than simply advertise an event. It aims to spark curiosity and foster emotional engagement. It celebrates cinema's enduring ability to unite, inspire, and ignite imagination. The design extends an open invitation, welcoming everyone to a shared cultural experience that is both joyful and profound. It reinforces the festival's commitment to community and the arts. The success of the Leiria Film Fest's visual identity lies in its ability to communicate warmth and accessibility while maintaining a sophisticated design language. It is a testament to how thoughtful branding can elevate an event, making it more memorable and inviting. This project provides a valuable case study for designers seeking to create impactful and inclusive visual narratives. Graphic Design by Paulo Graça a.k.a. Senhor Paulinho. Visit his work at https://paulograca.com/en/projects/leiria-film-fest-2025. Branding and visual identity artifacts Tags branding

230

· 4 Yorumlar ·0 hisse senetleri ·0 önizleme

Please log in to like, share and comment!
Microsoft Academic @MicrosoftAcademic paylaşılan bir bağlantı
2025-06-06 07:52:50 ·

BenchmarkQED: Automated benchmarking of RAG systems

One of the key use cases for generative AI involves answering questions over private datasets, with retrieval-augmented generation as the go-to framework. As new RAG techniques emerge, there’s a growing need to benchmark their performance across diverse datasets and metrics.
To meet this need, we’re introducing BenchmarkQED, a new suite of tools that automates RAG benchmarking at scale, available on GitHub. It includes components for query generation, evaluation, and dataset preparation, each designed to support rigorous, reproducible testing.
BenchmarkQED complements the RAG methods in our open-source GraphRAG library, enabling users to run a GraphRAG-style evaluation across models, metrics, and datasets. GraphRAG uses a large language model to generate and summarize entity-based knowledge graphs, producing more comprehensive and diverse answers than standard RAG for large-scale tasks.
In this post, we walk through the core components of BenchmarkQED that contribute to the overall benchmarking process. We also share some of the latest benchmark results comparing our LazyGraphRAG system to competing methods, including a vector-based RAG with a 1M-token context window, where the leading LazyGraphRAG configuration showed significant win rates across all combinations of quality metrics and query classes.
In the paper, we distinguish between local queries, where answers are found in a small number of text regions, and sometimes even a single region, and global queries, which require reasoning over large portions of or even the entire dataset.
Conventional vector-based RAG excels at local queries because the regions containing the answer to the query resemble the query itself and can be retrieved as the nearest neighbor in the vector space of text embeddings. However, it struggles with global questions, such as, “What are the main themes of the dataset?” which require understanding dataset qualities not explicitly stated in the text.
AutoQ: Automated query synthesis
This limitation motivated the development of GraphRAG a system designed to answer global queries. GraphRAG’s evaluation requirements subsequently led to the creation of AutoQ, a method for synthesizing these global queries for any dataset.
AutoQ extends this approach by generating synthetic queries across the spectrum of queries, from local to global. It defines four distinct classes based on the source and scope of the queryforming a logical progression along the spectrum.
Figure 1. Construction of a 2×2 design space for synthetic query generation with AutoQ, showing how the four resulting query classes map onto the local-global query spectrum.
AutoQ can be configured to generate any number and distribution of synthetic queries along these classes, enabling consistent benchmarking across datasets without requiring user customization. Figure 2 shows the synthesis process and sample queries from each class, using an AP News dataset.
Figure 2. Synthesis process and example query for each of the four AutoQ query classes.

About Microsoft Research
Advancing science and technology to benefit humanity

View our story

Opens in a new tab
AutoE: Automated evaluation framework
Our evaluation of GraphRAG focused on analyzing key qualities of answers to global questions. The following qualities were used for the current evaluation:

Comprehensiveness: Does the answer address all relevant aspects of the question?
Diversity: Does it present varied perspectives or insights?
Empowerment: Does it help the reader understand and make informed judgments?
Relevance: Does it address what the question is specifically asking?

The AutoE component scales evaluation of these qualities using the LLM-as-a-Judge method. It presents pairs of answers to an LLM, along with the query and target metric, in counterbalanced order. The model determines whether the first answer wins, loses, or ties with the second. Over a set of queries, whether from AutoQ or elsewhere, this produces win rates between competing methods. When ground truth is available, AutoE can also score answers on correctness, completeness, and related metrics.
An illustrative evaluation is shown in Figure 3. Using a dataset of 1,397 AP News articles on health and healthcare, AutoQ generated 50 queries per class . AutoE then compared LazyGraphRAG to a competing RAG method, running six trials per query across four metrics, using GPT-4.1 as a judge.
These trial-level results were aggregated using metric-based win rates, where each trial is scored 1 for a win, 0.5 for a tie, and 0 for a loss, and then averaged to calculate the overall win rate for each RAG method.
Figure 3. Win rates of four LazyGraphRAG configurations across methods, broken down by the AutoQ query class and averaged across AutoE’s four metrics: comprehensiveness, diversity, empowerment, and relevance. LazyGraphRAG outperforms comparison conditions where the bar is above 50%.
The four LazyGraphRAG conditionsdiffer by query budgetand chunk size. All used GPT-4o mini for relevance tests and GPT-4o for query expansionand answer generation, except for LGR_b200_c200_mini, which used GPT-4o mini throughout.
Comparison systems were GraphRAG , Vector RAG with 8k- and 120k-token windows, and three published methods: LightRAG, RAPTOR, and TREX. All methods were limited to the same 8k tokens for answer generation. GraphRAG Global Search used level 2 of the community hierarchy.
LazyGraphRAG outperformed every comparison condition using the same generative model, winning all 96 comparisons, with all but one reaching statistical significance. The best overall performance came from the larger budget, smaller chunk size configuration. For DataLocal queries, the smaller budgetperformed slightly better, likely because fewer chunks were relevant. For ActivityLocal queries, the larger chunk sizehad a slight edge, likely because longer chunks provide a more coherent context.
Competing methods performed relatively better on the query classes for which they were designed: GraphRAG Global for global queries, Vector RAG for local queries, and GraphRAG Drift Search, which combines both strategies, posed the strongest challenge overall.
Increasing Vector RAG’s context window from 8k to 120k tokens did not improve its performance compared to LazyGraphRAG. This raised the question of how LazyGraphRAG would perform against Vector RAG with 1-million token context window containing most of the dataset.
Figure 4 shows the follow-up experiment comparing LazyGraphRAG to Vector RAG using GPT-4.1 that enabled this comparison. Even against the 1M-token window, LazyGraphRAG achieved higher win rates across all comparisons, failing to reach significance only for the relevance of answers to DataLocal queries. These queries tend to benefit most from Vector RAG’s ranking of directly relevant chunks, making it hard for LazyGraphRAG to generate answers that have greater relevance to the query, even though these answers may be dramatically more comprehensive, diverse, and empowering overall.
Figure 4. Win rates of LazyGraphRAG  over Vector RAG across different context window sizes, broken down by the four AutoQ query classes and four AutoE metrics: comprehensiveness, diversity, empowerment, and relevance. Bars above 50% indicate that LazyGraphRAG outperformed the comparison condition.
AutoD: Automated data sampling and summarization
Text datasets have an underlying topical structure, but the depth, breadth, and connectivity of that structure can vary widely. This variability makes it difficult to evaluate RAG systems consistently, as results may reflect the idiosyncrasies of the dataset rather than the system’s general capabilities.
The AutoD component addresses this by sampling datasets to meet a target specification, defined by the number of topic clustersand the number of samples per cluster. This creates consistency across datasets, enabling more meaningful comparisons, as structurally aligned datasets lead to comparable AutoQ queries, which in turn support consistent AutoE evaluations.
AutoD also includes tools for summarizing input or output datasets in a way that reflects their topical coverage. These summaries play an important role in the AutoQ query synthesis process, but they can also be used more broadly, such as in prompts where context space is limited.
Since the release of the GraphRAG paper, we’ve received many requests to share the dataset of the Behind the Tech podcast transcripts we used in our evaluation. An updated version of this dataset is now available in the BenchmarkQED repository, alongside the AP News dataset containing 1,397 health-related articles, licensed for open release.
We hope these datasets, together with the BenchmarkQED tools, help accelerate benchmark-driven development of RAG systems and AI question-answering. We invite the community to try them on GitHub.
Opens in a new tab
#benchmarkqedautomatedbenchmarking #ofrag #systems

BenchmarkQED: Automated benchmarking of RAG systems
One of the key use cases for generative AI involves answering questions over private datasets, with retrieval-augmented generation as the go-to framework. As new RAG techniques emerge, there’s a growing need to benchmark their performance across diverse datasets and metrics. To meet this need, we’re introducing BenchmarkQED, a new suite of tools that automates RAG benchmarking at scale, available on GitHub. It includes components for query generation, evaluation, and dataset preparation, each designed to support rigorous, reproducible testing.   BenchmarkQED complements the RAG methods in our open-source GraphRAG library, enabling users to run a GraphRAG-style evaluation across models, metrics, and datasets. GraphRAG uses a large language model to generate and summarize entity-based knowledge graphs, producing more comprehensive and diverse answers than standard RAG for large-scale tasks. In this post, we walk through the core components of BenchmarkQED that contribute to the overall benchmarking process. We also share some of the latest benchmark results comparing our LazyGraphRAG system to competing methods, including a vector-based RAG with a 1M-token context window, where the leading LazyGraphRAG configuration showed significant win rates across all combinations of quality metrics and query classes. In the paper, we distinguish between local queries, where answers are found in a small number of text regions, and sometimes even a single region, and global queries, which require reasoning over large portions of or even the entire dataset. Conventional vector-based RAG excels at local queries because the regions containing the answer to the query resemble the query itself and can be retrieved as the nearest neighbor in the vector space of text embeddings. However, it struggles with global questions, such as, “What are the main themes of the dataset?” which require understanding dataset qualities not explicitly stated in the text.   AutoQ: Automated query synthesis This limitation motivated the development of GraphRAG a system designed to answer global queries. GraphRAG’s evaluation requirements subsequently led to the creation of AutoQ, a method for synthesizing these global queries for any dataset. AutoQ extends this approach by generating synthetic queries across the spectrum of queries, from local to global. It defines four distinct classes based on the source and scope of the queryforming a logical progression along the spectrum. Figure 1. Construction of a 2×2 design space for synthetic query generation with AutoQ, showing how the four resulting query classes map onto the local-global query spectrum. AutoQ can be configured to generate any number and distribution of synthetic queries along these classes, enabling consistent benchmarking across datasets without requiring user customization. Figure 2 shows the synthesis process and sample queries from each class, using an AP News dataset. Figure 2. Synthesis process and example query for each of the four AutoQ query classes. About Microsoft Research Advancing science and technology to benefit humanity View our story Opens in a new tab AutoE: Automated evaluation framework Our evaluation of GraphRAG focused on analyzing key qualities of answers to global questions. The following qualities were used for the current evaluation: Comprehensiveness: Does the answer address all relevant aspects of the question? Diversity: Does it present varied perspectives or insights? Empowerment: Does it help the reader understand and make informed judgments? Relevance: Does it address what the question is specifically asking?   The AutoE component scales evaluation of these qualities using the LLM-as-a-Judge method. It presents pairs of answers to an LLM, along with the query and target metric, in counterbalanced order. The model determines whether the first answer wins, loses, or ties with the second. Over a set of queries, whether from AutoQ or elsewhere, this produces win rates between competing methods. When ground truth is available, AutoE can also score answers on correctness, completeness, and related metrics. An illustrative evaluation is shown in Figure 3. Using a dataset of 1,397 AP News articles on health and healthcare, AutoQ generated 50 queries per class . AutoE then compared LazyGraphRAG to a competing RAG method, running six trials per query across four metrics, using GPT-4.1 as a judge. These trial-level results were aggregated using metric-based win rates, where each trial is scored 1 for a win, 0.5 for a tie, and 0 for a loss, and then averaged to calculate the overall win rate for each RAG method. Figure 3. Win rates of four LazyGraphRAG configurations across methods, broken down by the AutoQ query class and averaged across AutoE’s four metrics: comprehensiveness, diversity, empowerment, and relevance. LazyGraphRAG outperforms comparison conditions where the bar is above 50%. The four LazyGraphRAG conditionsdiffer by query budgetand chunk size. All used GPT-4o mini for relevance tests and GPT-4o for query expansionand answer generation, except for LGR_b200_c200_mini, which used GPT-4o mini throughout. Comparison systems were GraphRAG , Vector RAG with 8k- and 120k-token windows, and three published methods: LightRAG, RAPTOR, and TREX. All methods were limited to the same 8k tokens for answer generation. GraphRAG Global Search used level 2 of the community hierarchy. LazyGraphRAG outperformed every comparison condition using the same generative model, winning all 96 comparisons, with all but one reaching statistical significance. The best overall performance came from the larger budget, smaller chunk size configuration. For DataLocal queries, the smaller budgetperformed slightly better, likely because fewer chunks were relevant. For ActivityLocal queries, the larger chunk sizehad a slight edge, likely because longer chunks provide a more coherent context. Competing methods performed relatively better on the query classes for which they were designed: GraphRAG Global for global queries, Vector RAG for local queries, and GraphRAG Drift Search, which combines both strategies, posed the strongest challenge overall. Increasing Vector RAG’s context window from 8k to 120k tokens did not improve its performance compared to LazyGraphRAG. This raised the question of how LazyGraphRAG would perform against Vector RAG with 1-million token context window containing most of the dataset. Figure 4 shows the follow-up experiment comparing LazyGraphRAG to Vector RAG using GPT-4.1 that enabled this comparison. Even against the 1M-token window, LazyGraphRAG achieved higher win rates across all comparisons, failing to reach significance only for the relevance of answers to DataLocal queries. These queries tend to benefit most from Vector RAG’s ranking of directly relevant chunks, making it hard for LazyGraphRAG to generate answers that have greater relevance to the query, even though these answers may be dramatically more comprehensive, diverse, and empowering overall. Figure 4. Win rates of LazyGraphRAG  over Vector RAG across different context window sizes, broken down by the four AutoQ query classes and four AutoE metrics: comprehensiveness, diversity, empowerment, and relevance. Bars above 50% indicate that LazyGraphRAG outperformed the comparison condition. AutoD: Automated data sampling and summarization Text datasets have an underlying topical structure, but the depth, breadth, and connectivity of that structure can vary widely. This variability makes it difficult to evaluate RAG systems consistently, as results may reflect the idiosyncrasies of the dataset rather than the system’s general capabilities. The AutoD component addresses this by sampling datasets to meet a target specification, defined by the number of topic clustersand the number of samples per cluster. This creates consistency across datasets, enabling more meaningful comparisons, as structurally aligned datasets lead to comparable AutoQ queries, which in turn support consistent AutoE evaluations. AutoD also includes tools for summarizing input or output datasets in a way that reflects their topical coverage. These summaries play an important role in the AutoQ query synthesis process, but they can also be used more broadly, such as in prompts where context space is limited. Since the release of the GraphRAG paper, we’ve received many requests to share the dataset of the Behind the Tech podcast transcripts we used in our evaluation. An updated version of this dataset is now available in the BenchmarkQED repository, alongside the AP News dataset containing 1,397 health-related articles, licensed for open release.   We hope these datasets, together with the BenchmarkQED tools, help accelerate benchmark-driven development of RAG systems and AI question-answering. We invite the community to try them on GitHub. Opens in a new tab #benchmarkqedautomatedbenchmarking #ofrag #systems

BenchmarkQED: Automated benchmarking of RAG systems

www.microsoft.com
One of the key use cases for generative AI involves answering questions over private datasets, with retrieval-augmented generation (RAG) as the go-to framework. As new RAG techniques emerge, there’s a growing need to benchmark their performance across diverse datasets and metrics. To meet this need, we’re introducing BenchmarkQED, a new suite of tools that automates RAG benchmarking at scale, available on GitHub (opens in new tab). It includes components for query generation, evaluation, and dataset preparation, each designed to support rigorous, reproducible testing.   BenchmarkQED complements the RAG methods in our open-source GraphRAG library, enabling users to run a GraphRAG-style evaluation across models, metrics, and datasets. GraphRAG uses a large language model (LLM) to generate and summarize entity-based knowledge graphs, producing more comprehensive and diverse answers than standard RAG for large-scale tasks. In this post, we walk through the core components of BenchmarkQED that contribute to the overall benchmarking process. We also share some of the latest benchmark results comparing our LazyGraphRAG system to competing methods, including a vector-based RAG with a 1M-token context window, where the leading LazyGraphRAG configuration showed significant win rates across all combinations of quality metrics and query classes. In the paper, we distinguish between local queries, where answers are found in a small number of text regions, and sometimes even a single region, and global queries, which require reasoning over large portions of or even the entire dataset. Conventional vector-based RAG excels at local queries because the regions containing the answer to the query resemble the query itself and can be retrieved as the nearest neighbor in the vector space of text embeddings. However, it struggles with global questions, such as, “What are the main themes of the dataset?” which require understanding dataset qualities not explicitly stated in the text.   AutoQ: Automated query synthesis This limitation motivated the development of GraphRAG a system designed to answer global queries. GraphRAG’s evaluation requirements subsequently led to the creation of AutoQ, a method for synthesizing these global queries for any dataset. AutoQ extends this approach by generating synthetic queries across the spectrum of queries, from local to global. It defines four distinct classes based on the source and scope of the query (Figure 1, top) forming a logical progression along the spectrum (Figure 1, bottom). Figure 1. Construction of a 2×2 design space for synthetic query generation with AutoQ, showing how the four resulting query classes map onto the local-global query spectrum. AutoQ can be configured to generate any number and distribution of synthetic queries along these classes, enabling consistent benchmarking across datasets without requiring user customization. Figure 2 shows the synthesis process and sample queries from each class, using an AP News dataset. Figure 2. Synthesis process and example query for each of the four AutoQ query classes. About Microsoft Research Advancing science and technology to benefit humanity View our story Opens in a new tab AutoE: Automated evaluation framework Our evaluation of GraphRAG focused on analyzing key qualities of answers to global questions. The following qualities were used for the current evaluation: Comprehensiveness: Does the answer address all relevant aspects of the question? Diversity: Does it present varied perspectives or insights? Empowerment: Does it help the reader understand and make informed judgments? Relevance: Does it address what the question is specifically asking?   The AutoE component scales evaluation of these qualities using the LLM-as-a-Judge method. It presents pairs of answers to an LLM, along with the query and target metric, in counterbalanced order. The model determines whether the first answer wins, loses, or ties with the second. Over a set of queries, whether from AutoQ or elsewhere, this produces win rates between competing methods. When ground truth is available, AutoE can also score answers on correctness, completeness, and related metrics. An illustrative evaluation is shown in Figure 3. Using a dataset of 1,397 AP News articles on health and healthcare, AutoQ generated 50 queries per class (200 total). AutoE then compared LazyGraphRAG to a competing RAG method, running six trials per query across four metrics, using GPT-4.1 as a judge. These trial-level results were aggregated using metric-based win rates, where each trial is scored 1 for a win, 0.5 for a tie, and 0 for a loss, and then averaged to calculate the overall win rate for each RAG method. Figure 3. Win rates of four LazyGraphRAG (LGR) configurations across methods, broken down by the AutoQ query class and averaged across AutoE’s four metrics: comprehensiveness, diversity, empowerment, and relevance. LazyGraphRAG outperforms comparison conditions where the bar is above 50%. The four LazyGraphRAG conditions (LGR_b200_c200, LGR_b50_c200, LGR_b50_c600, LGR_b200_c200_mini) differ by query budget (b50, b200) and chunk size (c200, c600). All used GPT-4o mini for relevance tests and GPT-4o for query expansion (to five subqueries) and answer generation, except for LGR_b200_c200_mini, which used GPT-4o mini throughout. Comparison systems were GraphRAG (Local, Global, and Drift Search), Vector RAG with 8k- and 120k-token windows, and three published methods: LightRAG (opens in new tab), RAPTOR (opens in new tab), and TREX (opens in new tab). All methods were limited to the same 8k tokens for answer generation. GraphRAG Global Search used level 2 of the community hierarchy. LazyGraphRAG outperformed every comparison condition using the same generative model (GPT-4o), winning all 96 comparisons, with all but one reaching statistical significance. The best overall performance came from the larger budget, smaller chunk size configuration (LGR_b200_c200). For DataLocal queries, the smaller budget (LGR_b50_c200) performed slightly better, likely because fewer chunks were relevant. For ActivityLocal queries, the larger chunk size (LGR_b50_c600) had a slight edge, likely because longer chunks provide a more coherent context. Competing methods performed relatively better on the query classes for which they were designed: GraphRAG Global for global queries, Vector RAG for local queries, and GraphRAG Drift Search, which combines both strategies, posed the strongest challenge overall. Increasing Vector RAG’s context window from 8k to 120k tokens did not improve its performance compared to LazyGraphRAG. This raised the question of how LazyGraphRAG would perform against Vector RAG with 1-million token context window containing most of the dataset. Figure 4 shows the follow-up experiment comparing LazyGraphRAG to Vector RAG using GPT-4.1 that enabled this comparison. Even against the 1M-token window, LazyGraphRAG achieved higher win rates across all comparisons, failing to reach significance only for the relevance of answers to DataLocal queries. These queries tend to benefit most from Vector RAG’s ranking of directly relevant chunks, making it hard for LazyGraphRAG to generate answers that have greater relevance to the query, even though these answers may be dramatically more comprehensive, diverse, and empowering overall. Figure 4. Win rates of LazyGraphRAG (LGR) over Vector RAG across different context window sizes, broken down by the four AutoQ query classes and four AutoE metrics: comprehensiveness, diversity, empowerment, and relevance. Bars above 50% indicate that LazyGraphRAG outperformed the comparison condition. AutoD: Automated data sampling and summarization Text datasets have an underlying topical structure, but the depth, breadth, and connectivity of that structure can vary widely. This variability makes it difficult to evaluate RAG systems consistently, as results may reflect the idiosyncrasies of the dataset rather than the system’s general capabilities. The AutoD component addresses this by sampling datasets to meet a target specification, defined by the number of topic clusters (breadth) and the number of samples per cluster (depth). This creates consistency across datasets, enabling more meaningful comparisons, as structurally aligned datasets lead to comparable AutoQ queries, which in turn support consistent AutoE evaluations. AutoD also includes tools for summarizing input or output datasets in a way that reflects their topical coverage. These summaries play an important role in the AutoQ query synthesis process, but they can also be used more broadly, such as in prompts where context space is limited. Since the release of the GraphRAG paper, we’ve received many requests to share the dataset of the Behind the Tech (opens in new tab) podcast transcripts we used in our evaluation. An updated version of this dataset is now available in the BenchmarkQED repository (opens in new tab), alongside the AP News dataset containing 1,397 health-related articles, licensed for open release.   We hope these datasets, together with the BenchmarkQED tools (opens in new tab), help accelerate benchmark-driven development of RAG systems and AI question-answering. We invite the community to try them on GitHub (opens in new tab). Opens in a new tab

487

· 0 Yorumlar ·0 hisse senetleri ·0 önizleme

Please log in to like, share and comment!
Colossal @Colossal paylaşılan bir bağlantı
2025-06-06 07:34:14 ·

Reskate’s Youthful Murals Transform into Glowing Symbols of Peace

Reskate’s Youthful Murals Transform into Glowing Symbols of Peace
June 5, 2025
Art
Jackie Andres

During the day, Reskate’s extensive murals cover large swathes of space on buildings, stage backdrops, and even transformer towers. While these monumental public works are striking during sunlit hours, they completely transform with the darkness of night.
Artists Javier de Riba and María López are the artistic duo behind Reskate. Primarily based in Barcelona, both artists travel throughout the year, visiting different corners of the world to complete projects to “raise awareness of care for culture, nature, and peace.”
Detail of “Paix”. Reims, Champagne. Image by Romain Berthiot
Reskate’s subjects are often children. In a bold, illustrative style with graphic linework, the artists depict young figures holding objects related to the area in which the mural is placed, as well as articles that reflect global concerns. “The invisibilization and invalidation of youth as an active element that should be part of society is a burden that continues to be perpetuated,” the duo explains in a statement covering “Eulalia,” a previous mural completed in 2023.
An example of this is prevalent in “Bruit,” taking the form of stage design for an immersive concert. In the piece, a young girl protectively wraps her arms around a fishbowl, nodding to the impact of sound pollution within the oceans.
The pair recently completed an exhibit at the Museu Picasso in Barcelona and plans to continue their artistic endeavors both in and out of the public space. Find more work on Reskate’s website and Instagram, and browse prints in their online shop.
“Harmony”. Liverpool. Image by Corbyn John
“Transformateur”. Mareuil-sur-Ourcq, France. Image by Sophie Palmier
Detail of “Transformateur”. Mareuil-sur-Ourcq, France. Image by Sophie Palmier
“Bruit”. Le Mans, France
Detail of “Bruit”. Le Mans, France

“Boycott”. Ghent, Belgium
Next article
#reskates #youthful #murals #transform #into

Reskate’s Youthful Murals Transform into Glowing Symbols of Peace
Reskate’s Youthful Murals Transform into Glowing Symbols of Peace June 5, 2025 Art Jackie Andres During the day, Reskate’s extensive murals cover large swathes of space on buildings, stage backdrops, and even transformer towers. While these monumental public works are striking during sunlit hours, they completely transform with the darkness of night. Artists Javier de Riba and María López are the artistic duo behind Reskate. Primarily based in Barcelona, both artists travel throughout the year, visiting different corners of the world to complete projects to “raise awareness of care for culture, nature, and peace.” Detail of “Paix”. Reims, Champagne. Image by Romain Berthiot Reskate’s subjects are often children. In a bold, illustrative style with graphic linework, the artists depict young figures holding objects related to the area in which the mural is placed, as well as articles that reflect global concerns. “The invisibilization and invalidation of youth as an active element that should be part of society is a burden that continues to be perpetuated,” the duo explains in a statement covering “Eulalia,” a previous mural completed in 2023. An example of this is prevalent in “Bruit,” taking the form of stage design for an immersive concert. In the piece, a young girl protectively wraps her arms around a fishbowl, nodding to the impact of sound pollution within the oceans. The pair recently completed an exhibit at the Museu Picasso in Barcelona and plans to continue their artistic endeavors both in and out of the public space. Find more work on Reskate’s website and Instagram, and browse prints in their online shop. “Harmony”. Liverpool. Image by Corbyn John “Transformateur”. Mareuil-sur-Ourcq, France. Image by Sophie Palmier Detail of “Transformateur”. Mareuil-sur-Ourcq, France. Image by Sophie Palmier “Bruit”. Le Mans, France Detail of “Bruit”. Le Mans, France “Boycott”. Ghent, Belgium Next article #reskates #youthful #murals #transform #into

Reskate’s Youthful Murals Transform into Glowing Symbols of Peace

www.thisiscolossal.com
Reskate’s Youthful Murals Transform into Glowing Symbols of Peace June 5, 2025 Art Jackie Andres During the day, Reskate’s extensive murals cover large swathes of space on buildings, stage backdrops, and even transformer towers. While these monumental public works are striking during sunlit hours, they completely transform with the darkness of night. Artists Javier de Riba and María López are the artistic duo behind Reskate. Primarily based in Barcelona, both artists travel throughout the year, visiting different corners of the world to complete projects to “raise awareness of care for culture, nature, and peace.” Detail of “Paix” (2025). Reims, Champagne. Image by Romain Berthiot Reskate’s subjects are often children. In a bold, illustrative style with graphic linework, the artists depict young figures holding objects related to the area in which the mural is placed, as well as articles that reflect global concerns. “The invisibilization and invalidation of youth as an active element that should be part of society is a burden that continues to be perpetuated,” the duo explains in a statement covering “Eulalia,” a previous mural completed in 2023. An example of this is prevalent in “Bruit,” taking the form of stage design for an immersive concert. In the piece, a young girl protectively wraps her arms around a fishbowl, nodding to the impact of sound pollution within the oceans. The pair recently completed an exhibit at the Museu Picasso in Barcelona and plans to continue their artistic endeavors both in and out of the public space. Find more work on Reskate’s website and Instagram, and browse prints in their online shop. “Harmony” (2025). Liverpool. Image by Corbyn John “Transformateur” (2024). Mareuil-sur-Ourcq, France. Image by Sophie Palmier Detail of “Transformateur” (2024). Mareuil-sur-Ourcq, France. Image by Sophie Palmier “Bruit” (2024). Le Mans, France Detail of “Bruit” (2024). Le Mans, France “Boycott” (2024). Ghent, Belgium Next article

324

· 0 Yorumlar ·0 hisse senetleri ·0 önizleme

Please log in to like, share and comment!
GameSpot @GameSpot paylaşılan bir bağlantı
2025-05-23 08:00:27 ·

Six One Indie's Latest Showcase Proves Cool-Looking Games Don't Need To Cost $80

For the past few years, Six One Indie has been delivering stellar showcases that highlight an often overlooked category of games: indies. Though these titles might be made by smaller teams working on a much smaller budget compared to their AAA counterparts, you'd be wrong to think they don't contain every bit the charm and artistry of those promoted all across your social media timelines--and the games below serve as irrefutable proof.Though the team at Six One Indie featured nearly 50 games in its May showcase--the entirety of which you can watch here--we decided to round up just a few dozen of our favorites. From cozy titles like Bobo Bay to the hilariously bizarre Dinoblade, these are some of the indie that we immediately added to our Steam wishlist. Shadows of Chroma TowerShadows of Chroma Tower combines stylish, high-contrast art direction with "the best features of dungeon crawlers and ARPGs" to create a frenetic experience you can play by yourself or with friends. You'll be able to join factions, choose from five classes and six professions, upgrade a robust skill tree, and truly tailor your experience as you make your way up the tower in search of the game's big bad. Mouse: P.I. For Hire"Steamboat Willie-meets-John Wick" is probably one of the strangest combination of words I've ever used to describe a game, but when it comes to Mouse: P.I. For Hire, it just makes sense. In it, players take on the role of Jack Pepper, a hyper-violent private investigator with a strong right-hook and an arsenal of weapons at his disposal, one of which is quite literally a ray gun that causes heads to explode. Its gritty, blood-splattering content is in stark contrast to game's visuals, which draw inspiration from 1930s cartoons, making the whole experience even more over-the-top. All that plus some neo-noir vibes and a jazzy soundtrack make this a game all you shooter fans should definitely keep an eye on. LeilaUbik Studio's Leila is one of the handful of games Six One Indie showed off that is available now--and for only I might add. The hand-animated, story-driven puzzle game sees you relive a woman's "fragile memories" as she undergoes a deeply personal and transformative journey. It's worth noting that Leila features strong adult themes and some body horror, so don't go in expecting something "cozy." However, if you're looking for a dark, cerebral experience, this might be a great pick. Muffles' Life SentenceMuffles' Life Sentence is another game that is already available to play on Steam for the low price of so there's really no reason to not give it a lil' whirl. The "darkly quirky" RPG take place in a prison where inmates are "remade" to match their crimes, and features gameplay stylings that are sure to delight fans of Paper Mario or Undertale. DinobladeSometimes, you can come up with an extremely cool idea for a game just by taking two really cool ideas and mashing 'em together. Such is the case with Dinoblade, a new action RPG that sees players take on the role of a young, blade-wielding Spinosaurus who must fight off other dinosaurs in order to prevent an extinction. It's ridiculous, yes, but what's not ridiculous is how much developer Team Spino commits to the bit--the game looks extremely cool and seems like it'll be a blast for fans of over-the-top action titles like Devil May Cry. Bobo BayHave you ever wished you could stay in Sonic the Hedgehog's chao gardens just a bit longer? Bobo Bay might be the game for you. The pet simulation title sees you care for, collect, breed, train, and accessorize adorable little creatures, all while readying them up for fun competitions such as races and wrestling matches. Though the game isn't scheduled to release until next year, those interested can play its alpha build now. Oscuro Blossom's GlowIf you're looking for a delightful-looking puzzle platformer accompanied by gorgeous, 2D, illustrative art, you should check out Oscuro Blossom's Glow. In it you play as Selene, a young girl with the ability to emit light; naturally, this power helps her traverse the lush woodlands she find herself in by creating life, dispelling creatures, and more. The game currently has a demo available to play over on its Steam page. Truth ScrapperInsertdisc5, the studio behind 2023's indie gem In Stars and Time, is back with a new game that looks every bit as lovely as its predecessor. In Truth Scrapper you play as Sosotte, a member of the Truth Scrapper guild who is sent to investigate a mysterious sinkhole that has destroyed the community's "sense of will." The only problem? The vast majority of your memories reset at the end of each day, and you're the one responsible for choosing which ones stay and which ones to abandon. 1000 Deaths1000 Deaths is a "gravity-bending 3D platformer" that features some truly fun visuals and an early 2000s, Adult Swim feel. However, to relegate it to just another platformer is a disservice, as 1000 Deaths also features a unique spin: the ability for players to make choices that completely alter the game's mechanics, story, and level design. This chaotic, hardcore, action game aims to set the stage for some fun speed-running opportunities--if its players can stay alive. Sound interesting? Fortunately, you can check out 1000 Deaths' demo now. Cast n ChillA massive departure from the previous entry on this list, Cast n Chill features a far more relaxing gameplay loop. The cozy idle game sees its players explore serene lakes, rivers, and ocean with their loyal pup, their only goal being to catch some fish. As they play, they'll get the chance to upgrade their gear, granting them the ability to reel in more impressive catches. It's a low stakes experience accompanied by some truly picturesque pixel-art, and best of all, you can play the game's demo right now. Future Vibe CheckEven if you've played automation games before, I can almost guarantee that you've never played one quite like this. In Future Vibe Check, players are tasked with slowly building a factory that doesn't just create products--it creates music, too. As they rebuild the given area, the structures they place create their own unique sounds whenever energy courses through them. Curious as to how that will play out? Fortunately, you can try Future Vibe Check's demo now. Scratch the CatFor all the Spyro, Crash Bandicoot, Sly Cooper, Croc, and other 3D mascot platformer fans out there, here's a new game to keep on your radar. In Scratch the Cat, players take on the role of DJ Scratch, a sleek-looking cat who is on a journey to reclaim his stolen records. The adventure game features some remarkable visuals that are absolutely on par other games in the genre, and seems like it'll be ripe with collectibles, unique bosses, and plenty of ways to traverse and explore. Jump the TrackBilled as an "explosive comedy that blends visual novel with pachinko," Jump the Track looks like an incredibly charming game with plenty of style and humor. When not dishing out some pinball action, the game unfolds in an almost comic book-style way, as it tells the story of Sam, "a young dreamer struggling in the gig economy" whose fortune might just change tonight. Jump the Track currently has a demo available to play on Steam, as well as an extremely close release date: May 28, 2025. Rogue EclipseIn Rogue Eclipse, players get the chance to traverse stunning seas of stars and comets as they take down starfighters, armadas, and otherworldly behemoths. That said, it's not just a flight-based shooter, as Rogue Eclipse features an "epic roguelike campaign" as well. Developer Huskraft calls the game "easy to learn, tricky to master, and impossible to put down," and after this first-look, it's easy to see why. Guilty as Sock!One of the more bizarre games in the showcase, Guilty as Sock! looks incredible and I cannot wait to force my friends play it with me. The multiplayer, court simulator sees you and your pals jump into a chaotic trial where each person plays a sock puppet bound to a specific role--lawyer, judge, etc.--and must then present evidence cards that help support their agenda. While your friends testify, you can choose to throw paper balls and mock them in an attempt to shake them up and sway the verdict in your favor--it'll be up to the judge to call the behavior out, or encourage it. All in all, it's some real Among Us-style nonsense that I will absolutely be playing later this week, thanks to its new demo. Rue ValleyRue Valley is yet another title on this list with a demo I downloaded immediately. The gritty-looking narrative RPG follows a man trapped in a time loop and whose choices are bound by his mental state; if he is introverted, for example, even if you want him to go up and talk to a woman at a bar, he might not be able to muster up the courage. It will then be up to you to figure out how to deal with his various mental hurdles, form relationships with complex characters, and break out of the loop. Carimara: Beneath the Forlorn LimbsThere were a lot of game shown at Six One Indie's May showcase that featured some truly fantastic art direction, and Carimara: Beneath the Forlorn Limbs is absolutely one of them. The bleak, horror-adjacent title sees players take on the role of the Cariamara, a mysterious figure whose mission is to conjure up ghosts and answer their questions using a deck of cards. Described as a "short and creepy fairytale," this one might not be for the easily perturbed--but looks potentially delightful for those of us who love disturbing little creatures and moody, PS1-era visuals. A Week in the Life of Asocial GiraffeHave you ever just wanted to be left alone? That's precisely how this giraffe feels, and is the conceit behind A Week in the Life of Asocial Giraffe. In it, your goal is simple: avoid all social contact. However, the citizens of Friendly City do make things a bit harder thanks to their chatty nature. It is up to you to help our giraffe friend do all his chores and live his best life, all while avoiding people by solving point-and-click puzzles and utilizing stealth. If you're looking to give it a go, the game's demo is now live on Steam. InkshadeAnother visually remarkable entry on this list, Inkshade is turn-based tactics game that sees players take control of strange wooden miniatures that are "wrapped in a web of locked rooms and orchestrated by an otherworldly game master." The end goal of Inkshade is to guide these tokens to a mysterious realm known as the abyss, but they'll first need to conquer the continent, procure an airship, and sail through "cursed skies," and none of that will be easy. However, if you're interested in giving it a shot, you can play the game's demo now. One Way HomeBased on its trailer, One Way Home reminds me a lot of Limbo or Inside, albeit with realistic visuals, more horror, and some cool "choose-your-own" adventure elements. The game follows Jimmy Taylor, a 12-year-old boy who gets involved in a car accident on his walk home from school. When he comes to, Jimmy finds himself thrust into a mysterious version of his world that, while seemingly devoid of humans, is filled to the brim with monstrosities and disturbing visuals. What ensues is a tense-looking puzzle platformer in which player-choice dictates the skills, locations, enemies, and endings Jimmy stumbles upon--and thanks to its new demo, you can get a first glimpse at how this will all play out now. Kabuto ParkWith its playful visuals and adorable premise, Kabuto Park looks perfect for those seeking a game with a bit of whimsy and childlike wonder about it. At its core, the game revolves around bugs: finding them, catching them, training them, and ultimately winning the Summer Beetle Battle Championship with them. As players bug-catch and battle, they'll also gain the ability to upgrade their equipment, allowing them to find even "rarer, stronger, and shinier little friends" to use in the game's card-based competitions. Sounds cute, right? If you think so, I've got great news for you: the game comes out on May 28. Oh and if you're feeling really antsy, you can play the demo right now. Quite a RideIt's just you, your bicycle, a half-charged cellphone, and one very good boy against the world in the upcoming psychological horror game Quite a Ride. In it, you play as someone whose quick trip to their friend's house is derailed by the sudden presence of a dense, oppressive fog filled with eldritch horrors. As such, you have no choice but to keep pedaling--even as the beings following you grow so close you can hear them breathe and the world shifts around you. And though this and the game's beautiful, Pacific Northwest vibes are already enough to make me extremely interested in it, Quite a Ride also has another thing going for it: collective progress. This means that player's collective, global efforts will change the game over time by introducing new characters, locations, and secrets. We'll see if I am actually brave enough to play it, but wow am I eager to try. OddbatOddbat likens itself to Celeste in that it is an extremely challenging platformer filled with secrets and over 700 unique levels. That said, it is immediately obvious that its personality, humor, and style are all its own. In Oddbat, you play as a vampire on a mission to perform an elaborate ritual. However, you'll need to drain the blood of seven bosses and make your way through five unique dungeons to do so. Naturally, you'll need to rely on one of your most iconic vampiric powers--the ability to become a bat--to help you accomplish your goals. With its cheeky tone and monochromatic color scheme, Oddbat looks to be shaping up into a viciously fun and stylish platformer.
#six #one #indie039s #latest #showcase

Six One Indie's Latest Showcase Proves Cool-Looking Games Don't Need To Cost $80
For the past few years, Six One Indie has been delivering stellar showcases that highlight an often overlooked category of games: indies. Though these titles might be made by smaller teams working on a much smaller budget compared to their AAA counterparts, you'd be wrong to think they don't contain every bit the charm and artistry of those promoted all across your social media timelines--and the games below serve as irrefutable proof.Though the team at Six One Indie featured nearly 50 games in its May showcase--the entirety of which you can watch here--we decided to round up just a few dozen of our favorites. From cozy titles like Bobo Bay to the hilariously bizarre Dinoblade, these are some of the indie that we immediately added to our Steam wishlist. Shadows of Chroma TowerShadows of Chroma Tower combines stylish, high-contrast art direction with "the best features of dungeon crawlers and ARPGs" to create a frenetic experience you can play by yourself or with friends. You'll be able to join factions, choose from five classes and six professions, upgrade a robust skill tree, and truly tailor your experience as you make your way up the tower in search of the game's big bad. Mouse: P.I. For Hire"Steamboat Willie-meets-John Wick" is probably one of the strangest combination of words I've ever used to describe a game, but when it comes to Mouse: P.I. For Hire, it just makes sense. In it, players take on the role of Jack Pepper, a hyper-violent private investigator with a strong right-hook and an arsenal of weapons at his disposal, one of which is quite literally a ray gun that causes heads to explode. Its gritty, blood-splattering content is in stark contrast to game's visuals, which draw inspiration from 1930s cartoons, making the whole experience even more over-the-top. All that plus some neo-noir vibes and a jazzy soundtrack make this a game all you shooter fans should definitely keep an eye on. LeilaUbik Studio's Leila is one of the handful of games Six One Indie showed off that is available now--and for only I might add. The hand-animated, story-driven puzzle game sees you relive a woman's "fragile memories" as she undergoes a deeply personal and transformative journey. It's worth noting that Leila features strong adult themes and some body horror, so don't go in expecting something "cozy." However, if you're looking for a dark, cerebral experience, this might be a great pick. Muffles' Life SentenceMuffles' Life Sentence is another game that is already available to play on Steam for the low price of so there's really no reason to not give it a lil' whirl. The "darkly quirky" RPG take place in a prison where inmates are "remade" to match their crimes, and features gameplay stylings that are sure to delight fans of Paper Mario or Undertale. DinobladeSometimes, you can come up with an extremely cool idea for a game just by taking two really cool ideas and mashing 'em together. Such is the case with Dinoblade, a new action RPG that sees players take on the role of a young, blade-wielding Spinosaurus who must fight off other dinosaurs in order to prevent an extinction. It's ridiculous, yes, but what's not ridiculous is how much developer Team Spino commits to the bit--the game looks extremely cool and seems like it'll be a blast for fans of over-the-top action titles like Devil May Cry. Bobo BayHave you ever wished you could stay in Sonic the Hedgehog's chao gardens just a bit longer? Bobo Bay might be the game for you. The pet simulation title sees you care for, collect, breed, train, and accessorize adorable little creatures, all while readying them up for fun competitions such as races and wrestling matches. Though the game isn't scheduled to release until next year, those interested can play its alpha build now. Oscuro Blossom's GlowIf you're looking for a delightful-looking puzzle platformer accompanied by gorgeous, 2D, illustrative art, you should check out Oscuro Blossom's Glow. In it you play as Selene, a young girl with the ability to emit light; naturally, this power helps her traverse the lush woodlands she find herself in by creating life, dispelling creatures, and more. The game currently has a demo available to play over on its Steam page. Truth ScrapperInsertdisc5, the studio behind 2023's indie gem In Stars and Time, is back with a new game that looks every bit as lovely as its predecessor. In Truth Scrapper you play as Sosotte, a member of the Truth Scrapper guild who is sent to investigate a mysterious sinkhole that has destroyed the community's "sense of will." The only problem? The vast majority of your memories reset at the end of each day, and you're the one responsible for choosing which ones stay and which ones to abandon. 1000 Deaths1000 Deaths is a "gravity-bending 3D platformer" that features some truly fun visuals and an early 2000s, Adult Swim feel. However, to relegate it to just another platformer is a disservice, as 1000 Deaths also features a unique spin: the ability for players to make choices that completely alter the game's mechanics, story, and level design. This chaotic, hardcore, action game aims to set the stage for some fun speed-running opportunities--if its players can stay alive. Sound interesting? Fortunately, you can check out 1000 Deaths' demo now. Cast n ChillA massive departure from the previous entry on this list, Cast n Chill features a far more relaxing gameplay loop. The cozy idle game sees its players explore serene lakes, rivers, and ocean with their loyal pup, their only goal being to catch some fish. As they play, they'll get the chance to upgrade their gear, granting them the ability to reel in more impressive catches. It's a low stakes experience accompanied by some truly picturesque pixel-art, and best of all, you can play the game's demo right now. Future Vibe CheckEven if you've played automation games before, I can almost guarantee that you've never played one quite like this. In Future Vibe Check, players are tasked with slowly building a factory that doesn't just create products--it creates music, too. As they rebuild the given area, the structures they place create their own unique sounds whenever energy courses through them. Curious as to how that will play out? Fortunately, you can try Future Vibe Check's demo now. Scratch the CatFor all the Spyro, Crash Bandicoot, Sly Cooper, Croc, and other 3D mascot platformer fans out there, here's a new game to keep on your radar. In Scratch the Cat, players take on the role of DJ Scratch, a sleek-looking cat who is on a journey to reclaim his stolen records. The adventure game features some remarkable visuals that are absolutely on par other games in the genre, and seems like it'll be ripe with collectibles, unique bosses, and plenty of ways to traverse and explore. Jump the TrackBilled as an "explosive comedy that blends visual novel with pachinko," Jump the Track looks like an incredibly charming game with plenty of style and humor. When not dishing out some pinball action, the game unfolds in an almost comic book-style way, as it tells the story of Sam, "a young dreamer struggling in the gig economy" whose fortune might just change tonight. Jump the Track currently has a demo available to play on Steam, as well as an extremely close release date: May 28, 2025. Rogue EclipseIn Rogue Eclipse, players get the chance to traverse stunning seas of stars and comets as they take down starfighters, armadas, and otherworldly behemoths. That said, it's not just a flight-based shooter, as Rogue Eclipse features an "epic roguelike campaign" as well. Developer Huskraft calls the game "easy to learn, tricky to master, and impossible to put down," and after this first-look, it's easy to see why. Guilty as Sock!One of the more bizarre games in the showcase, Guilty as Sock! looks incredible and I cannot wait to force my friends play it with me. The multiplayer, court simulator sees you and your pals jump into a chaotic trial where each person plays a sock puppet bound to a specific role--lawyer, judge, etc.--and must then present evidence cards that help support their agenda. While your friends testify, you can choose to throw paper balls and mock them in an attempt to shake them up and sway the verdict in your favor--it'll be up to the judge to call the behavior out, or encourage it. All in all, it's some real Among Us-style nonsense that I will absolutely be playing later this week, thanks to its new demo. Rue ValleyRue Valley is yet another title on this list with a demo I downloaded immediately. The gritty-looking narrative RPG follows a man trapped in a time loop and whose choices are bound by his mental state; if he is introverted, for example, even if you want him to go up and talk to a woman at a bar, he might not be able to muster up the courage. It will then be up to you to figure out how to deal with his various mental hurdles, form relationships with complex characters, and break out of the loop. Carimara: Beneath the Forlorn LimbsThere were a lot of game shown at Six One Indie's May showcase that featured some truly fantastic art direction, and Carimara: Beneath the Forlorn Limbs is absolutely one of them. The bleak, horror-adjacent title sees players take on the role of the Cariamara, a mysterious figure whose mission is to conjure up ghosts and answer their questions using a deck of cards. Described as a "short and creepy fairytale," this one might not be for the easily perturbed--but looks potentially delightful for those of us who love disturbing little creatures and moody, PS1-era visuals. A Week in the Life of Asocial GiraffeHave you ever just wanted to be left alone? That's precisely how this giraffe feels, and is the conceit behind A Week in the Life of Asocial Giraffe. In it, your goal is simple: avoid all social contact. However, the citizens of Friendly City do make things a bit harder thanks to their chatty nature. It is up to you to help our giraffe friend do all his chores and live his best life, all while avoiding people by solving point-and-click puzzles and utilizing stealth. If you're looking to give it a go, the game's demo is now live on Steam. InkshadeAnother visually remarkable entry on this list, Inkshade is turn-based tactics game that sees players take control of strange wooden miniatures that are "wrapped in a web of locked rooms and orchestrated by an otherworldly game master." The end goal of Inkshade is to guide these tokens to a mysterious realm known as the abyss, but they'll first need to conquer the continent, procure an airship, and sail through "cursed skies," and none of that will be easy. However, if you're interested in giving it a shot, you can play the game's demo now. One Way HomeBased on its trailer, One Way Home reminds me a lot of Limbo or Inside, albeit with realistic visuals, more horror, and some cool "choose-your-own" adventure elements. The game follows Jimmy Taylor, a 12-year-old boy who gets involved in a car accident on his walk home from school. When he comes to, Jimmy finds himself thrust into a mysterious version of his world that, while seemingly devoid of humans, is filled to the brim with monstrosities and disturbing visuals. What ensues is a tense-looking puzzle platformer in which player-choice dictates the skills, locations, enemies, and endings Jimmy stumbles upon--and thanks to its new demo, you can get a first glimpse at how this will all play out now. Kabuto ParkWith its playful visuals and adorable premise, Kabuto Park looks perfect for those seeking a game with a bit of whimsy and childlike wonder about it. At its core, the game revolves around bugs: finding them, catching them, training them, and ultimately winning the Summer Beetle Battle Championship with them. As players bug-catch and battle, they'll also gain the ability to upgrade their equipment, allowing them to find even "rarer, stronger, and shinier little friends" to use in the game's card-based competitions. Sounds cute, right? If you think so, I've got great news for you: the game comes out on May 28. Oh and if you're feeling really antsy, you can play the demo right now. Quite a RideIt's just you, your bicycle, a half-charged cellphone, and one very good boy against the world in the upcoming psychological horror game Quite a Ride. In it, you play as someone whose quick trip to their friend's house is derailed by the sudden presence of a dense, oppressive fog filled with eldritch horrors. As such, you have no choice but to keep pedaling--even as the beings following you grow so close you can hear them breathe and the world shifts around you. And though this and the game's beautiful, Pacific Northwest vibes are already enough to make me extremely interested in it, Quite a Ride also has another thing going for it: collective progress. This means that player's collective, global efforts will change the game over time by introducing new characters, locations, and secrets. We'll see if I am actually brave enough to play it, but wow am I eager to try. OddbatOddbat likens itself to Celeste in that it is an extremely challenging platformer filled with secrets and over 700 unique levels. That said, it is immediately obvious that its personality, humor, and style are all its own. In Oddbat, you play as a vampire on a mission to perform an elaborate ritual. However, you'll need to drain the blood of seven bosses and make your way through five unique dungeons to do so. Naturally, you'll need to rely on one of your most iconic vampiric powers--the ability to become a bat--to help you accomplish your goals. With its cheeky tone and monochromatic color scheme, Oddbat looks to be shaping up into a viciously fun and stylish platformer. #six #one #indie039s #latest #showcase

Six One Indie's Latest Showcase Proves Cool-Looking Games Don't Need To Cost $80

www.gamespot.com
For the past few years, Six One Indie has been delivering stellar showcases that highlight an often overlooked category of games: indies. Though these titles might be made by smaller teams working on a much smaller budget compared to their AAA counterparts, you'd be wrong to think they don't contain every bit the charm and artistry of those promoted all across your social media timelines--and the games below serve as irrefutable proof.Though the team at Six One Indie featured nearly 50 games in its May showcase--the entirety of which you can watch here--we decided to round up just a few dozen of our favorites. From cozy titles like Bobo Bay to the hilariously bizarre Dinoblade, these are some of the indie that we immediately added to our Steam wishlist. Shadows of Chroma TowerShadows of Chroma Tower combines stylish, high-contrast art direction with "the best features of dungeon crawlers and ARPGs" to create a frenetic experience you can play by yourself or with friends. You'll be able to join factions, choose from five classes and six professions, upgrade a robust skill tree, and truly tailor your experience as you make your way up the tower in search of the game's big bad. Mouse: P.I. For Hire"Steamboat Willie-meets-John Wick" is probably one of the strangest combination of words I've ever used to describe a game, but when it comes to Mouse: P.I. For Hire, it just makes sense. In it, players take on the role of Jack Pepper, a hyper-violent private investigator with a strong right-hook and an arsenal of weapons at his disposal, one of which is quite literally a ray gun that causes heads to explode. Its gritty, blood-splattering content is in stark contrast to game's visuals, which draw inspiration from 1930s cartoons, making the whole experience even more over-the-top. All that plus some neo-noir vibes and a jazzy soundtrack make this a game all you shooter fans should definitely keep an eye on. LeilaUbik Studio's Leila is one of the handful of games Six One Indie showed off that is available now--and for only $12, I might add. The hand-animated, story-driven puzzle game sees you relive a woman's "fragile memories" as she undergoes a deeply personal and transformative journey. It's worth noting that Leila features strong adult themes and some body horror, so don't go in expecting something "cozy." However, if you're looking for a dark, cerebral experience, this might be a great pick. Muffles' Life SentenceMuffles' Life Sentence is another game that is already available to play on Steam for the low price of $0, so there's really no reason to not give it a lil' whirl. The "darkly quirky" RPG take place in a prison where inmates are "remade" to match their crimes, and features gameplay stylings that are sure to delight fans of Paper Mario or Undertale. DinobladeSometimes, you can come up with an extremely cool idea for a game just by taking two really cool ideas and mashing 'em together. Such is the case with Dinoblade, a new action RPG that sees players take on the role of a young, blade-wielding Spinosaurus who must fight off other dinosaurs in order to prevent an extinction. It's ridiculous, yes, but what's not ridiculous is how much developer Team Spino commits to the bit--the game looks extremely cool and seems like it'll be a blast for fans of over-the-top action titles like Devil May Cry. Bobo BayHave you ever wished you could stay in Sonic the Hedgehog's chao gardens just a bit longer? Bobo Bay might be the game for you. The pet simulation title sees you care for, collect, breed, train, and accessorize adorable little creatures, all while readying them up for fun competitions such as races and wrestling matches. Though the game isn't scheduled to release until next year, those interested can play its alpha build now. Oscuro Blossom's GlowIf you're looking for a delightful-looking puzzle platformer accompanied by gorgeous, 2D, illustrative art, you should check out Oscuro Blossom's Glow. In it you play as Selene, a young girl with the ability to emit light; naturally, this power helps her traverse the lush woodlands she find herself in by creating life, dispelling creatures, and more. The game currently has a demo available to play over on its Steam page. Truth ScrapperInsertdisc5, the studio behind 2023's indie gem In Stars and Time, is back with a new game that looks every bit as lovely as its predecessor. In Truth Scrapper you play as Sosotte, a member of the Truth Scrapper guild who is sent to investigate a mysterious sinkhole that has destroyed the community's "sense of will." The only problem? The vast majority of your memories reset at the end of each day, and you're the one responsible for choosing which ones stay and which ones to abandon. 1000 Deaths1000 Deaths is a "gravity-bending 3D platformer" that features some truly fun visuals and an early 2000s, Adult Swim feel. However, to relegate it to just another platformer is a disservice, as 1000 Deaths also features a unique spin: the ability for players to make choices that completely alter the game's mechanics, story, and level design. This chaotic, hardcore, action game aims to set the stage for some fun speed-running opportunities--if its players can stay alive. Sound interesting? Fortunately, you can check out 1000 Deaths' demo now. Cast n ChillA massive departure from the previous entry on this list, Cast n Chill features a far more relaxing gameplay loop. The cozy idle game sees its players explore serene lakes, rivers, and ocean with their loyal pup, their only goal being to catch some fish. As they play, they'll get the chance to upgrade their gear, granting them the ability to reel in more impressive catches. It's a low stakes experience accompanied by some truly picturesque pixel-art, and best of all, you can play the game's demo right now. Future Vibe CheckEven if you've played automation games before, I can almost guarantee that you've never played one quite like this. In Future Vibe Check, players are tasked with slowly building a factory that doesn't just create products--it creates music, too. As they rebuild the given area, the structures they place create their own unique sounds whenever energy courses through them. Curious as to how that will play out? Fortunately, you can try Future Vibe Check's demo now. Scratch the CatFor all the Spyro, Crash Bandicoot, Sly Cooper, Croc, and other 3D mascot platformer fans out there, here's a new game to keep on your radar. In Scratch the Cat, players take on the role of DJ Scratch, a sleek-looking cat who is on a journey to reclaim his stolen records. The adventure game features some remarkable visuals that are absolutely on par other games in the genre, and seems like it'll be ripe with collectibles, unique bosses, and plenty of ways to traverse and explore. Jump the TrackBilled as an "explosive comedy that blends visual novel with pachinko," Jump the Track looks like an incredibly charming game with plenty of style and humor. When not dishing out some pinball action, the game unfolds in an almost comic book-style way, as it tells the story of Sam, "a young dreamer struggling in the gig economy" whose fortune might just change tonight. Jump the Track currently has a demo available to play on Steam, as well as an extremely close release date: May 28, 2025. Rogue EclipseIn Rogue Eclipse, players get the chance to traverse stunning seas of stars and comets as they take down starfighters, armadas, and otherworldly behemoths. That said, it's not just a flight-based shooter, as Rogue Eclipse features an "epic roguelike campaign" as well. Developer Huskraft calls the game "easy to learn, tricky to master, and impossible to put down," and after this first-look, it's easy to see why. Guilty as Sock!One of the more bizarre games in the showcase, Guilty as Sock! looks incredible and I cannot wait to force my friends play it with me. The multiplayer, court simulator sees you and your pals jump into a chaotic trial where each person plays a sock puppet bound to a specific role--lawyer, judge, etc.--and must then present evidence cards that help support their agenda. While your friends testify, you can choose to throw paper balls and mock them in an attempt to shake them up and sway the verdict in your favor--it'll be up to the judge to call the behavior out, or encourage it. All in all, it's some real Among Us-style nonsense that I will absolutely be playing later this week, thanks to its new demo. Rue ValleyRue Valley is yet another title on this list with a demo I downloaded immediately. The gritty-looking narrative RPG follows a man trapped in a time loop and whose choices are bound by his mental state; if he is introverted, for example, even if you want him to go up and talk to a woman at a bar, he might not be able to muster up the courage. It will then be up to you to figure out how to deal with his various mental hurdles, form relationships with complex characters, and break out of the loop. Carimara: Beneath the Forlorn LimbsThere were a lot of game shown at Six One Indie's May showcase that featured some truly fantastic art direction, and Carimara: Beneath the Forlorn Limbs is absolutely one of them. The bleak, horror-adjacent title sees players take on the role of the Cariamara, a mysterious figure whose mission is to conjure up ghosts and answer their questions using a deck of cards. Described as a "short and creepy fairytale," this one might not be for the easily perturbed--but looks potentially delightful for those of us who love disturbing little creatures and moody, PS1-era visuals. A Week in the Life of Asocial GiraffeHave you ever just wanted to be left alone? That's precisely how this giraffe feels, and is the conceit behind A Week in the Life of Asocial Giraffe. In it, your goal is simple: avoid all social contact. However, the citizens of Friendly City do make things a bit harder thanks to their chatty nature. It is up to you to help our giraffe friend do all his chores and live his best life, all while avoiding people by solving point-and-click puzzles and utilizing stealth. If you're looking to give it a go, the game's demo is now live on Steam. InkshadeAnother visually remarkable entry on this list, Inkshade is turn-based tactics game that sees players take control of strange wooden miniatures that are "wrapped in a web of locked rooms and orchestrated by an otherworldly game master." The end goal of Inkshade is to guide these tokens to a mysterious realm known as the abyss, but they'll first need to conquer the continent, procure an airship, and sail through "cursed skies," and none of that will be easy. However, if you're interested in giving it a shot, you can play the game's demo now. One Way HomeBased on its trailer, One Way Home reminds me a lot of Limbo or Inside, albeit with realistic visuals, more horror, and some cool "choose-your-own" adventure elements. The game follows Jimmy Taylor, a 12-year-old boy who gets involved in a car accident on his walk home from school. When he comes to, Jimmy finds himself thrust into a mysterious version of his world that, while seemingly devoid of humans, is filled to the brim with monstrosities and disturbing visuals. What ensues is a tense-looking puzzle platformer in which player-choice dictates the skills, locations, enemies, and endings Jimmy stumbles upon--and thanks to its new demo, you can get a first glimpse at how this will all play out now. Kabuto ParkWith its playful visuals and adorable premise, Kabuto Park looks perfect for those seeking a game with a bit of whimsy and childlike wonder about it. At its core, the game revolves around bugs: finding them, catching them, training them, and ultimately winning the Summer Beetle Battle Championship with them. As players bug-catch and battle, they'll also gain the ability to upgrade their equipment, allowing them to find even "rarer, stronger, and shinier little friends" to use in the game's card-based competitions. Sounds cute, right? If you think so, I've got great news for you: the game comes out on May 28. Oh and if you're feeling really antsy, you can play the demo right now. Quite a RideIt's just you, your bicycle, a half-charged cellphone, and one very good boy against the world in the upcoming psychological horror game Quite a Ride. In it, you play as someone whose quick trip to their friend's house is derailed by the sudden presence of a dense, oppressive fog filled with eldritch horrors. As such, you have no choice but to keep pedaling--even as the beings following you grow so close you can hear them breathe and the world shifts around you. And though this and the game's beautiful, Pacific Northwest vibes are already enough to make me extremely interested in it, Quite a Ride also has another thing going for it: collective progress. This means that player's collective, global efforts will change the game over time by introducing new characters, locations, and secrets. We'll see if I am actually brave enough to play it, but wow am I eager to try. OddbatOddbat likens itself to Celeste in that it is an extremely challenging platformer filled with secrets and over 700 unique levels. That said, it is immediately obvious that its personality, humor, and style are all its own. In Oddbat, you play as a vampire on a mission to perform an elaborate ritual. However, you'll need to drain the blood of seven bosses and make your way through five unique dungeons to do so. Naturally, you'll need to rely on one of your most iconic vampiric powers--the ability to become a bat--to help you accomplish your goals. With its cheeky tone and monochromatic color scheme, Oddbat looks to be shaping up into a viciously fun and stylish platformer.

0 Yorumlar ·0 hisse senetleri ·0 önizleme

Please log in to like, share and comment!
Towards Data Science @TowardsDataScience paylaşılan bir bağlantı
2025-05-15 23:37:54 ·

Lessons in Decision Making from the Monty Hall Problem

The Monty Hall Problem is a well-known brain teaser from which we can learn important lessons in Decision Making that are useful in general and in particular for data scientists.

If you are not familiar with this problem, prepare to be perplexed . If you are, I hope to shine light on aspects that you might not have considered .

I introduce the problem and solve with three types of intuitions:

Common — The heart of this post focuses on applying our common sense to solve this problem. We’ll explore why it fails us and what we can do to intuitively overcome this to make the solution crystal clear . We’ll do this by using visuals , qualitative arguments and some basic probabilities.

Bayesian — We will briefly discuss the importance of belief propagation.

Causal — We will use a Graph Model to visualise conditions required to use the Monty Hall problem in real world settings.Spoiler alert I haven’t been convinced that there are any, but the thought process is very useful.

I summarise by discussing lessons learnt for better data decision making.

In regards to the Bayesian and Causal intuitions, these will be presented in a gentle form. For the mathematically inclined I also provide supplementary sections with short Deep Dives into each approach after the summary.By examining different aspects of this puzzle in probability you will hopefully be able to improve your data decision making .

Credit: Wikipedia

First, some history. Let’s Make a Deal is a USA television game show that originated in 1963. As its premise, audience participants were considered traders making deals with the host, Monty Hall .

At the heart of the matter is an apparently simple scenario:

A trader is posed with the question of choosing one of three doors for the opportunity to win a luxurious prize, e.g, a car . Behind the other two were goats .

The trader is shown three closed doors.

The trader chooses one of the doors. Let’s call thisdoor A and mark it with a .

Keeping the chosen door closed, the host reveals one of the remaining doors showing a goat.

The trader chooses door and the the host reveals door C showing a goat.

The host then asks the trader if they would like to stick with their first choice or switch to the other remaining one.

If the trader guesses correct they win the prize . If not they’ll be shown another goat.

What is the probability of being Zonked? Credit: Wikipedia

Should the trader stick with their original choice of door A or switch to B?

Before reading further, give it a go. What would you do?

Most people are likely to have a gut intuition that “it doesn’t matter” arguing that in the first instance each door had a ⅓ chance of hiding the prize, and that after the host intervention , when only two doors remain closed, the winning of the prize is 50:50.

There are various ways of explaining why the coin toss intuition is incorrect. Most of these involve maths equations, or simulations. Whereas we will address these later, we’ll attempt to solve by applying Occam’s razor:

A principle that states that simpler explanations are preferable to more complex ones — William of OckhamTo do this it is instructive to slightly redefine the problem to a large N doors instead of the original three.

The Large N-Door Problem

Similar to before: you have to choose one of many doors. For illustration let’s say N=100. Behind one of the doors there is the prize and behind 99of the rest are goats .

The 100 Door Monty Hall problem before the host intervention.

You choose one door and the host reveals 98of the other doors that have goats leaving yours and one more closed .

The 100 Door Monty Hall Problem after the host intervention. Should you stick with your door or make the switch?

Should you stick with your original choice or make the switch?

I think you’ll agree with me that the remaining door, not chosen by you, is much more likely to conceal the prize … so you should definitely make the switch!

It’s illustrative to compare both scenarios discussed so far. In the next figure we compare the post host intervention for the N=3 setupand that of N=100:

Post intervention settings for the N=3 setupand N=100.

In both cases we see two shut doors, one of which we’ve chosen. The main difference between these scenarios is that in the first we see one goat and in the second there are more than the eye would care to see.

Why do most people consider the first case as a “50:50” toss up and in the second it’s obvious to make the switch?

We’ll soon address this question of why. First let’s put probabilities of success behind the different scenarios.

What’s The Frequency, Kenneth?

So far we learnt from the N=100 scenario that switching doors is obviously beneficial. Inferring for the N=3 may be a leap of faith for most. Using some basic probability arguments here we’ll quantify why it is favourable to make the switch for any number door scenario N.

We start with the standard Monty Hall problem. When it starts the probability of the prize being behind each of the doors A, B and C is p=⅓. To be explicit let’s define the Y parameter to be the door with the prize , i.e, p= p=p=⅓.

The trick to solving this problem is that once the trader’s door A has been chosen , we should pay close attention to the set of the other doors {B,C}, which has the probability of p=p+p=⅔. This visual may help make sense of this:

By being attentive to the {B,C} the rest should follow. When the goat is revealed

it is apparent that the probabilities post intervention change. Note that for ease of reading I’ll drop the Y notation, where pwill read pand pwill read p. Also for completeness the full terms after the intervention should be even longer due to it being conditional, e.g, p, p, where Z is a parameter representing the choice of the host .premains ⅓

p=p+premains ⅔,

p=0; we just learnt that the goat is behind door C, not the prize.

p= p-p= ⅔

For anyone with the information provided by the hostthis means that it isn’t a toss of a fair coin! For them the fact that pbecame zero does not “raise all other boats”, but rather premains the same and pgets doubled.

The bottom line is that the trader should consider p= ⅓ and p=⅔, hence by switching they are doubling the odds at winning!

Let’s generalise to N.

When we start all doors have odds of winning the prize p=1/N. After the trader chooses one door which we’ll call D₁, meaning p=1/N, we should now pay attention to the remaining set of doors {D₂, …, Dₙ} will have a chance of p=/N.

When the host revealsdoors {D₃, …, Dₙ} with goats:

premains 1/N

p=p+p+… + premains/N

p=p= …=p=p= 0; we just learnt that they have goats, not the prize.

p=p— p— … — p=/N

The trader should now consider two door values p=1/N and p=/N.

Hence the odds of winning improved by a factor of N-1! In the case of N=100, this means by an odds ratio of 99!.

The improvement of odds ratios in all scenarios between N=3 to 100 may be seen in the following graph. The thin line is the probability of winning by choosing any door prior to the intervention p=1/N. Note that it also represents the chance of winning after the intervention, if they decide to stick to their guns and not switch p.The thick line is the probability of winning the prize after the intervention if the door is switched p=/N:

Probability of winning as a function of N. p=p=1/N is the thin line; p=N/is the thick one.Perhaps the most interesting aspect of this graphis that the N=3 case has the highest probability before the host intervention , but the lowest probability after and vice versa for N=100.

Another interesting feature is the quick climb in the probability of winning for the switchers:

N=3: p=67%

N=4: p=75%

N=5=80%

The switchers curve gradually reaches an asymptote approaching at 100% whereas at N=99 it is 98.99% and at N=100 is equal to 99%.

This starts to address an interesting question:

Why Is Switching Obvious For Large N But Not N=3?

The answer is the fact that this puzzle is slightly ambiguous. Only the highly attentive realise that by revealing the goatthe host is actually conveying a lot of information that should be incorporated into one’s calculation. Later we discuss the difference of doing this calculation in one’s mind based on intuition and slowing down by putting pen to paper or coding up the problem.

How much information is conveyed by the host by intervening?

A hand wavy explanation is that this information may be visualised as the gap between the lines in the graph above. For N=3 we saw that the odds of winning doubled, but that doesn’t register as strongly to our common sense intuition as the 99 factor as in the N=100.

I have also considered describing stronger arguments from Information Theory that provide useful vocabulary to express communication of information. However, I feel that this fascinating field deserves a post of its own, which I’ve published.

The main takeaway for the Monty Hall problem is that I have calculated the information gain to be a logarithmic function of the number of doors c using this formula:

Information Gain due to the intervention of the host for a setup with c doors. Full details in my upcoming article.

For c=3 door case, e.g, the information gain is ⅔ bits. Full details are in this article on entropy.

To summarise this section, we use basic probability arguments to quantify the probabilities of winning the prize showing the benefit of switching for all N door scenarios. For those interested in more formal solutions using Bayesian and Causality on the bottom I provide supplement sections.

In the next three final sections we’ll discuss how this problem was accepted in the general public back in the 1990s, discuss lessons learnt and then summarise how we can apply them in real-world settings.

Being Confused Is OK

“No, that is impossible, it should make no difference.” — Paul Erdős

If you still don’t feel comfortable with the solution of the N=3 Monty Hall problem, don’t worry you are in good company! According to Vazsonyi¹ even Paul Erdős who is considered “of the greatest experts in probability theory” was confounded until computer simulations were demonstrated to him.

When the original solution by Steve Selvin² was popularised by Marilyn vos Savant in her column “Ask Marilyn” in Parade magazine in 1990 many readers wrote that Selvin and Savant were wrong³. According to Tierney’s 1991 article in the New York Times, this included about 10,000 readers, including nearly 1,000 with Ph.D degrees⁴.

On a personal note, over a decade ago I was exposed to the standard N=3 problem and since then managed to forget the solution numerous times. When I learnt about the large N approach I was quite excited about how intuitive it was. I then failed to explain it to my technical manager over lunch, so this is an attempt to compensate. I still have the same day job .

While researching this piece I realised that there is a lot to learn in terms of decision making in general and in particular useful for data science.

Lessons Learnt From Monty Hall Problem

In his book Thinking Fast and Slow, the late Daniel Kahneman, the co-creator of Behaviour Economics, suggested that we have two types of thought processes:

System 1 — fast thinking : based on intuition. This helps us react fast with confidence to familiar situations.

System 2 – slow thinking : based on deep thought. This helps figure out new complex situations that life throws at us.

Assuming this premise, you might have noticed that in the above you were applying both.

By examining the visual of N=100 doors your System 1 kicked in and you immediately knew the answer. I’m guessing that in the N=3 you were straddling between System 1 and 2. Considering that you had to stop and think a bit when going throughout the probabilities exercise it was definitely System 2 .

The decision maker’s struggle between System 1 and System 2 . Generated using Gemini Imagen 3

Beyond the fast and slow thinking I feel that there are a lot of data decision making lessons that may be learnt.Assessing probabilities can be counter-intuitive …

or

Be comfortable with shifting to deep thought

We’ve clearly shown that in the N=3 case. As previously mentioned it confounded many people including prominent statisticians.

Another classic example is The Birthday Paradox , which shows how we underestimate the likelihood of coincidences. In this problem most people would think that one needs a large group of people until they find a pair sharing the same birthday. It turns out that all you need is 23 to have a 50% chance. And 70 for a 99.9% chance.

One of the most confusing paradoxes in the realm of data analysis is Simpson’s, which I detailed in a previous article. This is a situation where trends of a population may be reversed in its subpopulations.

The common with all these paradoxes is them requiring us to get comfortable to shifting gears from System 1 fast thinking to System 2 slow . This is also the common theme for the lessons outlined below.

A few more classical examples are: The Gambler’s Fallacy , Base Rate Fallacy and the The LindaProblem . These are beyond the scope of this article, but I highly recommend looking them up to further sharpen ways of thinking about data.… especially when dealing with ambiguity

or

Search for clarity in ambiguity

Let’s reread the problem, this time as stated in “Ask Marilyn”

Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say №1, and the host, who knows what’s behind the doors, opens another door, say №3, which has a goat. He then says to you, “Do you want to pick door №2?” Is it to your advantage to switch your choice?

We discussed that the most important piece of information is not made explicit. It says that the host “knows what’s behind the doors”, but not that they open a door at random, although it’s implicitly understood that the host will never open the door with the car.

Many real life problems in data science involve dealing with ambiguous demands as well as in data provided by stakeholders.

It is crucial for the researcher to track down any relevant piece of information that is likely to have an impact and update that into the solution. Statisticians refer to this as “belief update”.With new information we should update our beliefs

This is the main aspect separating the Bayesian stream of thought to the Frequentist. The Frequentist approach takes data at face value. The Bayesian approach incorporates prior beliefs and updates it when new findings are introduced. This is especially useful when dealing with ambiguous situations.

To drive this point home, let’s re-examine this figure comparing between the post intervention N=3 setupsand the N=100 one.

Copied from above. Post intervention settings for the N=3 setupand N=100.

In both cases we had a prior belief that all doors had an equal chance of winning the prize p=1/N.

Once the host opened one doora lot of valuable information was revealed whereas in the case of N=100 it was much more apparent than N=3.

In the Frequentist approach, however, most of this information would be ignored, as it only focuses on the two closed doors. The Frequentist conclusion, hence is a 50% chance to win the prize regardless of what else is known about the situation. Hence the Frequentist takes Paul Erdős’ “no difference” point of view, which we now know to be incorrect.

This would be reasonable if all that was presented were the two doors and not the intervention and the goats. However, if that information is presented, one should shift gears into System 2 thinking and update their beliefs in the system. This is what we have done by focusing not only on the shut door, but rather consider what was learnt about the system at large.

For the brave hearted , in a supplementary section below called The Bayesian Point of View I solve for the Monty Hall problem using the Bayesian formalism.Be one with subjectivity

The Frequentist main reservation about “going Bayes” is that — “Statistics should be objective”.

The Bayesian response is — the Frequentist’s also apply a prior without realising it — a flat one.

Regardless of the Bayesian/Frequentist debate, as researchers we try our best to be as objective as possible in every step of the analysis.

That said, it is inevitable that subjective decisions are made throughout.

E.g, in a skewed distribution should one quote the mean or median? It highly depends on the context and hence a subjective decision needs to be made.

The responsibility of the analyst is to provide justification for their choices first to convince themselves and then their stakeholders.When confused — look for a useful analogy

… but tread with caution

We saw that by going from the N=3 setup to the N=100 the solution was apparent. This is a trick scientists frequently use — if the problem appears at first a bit too confusing/overwhelming, break it down and try to find a useful analogy.

It is probably not a perfect comparison, but going from the N=3 setup to N=100 is like examining a picture from up close and zooming out to see the big picture. Think of having only a puzzle piece and then glancing at the jigsaw photo on the box.

Monty Hall in 1976. Credit: Wikipedia and using Visual Paradigm Online for the puzzle effect

Note: whereas analogies may be powerful, one should do so with caution, not to oversimplify. Physicists refer to this situation as the spherical cow method, where models may oversimplify complex phenomena.

I admit that even with years of experience in applied statistics at times I still get confused at which method to apply. A large part of my thought process is identifying analogies to known solved problems. Sometimes after making progress in a direction I will realise that my assumptions were wrong and seek a new direction. I used to quip with colleagues that they shouldn’t trust me before my third attempt …Simulations are powerful but not always necessary

It’s interesting to learn that Paul Erdős and other mathematicians were convinced only after seeing simulations of the problem.

I am two-minded about usage of simulations when it comes to problem solving.

On the one hand simulations are powerful tools to analyse complex and intractable problems. Especially in real life data in which one wants a grasp not only of the underlying formulation, but also stochasticity.

And here is the big BUT — if a problem can be analytically solved like the Monty Hall one, simulations as fun as they may be, may not be necessary.

According to Occam’s razor, all that is required is a brief intuition to explain the phenomena. This is what I attempted to do here by applying common sense and some basic probability reasoning. For those who enjoy deep dives I provide below supplementary sections with two methods for analytical solutions — one using Bayesian statistics and another using Causality.After publishing the first version of this article there was a comment that Savant’s solution³ may be simpler than those presented here. I revisited her communications and agreed that it should be added. In the process I realised three more lessons may be learnt.A well designed visual goes a long way

Continuing the principle of Occam’s razor, Savant explained³ quite convincingly in my opinion:

You should switch. The first door has a 1/3 chance of winning, but the second door has a 2/3 chance. Here’s a good way to visualize what happened. Suppose there are a million doors, and you pick door #1. Then the host, who knows what’s behind the doors and will always avoid the one with the prize, opens them all except door #777,777. You’d switch to that door pretty fast, wouldn’t you?

Hence she provided an abstract visual for the readers. I attempted to do the same with the 100 doors figures.

Marilyn vos Savant who popularised the Monty Hall Problem. Credit: Ben David on Flickr under license

As mentioned many readers, and especially with backgrounds in maths and statistics, still weren’t convinced.

She revised³ with another mental image:

The benefits of switching are readily proven by playing through the six games that exhaust all the possibilities. For the first three games, you choose #1 and “switch” each time, for the second three games, you choose #1 and “stay” each time, and the host always opens a loser. Here are the results.

She added a table with all the scenarios. I took some artistic liberty and created the following figure. As indicated, the top batch are the scenarios in which the trader switches and the bottom when they switch. Lines in green are games which the trader wins, and in red when they get zonked. The symbolised the door chosen by the trader and Monte Hall then chooses a different door that has a goat behind it.

Adaptation of Savant’s table³ of six scenarios that shows the solution to the Monty Hall Problem

We clearly see from this diagram that the switcher has a ⅔ chance of winning and those that stay only ⅓.

This is yet another elegant visualisation that clearly explains the non intuitive.

It strengthens the claim that there is no real need for simulations in this case because all they would be doing is rerunning these six scenarios.

One more popular solution is decision tree illustrations. You can find these in the Wikipedia page, but I find it’s a bit redundant to Savant’s table.

The fact that we can solve this problem in so many ways yields another lesson:There are many ways to skin a … problem

Of the many lessons that I have learnt from the writings of late Richard Feynman, one of the best physics and ideas communicators, is that a problem can be solved many ways. Mathematicians and Physicists do this all the time.

A relevant quote that paraphrases Occam’s razor:

If you can’t explain it simply, you don’t understand it well enough — attributed to Albert Einstein

And finallyEmbrace ignorance and be humble ‍

“You are utterly incorrect … How many irate mathematicians are needed to get you to change your mind?” — Ph.D from Georgetown University

“May I suggest that you obtain and refer to a standard textbook on probability before you try to answer a question of this type again?” — Ph.D from University of Florida

“You’re in error, but Albert Einstein earned a dearer place in the hearts of people after he admitted his errors.” — Ph.D. from University of Michigan

Ouch!

These are some of the said responses from mathematicians to the Parade article.

Such unnecessary viciousness.

You can check the reference³ to see the writer’s names and other like it. To whet your appetite: “You blew it, and you blew it big!”, , “You made a mistake, but look at the positive side. If all those Ph.D.’s were wrong, the country would be in some very serious trouble.”, “I am in shock that after being corrected by at least three mathematicians, you still do not see your mistake.”.

And as expected from the 1990s perhaps the most embarrassing one was from a resident of Oregon:

“Maybe women look at math problems differently than men.”

These make me cringe and be embarrassed to be associated by gender and Ph.D. title with these graduates and professors.

Hopefully in the 2020s most people are more humble about their ignorance. Yuval Noah Harari discusses the fact that the Scientific Revolution of Galileo Galilei et al. was not due to knowledge but rather admittance of ignorance.

“The great discovery that launched the Scientific Revolution was the discovery that humans do not know the answers to their most important questions” — Yuval Noah Harari

Fortunately for mathematicians’ image, there were also quiet a lot of more enlightened comments. I like this one from one Seth Kalson, Ph.D. of MIT:

You are indeed correct. My colleagues at work had a ball with this problem, and I dare say that most of them, including me at first, thought you were wrong!

We’ll summarise by examining how, and if, the Monty Hall problem may be applied in real-world settings, so you can try to relate to projects that you are working on.

Application in Real World Settings

Researching for this article I found that beyond artificial setups for entertainment⁶ ⁷ there aren’t practical settings for this problem to use as an analogy. Of course, I may be wrong⁸ and would be glad to hear if you know of one.

One way of assessing the viability of an analogy is using arguments from causality which provides vocabulary that cannot be expressed with standard statistics.

In a previous post I discussed the fact that the story behind the data is as important as the data itself. In particular Causal Graph Models visualise the story behind the data, which we will use as a framework for a reasonable analogy.

For the Monty Hall problem we can build a Causal Graph Model like this:

Reading:

The door chosen by the trader is independent from that with the prize and vice versa. As important, there is no common cause between them that might generate a spurious correlation.

The host’s choice depends on both and .

By comparing causal graphs of two systems one can get a sense for how analogous both are. A perfect analogy would require more details, but this is beyond the scope of this article. Briefly, one would want to ensure similar functions between the parameters.

Those interested in learning further details about using Causal Graphs Models to assess causality in real world problems may be interested in this article.

Anecdotally it is also worth mentioning that on Let’s Make a Deal, Monty himself has admitted years later to be playing mind games with the contestants and did not always follow the rules, e.g, not always doing the intervention as “it all depends on his mood”⁴.

In our setup we assumed perfect conditions, i.e., a host that does not skew from the script and/or play on the trader’s emotions. Taking this into consideration would require updating the Graphical Model above, which is beyond the scope of this article.

Some might be disheartened to realise at this stage of the post that there might not be real world applications for this problem.

I argue that lessons learnt from the Monty Hall problem definitely are.

Just to summarise them again:Assessing probabilities can be counter intuitive …… especially when dealing with ambiguityWith new information we should update our beliefsBe one with subjectivityWhen confused — look for a useful analogy … but tread with cautionSimulations are powerful but not always necessaryA well designed visual goes a long wayThere are many ways to skin a … problemEmbrace ignorance and be humble ‍

While the Monty Hall Problem might seem like a simple puzzle, it offers valuable insights into decision-making, particularly for data scientists. The problem highlights the importance of going beyond intuition and embracing a more analytical, data-driven approach. By understanding the principles of Bayesian thinking and updating our beliefs based on new information, we can make more informed decisions in many aspects of our lives, including data science. The Monty Hall Problem serves as a reminder that even seemingly straightforward scenarios can contain hidden complexities and that by carefully examining available information, we can uncover hidden truths and make better decisions.

At the bottom of the article I provide a list of resources that I found useful to learn about this topic.

Credit: Wikipedia

Loved this post? Join me on LinkedIn or Buy me a coffee!

Credits

Unless otherwise noted, all images were created by the author.

Many thanks to Jim Parr, Will Reynolds, and Betty Kazin for their useful comments.

In the following supplementary sections I derive solutions to the Monty Hall’s problem from two perspectives:

Bayesian

Causal

Both are motivated by questions in textbook: Causal Inference in Statistics A Primer by Judea Pearl, Madelyn Glymour, and Nicholas P. Jewell.

Supplement 1: The Bayesian Point of View

This section assumes a basic understanding of Bayes’ Theorem, in particular being comfortable conditional probabilities. In other words if this makes sense:

We set out to use Bayes’ theorem to prove that switching doors improves chances in the N=3 Monty Hall Problem.We define

X — the chosen door

Y— the door with the prize

Z — the door opened by the host

Labelling the doors as A, B and C, without loss of generality, we need to solve for:

Using Bayes’ theorem we equate the left side as

and the right one as:

Most components are equal=P=⅓ so we are left to prove:

In the case where Y=B, the host has only one choice, making P= 1.

In the case where Y=A, the host has two choices, making P= 1/2.

From here:

Quod erat demonstrandum.

Note: if the “host choices” arguments didn’t make sense look at the table below showing this explicitly. You will want to compare entries {X=A, Y=B, Z=C} and {X=A, Y=A, Z=C}.

Supplement 2: The Causal Point of View

The section assumes a basic understanding of Directed Acyclic Graphsand Structural Causal Modelsis useful, but not required. In brief:

DAGs qualitatively visualise the causal relationships between the parameter nodes.

SCMs quantitatively express the formula relationships between the parameters.

Given the DAG

we are going to define the SCM that corresponds to the classic N=3 Monty Hall problem and use it to describe the joint distribution of all variables. We later will generically expand to N.We define

X — the chosen door

Y — the door with the prize

Z — the door opened by the host

According to the DAG we see that according to the chain rule:

The SCM is defined by exogenous variables U , endogenous variables V, and the functions between them F:

U = {X,Y}, V={Z}, F= {f}

where X, Y and Z have door values:

D = {A, B, C}

The host choice is fdefined as:

In order to generalise to N doors, the DAG remains the same, but the SCM requires to update D to be a set of N doors Dᵢ: {D₁, D₂, … Dₙ}.

Exploring Example Scenarios

To gain an intuition for this SCM, let’s examine 6 examples of 27:

When X=YP= 0; cannot choose the participant’s door

P= 1/2; is behind → chooses B at 50%

P= 1/2; is behind → chooses C at 50%When X≠YP= 0; cannot choose the participant’s door

P= 0; cannot choose prize door

P= 1; has not choice in the matterCalculating Joint Probabilities

Using logic let’s code up all 27 possibilities in python

df = pd.DataFrame++, "Y":++)* 3, "Z":* 9})

df= None

p_x = 1./3

p_y = 1./3

df.loc= 0

df.loc= 0.5

df.loc= 0

df.loc= 0

df.loc= 1

df= df* p_x * p_y

print{df.sum}")

df

yields

Resources

This Quora discussion by Joshua Engel helped me shape a few aspects of this article.

Causal Inference in Statistics A Primer / Pearl, Glymour & Jewell— excellent short text bookI also very much enjoy Tim Harford’s podcast Cautionary Tales. He wrote about this topic on November 3rd 2017 for the Financial Times: Monty Hall and the game show stick-or-switch conundrum

Footnotes

¹ Vazsonyi, Andrew. “Which Door Has the Cadillac?”. Decision Line: 17–19. Archived from the originalon 13 April 2014. Retrieved 16 October 2012.

² Steve Selvin to the American Statistician in 1975.³Game Show Problem by Marilyn vos Savant’s “Ask Marilyn” in marilynvossavant.com: “This material in this article was originally published in PARADE magazine in 1990 and 1991”

⁴Tierney, John. “Behind Monty Hall’s Doors: Puzzle, Debate and Answer?”. The New York Times. Retrieved 18 January 2008.

⁵ Kahneman, D.. Thinking, fast and slow. Farrar, Straus and Giroux.

⁶ MythBusters Episode 177 “Pick a Door”Watch Mythbuster’s approach

⁶Monty Hall Problem on Survivor Season 41Watch Survivor’s take on the problem

⁷ Jingyi Jessica LiHow the Monty Hall problem is similar to the false discovery rate in high-throughput data analysis.Whereas the author points about “similarities” between hypothesis testing and the Monty Hall problem, I think that this is a bit misleading. The author is correct that both problems change by the order in which processes are done, but that is part of Bayesian statistics in general, not limited to the Monty Hall problem.
The post Lessons in Decision Making from the Monty Hall Problem appeared first on Towards Data Science.
#lessons #decision #making #monty #hall

🚪🚪🐐 Lessons in Decision Making from the Monty Hall Problem
The Monty Hall Problem is a well-known brain teaser from which we can learn important lessons in Decision Making that are useful in general and in particular for data scientists. If you are not familiar with this problem, prepare to be perplexed . If you are, I hope to shine light on aspects that you might not have considered . I introduce the problem and solve with three types of intuitions: Common — The heart of this post focuses on applying our common sense to solve this problem. We’ll explore why it fails us and what we can do to intuitively overcome this to make the solution crystal clear . We’ll do this by using visuals , qualitative arguments and some basic probabilities. Bayesian — We will briefly discuss the importance of belief propagation. Causal — We will use a Graph Model to visualise conditions required to use the Monty Hall problem in real world settings.Spoiler alert I haven’t been convinced that there are any, but the thought process is very useful. I summarise by discussing lessons learnt for better data decision making. In regards to the Bayesian and Causal intuitions, these will be presented in a gentle form. For the mathematically inclined I also provide supplementary sections with short Deep Dives into each approach after the summary.By examining different aspects of this puzzle in probability you will hopefully be able to improve your data decision making . Credit: Wikipedia First, some history. Let’s Make a Deal is a USA television game show that originated in 1963. As its premise, audience participants were considered traders making deals with the host, Monty Hall . At the heart of the matter is an apparently simple scenario: A trader is posed with the question of choosing one of three doors for the opportunity to win a luxurious prize, e.g, a car . Behind the other two were goats . The trader is shown three closed doors. The trader chooses one of the doors. Let’s call thisdoor A and mark it with a . Keeping the chosen door closed, the host reveals one of the remaining doors showing a goat. The trader chooses door and the the host reveals door C showing a goat. The host then asks the trader if they would like to stick with their first choice or switch to the other remaining one. If the trader guesses correct they win the prize . If not they’ll be shown another goat. What is the probability of being Zonked? Credit: Wikipedia Should the trader stick with their original choice of door A or switch to B? Before reading further, give it a go. What would you do? Most people are likely to have a gut intuition that “it doesn’t matter” arguing that in the first instance each door had a ⅓ chance of hiding the prize, and that after the host intervention , when only two doors remain closed, the winning of the prize is 50:50. There are various ways of explaining why the coin toss intuition is incorrect. Most of these involve maths equations, or simulations. Whereas we will address these later, we’ll attempt to solve by applying Occam’s razor: A principle that states that simpler explanations are preferable to more complex ones — William of OckhamTo do this it is instructive to slightly redefine the problem to a large N doors instead of the original three. The Large N-Door Problem Similar to before: you have to choose one of many doors. For illustration let’s say N=100. Behind one of the doors there is the prize and behind 99of the rest are goats . The 100 Door Monty Hall problem before the host intervention. You choose one door and the host reveals 98of the other doors that have goats leaving yours and one more closed . The 100 Door Monty Hall Problem after the host intervention. Should you stick with your door or make the switch? Should you stick with your original choice or make the switch? I think you’ll agree with me that the remaining door, not chosen by you, is much more likely to conceal the prize … so you should definitely make the switch! It’s illustrative to compare both scenarios discussed so far. In the next figure we compare the post host intervention for the N=3 setupand that of N=100: Post intervention settings for the N=3 setupand N=100. In both cases we see two shut doors, one of which we’ve chosen. The main difference between these scenarios is that in the first we see one goat and in the second there are more than the eye would care to see. Why do most people consider the first case as a “50:50” toss up and in the second it’s obvious to make the switch? We’ll soon address this question of why. First let’s put probabilities of success behind the different scenarios. What’s The Frequency, Kenneth? So far we learnt from the N=100 scenario that switching doors is obviously beneficial. Inferring for the N=3 may be a leap of faith for most. Using some basic probability arguments here we’ll quantify why it is favourable to make the switch for any number door scenario N. We start with the standard Monty Hall problem. When it starts the probability of the prize being behind each of the doors A, B and C is p=⅓. To be explicit let’s define the Y parameter to be the door with the prize , i.e, p= p=p=⅓. The trick to solving this problem is that once the trader’s door A has been chosen , we should pay close attention to the set of the other doors {B,C}, which has the probability of p=p+p=⅔. This visual may help make sense of this: By being attentive to the {B,C} the rest should follow. When the goat is revealed it is apparent that the probabilities post intervention change. Note that for ease of reading I’ll drop the Y notation, where pwill read pand pwill read p. Also for completeness the full terms after the intervention should be even longer due to it being conditional, e.g, p, p, where Z is a parameter representing the choice of the host .premains ⅓ p=p+premains ⅔, p=0; we just learnt that the goat is behind door C, not the prize. p= p-p= ⅔ For anyone with the information provided by the hostthis means that it isn’t a toss of a fair coin! For them the fact that pbecame zero does not “raise all other boats”, but rather premains the same and pgets doubled. The bottom line is that the trader should consider p= ⅓ and p=⅔, hence by switching they are doubling the odds at winning! Let’s generalise to N. When we start all doors have odds of winning the prize p=1/N. After the trader chooses one door which we’ll call D₁, meaning p=1/N, we should now pay attention to the remaining set of doors {D₂, …, Dₙ} will have a chance of p=/N. When the host revealsdoors {D₃, …, Dₙ} with goats: premains 1/N p=p+p+… + premains/N p=p= …=p=p= 0; we just learnt that they have goats, not the prize. p=p— p— … — p=/N The trader should now consider two door values p=1/N and p=/N. Hence the odds of winning improved by a factor of N-1! In the case of N=100, this means by an odds ratio of 99!. The improvement of odds ratios in all scenarios between N=3 to 100 may be seen in the following graph. The thin line is the probability of winning by choosing any door prior to the intervention p=1/N. Note that it also represents the chance of winning after the intervention, if they decide to stick to their guns and not switch p.The thick line is the probability of winning the prize after the intervention if the door is switched p=/N: Probability of winning as a function of N. p=p=1/N is the thin line; p=N/is the thick one.Perhaps the most interesting aspect of this graphis that the N=3 case has the highest probability before the host intervention , but the lowest probability after and vice versa for N=100. Another interesting feature is the quick climb in the probability of winning for the switchers: N=3: p=67% N=4: p=75% N=5=80% The switchers curve gradually reaches an asymptote approaching at 100% whereas at N=99 it is 98.99% and at N=100 is equal to 99%. This starts to address an interesting question: Why Is Switching Obvious For Large N But Not N=3? The answer is the fact that this puzzle is slightly ambiguous. Only the highly attentive realise that by revealing the goatthe host is actually conveying a lot of information that should be incorporated into one’s calculation. Later we discuss the difference of doing this calculation in one’s mind based on intuition and slowing down by putting pen to paper or coding up the problem. How much information is conveyed by the host by intervening? A hand wavy explanation is that this information may be visualised as the gap between the lines in the graph above. For N=3 we saw that the odds of winning doubled, but that doesn’t register as strongly to our common sense intuition as the 99 factor as in the N=100. I have also considered describing stronger arguments from Information Theory that provide useful vocabulary to express communication of information. However, I feel that this fascinating field deserves a post of its own, which I’ve published. The main takeaway for the Monty Hall problem is that I have calculated the information gain to be a logarithmic function of the number of doors c using this formula: Information Gain due to the intervention of the host for a setup with c doors. Full details in my upcoming article. For c=3 door case, e.g, the information gain is ⅔ bits. Full details are in this article on entropy. To summarise this section, we use basic probability arguments to quantify the probabilities of winning the prize showing the benefit of switching for all N door scenarios. For those interested in more formal solutions using Bayesian and Causality on the bottom I provide supplement sections. In the next three final sections we’ll discuss how this problem was accepted in the general public back in the 1990s, discuss lessons learnt and then summarise how we can apply them in real-world settings. Being Confused Is OK “No, that is impossible, it should make no difference.” — Paul Erdős If you still don’t feel comfortable with the solution of the N=3 Monty Hall problem, don’t worry you are in good company! According to Vazsonyi¹ even Paul Erdős who is considered “of the greatest experts in probability theory” was confounded until computer simulations were demonstrated to him. When the original solution by Steve Selvin² was popularised by Marilyn vos Savant in her column “Ask Marilyn” in Parade magazine in 1990 many readers wrote that Selvin and Savant were wrong³. According to Tierney’s 1991 article in the New York Times, this included about 10,000 readers, including nearly 1,000 with Ph.D degrees⁴. On a personal note, over a decade ago I was exposed to the standard N=3 problem and since then managed to forget the solution numerous times. When I learnt about the large N approach I was quite excited about how intuitive it was. I then failed to explain it to my technical manager over lunch, so this is an attempt to compensate. I still have the same day job . While researching this piece I realised that there is a lot to learn in terms of decision making in general and in particular useful for data science. Lessons Learnt From Monty Hall Problem In his book Thinking Fast and Slow, the late Daniel Kahneman, the co-creator of Behaviour Economics, suggested that we have two types of thought processes: System 1 — fast thinking : based on intuition. This helps us react fast with confidence to familiar situations. System 2 – slow thinking : based on deep thought. This helps figure out new complex situations that life throws at us. Assuming this premise, you might have noticed that in the above you were applying both. By examining the visual of N=100 doors your System 1 kicked in and you immediately knew the answer. I’m guessing that in the N=3 you were straddling between System 1 and 2. Considering that you had to stop and think a bit when going throughout the probabilities exercise it was definitely System 2 . The decision maker’s struggle between System 1 and System 2 . Generated using Gemini Imagen 3 Beyond the fast and slow thinking I feel that there are a lot of data decision making lessons that may be learnt.Assessing probabilities can be counter-intuitive … or Be comfortable with shifting to deep thought We’ve clearly shown that in the N=3 case. As previously mentioned it confounded many people including prominent statisticians. Another classic example is The Birthday Paradox , which shows how we underestimate the likelihood of coincidences. In this problem most people would think that one needs a large group of people until they find a pair sharing the same birthday. It turns out that all you need is 23 to have a 50% chance. And 70 for a 99.9% chance. One of the most confusing paradoxes in the realm of data analysis is Simpson’s, which I detailed in a previous article. This is a situation where trends of a population may be reversed in its subpopulations. The common with all these paradoxes is them requiring us to get comfortable to shifting gears from System 1 fast thinking to System 2 slow . This is also the common theme for the lessons outlined below. A few more classical examples are: The Gambler’s Fallacy , Base Rate Fallacy and the The LindaProblem . These are beyond the scope of this article, but I highly recommend looking them up to further sharpen ways of thinking about data.… especially when dealing with ambiguity or Search for clarity in ambiguity Let’s reread the problem, this time as stated in “Ask Marilyn” Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say №1, and the host, who knows what’s behind the doors, opens another door, say №3, which has a goat. He then says to you, “Do you want to pick door №2?” Is it to your advantage to switch your choice? We discussed that the most important piece of information is not made explicit. It says that the host “knows what’s behind the doors”, but not that they open a door at random, although it’s implicitly understood that the host will never open the door with the car. Many real life problems in data science involve dealing with ambiguous demands as well as in data provided by stakeholders. It is crucial for the researcher to track down any relevant piece of information that is likely to have an impact and update that into the solution. Statisticians refer to this as “belief update”.With new information we should update our beliefs This is the main aspect separating the Bayesian stream of thought to the Frequentist. The Frequentist approach takes data at face value. The Bayesian approach incorporates prior beliefs and updates it when new findings are introduced. This is especially useful when dealing with ambiguous situations. To drive this point home, let’s re-examine this figure comparing between the post intervention N=3 setupsand the N=100 one. Copied from above. Post intervention settings for the N=3 setupand N=100. In both cases we had a prior belief that all doors had an equal chance of winning the prize p=1/N. Once the host opened one doora lot of valuable information was revealed whereas in the case of N=100 it was much more apparent than N=3. In the Frequentist approach, however, most of this information would be ignored, as it only focuses on the two closed doors. The Frequentist conclusion, hence is a 50% chance to win the prize regardless of what else is known about the situation. Hence the Frequentist takes Paul Erdős’ “no difference” point of view, which we now know to be incorrect. This would be reasonable if all that was presented were the two doors and not the intervention and the goats. However, if that information is presented, one should shift gears into System 2 thinking and update their beliefs in the system. This is what we have done by focusing not only on the shut door, but rather consider what was learnt about the system at large. For the brave hearted , in a supplementary section below called The Bayesian Point of View I solve for the Monty Hall problem using the Bayesian formalism.Be one with subjectivity The Frequentist main reservation about “going Bayes” is that — “Statistics should be objective”. The Bayesian response is — the Frequentist’s also apply a prior without realising it — a flat one. Regardless of the Bayesian/Frequentist debate, as researchers we try our best to be as objective as possible in every step of the analysis. That said, it is inevitable that subjective decisions are made throughout. E.g, in a skewed distribution should one quote the mean or median? It highly depends on the context and hence a subjective decision needs to be made. The responsibility of the analyst is to provide justification for their choices first to convince themselves and then their stakeholders.When confused — look for a useful analogy … but tread with caution We saw that by going from the N=3 setup to the N=100 the solution was apparent. This is a trick scientists frequently use — if the problem appears at first a bit too confusing/overwhelming, break it down and try to find a useful analogy. It is probably not a perfect comparison, but going from the N=3 setup to N=100 is like examining a picture from up close and zooming out to see the big picture. Think of having only a puzzle piece and then glancing at the jigsaw photo on the box. Monty Hall in 1976. Credit: Wikipedia and using Visual Paradigm Online for the puzzle effect Note: whereas analogies may be powerful, one should do so with caution, not to oversimplify. Physicists refer to this situation as the spherical cow method, where models may oversimplify complex phenomena. I admit that even with years of experience in applied statistics at times I still get confused at which method to apply. A large part of my thought process is identifying analogies to known solved problems. Sometimes after making progress in a direction I will realise that my assumptions were wrong and seek a new direction. I used to quip with colleagues that they shouldn’t trust me before my third attempt …Simulations are powerful but not always necessary It’s interesting to learn that Paul Erdős and other mathematicians were convinced only after seeing simulations of the problem. I am two-minded about usage of simulations when it comes to problem solving. On the one hand simulations are powerful tools to analyse complex and intractable problems. Especially in real life data in which one wants a grasp not only of the underlying formulation, but also stochasticity. And here is the big BUT — if a problem can be analytically solved like the Monty Hall one, simulations as fun as they may be, may not be necessary. According to Occam’s razor, all that is required is a brief intuition to explain the phenomena. This is what I attempted to do here by applying common sense and some basic probability reasoning. For those who enjoy deep dives I provide below supplementary sections with two methods for analytical solutions — one using Bayesian statistics and another using Causality.After publishing the first version of this article there was a comment that Savant’s solution³ may be simpler than those presented here. I revisited her communications and agreed that it should be added. In the process I realised three more lessons may be learnt.A well designed visual goes a long way Continuing the principle of Occam’s razor, Savant explained³ quite convincingly in my opinion: You should switch. The first door has a 1/3 chance of winning, but the second door has a 2/3 chance. Here’s a good way to visualize what happened. Suppose there are a million doors, and you pick door #1. Then the host, who knows what’s behind the doors and will always avoid the one with the prize, opens them all except door #777,777. You’d switch to that door pretty fast, wouldn’t you? Hence she provided an abstract visual for the readers. I attempted to do the same with the 100 doors figures. Marilyn vos Savant who popularised the Monty Hall Problem. Credit: Ben David on Flickr under license As mentioned many readers, and especially with backgrounds in maths and statistics, still weren’t convinced. She revised³ with another mental image: The benefits of switching are readily proven by playing through the six games that exhaust all the possibilities. For the first three games, you choose #1 and “switch” each time, for the second three games, you choose #1 and “stay” each time, and the host always opens a loser. Here are the results. She added a table with all the scenarios. I took some artistic liberty and created the following figure. As indicated, the top batch are the scenarios in which the trader switches and the bottom when they switch. Lines in green are games which the trader wins, and in red when they get zonked. The symbolised the door chosen by the trader and Monte Hall then chooses a different door that has a goat behind it. Adaptation of Savant’s table³ of six scenarios that shows the solution to the Monty Hall Problem We clearly see from this diagram that the switcher has a ⅔ chance of winning and those that stay only ⅓. This is yet another elegant visualisation that clearly explains the non intuitive. It strengthens the claim that there is no real need for simulations in this case because all they would be doing is rerunning these six scenarios. One more popular solution is decision tree illustrations. You can find these in the Wikipedia page, but I find it’s a bit redundant to Savant’s table. The fact that we can solve this problem in so many ways yields another lesson:There are many ways to skin a … problem Of the many lessons that I have learnt from the writings of late Richard Feynman, one of the best physics and ideas communicators, is that a problem can be solved many ways. Mathematicians and Physicists do this all the time. A relevant quote that paraphrases Occam’s razor: If you can’t explain it simply, you don’t understand it well enough — attributed to Albert Einstein And finallyEmbrace ignorance and be humble ‍ “You are utterly incorrect … How many irate mathematicians are needed to get you to change your mind?” — Ph.D from Georgetown University “May I suggest that you obtain and refer to a standard textbook on probability before you try to answer a question of this type again?” — Ph.D from University of Florida “You’re in error, but Albert Einstein earned a dearer place in the hearts of people after he admitted his errors.” — Ph.D. from University of Michigan Ouch! These are some of the said responses from mathematicians to the Parade article. Such unnecessary viciousness. You can check the reference³ to see the writer’s names and other like it. To whet your appetite: “You blew it, and you blew it big!”, , “You made a mistake, but look at the positive side. If all those Ph.D.’s were wrong, the country would be in some very serious trouble.”, “I am in shock that after being corrected by at least three mathematicians, you still do not see your mistake.”. And as expected from the 1990s perhaps the most embarrassing one was from a resident of Oregon: “Maybe women look at math problems differently than men.” These make me cringe and be embarrassed to be associated by gender and Ph.D. title with these graduates and professors. Hopefully in the 2020s most people are more humble about their ignorance. Yuval Noah Harari discusses the fact that the Scientific Revolution of Galileo Galilei et al. was not due to knowledge but rather admittance of ignorance. “The great discovery that launched the Scientific Revolution was the discovery that humans do not know the answers to their most important questions” — Yuval Noah Harari Fortunately for mathematicians’ image, there were also quiet a lot of more enlightened comments. I like this one from one Seth Kalson, Ph.D. of MIT: You are indeed correct. My colleagues at work had a ball with this problem, and I dare say that most of them, including me at first, thought you were wrong! We’ll summarise by examining how, and if, the Monty Hall problem may be applied in real-world settings, so you can try to relate to projects that you are working on. Application in Real World Settings Researching for this article I found that beyond artificial setups for entertainment⁶ ⁷ there aren’t practical settings for this problem to use as an analogy. Of course, I may be wrong⁸ and would be glad to hear if you know of one. One way of assessing the viability of an analogy is using arguments from causality which provides vocabulary that cannot be expressed with standard statistics. In a previous post I discussed the fact that the story behind the data is as important as the data itself. In particular Causal Graph Models visualise the story behind the data, which we will use as a framework for a reasonable analogy. For the Monty Hall problem we can build a Causal Graph Model like this: Reading: The door chosen by the trader is independent from that with the prize and vice versa. As important, there is no common cause between them that might generate a spurious correlation. The host’s choice depends on both and . By comparing causal graphs of two systems one can get a sense for how analogous both are. A perfect analogy would require more details, but this is beyond the scope of this article. Briefly, one would want to ensure similar functions between the parameters. Those interested in learning further details about using Causal Graphs Models to assess causality in real world problems may be interested in this article. Anecdotally it is also worth mentioning that on Let’s Make a Deal, Monty himself has admitted years later to be playing mind games with the contestants and did not always follow the rules, e.g, not always doing the intervention as “it all depends on his mood”⁴. In our setup we assumed perfect conditions, i.e., a host that does not skew from the script and/or play on the trader’s emotions. Taking this into consideration would require updating the Graphical Model above, which is beyond the scope of this article. Some might be disheartened to realise at this stage of the post that there might not be real world applications for this problem. I argue that lessons learnt from the Monty Hall problem definitely are. Just to summarise them again:Assessing probabilities can be counter intuitive …… especially when dealing with ambiguityWith new information we should update our beliefsBe one with subjectivityWhen confused — look for a useful analogy … but tread with cautionSimulations are powerful but not always necessaryA well designed visual goes a long wayThere are many ways to skin a … problemEmbrace ignorance and be humble ‍ While the Monty Hall Problem might seem like a simple puzzle, it offers valuable insights into decision-making, particularly for data scientists. The problem highlights the importance of going beyond intuition and embracing a more analytical, data-driven approach. By understanding the principles of Bayesian thinking and updating our beliefs based on new information, we can make more informed decisions in many aspects of our lives, including data science. The Monty Hall Problem serves as a reminder that even seemingly straightforward scenarios can contain hidden complexities and that by carefully examining available information, we can uncover hidden truths and make better decisions. At the bottom of the article I provide a list of resources that I found useful to learn about this topic. Credit: Wikipedia Loved this post? Join me on LinkedIn or Buy me a coffee! Credits Unless otherwise noted, all images were created by the author. Many thanks to Jim Parr, Will Reynolds, and Betty Kazin for their useful comments. In the following supplementary sections I derive solutions to the Monty Hall’s problem from two perspectives: Bayesian Causal Both are motivated by questions in textbook: Causal Inference in Statistics A Primer by Judea Pearl, Madelyn Glymour, and Nicholas P. Jewell. Supplement 1: The Bayesian Point of View This section assumes a basic understanding of Bayes’ Theorem, in particular being comfortable conditional probabilities. In other words if this makes sense: We set out to use Bayes’ theorem to prove that switching doors improves chances in the N=3 Monty Hall Problem.We define X — the chosen door Y— the door with the prize Z — the door opened by the host Labelling the doors as A, B and C, without loss of generality, we need to solve for: Using Bayes’ theorem we equate the left side as and the right one as: Most components are equal=P=⅓ so we are left to prove: In the case where Y=B, the host has only one choice, making P= 1. In the case where Y=A, the host has two choices, making P= 1/2. From here: Quod erat demonstrandum. Note: if the “host choices” arguments didn’t make sense look at the table below showing this explicitly. You will want to compare entries {X=A, Y=B, Z=C} and {X=A, Y=A, Z=C}. Supplement 2: The Causal Point of View The section assumes a basic understanding of Directed Acyclic Graphsand Structural Causal Modelsis useful, but not required. In brief: DAGs qualitatively visualise the causal relationships between the parameter nodes. SCMs quantitatively express the formula relationships between the parameters. Given the DAG we are going to define the SCM that corresponds to the classic N=3 Monty Hall problem and use it to describe the joint distribution of all variables. We later will generically expand to N.We define X — the chosen door Y — the door with the prize Z — the door opened by the host According to the DAG we see that according to the chain rule: The SCM is defined by exogenous variables U , endogenous variables V, and the functions between them F: U = {X,Y}, V={Z}, F= {f} where X, Y and Z have door values: D = {A, B, C} The host choice is fdefined as: In order to generalise to N doors, the DAG remains the same, but the SCM requires to update D to be a set of N doors Dᵢ: {D₁, D₂, … Dₙ}. Exploring Example Scenarios To gain an intuition for this SCM, let’s examine 6 examples of 27: When X=YP= 0; cannot choose the participant’s door P= 1/2; is behind → chooses B at 50% P= 1/2; is behind → chooses C at 50%When X≠YP= 0; cannot choose the participant’s door P= 0; cannot choose prize door P= 1; has not choice in the matterCalculating Joint Probabilities Using logic let’s code up all 27 possibilities in python df = pd.DataFrame++, "Y":++)* 3, "Z":* 9}) df= None p_x = 1./3 p_y = 1./3 df.loc= 0 df.loc= 0.5 df.loc= 0 df.loc= 0 df.loc= 1 df= df* p_x * p_y print{df.sum}") df yields Resources This Quora discussion by Joshua Engel helped me shape a few aspects of this article. Causal Inference in Statistics A Primer / Pearl, Glymour & Jewell— excellent short text bookI also very much enjoy Tim Harford’s podcast Cautionary Tales. He wrote about this topic on November 3rd 2017 for the Financial Times: Monty Hall and the game show stick-or-switch conundrum Footnotes ¹ Vazsonyi, Andrew. “Which Door Has the Cadillac?”. Decision Line: 17–19. Archived from the originalon 13 April 2014. Retrieved 16 October 2012. ² Steve Selvin to the American Statistician in 1975.³Game Show Problem by Marilyn vos Savant’s “Ask Marilyn” in marilynvossavant.com: “This material in this article was originally published in PARADE magazine in 1990 and 1991” ⁴Tierney, John. “Behind Monty Hall’s Doors: Puzzle, Debate and Answer?”. The New York Times. Retrieved 18 January 2008. ⁵ Kahneman, D.. Thinking, fast and slow. Farrar, Straus and Giroux. ⁶ MythBusters Episode 177 “Pick a Door”Watch Mythbuster’s approach ⁶Monty Hall Problem on Survivor Season 41Watch Survivor’s take on the problem ⁷ Jingyi Jessica LiHow the Monty Hall problem is similar to the false discovery rate in high-throughput data analysis.Whereas the author points about “similarities” between hypothesis testing and the Monty Hall problem, I think that this is a bit misleading. The author is correct that both problems change by the order in which processes are done, but that is part of Bayesian statistics in general, not limited to the Monty Hall problem. The post 🚪🚪🐐 Lessons in Decision Making from the Monty Hall Problem appeared first on Towards Data Science. #lessons #decision #making #monty #hall

🚪🚪🐐 Lessons in Decision Making from the Monty Hall Problem

towardsdatascience.com
The Monty Hall Problem is a well-known brain teaser from which we can learn important lessons in Decision Making that are useful in general and in particular for data scientists. If you are not familiar with this problem, prepare to be perplexed . If you are, I hope to shine light on aspects that you might not have considered . I introduce the problem and solve with three types of intuitions: Common — The heart of this post focuses on applying our common sense to solve this problem. We’ll explore why it fails us and what we can do to intuitively overcome this to make the solution crystal clear . We’ll do this by using visuals , qualitative arguments and some basic probabilities (not too deep, I promise). Bayesian — We will briefly discuss the importance of belief propagation. Causal — We will use a Graph Model to visualise conditions required to use the Monty Hall problem in real world settings.Spoiler alert I haven’t been convinced that there are any, but the thought process is very useful. I summarise by discussing lessons learnt for better data decision making. In regards to the Bayesian and Causal intuitions, these will be presented in a gentle form. For the mathematically inclined I also provide supplementary sections with short Deep Dives into each approach after the summary. (Note: These are not required to appreciate the main points of the article.) By examining different aspects of this puzzle in probability you will hopefully be able to improve your data decision making . Credit: Wikipedia First, some history. Let’s Make a Deal is a USA television game show that originated in 1963. As its premise, audience participants were considered traders making deals with the host, Monty Hall . At the heart of the matter is an apparently simple scenario: A trader is posed with the question of choosing one of three doors for the opportunity to win a luxurious prize, e.g, a car . Behind the other two were goats . The trader is shown three closed doors. The trader chooses one of the doors. Let’s call this (without loss of generalisability) door A and mark it with a . Keeping the chosen door closed, the host reveals one of the remaining doors showing a goat (let’s call this door C). The trader chooses door and the the host reveals door C showing a goat. The host then asks the trader if they would like to stick with their first choice or switch to the other remaining one (which we’ll call door B). If the trader guesses correct they win the prize . If not they’ll be shown another goat (also referred to as a zonk). What is the probability of being Zonked? Credit: Wikipedia Should the trader stick with their original choice of door A or switch to B? Before reading further, give it a go. What would you do? Most people are likely to have a gut intuition that “it doesn’t matter” arguing that in the first instance each door had a ⅓ chance of hiding the prize, and that after the host intervention , when only two doors remain closed, the winning of the prize is 50:50. There are various ways of explaining why the coin toss intuition is incorrect. Most of these involve maths equations, or simulations. Whereas we will address these later, we’ll attempt to solve by applying Occam’s razor: A principle that states that simpler explanations are preferable to more complex ones — William of Ockham (1287–1347) To do this it is instructive to slightly redefine the problem to a large N doors instead of the original three. The Large N-Door Problem Similar to before: you have to choose one of many doors. For illustration let’s say N=100. Behind one of the doors there is the prize and behind 99 (N-1) of the rest are goats . The 100 Door Monty Hall problem before the host intervention. You choose one door and the host reveals 98 (N-2) of the other doors that have goats leaving yours and one more closed . The 100 Door Monty Hall Problem after the host intervention. Should you stick with your door or make the switch? Should you stick with your original choice or make the switch? I think you’ll agree with me that the remaining door, not chosen by you, is much more likely to conceal the prize … so you should definitely make the switch! It’s illustrative to compare both scenarios discussed so far. In the next figure we compare the post host intervention for the N=3 setup (top panel) and that of N=100 (bottom): Post intervention settings for the N=3 setup (top) and N=100 (bottom). In both cases we see two shut doors, one of which we’ve chosen. The main difference between these scenarios is that in the first we see one goat and in the second there are more than the eye would care to see (unless you shepherd for a living). Why do most people consider the first case as a “50:50” toss up and in the second it’s obvious to make the switch? We’ll soon address this question of why. First let’s put probabilities of success behind the different scenarios. What’s The Frequency, Kenneth? So far we learnt from the N=100 scenario that switching doors is obviously beneficial. Inferring for the N=3 may be a leap of faith for most. Using some basic probability arguments here we’ll quantify why it is favourable to make the switch for any number door scenario N. We start with the standard Monty Hall problem (N=3). When it starts the probability of the prize being behind each of the doors A, B and C is p=⅓. To be explicit let’s define the Y parameter to be the door with the prize , i.e, p(Y=A)= p(Y=B)=p(Y=C)=⅓. The trick to solving this problem is that once the trader’s door A has been chosen , we should pay close attention to the set of the other doors {B,C}, which has the probability of p(Y∈{B,C})=p(Y=B)+p(Y=C)=⅔. This visual may help make sense of this: By being attentive to the {B,C} the rest should follow. When the goat is revealed it is apparent that the probabilities post intervention change. Note that for ease of reading I’ll drop the Y notation, where p(Y=A) will read p(A) and p(Y∈{B,C}) will read p({B,C}). Also for completeness the full terms after the intervention should be even longer due to it being conditional, e.g, p(Y=A|Z=C), p(Y∈{B,C}|Z=C), where Z is a parameter representing the choice of the host . (In the Bayesian supplement section below I use proper notation without this shortening.) p(A) remains ⅓ p({B,C})=p(B)+p(C) remains ⅔, p(C)=0; we just learnt that the goat is behind door C, not the prize. p(B)= p({B,C})-p(C) = ⅔ For anyone with the information provided by the host (meaning the trader and the audience) this means that it isn’t a toss of a fair coin! For them the fact that p(C) became zero does not “raise all other boats” (probabilities of doors A and B), but rather p(A) remains the same and p(B) gets doubled. The bottom line is that the trader should consider p(A) = ⅓ and p(B)=⅔, hence by switching they are doubling the odds at winning! Let’s generalise to N (to make the visual simpler we’ll use N=100 again as an analogy). When we start all doors have odds of winning the prize p=1/N. After the trader chooses one door which we’ll call D₁, meaning p(Y=D₁)=1/N, we should now pay attention to the remaining set of doors {D₂, …, Dₙ} will have a chance of p(Y∈{D₂, …, Dₙ})=(N-1)/N. When the host reveals (N-2) doors {D₃, …, Dₙ} with goats (back to short notation): p(D₁) remains 1/N p({D₂, …, Dₙ})=p(D₂)+p(D₃)+… + p(Dₙ) remains (N-1)/N p(D₃)=p(D₄)= …=p(Dₙ₋₁) =p(Dₙ) = 0; we just learnt that they have goats, not the prize. p(D₂)=p({D₂, …, Dₙ}) — p(D₃) — … — p(Dₙ)=(N-1)/N The trader should now consider two door values p(D₁)=1/N and p(D₂)=(N-1)/N. Hence the odds of winning improved by a factor of N-1! In the case of N=100, this means by an odds ratio of 99! (i.e, 99% likely to win a prize when switching vs. 1% if not). The improvement of odds ratios in all scenarios between N=3 to 100 may be seen in the following graph. The thin line is the probability of winning by choosing any door prior to the intervention p(Y)=1/N. Note that it also represents the chance of winning after the intervention, if they decide to stick to their guns and not switch p(Y=D₁|Z={D₃…Dₙ}). (Here I reintroduce the more rigorous conditional form mentioned earlier.) The thick line is the probability of winning the prize after the intervention if the door is switched p(Y=D₂|Z={D₃…Dₙ})=(N-1)/N: Probability of winning as a function of N. p(Y)=p(Y=no switch|Z)=1/N is the thin line; p(Y=switch|Z)=N/(N-1) is the thick one. (By definition the sum of both lines is 1 for each N.) Perhaps the most interesting aspect of this graph (albeit also by definition) is that the N=3 case has the highest probability before the host intervention , but the lowest probability after and vice versa for N=100. Another interesting feature is the quick climb in the probability of winning for the switchers: N=3: p=67% N=4: p=75% N=5=80% The switchers curve gradually reaches an asymptote approaching at 100% whereas at N=99 it is 98.99% and at N=100 is equal to 99%. This starts to address an interesting question: Why Is Switching Obvious For Large N But Not N=3? The answer is the fact that this puzzle is slightly ambiguous. Only the highly attentive realise that by revealing the goat (and never the prize!) the host is actually conveying a lot of information that should be incorporated into one’s calculation. Later we discuss the difference of doing this calculation in one’s mind based on intuition and slowing down by putting pen to paper or coding up the problem. How much information is conveyed by the host by intervening? A hand wavy explanation is that this information may be visualised as the gap between the lines in the graph above. For N=3 we saw that the odds of winning doubled (nothing to sneeze at!), but that doesn’t register as strongly to our common sense intuition as the 99 factor as in the N=100. I have also considered describing stronger arguments from Information Theory that provide useful vocabulary to express communication of information. However, I feel that this fascinating field deserves a post of its own, which I’ve published. The main takeaway for the Monty Hall problem is that I have calculated the information gain to be a logarithmic function of the number of doors c using this formula: Information Gain due to the intervention of the host for a setup with c doors. Full details in my upcoming article. For c=3 door case, e.g, the information gain is ⅔ bits (of a maximum possible 1.58 bits). Full details are in this article on entropy. To summarise this section, we use basic probability arguments to quantify the probabilities of winning the prize showing the benefit of switching for all N door scenarios. For those interested in more formal solutions using Bayesian and Causality on the bottom I provide supplement sections. In the next three final sections we’ll discuss how this problem was accepted in the general public back in the 1990s, discuss lessons learnt and then summarise how we can apply them in real-world settings. Being Confused Is OK “No, that is impossible, it should make no difference.” — Paul Erdős If you still don’t feel comfortable with the solution of the N=3 Monty Hall problem, don’t worry you are in good company! According to Vazsonyi (1999)¹ even Paul Erdős who is considered “of the greatest experts in probability theory” was confounded until computer simulations were demonstrated to him. When the original solution by Steve Selvin (1975)² was popularised by Marilyn vos Savant in her column “Ask Marilyn” in Parade magazine in 1990 many readers wrote that Selvin and Savant were wrong³. According to Tierney’s 1991 article in the New York Times, this included about 10,000 readers, including nearly 1,000 with Ph.D degrees⁴. On a personal note, over a decade ago I was exposed to the standard N=3 problem and since then managed to forget the solution numerous times. When I learnt about the large N approach I was quite excited about how intuitive it was. I then failed to explain it to my technical manager over lunch, so this is an attempt to compensate. I still have the same day job . While researching this piece I realised that there is a lot to learn in terms of decision making in general and in particular useful for data science. Lessons Learnt From Monty Hall Problem In his book Thinking Fast and Slow, the late Daniel Kahneman, the co-creator of Behaviour Economics, suggested that we have two types of thought processes: System 1 — fast thinking : based on intuition. This helps us react fast with confidence to familiar situations. System 2 – slow thinking : based on deep thought. This helps figure out new complex situations that life throws at us. Assuming this premise, you might have noticed that in the above you were applying both. By examining the visual of N=100 doors your System 1 kicked in and you immediately knew the answer. I’m guessing that in the N=3 you were straddling between System 1 and 2. Considering that you had to stop and think a bit when going throughout the probabilities exercise it was definitely System 2 . The decision maker’s struggle between System 1 and System 2 . Generated using Gemini Imagen 3 Beyond the fast and slow thinking I feel that there are a lot of data decision making lessons that may be learnt. (1) Assessing probabilities can be counter-intuitive … or Be comfortable with shifting to deep thought We’ve clearly shown that in the N=3 case. As previously mentioned it confounded many people including prominent statisticians. Another classic example is The Birthday Paradox , which shows how we underestimate the likelihood of coincidences. In this problem most people would think that one needs a large group of people until they find a pair sharing the same birthday. It turns out that all you need is 23 to have a 50% chance. And 70 for a 99.9% chance. One of the most confusing paradoxes in the realm of data analysis is Simpson’s, which I detailed in a previous article. This is a situation where trends of a population may be reversed in its subpopulations. The common with all these paradoxes is them requiring us to get comfortable to shifting gears from System 1 fast thinking to System 2 slow . This is also the common theme for the lessons outlined below. A few more classical examples are: The Gambler’s Fallacy , Base Rate Fallacy and the The Linda [bank teller] Problem . These are beyond the scope of this article, but I highly recommend looking them up to further sharpen ways of thinking about data. (2) … especially when dealing with ambiguity or Search for clarity in ambiguity Let’s reread the problem, this time as stated in “Ask Marilyn” Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say №1, and the host, who knows what’s behind the doors, opens another door, say №3, which has a goat. He then says to you, “Do you want to pick door №2?” Is it to your advantage to switch your choice? We discussed that the most important piece of information is not made explicit. It says that the host “knows what’s behind the doors”, but not that they open a door at random, although it’s implicitly understood that the host will never open the door with the car. Many real life problems in data science involve dealing with ambiguous demands as well as in data provided by stakeholders. It is crucial for the researcher to track down any relevant piece of information that is likely to have an impact and update that into the solution. Statisticians refer to this as “belief update”. (3) With new information we should update our beliefs This is the main aspect separating the Bayesian stream of thought to the Frequentist. The Frequentist approach takes data at face value (referred to as flat priors). The Bayesian approach incorporates prior beliefs and updates it when new findings are introduced. This is especially useful when dealing with ambiguous situations. To drive this point home, let’s re-examine this figure comparing between the post intervention N=3 setups (top panel) and the N=100 one (bottom panel). Copied from above. Post intervention settings for the N=3 setup (top) and N=100 (bottom). In both cases we had a prior belief that all doors had an equal chance of winning the prize p=1/N. Once the host opened one door (N=3; or 98 doors when N=100) a lot of valuable information was revealed whereas in the case of N=100 it was much more apparent than N=3. In the Frequentist approach, however, most of this information would be ignored, as it only focuses on the two closed doors. The Frequentist conclusion, hence is a 50% chance to win the prize regardless of what else is known about the situation. Hence the Frequentist takes Paul Erdős’ “no difference” point of view, which we now know to be incorrect. This would be reasonable if all that was presented were the two doors and not the intervention and the goats. However, if that information is presented, one should shift gears into System 2 thinking and update their beliefs in the system. This is what we have done by focusing not only on the shut door, but rather consider what was learnt about the system at large. For the brave hearted , in a supplementary section below called The Bayesian Point of View I solve for the Monty Hall problem using the Bayesian formalism. (4) Be one with subjectivity The Frequentist main reservation about “going Bayes” is that — “Statistics should be objective”. The Bayesian response is — the Frequentist’s also apply a prior without realising it — a flat one. Regardless of the Bayesian/Frequentist debate, as researchers we try our best to be as objective as possible in every step of the analysis. That said, it is inevitable that subjective decisions are made throughout. E.g, in a skewed distribution should one quote the mean or median? It highly depends on the context and hence a subjective decision needs to be made. The responsibility of the analyst is to provide justification for their choices first to convince themselves and then their stakeholders. (5) When confused — look for a useful analogy … but tread with caution We saw that by going from the N=3 setup to the N=100 the solution was apparent. This is a trick scientists frequently use — if the problem appears at first a bit too confusing/overwhelming, break it down and try to find a useful analogy. It is probably not a perfect comparison, but going from the N=3 setup to N=100 is like examining a picture from up close and zooming out to see the big picture. Think of having only a puzzle piece and then glancing at the jigsaw photo on the box. Monty Hall in 1976. Credit: Wikipedia and using Visual Paradigm Online for the puzzle effect Note: whereas analogies may be powerful, one should do so with caution, not to oversimplify. Physicists refer to this situation as the spherical cow method, where models may oversimplify complex phenomena. I admit that even with years of experience in applied statistics at times I still get confused at which method to apply. A large part of my thought process is identifying analogies to known solved problems. Sometimes after making progress in a direction I will realise that my assumptions were wrong and seek a new direction. I used to quip with colleagues that they shouldn’t trust me before my third attempt … (6) Simulations are powerful but not always necessary It’s interesting to learn that Paul Erdős and other mathematicians were convinced only after seeing simulations of the problem. I am two-minded about usage of simulations when it comes to problem solving. On the one hand simulations are powerful tools to analyse complex and intractable problems. Especially in real life data in which one wants a grasp not only of the underlying formulation, but also stochasticity. And here is the big BUT — if a problem can be analytically solved like the Monty Hall one, simulations as fun as they may be (such as the MythBusters have done⁶), may not be necessary. According to Occam’s razor, all that is required is a brief intuition to explain the phenomena. This is what I attempted to do here by applying common sense and some basic probability reasoning. For those who enjoy deep dives I provide below supplementary sections with two methods for analytical solutions — one using Bayesian statistics and another using Causality. [Update] After publishing the first version of this article there was a comment that Savant’s solution³ may be simpler than those presented here. I revisited her communications and agreed that it should be added. In the process I realised three more lessons may be learnt. (7) A well designed visual goes a long way Continuing the principle of Occam’s razor, Savant explained³ quite convincingly in my opinion: You should switch. The first door has a 1/3 chance of winning, but the second door has a 2/3 chance. Here’s a good way to visualize what happened. Suppose there are a million doors, and you pick door #1. Then the host, who knows what’s behind the doors and will always avoid the one with the prize, opens them all except door #777,777. You’d switch to that door pretty fast, wouldn’t you? Hence she provided an abstract visual for the readers. I attempted to do the same with the 100 doors figures. Marilyn vos Savant who popularised the Monty Hall Problem. Credit: Ben David on Flickr under license As mentioned many readers, and especially with backgrounds in maths and statistics, still weren’t convinced. She revised³ with another mental image: The benefits of switching are readily proven by playing through the six games that exhaust all the possibilities. For the first three games, you choose #1 and “switch” each time, for the second three games, you choose #1 and “stay” each time, and the host always opens a loser. Here are the results. She added a table with all the scenarios. I took some artistic liberty and created the following figure. As indicated, the top batch are the scenarios in which the trader switches and the bottom when they switch. Lines in green are games which the trader wins, and in red when they get zonked. The symbolised the door chosen by the trader and Monte Hall then chooses a different door that has a goat behind it. Adaptation of Savant’s table³ of six scenarios that shows the solution to the Monty Hall Problem We clearly see from this diagram that the switcher has a ⅔ chance of winning and those that stay only ⅓. This is yet another elegant visualisation that clearly explains the non intuitive. It strengthens the claim that there is no real need for simulations in this case because all they would be doing is rerunning these six scenarios. One more popular solution is decision tree illustrations. You can find these in the Wikipedia page, but I find it’s a bit redundant to Savant’s table. The fact that we can solve this problem in so many ways yields another lesson: (8) There are many ways to skin a … problem Of the many lessons that I have learnt from the writings of late Richard Feynman, one of the best physics and ideas communicators, is that a problem can be solved many ways. Mathematicians and Physicists do this all the time. A relevant quote that paraphrases Occam’s razor: If you can’t explain it simply, you don’t understand it well enough — attributed to Albert Einstein And finally (9) Embrace ignorance and be humble ‍ “You are utterly incorrect … How many irate mathematicians are needed to get you to change your mind?” — Ph.D from Georgetown University “May I suggest that you obtain and refer to a standard textbook on probability before you try to answer a question of this type again?” — Ph.D from University of Florida “You’re in error, but Albert Einstein earned a dearer place in the hearts of people after he admitted his errors.” — Ph.D. from University of Michigan Ouch! These are some of the said responses from mathematicians to the Parade article. Such unnecessary viciousness. You can check the reference³ to see the writer’s names and other like it. To whet your appetite: “You blew it, and you blew it big!”, , “You made a mistake, but look at the positive side. If all those Ph.D.’s were wrong, the country would be in some very serious trouble.”, “I am in shock that after being corrected by at least three mathematicians, you still do not see your mistake.”. And as expected from the 1990s perhaps the most embarrassing one was from a resident of Oregon: “Maybe women look at math problems differently than men.” These make me cringe and be embarrassed to be associated by gender and Ph.D. title with these graduates and professors. Hopefully in the 2020s most people are more humble about their ignorance. Yuval Noah Harari discusses the fact that the Scientific Revolution of Galileo Galilei et al. was not due to knowledge but rather admittance of ignorance. “The great discovery that launched the Scientific Revolution was the discovery that humans do not know the answers to their most important questions” — Yuval Noah Harari Fortunately for mathematicians’ image, there were also quiet a lot of more enlightened comments. I like this one from one Seth Kalson, Ph.D. of MIT: You are indeed correct. My colleagues at work had a ball with this problem, and I dare say that most of them, including me at first, thought you were wrong! We’ll summarise by examining how, and if, the Monty Hall problem may be applied in real-world settings, so you can try to relate to projects that you are working on. Application in Real World Settings Researching for this article I found that beyond artificial setups for entertainment⁶ ⁷ there aren’t practical settings for this problem to use as an analogy. Of course, I may be wrong⁸ and would be glad to hear if you know of one. One way of assessing the viability of an analogy is using arguments from causality which provides vocabulary that cannot be expressed with standard statistics. In a previous post I discussed the fact that the story behind the data is as important as the data itself. In particular Causal Graph Models visualise the story behind the data, which we will use as a framework for a reasonable analogy. For the Monty Hall problem we can build a Causal Graph Model like this: Reading: The door chosen by the trader is independent from that with the prize and vice versa. As important, there is no common cause between them that might generate a spurious correlation. The host’s choice depends on both and . By comparing causal graphs of two systems one can get a sense for how analogous both are. A perfect analogy would require more details, but this is beyond the scope of this article. Briefly, one would want to ensure similar functions between the parameters (referred to as the Structural Causal Model; for details see in the supplementary section below called The Causal Point of View). Those interested in learning further details about using Causal Graphs Models to assess causality in real world problems may be interested in this article. Anecdotally it is also worth mentioning that on Let’s Make a Deal, Monty himself has admitted years later to be playing mind games with the contestants and did not always follow the rules, e.g, not always doing the intervention as “it all depends on his mood”⁴. In our setup we assumed perfect conditions, i.e., a host that does not skew from the script and/or play on the trader’s emotions. Taking this into consideration would require updating the Graphical Model above, which is beyond the scope of this article. Some might be disheartened to realise at this stage of the post that there might not be real world applications for this problem. I argue that lessons learnt from the Monty Hall problem definitely are. Just to summarise them again: (1) Assessing probabilities can be counter intuitive …(Be comfortable with shifting to deep thought ) (2) … especially when dealing with ambiguity(Search for clarity ) (3) With new information we should update our beliefs (4) Be one with subjectivity (5) When confused — look for a useful analogy … but tread with caution (6) Simulations are powerful but not always necessary (7) A well designed visual goes a long way (8) There are many ways to skin a … problem (9) Embrace ignorance and be humble ‍ While the Monty Hall Problem might seem like a simple puzzle, it offers valuable insights into decision-making, particularly for data scientists. The problem highlights the importance of going beyond intuition and embracing a more analytical, data-driven approach. By understanding the principles of Bayesian thinking and updating our beliefs based on new information, we can make more informed decisions in many aspects of our lives, including data science. The Monty Hall Problem serves as a reminder that even seemingly straightforward scenarios can contain hidden complexities and that by carefully examining available information, we can uncover hidden truths and make better decisions. At the bottom of the article I provide a list of resources that I found useful to learn about this topic. Credit: Wikipedia Loved this post? Join me on LinkedIn or Buy me a coffee! Credits Unless otherwise noted, all images were created by the author. Many thanks to Jim Parr, Will Reynolds, and Betty Kazin for their useful comments. In the following supplementary sections I derive solutions to the Monty Hall’s problem from two perspectives: Bayesian Causal Both are motivated by questions in textbook: Causal Inference in Statistics A Primer by Judea Pearl, Madelyn Glymour, and Nicholas P. Jewell (2016). Supplement 1: The Bayesian Point of View This section assumes a basic understanding of Bayes’ Theorem, in particular being comfortable conditional probabilities. In other words if this makes sense: We set out to use Bayes’ theorem to prove that switching doors improves chances in the N=3 Monty Hall Problem. (Problem 1.3.3 of the Primer textbook.) We define X — the chosen door Y— the door with the prize Z — the door opened by the host Labelling the doors as A, B and C, without loss of generality, we need to solve for: Using Bayes’ theorem we equate the left side as and the right one as: Most components are equal (remember that P(Y=A)=P(Y=B)=⅓ so we are left to prove: In the case where Y=B (the prize is behind door B ), the host has only one choice (can only select door C ), making P(X=A, Z=C|Y=B)= 1. In the case where Y=A (the prize is behind door A ), the host has two choices (doors B and C ) , making P(X=A, Z=C|Y=A)= 1/2. From here: Quod erat demonstrandum. Note: if the “host choices” arguments didn’t make sense look at the table below showing this explicitly. You will want to compare entries {X=A, Y=B, Z=C} and {X=A, Y=A, Z=C}. Supplement 2: The Causal Point of View The section assumes a basic understanding of Directed Acyclic Graphs (DAGs) and Structural Causal Models (SCMs) is useful, but not required. In brief: DAGs qualitatively visualise the causal relationships between the parameter nodes. SCMs quantitatively express the formula relationships between the parameters. Given the DAG we are going to define the SCM that corresponds to the classic N=3 Monty Hall problem and use it to describe the joint distribution of all variables. We later will generically expand to N. (Inspired by problem 1.5.4 of the Primer textbook as well as its brief mention of the N door problem.) We define X — the chosen door Y — the door with the prize Z — the door opened by the host According to the DAG we see that according to the chain rule: The SCM is defined by exogenous variables U , endogenous variables V, and the functions between them F: U = {X,Y}, V={Z}, F= {f(Z)} where X, Y and Z have door values: D = {A, B, C} The host choice is f(Z) defined as: In order to generalise to N doors, the DAG remains the same, but the SCM requires to update D to be a set of N doors Dᵢ: {D₁, D₂, … Dₙ}. Exploring Example Scenarios To gain an intuition for this SCM, let’s examine 6 examples of 27 (=3³) : When X=Y (i.e., the prize is behind the chosen door ) P(Z=A|X=A, Y=A) = 0; cannot choose the participant’s door P(Z=B|X=A, Y=A) = 1/2; is behind → chooses B at 50% P(Z=C|X=A, Y=A) = 1/2; is behind → chooses C at 50%(complementary to the above) When X≠Y (i.e., the prize is not behind the chosen door ) P(Z=A|X=A, Y=B) = 0; cannot choose the participant’s door P(Z=B|X=A, Y=B) = 0; cannot choose prize door P(Z=C|X=A, Y=B) = 1; has not choice in the matter(complementary to the above) Calculating Joint Probabilities Using logic let’s code up all 27 possibilities in python df = pd.DataFrame({"X": (["A"] * 9) + (["B"] * 9) + (["C"] * 9), "Y": ((["A"] * 3) + (["B"] * 3) + (["C"] * 3) )* 3, "Z": ["A", "B", "C"] * 9}) df["P(Z|X,Y)"] = None p_x = 1./3 p_y = 1./3 df.loc[df.query("X == Y == Z").index, "P(Z|X,Y)"] = 0 df.loc[df.query("X == Y != Z").index, "P(Z|X,Y)"] = 0.5 df.loc[df.query("X != Y == Z").index, "P(Z|X,Y)"] = 0 df.loc[df.query("Z == X != Y").index, "P(Z|X,Y)"] = 0 df.loc[df.query("X != Y").query("Z != Y").query("Z != X").index, "P(Z|X,Y)"] = 1 df["P(X, Y, Z)"] = df["P(Z|X,Y)"] * p_x * p_y print(f"Testing normalisation of P(X,Y,Z) {df['P(X, Y, Z)'].sum()}") df yields Resources This Quora discussion by Joshua Engel helped me shape a few aspects of this article. Causal Inference in Statistics A Primer / Pearl, Glymour & Jewell (2016) — excellent short text book (site) I also very much enjoy Tim Harford’s podcast Cautionary Tales. He wrote about this topic on November 3rd 2017 for the Financial Times: Monty Hall and the game show stick-or-switch conundrum Footnotes ¹ Vazsonyi, Andrew (December 1998 — January 1999). “Which Door Has the Cadillac?” (PDF). Decision Line: 17–19. Archived from the original (PDF) on 13 April 2014. Retrieved 16 October 2012. ² Steve Selvin to the American Statistician in 1975.[1][2] ³Game Show Problem by Marilyn vos Savant’s “Ask Marilyn” in marilynvossavant.com (web archive): “This material in this article was originally published in PARADE magazine in 1990 and 1991” ⁴Tierney, John (21 July 1991). “Behind Monty Hall’s Doors: Puzzle, Debate and Answer?”. The New York Times. Retrieved 18 January 2008. ⁵ Kahneman, D. (2011). Thinking, fast and slow. Farrar, Straus and Giroux. ⁶ MythBusters Episode 177 “Pick a Door” (Wikipedia) Watch Mythbuster’s approach ⁶Monty Hall Problem on Survivor Season 41 (LinkedIn, YouTube) Watch Survivor’s take on the problem ⁷ Jingyi Jessica Li (2024) How the Monty Hall problem is similar to the false discovery rate in high-throughput data analysis.Whereas the author points about “similarities” between hypothesis testing and the Monty Hall problem, I think that this is a bit misleading. The author is correct that both problems change by the order in which processes are done, but that is part of Bayesian statistics in general, not limited to the Monty Hall problem. The post 🚪🚪🐐 Lessons in Decision Making from the Monty Hall Problem appeared first on Towards Data Science.

0 Yorumlar ·0 hisse senetleri ·0 önizleme

Please log in to like, share and comment!

Upgrade to Pro