UXDESIGN.CC
Creating quantitative personas using latent class analysis
How the person-oriented approach facilitates the creation of statistical personas.Photo by Craig Whitehead onUnsplashHave you ever wondered if theres a better way to understand your users beyond simple survey metrics such as averages and medians? In my previous article, I discussed the person-oriented approach and compared it to the variable-oriented approach, describing how the person-oriented approach sees users as whole, unique entities, while the variable-oriented approach breaks users into parts and misses the big picture. Here, I will explain how much of a difference the person-oriented approach can make in your analyses as a UX researcher and how it will help you create data-driven personas. To do this, I will use artificial data generated by ChatGPT-4o from a survey Icreated.To better understand the user base and create quantitative personas, a UX researcher usually conducts surveys aimed at gathering insights into users behavior, habits, and past experiences. In this article, we assume the role of a UX researcher at a startup developing a product related to books, and we are interested in the users reading habits. To find out more about these habits, we have run a survey containing the following questions:Hypothetical survey questions. (Diagram created by theauthor)Imagine you have gathered 1000 responses for this short survey (the following visualizations are based on the data generated via Chat-GPT 4o, you can find the dataset here). How would you report theresults?Taking the routine, variable-oriented approach, you would answer these questions likethis:Users Preferred Reading Medium (Graph created by theauthor)Users Frequent Reading Conditions (Graph created by theauthor)Users Reading Frequency (Graph created by theauthor)While such data are informative, they often lack meaningful connections between different responses. For instance, you may know how many users prefer reading audiobooks, but it does not reveal how this preference correlates with other aspects, such as reading frequency or context. In the variable-oriented approach, these relationships are analyzed separately using correlations. For example, you may find that individuals who read in the mornings are more likely to prefer audiobooks. Although these correlations can be insightful, they fail to provide a comprehensive picture of the holistic identity of yourusers.To address this limitation, you may include demographic questions in your survey, such as age, gender, education level, or income, to provide more context and depth. However, demographics alone are insufficient to understand users mindsets or predict their future behaviors.To truly understand who the users are, we need a deeper analysis. This is where the person-oriented approach becomes invaluable.The use of person-oriented analysis to achieve deeper userinsightsIn the person-oriented approach, rather than analyzing each survey question independently, the goal is to understand participants as a whole. This involves identifying clusters of users with similar behaviors. To accomplish this, you can employ Latent Class Analysis(LCA).The meaning of latent in latent classanalysisBefore exploring how the person-oriented approach utilizes Latent Class Analysis (LCA), its essential to understand the term latent. In this context, latent refers to something that exists but is not immediately visible or directly measurable. LCA identifies these hidden variablesunderlying patterns or traits that go beyond the observable data, such as responses to survey questions. This method allows researchers to uncover and interpret these unseen factors that shape observable behaviors, classifying users based on these deeper, often unmeasured, characteristics.The person-oriented approach builds on this foundation by enhancing your analysis in three keyways:Discovering your participant groupshelping you identify distinct groups within your userbase.Revealing the unobservablesuncovering hidden patterns that typical survey metricsmiss.Adding dimensionality to the dataenabling a richer, more nuanced view of users behaviors and motivations.In the following sections, we will explore each of these aspects in depth and illustrate how they come to life in our example researchproject.1. Discovering your participant groupsParticipants who would select audiobooks in response to our survey question. (Illustrations generated by DALLE and arranged by the author in thediagram)In this illustration, we observe three distinct participants who have chosen audiobooks as their preferred reading medium. While all of them share this preference, their behaviors and preferences differ significantly. These differences become clear when we analyze their responses across all survey questions. For example, a participant who listens to audiobooks only a few times a month during their commute contrasts sharply with someone who listens daily everymorning.Rather than analyzing each survey question separately, this approach examines each participants entire set of responses across all questions. For example, a participant might indicate they listen to audiobooks while commuting a few times a month. These complete sets of responses are then classified using Latent Class Analysis (LCA), allowing us to group participants based on shared characteristics.By applying LCA to our mock dataset, we identified two distinct participant groups, known as LatentClasses:Group 1: The steadyscholarsThe first group identified through Latent Class Analysis. (Illustrations generated by DALLE and arranged by the author in thediagram)Group 2: The spontaneous explorersThe second group identified through Latent Class Analysis. (Illustrations generated by DALLE and arranged by the author in thediagram)In the charts above, you see the probabilities with which each group of users answered our questions. As shown, these two groups provided notably different responses. This insight allows us to take the next step: Identifying the underlying variables or characteristics that distinguish these groups from oneanother.2. Revealing the unobservablesLatent Class Analysis (LCA) aims to infer unobservable variables from observable ones. In this study, the observable variables are:Conditions in which users readbooksUsers readingmediumsUsers reading frequencyLCA enables us to go beyond these surface-level variables, linking them together to create a more comprehensive picture that adds depth to our understanding of user behavior.To identify the unobservable variables and interpret these groups, we need to examine the patterns in their responses. The Steady Scholars (group 1), for example, show a strong preference for physical booksa more conservative choice. They also tend to read daily, suggesting a propensity for maintaining routines. This groups second-most likely choice is reading a few times a week, and their selected reading times are regular and rhythmic, indicating a set routine. Overall, these patterns imply that the steady scholars may be conscientious, routine-oriented, and possibly more conservative in theirhabits.In contrast, The Spontaneous Explorers (group 2) lean toward more modern reading mediums, unlike the more traditional preferences seen in The Steady Scholars. They also show little regularity in reading frequency and reading conditions, suggesting a preference for novelty and spontaneity. This pattern implies a group of individuals who may be more novelty-seeking and less likely to adhere to strict routines, showing a lower level of conscientiousness compared to the steady scholars.In summary, these interpretations reveal two key factors differentiating the groups: openness to new experiences and conscientiousness. These two factors, which we might call our latent variables, represent the deeper traits underlying the observed behaviors. Interpreting these latent variables, however, requires a strong understanding of psychological theories of personality to draw meaningful conclusions.The process of deducing unobservable and latent variables from observable data for the steady scholars (group 1). (Illustrations generated by DALLE and arranged by the author in thediagram)The process of deducing unobservable and latent variables from observable data for the spontaneous explorers (group 2). (Illustrations generated by DALLE and arranged by the author in thediagram)Taking a look at what we have done here, we understand that the flow of interpretation, coming up with classes, and identifying the latent variables is as shownbelow:Diagram illustrating the process of discovering latent variables. (Diagram created by theauthor)Finding the unobservable variablesIdentifying unobservable variables requires a solid theoretical foundation, often found in personality psychology. Because these participant groups are likely to differ qualitatively, their traits are assumed to be rooted in stable personality characteristics rather than temporary states. Personality psychology provides scientifically grounded theories to guide this analysis, focusing on enduring traits that can distinguish betweengroups.Once you have identified potential personality traits that correspond to the latent classes youve found, you can generate various hypotheses to explore further. In our example, we inferred that openness and conscientiousness might underlie the observed behaviors in each group. With these assumptions, we can hypothesize additional characteristics and behaviors that may be associated with eachgroup:Additional traits of users with high openness to experience:Early adoption of new features orservicesHigher frequency ofbrowsingTendency to explore a variety ofgenresAdditional traits of users with high conscientiousness:Greater loyalty to theplatformMore frequentusageHigher likelihood of engaging with triggered notificationsIt is important to recognize, however, that not all behavioral differences stem from personality traits; environmental and social contexts can also shape user behaviors and should be considered in the analysis.3. Adding dimensionality to thedataEach dataset we work with can be thought of as having a dimensionality, especially when visualized. Consider binary data, for example, where responses to yes/no questions could be represented by a single dot that appears when the answer is yes and doesnt appear for no. This type of data is essentially 0-dimensional, as it contains only presence or absence. Lets revisit one of the variable-oriented results displayed earlier:Users Preferred Reading Medium (Graph created by theauthor)The data from this question can be viewed as 0-dimensional. It is composed of four binary questions (e.g., Do you usually read physical books?), and each participants response can be represented by a single dot. With a sample of 1,000 responses, we have a set of 0-dimensional data pointseach dot representing a participants answer to these yes/no questions.Illustration of user responses represented in a 0-dimensional space. (Graph created by theauthor)In this figure, each dot represents a participants response in a 0-dimensional space.By contrast, ordinal datasuch as responses on a Likert scale ranging from very bad to very goodhave a 1-dimensional nature because they map along a single line between two extremes. For instance, in our survey question How frequently do you read books? responses form a 1-dimensional dataset, representing a continuum from Daily toNever.Users Reading Frequency (Graph created by theauthor)Mapping users reading frequency onto a line in a 1-dimensional space. (Graph created by theauthor)These examples capture the dimensions typically used in the variable-oriented approach. In the person-oriented approach, however, the number of dimensions may increase with the number of survey questions, as each questions response is viewed as anaxis.In our 3-question survey example, for instance, the person-oriented approach sees a participants responses as coordinates in a 3-dimensional space, where each axis represents one survey question.A 3D space illustrating how survey questions contribute to the dimensionality of data in the person-oriented approach. (Axes derived from the colourbox)In this view, the data can span across as many dimensions as there are survey questions. But the story doesnt end here. When adopting the person-oriented approach, we assume that latent or hidden variables influence participants responses. Latent Class Analysis enables us to identify and interpret these underlying variables, representing participants placement in a space defined by the latent variables discovered.The space defined by latent variables, where dimensionality increases with the number of detected latent variables. (Graph created by theauthor)To deepen our understanding, lets turn back to our example of book readers. We previously identified three users who had selected audiobooks as their preferred readingmedium.Three distinct respondents who selected audiobooks as their preferred reading medium. (Diagram created by theauthor)Their responses can be visualized as coordinates on a 3-dimensional graph, with each dot representing one participant:Observed Variables: In the person-oriented approach, each survey participant is represented as a dot in an x-dimensional space, where x corresponds to the number of survey questions. (Graph created by theauthor)In the person-oriented approach, our participants are initially mapped in a 3-dimensional space based on their observed responses, as we had three survey questionsobserved variables. However, this is only the starting point. The X-dimensional space formed by observed responses can be refined into a simpler, more insightful space defined by latent (unobservable) variables. In our hypothetical analysis, we identified two such variablesopenness to new experiences and conscientiousness, both key personality factors.In this new, higher-level space, we no longer map individual participants; instead, we map classes or groups of participants identified through LCA. With two identified latent variables, our space becomes 2-dimensional, as illustrated below.Mapping user groups in an x-dimensional space, where x corresponds to the number of detected latent variables. (Graph created by theauthor)This approach offers a richer, more dimensional insight into user behaviors, helping us build a more comprehensive understanding of the user base and their unique characteristics.Why these analysesmatterGaining a deeper understanding of our users allows us to better predict their behavior when introducing new features, even when we are unsure how they might interact with them. As UX researchers, we typically avoid asking future-oriented questions, as such questions often fail to accurately reflect what users will do in the future. This limitation hinders our ability to reliably forecast user behavior.However, by leveraging the deep insights outlined in this article and understanding how users are segmented based on their personality traits, we can enhance our ability to predict their actions, decisions, and emotions when faced with new features or products.This is not how real-world data usuallylooksIn real-world datasets, user data seldom falls into such neat categories. Instead, distributions typically follow normal or exponential patterns, with group differences emerging as subtle shifts within these distributions. This makes LCA particularly valuable in real-world applications, where it excels at detecting anomalies and uncovering hidden structures within complexdata.Final thoughtsThis exercise highlights just how powerful Latent Class Analysis can be in user research. By combining a structured dataseteven an artificially generated onewith a method that goes beneath surface-level data, were able to reveal deeper patterns and traits that might otherwise go unnoticed. In a perfect world, real-world data would offer such clear divisions, but part of the value in LCA lies precisely in its ability to navigate and make sense of the messiness inherent in real data. As researchers, our goal isnt just to classify users but to understand the complex motivations and characteristics that drive their behavior. LCA provides us a unique lens for this purpose, pushing our understanding of users beyond broad demographics into the realm of nuanced, psychology-backed insights. This journey with LCA is just the beginningtheres always more to uncover beneath thesurface.Creating quantitative personas using latent class analysis was originally published in UX Collective on Medium, where people are continuing the conversation by highlighting and responding to this story.
0 Comments 0 Shares 3 Views