Part 02 Module 01
towardsai.net
Author(s): Sudeep Originally published on Towards AI. Now lets learn about Correlation. Correlation is a statistical measure that indicates the strength and direction of a relationship between two variables.In a dataset, the values can show:Positive Correlation: As one variable increases, the other variable also increases.Negative Correlation: As one variable increases, the other variable decreases.No Correlation: There is no apparent relationship between the two variables.Made with WhimsicalCovarianceCovariance is a statistical measure that indicates the degree to which two random variables change together. It helps describe the relationship between the variables and can be:Positive: When both variables tend to increase or decrease together.Negative: When one variable increases as the other decreases.Zero: When there is no linear relationship between the variables.Where:x_i, and y_i are the individual data points of variables X and Y.x y are the mean values of X and Y.n is the total number of observations.Note: If the formula uses 1/m, where m=n1, this is referred to as the Bessels Correction, which is used to calculate the sample covariance to avoid bias when estimating population covariance.Limitations of CovarianceOne issue with covariance is that its value is not normalized. If we increase the range of x or y, the covariance will also increase, making it difficult to interpret the strength of the relationship on its own.Solution: NormalizationTo address this, we normalize the covariance by dividing it by the product of the standard deviations of x and y. This gives us the correlation coefficient, which is a dimensionless measure that ranges between -1 and 1Where:_x and _y are the standard deviations of x and y, respectively.The correlation coefficient makes it easier to compare relationships across different datasets, regardless of their scales.This is know as Pearsons Correlation and it is represented as r.Properties of Pearsons Correlation:r = 1: Perfect positive correlation (variables increase together).r = 1: Perfect negative correlation (one variable increases as the other decreases).r = 0: No linear correlation between the variables.Still the Problem is not YET SOLVED.In the above graph, we notice some outliers (the dots in the 4th quadrant). These outliers can significantly influence the calculation of the Pearson correlation coefficient r, leading to incorrect results. For instance, the presence of these outliers might cause r to be less than 0, even when the relationship between the majority of the data points suggests otherwise.Solution:To handle this issue, we need to introduce Rank Correlation, which is more robust to outliers. Rank correlation methods, such as Spearmans Rank Correlation Coefficient, rely on the relative ranking of data points rather than their actual values. This makes them less sensitive to extreme values, ensuring a more accurate representation of the underlying relationship.Spearmans Rank Correlation Coefficient Equation:The equation for Spearmans rank correlation coefficient is:Where: = Spearmans rank correlation coefficientd_i = the difference in ranks for each pair of valuesn = the number of data pointsThis formula computes the rank-based correlation. Now, notice that Spearmans rank correlation is essentially a Pearson correlation of the ranked values, rather than the raw data. This makes it a useful method when the relationship between variables is not linear or when there are outliers in the data.To know more about how this works in numerical problems check out this: https://www.geeksforgeeks.org/spearmans-rank-correlation/Random ExperimentA random experiment is any activity or process that we perform where the result cannot be predicted exactly beforehand. However, we know all the possible outcomes, and each outcome has a chance of happening. Even if we do the same experiment multiple times under the same conditions, the result might be different each time.Example: Marks of Students in a ClassImagine youre conducting a random experiment where you ask each student in a class to pick a random number between 1 and 100, representing their mock test score. This is what happens:The process (experiment): Asking students to pick a number.Uncertain result: You cannot predict what number each student will pick. One student might choose 85, another 60, another 45, etc.Possible outcomes: The numbers they can choose are between 1 and 100 (all the possible test scores).Chance or probability: Each number has a probability of being chosen. For example, if students pick completely randomly, all numbers have an equal chance.Even though you repeated the process (experiment) of asking students under the same conditions, the numbers they pick can vary each time. This makes it a random experiment.Summary: Understanding Correlation, Covariance, and Random ExperimentsIn this article, we explored the foundational concepts of correlation, covariance, and random experiments:Correlation:Measures the strength and direction of the relationship between two variables.Types:Positive: Variables increase together.Negative: One variable increases as the other decreases.No Correlation: No apparent relationship.Covariance:Describes how two variables change together.Types:Positive: Variables increase or decrease together.Negative: One variable increases as the other decreases.Zero: No relationship.Limitation: Covariance is not normalized, making it difficult to interpret.Normalization via Pearsons Correlation:Dividing covariance by the product of the standard deviations of the variables gives the correlation coefficient (rrr).Properties:r=1r = 1r=1: Perfect positive correlation.r=1r = -1r=1: Perfect negative correlation.r=0r = 0r=0: No linear correlation.Addressing Outliers with Rank Correlation:Outliers can distort Pearsons correlation.Solution: Spearmans Rank Correlation computes correlation based on the relative ranks of data points, making it robust to outliers.Random Experiments:Processes with uncertain outcomes but known possibilities.Example: Students picking random numbers as mock test scores.In the next article, well delve deeper into random experiments, exploring the estimation of populations, its types, and more statistical insights. Stay tuned Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AI
0 Комментарии ·0 Поделились ·47 Просмотры