0 Comentários
0 Compartilhamentos
42 Visualizações
Diretório
Diretório
-
Faça Login para curtir, compartilhar e comentar!
-
VENTUREBEAT.COM$42.1 million poured into startup offering energy-efficient solutions for costly and unwieldy operational data and AI workloadsThe funding infusion sharpens a mission to make hyperscale analytics radically cheaper and greener at the very moment enterprises fear ballooning data‑center power bills. Read More0 Comentários 0 Compartilhamentos 46 Visualizações
-
VENTUREBEAT.COMBigQuery is 5x bigger than Snowflake and Databricks: What Google is doing to make it even betterGoogle ramps up the competition in the enterprise data space claiming its AI innovation help it to leapfrog rivals.Read More0 Comentários 0 Compartilhamentos 45 Visualizações
-
VENTUREBEAT.COMThis AI startup just raised $7.5m to fix commercial insurance for America’s 24m underprotected small businessesNew York AI startup 1Fort secures $7.5M in funding to streamline commercial insurance for small businesses with its broker-focused platform that cuts paperwork from hours to minutes.Read More0 Comentários 0 Compartilhamentos 44 Visualizações
-
TOWARDSDATASCIENCE.COMExplained: How Does L1 Regularization Perform Feature Selection?Feature Selection is the process of selecting an optimal subset of features from a given set of features; an optimal feature subset is the one which maximizes the performance of the model on the given task. Feature selection can be a manual or rather explicit process when performed with filter or wrapper methods. In these methods, features are added or removed iteratively based on the value of a fixed measure, which quantifies the relevance of the feature in the making the prediction. The measures could be information gain, variance or the chi-squared statistic, and the algorithm would make a decision to accept/reject the feature considering a fixed threshold on the measure. Note, these methods are not a part of the model training stage and are performed prior to it. Embedded methods perform feature selection implicitly, without using any pre-defined selection criteria and deriving it from the training data itself. This intrinsic feature selection process is a part of the model training stage. The model learns to select features and make relevant predictions at the same time. In later sections, we will describe the role of regularization in performing this intrinsic feature selection. Regularization and Model Complexity Regularization is the process of penalizing the complexity of the model to avoid overfitting and achieve generalization over the task. Here, the complexity of the model is analogous to its power to adapt to the patterns in the training data. Assuming a simple polynomial model in ‘x’ with degree ‘d’, as we increase the degree ‘d’ of the polynomial, the model achieves greater flexibility to capture patterns in the observed data. Overfitting and Underfitting If we are trying to fit a polynomial model with d = 2 on a set of training samples which were derived from a cubic polynomial with some noise, the model will not be able to capture the distribution of the samples to a sufficient extent. The model simply lacks the flexibility or complexity to model the data generated from a degree 3 (or higher order) polynomials. Such a model is said to under-fit on the training data. Working on the same example, assume we now have a model with d = 6. Now with increased complexity, it should be easy for the model to estimate the original cubic polynomial that was used to generate the data (like setting the coefficients of all terms with exponent > 3 to 0). If the training process is not terminated at the right time, the model will continue to utilize its additional flexibility to reduce the error within further and start capturing in the noisy samples too. This will reduce the training error significantly, but the model now overfits the training data. The noise will change in real-world settings (or in the test phase) and any knowledge based on predicting them will disrupt, leading to high test error. How to determine the optimal model complexity? In practical settings, we have little-to-no understanding of the data-generation process or the true distribution of the data. Finding the optimal model with the right complexity, such that no under-fitting or overfitting occurs is a challenge. One technique could be to start with a sufficiently powerful model and then reduce its complexity by means of feature selection. Lesser the features, lesser is the complexity of the model. As discussed in the previous section, feature selection can be explicit (filter, wrapper methods) or implicit. Redundant features that have insignificant relevance in the determining the value of the response variable should be eliminated to avoid the model learning uncorrelated patterns in them. Regularization, also performs a similar task. So, how are regularization and feature selection connected to attain a common goal of optimal model complexity? L1 Regularization As A Feature Selector Continuing with our polynomial model, we represent it as a function f, with inputs x, parameters θ and degree d, (Image by author) For a polynomial model, each power of the input x_i can be considered as a feature, forming a vector of the form, (Image by author) We also define an objective function, which on minimizing leads us to the optimal parameters θ* and includes a regularization term penalizing the complexity of the model. (Image by author) To determine the minima of this function, we need to analyze all of its critical points i.e. points where the derivation is zero or undefined. The partial derivative w.r.t. one the parameters, θj, can be written as, (Image by author) where the function sgn is defined as, (Image by author) Note: The derivative of the absolute function is different from the sgn function defined above. The original derivative is undefined at x = 0. We augment the definition to remove the inflection point at x = 0 and to make the function differentiable across its entire domain. Moreover, such augmented functions are also used by ML frameworks when the underlying computation involves the absolute function. Check this thread on the PyTorch forum. By computing the partial derivative of the objective function w.r.t. a single parameter θj, and setting it to zero, we can build an equation that relates the optimal value of θj with the predictions, targets, and features. (Image by author) (Image by author) Let us examine the equation above. If we assume that the inputs and targets were centered about the mean (i.e. the data had been standardized in the preprocessing step), the term on the LHS effectively represents the covariance between the jth feature and the difference between the predicted and target values. Statistical covariance between two variables quantifies how much one variable influences the value of the second variable (and vice-versa) The sign function on the RHS forces the covariance on the LHS to assume only three values (as the sign function only returns -1, 0 and 1). If the jth feature is redundant and does not influence the predictions, the covariance will be nearly zero, bringing the corresponding parameter θj* to zero. This results in the feature being eliminated from the model. Imagine the sign function as a canyon carved by a river. You can walk in the canyon (i.e. the river bed) but to get out of it, you have these huge barriers or steep slopes. L1 regularization induces a similar ‘thresholding’ effect for the gradient of the loss function. The gradient must be powerful enough to break the barriers or become zero, which eventually brings the parameter to zero. For a more grounded example, consider a dataset that contains samples derived from a straight line (parameterized by two coefficients) with some added noise. The optimal model should have no more than two parameters, else it will adapt to the noise present in the data (with the added freedom/power to the polynomial). Changing the parameters of the higher powers in the polynomial model does not affect the difference between the targets and the model’s predictions, thus reducing their covariance with the feature. During the training process, a constant step gets added/subtracted from the gradient of the loss function. If the gradient of the loss function (MSE) is smaller than the constant step, the parameter will eventually reach to a value of 0. Observe the equation below, depicting how parameters are updated with gradient descent, (Image by author) (Image by author) If the blue part above is smaller than λα, which itself is a very small number, Δθj is the nearly a constant step λα. The sign of this step (red part) depends on sgn(θj), whose output depends on θj. If θj is positive i.e. greater than ε, sgn(θj) equals 1, hence making Δθj approx. equal to –λα pushing it towards zero. To suppress the constant step (red part) that makes the parameter zero, the gradient of the loss function (blue part) has to be larger than the step size. For a larger loss function gradient, the value of the feature must affect the output of the model significantly. This is how a feature is eliminated, or more precisely, its corresponding parameter, whose value does not correlate with the output of the model, is zero-ed by L1 regularization during the training. Further Reading And Conclusion To get more insights on the topic, I have posted a question on r/MachineLearning subreddit and the resulting thread contains different explanations that you may want to read. Madiyar Aitbayev also has an interesting blog covering the same question, but with a geometrical explanation. Brian Keng’s blog explains regularization from a probabilistic perspective. This thread on CrossValidated explains why L1 norm encourages sparse models. A detailed blog by Mukul Ranjan explains why L1 norm encourages the parameters to become zero and not the L2 norm. “L1 regularization performs feature selection” is a simple statement that most ML learners agree with, without diving deep into how it works internally. This blog is an attempt to bring my understanding and mental-model to the readers in order to answer the question in an intuitive manner. For suggestions and doubts, you can find my email at my website. Keep learning and have a nice day ahead! The post Explained: How Does L1 Regularization Perform Feature Selection? appeared first on Towards Data Science.0 Comentários 0 Compartilhamentos 44 Visualizações
-
WWW.GAMESPOT.COMLord Of The Rings: Fellowship & Two Towers Steelbook Preorders Are Live At AmazonThe Lord of the Rings: The Fellowship of the Ring 4K Steelbook $28.80 (was $35) | Releases May 27 Preorder at Amazon Preorder at Walmart Preorder at Gruv The Lord of the Rings: The Two Towers 4K Steelbook $28 (was $35) | Releases May 27 Preorder at Amazon Preorder at Walmart Preorder at Gruv The Lord of the Rings Trilogy Limited Edition Steelbook Collection (4K Blu-ray) $121.65 (was $175) See at Walmart See at Amazon The Hobbit Trilogy Limited Edition Steelbook Collection (4K Blu-ray) $160 (was $175) See at Amazon Limited Edition Steelbooks for Lord of the Rings: The Fellowship of the Ring and The Two Towers are now available to preorder at Amazon ahead of their May 27 release. This is notable because Fellowship was originally printed as a Walmart exclusive in January. With the upcoming reprint, which is set to release alongside The Two Towers, Walmart's exclusivity deal has ended.That said, Walmart is offering discounts on preorders for both that drops the price of Fellowship of the Ring to $28.80 (was $35) and The Two Towers to $28 (was $35). Amazon currently has both beloved Peter Jackson films listed for full price. It is worth noting that Amazon's prices are often a bit higher when preorders open before discounts are applied to match Walmart. Both retailers offer preorder price guarantees, and you won't be charged until your preorders ship.The Lord of the Rings Trilogy Steelbook Collection is also back in stock at Walmart and discounted to only $121.65 (was $175). You can pair the box set edition with a matching collection of Jackson's Hobbit Trilogy for $170.Continue Reading at GameSpot0 Comentários 0 Compartilhamentos 38 Visualizações
-
WWW.GAMESPOT.COM8BitDo Ultimate 2 Bluetooth Controller Is Discounted One Week After Launch8BitDo Ultimate 2 Bluetooth Controller (Nintendo Switch + PC) $63 (was $70) with 10% coupon See at Amazon 8BitDo Ultimate 2 Wireless Controller (PC, Mac, Android) $60 See at Amazon 8BitDo's new Ultimate 2 Bluetooth Controller for Nintendo Switch and PC is on sale for $63 (was $70) when you click the 10% off coupon box on Amazon's store page. The Ultimate 2 Bluetooth just launched on April 15, so it's pretty cool to see a deal on it already. The Bluetooth version arrived less than two months after the release of the 2.4GHz Wireless model for PC, which you can also save 10% on right now. The Ultimate 2 is 8BitDo's best and most advanced flagship controller yet thanks to a wealth of customization options and premium components.Notably, the Ultimate 2 should work on the upcoming Nintendo Switch 2 via Bluetooth and possibly with the 2.4GHz dongle, too. So if you plan on getting the Switch 2 when console and accessory preorders open this week, you may want to compare the new official Switch 2 Pro Controller to 8BitDo's Ultimate 2. Nintendo's controller adds two remappable back buttons, but the Ultimate 2 has four remappable inputs, upgraded analog sticks, trigger locks, and a bunch of internal customization features.8BitDo Ultimate 2 Bluetooth Controller (Switch, PC)White -- $63 ($70) | Click coupon boxBlack -- $708BitDo Ultimate 2 Controller (PC, Android)Black-- $60 | Save 10%White -- $60 | Save 10%Purple -- $60 | Save 10%The Ultimate 2 is 8BitDo's most feature-rich controller yet. We've tested the PC and Switch editions and view them as meaningful upgrades over their predecessors. The original Ultimate was already our top pick in our best PC controllers list, so the Ultimate 2 will take the top spot from the original in our next update. Meanwhile, the Ultimate 2 Bluetooth will replace the original as GameSpot's pick for the best third-party Switch controller, and we're eager to see how it stacks up with the Switch 2 Pro Controller.Continue Reading at GameSpot0 Comentários 0 Compartilhamentos 36 Visualizações
-
GAMERANT.COMBest Ray Gun Mark 2 Upgrades in Shattered VeilThe Ray Gun Mark 2 has 3 distinctive upgrade variants in Black Ops 6’s Shattered Veil map, and each of these variants offers players a unique way to shoot the undead, filled with different ammo types and fire-rates that will offer an exciting experience, but which of those experiences is the best for reaching high rounds?0 Comentários 0 Compartilhamentos 38 Visualizações
-
GAMERANT.COMDestiny 2: The Palindrome God Roll GuideThe Palindrome is an iconic hand cannon in Destiny 2 that returned in a new form with Episode Heresy. The reprised hand cannon is now Arc instead of Void and has a couple of great new perks to try out in both PvE and PvP activities.0 Comentários 0 Compartilhamentos 44 Visualizações
-
WWW.POLYGON.COMWhere to pre-order Fallout season 1 on 4K Blu-rayWe still don’t know when the second season of the amazing Fallout series will premiere on Prime Video, but if you’d like to revisit the wasteland and its cast of memorable misfits, the first season of Fallout is coming to 4K Blu-ray and is currently available to pre-order from Amazon for $39.99. While there isn’t currently a launch date for this physical collector’s edition, we’ll update this page as soon as we have more information. Fallout is still available for streaming, but the 4K Blu-ray version lets you watch the series ad-free without paying Prime’s $3 premium to ditch the commercial breaks. The three-disc Steelbook features some spectacular cover art and comes packaged with a collection of six lithograph prints of the cast. In addition to the eight-episode run of the series, the Collector’s Edition Steelbook also includes a dozen featurettes detailing the immense effort that went into bringing the wasteland to life, ranging from makeup and sound design to how the writers adapted iconic elements from the Fallout franchise.0 Comentários 0 Compartilhamentos 49 Visualizações