• WWW.THEVERGE.COM
    Duolingo is getting a chess course
    Duolingo will soon be able to help you learn another new skill: chess. The platform’s new chess course will use short puzzles, such as moving a knight to the correct spot to practice its L-shaped movement or checkmating a king that’s stuck in a corner, to help teach you the game. You’ll also be able to play “mini-matches” or full games against Duolingo’s Oscar character, who serves as your coach through the lessons. Duolingo gave me a brief demo of the chess course on a recent video call, and it looks like it could be a lot of fun. The team’s approach was to “make chess as accessible as possible,” Duolingo group product manager Edwin Bodge tells The Verge, and the course will offer lessons for a wide range of skill levels so that people can jump in at whatever level of familiarity they already have for chess. Chess joins topics like music and math that aren’t language-based but can still be practiced via Duolingo. I asked if the company plans to build courses around other games, and staff software engineer Sammi Siegel says that “while we don’t have any plans to share right now, I think we’re just seeing how this launch goes and then we’ll go from there.” The chess course is launching in beta testing on iOS in English, according to Monica Earle, Duolingo’s PR director. Earle estimates that in four to six weeks, more people on iOS will be able to try it, and then it will head to Android and be available in other languages down the line.
    0 Commentaires 0 Parts 38 Vue
  • WWW.MARKTECHPOST.COM
    LLMs Can Now Retain High Accuracy at 2-Bit Precision: Researchers from UNC Chapel Hill Introduce TACQ, a Task-Aware Quantization Approach that Preserves Critical Weight Circuits for Compression Without Performance Loss
    LLMs show impressive capabilities across numerous applications, yet they face challenges due to computational demands and memory requirements. This challenge is acute in scenarios requiring local deployment for privacy concerns, such as processing sensitive patient records, or compute-constrained environments like real-time customer service systems and edge devices. Post-training quantization (PTQ) is a promising solution that allows efficient compression of pre-trained models, reducing memory consumption by 2-4 times. However, current processes have a bottleneck at 4-bit compression, with substantial performance degradation when attempting 2- or 3-bit precision. Most PTQ methods rely on small mini-batches of general-purpose pre-training data to account for activation changes resulting from quantization. Current methods for LLM compression primarily fall into three categories. Uniform quantization represents the most basic approach, where weights stored as 16-bit float tensors are compressed by treating each row independently, mapping floats to integers based on maximum and minimum values within each channel. GPTQ-based quantization techniques advance this concept by focusing on layerwise reconstruction, aiming to minimize reconstruction loss after quantization. Further, Mixed-precision quantization methods offer a more nuanced strategy, moving beyond fixed precision for all weights. These techniques assign bit-width based on weight importance to maintain performance, with some approaches preserving high-sensitivity “outlier” weights at higher precision. Researchers from UNC Chapel Hill have proposed a novel mixed-precision post-training quantization approach called TaskCircuit Quantization (TACQ). The method shows similarities to automated circuit discovery by directly conditioning the quantization process on specific weight circuits, defined as sets of weights associated with downstream task performance. TACQ compares unquantized model weights with uniformly quantized ones to estimate expected weight changes from quantization, then uses gradient information to predict impacts on task performance, enabling preservation of task-specific weights. TACQ consistently outperforms baselines with the same calibration data and lower weight budgets, and achieves significant improvements in the challenging 2-bit and 3-bit regimes. TACQ is defined by a saliency metric that identifies critical weights to preserve during quantization, building on concepts from model interpretability like automatic circuit discovery, knowledge localization, and input attribution. This metric uses two components: Quantization-aware Localization (QAL): Trace how model performance is affected by estimating expected weight changes due to quantization. Magnitude-sharpened Gradient (MSG): A generalized metric for absolute weight importance adapted from input attribution techniques. MSG helps stabilize TACQ and addresses biases from QAL’s estimations. These factors combine into a unified saliency metric that can be efficiently evaluated for every weight in a single backward pass, allowing preservation of the top p% highest-scoring weights at 16-bit precision. In the challenging 2-bit setting, TACQ outperforms SliM-LLM with absolute margin improvements of 16.0% (from 20.1% to 36.1%) on GSM8k, 14.1% (from 34.8% to 49.2%) on MMLU, and 21.9% (from 0% to 21.9%) on Spider. Other baseline methods like GPTQ, SqueezeLLM, and SPQR deteriorate to near-random performance at this compression level. At 3-bit precision, TACQ preserves approximately 91%, 96%, and 89% of the unquantized accuracy on GSM8k, MMLU, and Spider, respectively, while outperforming the strongest baseline, SliM-LLM, by 1-2% across most datasets. TACQ’s advantages become evident in generation tasks requiring sequential token outputs, where it is the only method capable of recovering non-negligible performance in the 2-bit setting for the Spider text-to-SQL task. In conclusion, researchers introduced TACQ, a significant advancement in task-aware post-training quantization. It improves model performance at ultra-low bit-widths (2- to 3-bits) where previous methods degrade to near-random outputs. TACQ aligns with automatic circuit discovery research by selectively preserving only a small fraction of salient weights at 16-bit precision, indicating that sparse weight “circuits” disproportionately influence specific tasks. Moreover, experiments on Spider show that TACQ better preserves model generation capabilities, making it suitable for program-prediction tasks. This also applies to situations involving agents, where models frequently generate many executable outputs, and where efficiency is a concern. Check out the Paper and GitHub Page. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 90k+ ML SubReddit. Sajjad AnsariSajjad Ansari is a final year undergraduate from IIT Kharagpur. As a Tech enthusiast, he delves into the practical applications of AI with a focus on understanding the impact of AI technologies and their real-world implications. He aims to articulate complex AI concepts in a clear and accessible manner.Sajjad Ansarihttps://www.marktechpost.com/author/sajjadansari/ReTool: A Tool-Augmented Reinforcement Learning Framework for Optimizing LLM Reasoning with Computational ToolsSajjad Ansarihttps://www.marktechpost.com/author/sajjadansari/Fourier Neural Operators Just Got a Turbo Boost: Researchers from UC Riverside Introduce TurboFNO, a Fully Fused FFT-GEMM-iFFT Kernel Achieving Up to 150% Speedup over PyTorchSajjad Ansarihttps://www.marktechpost.com/author/sajjadansari/Model Compression Without Compromise: Loop-Residual Neural Networks Show Comparable Results to Larger GPT-2 Variants Using Iterative RefinementSajjad Ansarihttps://www.marktechpost.com/author/sajjadansari/Underdamped Diffusion Samplers Outperform Traditional Methods: Researchers from Karlsruhe Institute of Technology, NVIDIA, and Zuse Institute Berlin Introduce a New Framework for Efficient Sampling from Complex Distributions with Degenerate Noise
    0 Commentaires 0 Parts 38 Vue
  • TOWARDSAI.NET
    Building GPT From First Principles: Code and Intuition
    Author(s): Akhil Shekkari Originally published on Towards AI. Figure — 0 The main goal of this blog post would be to Understand each component inside GPT with Intuition and be able to Code it in Plain PyTorch. Please have a look at the below two Figures. Our Implementation will heavily follow the Figure-1. 2. I will be taking lots of Ideas and concepts from Figure – 2 (taken from Anthropic’s paper on https://transformercircuits.pub/2021/framework/index.html ). we will use this Figure for our Intuition and Understanding.Figure — 1 Figure — 2 For Every component, I will go over the theory required. This is important because we have to understand why that particular component / concept is used. Next I will go over coding part. Lets look at all the Individual components of a Transformer: 1. Residual Stream(Also Known as Skip Connections) 2. Embedding Matrix3. Layer Normalization4. Positional Encoding5. Self Attention Mechanism(Causal Masking)6. Multi — Layer Perceptron7. UnEmbedding Matrix Before looking at Residual Stream, It is always good to approach concepts with an example in mind. One of the main reason I came across when people say they find it difficult to code is, the problem with input output dimensions. Before and After every Transformation, we should know how the vector changes and its dimension changes. Let the example sentence be “Messi is the greatest of all time”. For this example, there are 7 tokens(1 word = 1 token for simplicity). let us take 1 token can be represented in 50 dimensions. we call this d_model. Batch size is usually the no. of examples we feed to model at given point of time. since we are working with demo example and we have only one example, let us consider a batch_size of 1.Let us Assume max length of any sentence in our dataset is less than or equal to 10. we call this seq_len. Let the total no. of tokens in our vocabulary is 5000. we call this d_vocab. So the Configurations of our toy example is: d_model = 50 d_vocab = 5000Note: The above config is for toy example. In our Code, we will be working with Actual GPT level Configs. (See Below) Let’s define our Config Note: There are lot oh hyper parameters, which you haven’t seen. But don’t worry, we will cover all of them in later parts of the blogfrom dataclasses import dataclass## lets define all the parameters of our model@dataclassclass Config: d_model: int = 768 debug: bool = True layer_norm_eps: float = 1e-5 d_vocab: int = 50257 init_range: float = 0.02 n_ctx: int = 1024 d_head: int = 64 d_mlp: int = 3072 n_heads: int = 12 n_layers: int = 12cfg = Config()print(cfg) Note that @dataclass is simplifying a lot of stuff for us. With @dataclass we get constructor and a clean output representation when we want to print the parameters of that class. No Need of huge boilerplate code. Without that, we would have to write the same class like this. class Config: def __init__(self, d_model=768, d_vocab=50257): self.d_model = d_model self.d_vocab = d_vocab def __repr__(self): return f"Config(d_model={self.d_model}, d_vocab={self.d_vocab})" Some Common Implementation details for all the Components: 1. For Every Component , we define a class. 2. Every Class needs to subclass nn.Module. This is important for many reasons like storing model parameters, using helper functions etc., You can read more about this at https://pytorch.org/tutorials/beginner/nn_tutorial.html3. super().__init__() makes sure the constructor of nn.Module gets called. https://www.geeksforgeeks.org/python-super-with-__init__-method/4. We then pass config obj to that class, to set values for our parameters as required.What is Embedding matrix ? This is just the plain lookup table. You look for an embedding vector of a particular token. Questions to ask before coding: Q. What is the Input for Embedding Matrix ? A. Int[Tensor, ‘batch position’] Here [batch, position] represent dimensions. position refers to token postion.Q. What is the Output for Embedding Matrix ? A. Float[Tensor, ‘batch seq_len d_model’]Returns corresponding embedding vectors in the above shape.class Embed(nn.Module): def __init__(self, cfg: Config): super().__init__() self.cfg = cfg self.W_E = nn.Parameter(t.empty((cfg.d_vocab, cfg.d_model))) nn.init.normal_(self.W_E, std = self.cfg.init_range) def forward(self, tokens: Int[Tensor, 'batch position']) -> Float[Tensor, 'batch seq_len d_model']: return self.W_E[tokens] Every trainable parameter in a neural network needs to be tracked and updated based on gradients. PyTorch simplifies this with nn.Parameter(). Any tensor wrapped in nn.Parameter is automatically registered as a trainable weight.nn.init.normal_ fills the tensor with values drawn from a normal distribution (a.k.a. Gaussian), in-place. Our Embedding matrix will be of the shape (d_vocab, d_model). Intuitively, we can read it as, for every token the matrix row will represent its corresponding embedding vector. What is Positional Embedding ? This can also be thought of as a lookup table. Here Instead of token Ids, we have numbers/positions.Positional Embedding Refers to a learned vector assigned to each position (like a token embedding).Think of this as model learns that certain positions have certain tokens and relationships between them which is useful for attention tasks downstream. Small Clarification: In the original paper “Attention is all you Need”, authors came up with Positional Encoding. It’s not learned it’s a fixed function (based on sine and cosine) you add to the input embeddings. In our GPT, we use Positional Embedding. More Intuition: For the example"Akhil plays football."Positional embeddings evolve such that: pos[0] → helps identify "Akhil" as the subjectpos[1] → contributes to verb detectionpos[2] → contributes to object predictionQuestions to ask before coding: Q. What is the Input for Positional Embedding ? A. Int[Tensor, ‘batch position’] Here [batch, position] represent dimensions. position refers to token postion.Q. What is the Output for Positional Embedding ? A. Float[Tensor, ‘batch seq_len d_model’]Returns corresponding embedding vectors in the above shape.class PosEmbed(nn.Module): def __init__(self, cfg: Config): super().__init__() self.cfg = cfg self.W_pos = nn.Parameter(t.empty((cfg.n_ctx, cfg.d_model))) nn.init.normal_(self.W_pos, std=self.cfg.init_range) def forward(self, tokens: Int[Tensor, "batch position"]) -> Float[Tensor, "batch position d_model"]: batch, seq_len = tokens.shape return einops.repeat(self.W_pos[:seq_len], "seq d_model -> batch seq d_model", batch=batch) Here n_ctx is the context length of our model. That means at any given time, we will have atmost of n_ctx number token to be positioned. In the forward pass, we slice out the relevant position vectors from our learned embedding matrix, and repeat them across the batch. This gives us a tensor of shape [batch, seq_len, d_model], so each token gets a learnable embedding for its position — which we can then add to the token embeddings. What is a Residual Stream ? The straight path from Embed to UnEmbed from Figure — 2. You can kind of think of this as a central part in a Transformer. Information inside this stream flows forward. By forward, I mean from Embedding stage to UnEmbedding stage. The tokens will be represented with their corresponsing embeddings via the Embedding Table. These Embeddings then enter the Residual Stream. We represent the example Messi is the greatest of all time, inside the residual stream in the following dimensions. [batch_size, seq_len, d_model] ==> [1, 10, 50] (since each token is 50 dimensional vector, and we have 7 tokens. Here we pad the remaining 3 tokens with zeros to maintain dimensions.) Next Steps In general, Input gets sent to LayerNorm Attention Heads, Read Information from this Residual Stream. Attention heads are responsible for moving information within tokens, based on Attention Matrix. (More on this in Attention Section) MLP does explicit read and write operations(new vectors) onto this Residual stream. They can also delete information from Residual Stream. (Will explain more on this in later sections) What is Layer Normalization ? Fundamental reason why we do normalization is to keep the data flowing nicely through gradients without gradients vanishing or exploding. Figure — 5 From Figure — 5 , we can see two hyper parameters. These are Gamma(scaling factor) and beta(shifting factor). We make the values inside our embedding vector in Normalized form. E[x] is mean. Then we allow the model a little bit of room as training progresses for the purpose of Scaling and Shifting. we can see small epsilon in order to avoid division by zero error. Questions to Ask: Q. What does Layer Norm take as input ? A. It takes the residual after attention. [Batch posn d_model]Q. What does it return ? A. It just normalizes existing values on the embedding vector. Doesn’t add anything new. So returns normalized values.Note: dim = -1 denotes perform operations on the last dimension. Here last dimension is d_model. So, we take mean and varience along the embedding vector of each token independently. ### LayerNorm Implementationclass LayerNorm(nn.Module): def __init__(self, cfg: Config ): ### has the x input vector super().__init__() self.cfg = cfg self.w = nn.Parameter(t.ones(cfg.d_model)) ## these are gamma and beta self.b = nn.Parameter(t.zeros(cfg.d_model)) ## learnable def forward(self, residual: Float[Tensor, 'batch posn d_model']) -> Float[Tensor, 'batch posn d_model']: residual_mean = residual.mean(dim = -1, keepdim=True) residual_std = (residual.var(dim = -1, keepdim=True, unbiased=False)+ cfg.layer_norm_eps).sqrt() residual = (residual - residual_mean) / residual_std residual = residual * self.w + self.b return residual Multi Head Attention: Okay. Let’s think in simple terms first. Before talking about Multiple Attention Heads, let us understand what happens in a Single Attention Head. Questions to Ask: Q. What does an Attention Head get as an Input ? A. The Attention head reads what is present in residual stream. i.e., Float [batch seq_len d_model]. From our toy examples, this might be one of the examples like “Messi is the greatest of all time”Q. After the completion if Self Attention Process(from the attention block), what does the output look like ? A. Float[Tensor, ‘batch seq_len d_model’]. The output is still the same example, but there is a lot of information movement. Let’s go through that in detail.Information Movement:(Intuition) Let's take two tokens from above. (for our convenience, we represent each token in 4 dimensions.)Below is the state of embedding vectors before entering the attention block. Messi→ [0.1 0.9 2.3 7.1]greatest → [2.1 4.4 0.6 1.8] Once these tokens enter into Attention block, tokens start to attend to the tokens came before in order to include more context and change its representation. This is called Causal Self Attention. Messi is the greatest of all time. In this example, when greatest wants to encode some context inside of it, it can only use the context from the words Messi, is and the. From these words, the representation of greatest changes. After Attention,Messi→ [0.1 0.9 2.3 7.1]greatest → [0.2 1.1 0.6 1.8] (changed representation) what does that mean ? Look at the greatest vector. Now It represents some “Messi” inside of it. This is kinda like while constructing the embedding vector for “greatest” it is referring to a bit of Messi. This is what is information Movement. But still, we want to know how this process exactly happens. Let me introduce few Matrices which are important in this process. In the literature, these are named as Queries, Keys and Values. Q = Input * Wq K = Input * WkV = Input * WvHere Input is our example “Messi is the greatest of all time”. The Idea of Q, K and V are to do linear transformation of Input into a different space where these Inputs are represented in more meaningful way. Let's see the dimensions of these matrix multiplications on our toy example. Input/redisual = [1 10 50] [batch seq d_model] “Wq” matrix dimension depends on how many heads we want to have in our model. This is a very important statement because, if we decide to have only one attention head, then we can have Wq = [n_head, d_model, d_model] ==> [1, 50,50]If we decided to have n_heads, then the dimension will be [d_model/n_heads]. We represent this quantity as d_head. So, if we want to have 5 heads, The dimensions of Wq will be [n_head, d_model, d_head] ==> [5, 50, 10]. Lets say we want to have 5 heads, then Q = [1 10 50] * [5, 50, 10] ==>[batch seq d_model] * [n_head d_model d_head] ==> [1 10 5 10] [Batch seq_len n_head d_head] The extra dimension in the beginning is batch. For Q, K and V we will clearly see how all of this fits together in a diagram. The same applies for K and V matrices. First let’ s talk about K. K = [1 10 50] * [5 50 10] ==> [1 10 5 10] Attention is calculated by multiplying the Matrices Q and K. Remember, Attention matrix will always be a Square Matrix. Please look at the diagram I made. I tried to communicate what do those dimensions even mean. Look at the left part. [Batch seq d_head d_model] Figure — 3 I took two example sentences. 1. I good 2. You badIn Left representation, for one batch we are having two examples. Of those two examples, we have 2 seq tokens per example. For each token, we have all the heads which is like full d_model dimension. we are not computing attention per head. But, we want it such that for every batch , and for every head we want those tokens to be represented by different Attention heads parallely. Right side of the representation helps in that. That is the reason while computing attention, we permute the shapes. (hope this helps!!!) Note: Don’t worry all of this transformation can be done very intuitively through einsum. you will see this in coding. Now that we have understood, how attention is computed, let’s get back to our Messi example. Earlier we talked about how “greatest” would attend to “Messi”. we get a [10,10] matrix of all words of our toy example attending to all the other words. Here After getting the attention matrix, we apply causal masking to prevent words attending to future words. “greatest” cannot attend to “time”. After that, we apply Softmax on Attention Matrix. Softmax gives us a score that would sum to 1 along that row. For the word greatest, it would tell how much percent it should attend to “Messi”, how much to “is” and How much to “the” and itself.I took some other example from google to make things visually simple. You can easily connect this with our Messi example. Figure — 4 Once this is done, next step is to Multiply this matrix with our Value vector. As I discussed above, value vector will also be nothing but linear transformation of Input to another Space.V = Input * Wv ==> [1 10 50] * [5 50 10] ==> [1 10 5 10] Z = V * A==> [batch seq_len n_head d_head] * [batch n_head q_seq k_seq] ==> [1 10 5 10] * [1 5 10 10] ==> [1 10 5 10] [batch seq_len n_head d_head] Again, once you look at einsum code, this is self explanatory. Z is the output from 1 head. We stack all these outputs of [1 10 5 10] horizontally. There are 5 heads. so the result becomes [1 10 5 50]. The concatenation of all these heads is then multiplied with one final Output Matrix(Wo) which can be intuitively thought of as learning to represent how to combine all these outputs from different heads. (Z from all the heads) * Wo [1 10 5 50] * [5 10 50][n_head d_head d_model]==> [1 10 50] This is how information is moved in between tokens. I know there are a lot of dimensions here, but this is the core part. once you grab gist of it, everything looks straighforward. Now the information is moved inside the Residual Stream. Look at the code for Implementing Attention below. There are bias initializations which are self explanatory. Note: I use “posn” and “seq_len” interchangeably. They are the same. Implementation details: Regarding implementing causal mask is tril and triu functions in PyTorch. please look at them as they are straightforward. Register buffer is the process of creating temperory parameters that doesnt require gradient tracking. They give nice functionality of moving between CPU and GPU if we register them with PyTorch provided buffer. class Attention(nn.Module): ### register your buffer here IGNORE: Float[Tensor, ''] def __init__(self, cfg: Config): super().__init__() self.cfg = cfg self.W_Q = nn.Parameter(t.empty((cfg.n_heads, cfg.d_model, cfg.d_head))) self.W_K = nn.Parameter(t.empty((cfg.n_heads, cfg.d_model, cfg.d_head))) self.W_V = nn.Parameter(t.empty((cfg.n_heads, cfg.d_model, cfg.d_head))) self.W_O = nn.Parameter(t.empty((cfg.n_heads, cfg.d_head, cfg.d_model))) self.b_Q = nn.Parameter(t.zeros((cfg.n_heads, cfg.d_head))) self.b_K = nn.Parameter(t.zeros((cfg.n_heads, cfg.d_head))) self.b_V = nn.Parameter(t.zeros((cfg.n_heads, cfg.d_head))) self.b_O = nn.Parameter(t.zeros((cfg.d_model))) nn.init.normal_(self.W_Q, std=self.cfg.init_range) nn.init.normal_(self.W_K, std=self.cfg.init_range) nn.init.normal_(self.W_V, std=self.cfg.init_range) nn.init.normal_(self.W_O, std=self.cfg.init_range) self.register_buffer('IGNORE', torch.tensor(float('-inf'), dtype=torch.float32, device = device)) # mention device also def forward(self, normalized_resid_pre: Float[Tensor, 'batch pos d_model']) -> Float[Tensor, 'batch pos d_model']: ### calculate query, key and value vectors and go according to the formula q = ( einops.einsum( normalized_resid_pre, self.W_Q, "batch posn d_model, nheads d_model d_head -> batch posn nheads d_head" ) + self.b_Q ) k = ( einops.einsum( normalized_resid_pre, self.W_K, "batch posn d_model, nheads d_model d_head -> batch posn nheads d_head" ) + self.b_K ) v = ( einops.einsum( normalized_resid_pre, self.W_V, "batch posn d_model, nheads d_model d_head -> batch posn nheads d_head" ) + self.b_V ) attn_scores = einops.einsum( q, k, "batch posn_Q nheads d_head, batch posn_K nheads d_head -> batch nheads posn_Q posn_K" ) attn_scores_masked = self.apply_causal_mask(attn_scores / self.cfg.d_head**0.5) attn_pattern = attn_scores_masked.softmax(-1) # Take weighted sum of value vectors, according to attention probabilities z = einops.einsum( v, attn_pattern, "batch posn_K nheads d_head, batch nheads posn_Q posn_K -> batch posn_Q nheads d_head" ) # Calculate output (by applying matrix W_O and summing over heads, then adding bias b_O) attn_out = ( einops.einsum(z, self.W_O, "batch posn_Q nheads d_head, nheads d_head d_model -> batch posn_Q d_model") + self.b_O ) return attn_out def apply_causal_mask() -> Float[Tensor, self, attn_scores: Float[Tensor, "batch n_heads query_pos key_pos"] "batch n_heads query_pos key_pos"]: """ Applies a causal mask to attention scores, and returns masked scores. """ # Define a mask that is True for all positions we want to set probabilities to zero for all_ones = t.ones(attn_scores.size(-2), attn_scores.size(-1), device=attn_scores.device) mask = t.triu(all_ones, diagonal=1).bool() # Apply the mask to attention scores, then return the masked scores attn_scores.masked_fill_(mask, self.IGNORE) return attn_scores Imp takeaway What information we copy depends on the source token’s residual stream, but this doesn’t mean it only depends on the value of that token, because the residual stream can store more information than just the token identity (the purpose of the attention heads is to move information between vectors at different positions in the residual stream).What does that mean ? Messi is the greatest of all timeSo when greatest is referring/Attending back to Messi, it doesn’t just see the value Messi. Residual stream stores much more than just the identity. It refers to things likeMessi is a subject.Messi is a person etc., All of this is stored in the residual stream. Now Input goes into MLP. Multi — layer Perceptron (MLP Layer) This is very important layer. 2/3rds of Model’s parameters are MLPs. These are responsible for Non — Linear Transformation of given Input vectors. The main Intuition of this layer is to form rich projections. To store facts.There is a very Intuitive video made by 3 blue 1 brown about this. It’s a must watch. https://www.youtube.com/watch?v=9-Jl0dxWQs8&t=498s Intuition You can loosely think of the MLP as working like a Key → Value function, where: Input = “Key” (what token currently holds in residual stream)Output = “Value” (what features we want to add to the residual stream) For Example Key = token’s current context vector coming from the residual stream. It Represents the meaning of the token so far (including attention context) Value = non-linear mix of learned featuresCould be:1. “This is a named entity”2. “This clause is negated”3. “A question is being asked”4. “Boost strength-related features”5. “Trigger next layer’s copy circuit”So the MLP says: “Oh you’re a token that’s the subject of a sentence AND you were just negated? Cool. Let me output features relevant to that situation.” Hope you got the intuition. The first hidden layer has 3072 neurons. we call this as d_mlp and have declared it in our config. Also the 2nd hidden layer projects these back to d_model space. These have been shown as W_in and W_out in the code. We use GeLU Non linearity.class MLP(nn.Module): def __init__(self, cfg: Config): super().__init__() self.cfg = cfg self.W_in = nn.Parameter(t.empty(cfg.d_model, cfg.d_mlp)) self.b_in = nn.Parameter(t.zeros(cfg.d_mlp)) self.W_out = nn.Parameter(t.empty(cfg.d_mlp, cfg.d_model)) self.b_out = nn.Parameter(t.zeros(cfg.d_model)) nn.init.normal_(self.W_in, std=self.cfg.init_range) nn.init.normal_(self.W_out, std=self.cfg.init_range) def forward(self, normalized_resid_mid: Float[Tensor, 'batch posn d_model']): ## Its going to do per token level matmul pre = einops.einsum(normalized_resid_mid, self.W_in, 'batch posn d_model,d_model d_mlp->batch posn d_mlp') + self.b_in post = gelu_new(pre) mlp_out = einops.einsum(pre, self.W_out, 'batch posn d_mlp, d_mlp d_model->batch posn d_model') + self.b_out return mlp_out With this, we completed one layer of what we call Transformer Block. There are 12 such layers in GPT-2. Also there are 12 attention heads in GPT that we are implementing. Therefore n_heads = 12 and n_layers = 12. These have already been coded in the config. Our GPT model contains (d_model) 768 dimensions and a vocabulary(d_vocab) of over 50257 tokens. So this Transformer block is repeated 12 times. Code for TransformerBlock is just connecting ( LayerNorm + Attention + MLP & Skip Connections). class TransformerBlock(nn.Module): def __init__(self, cfg: Config): super().__init__() self.cfg = cfg self.ln1 = LayerNorm(cfg) self.attn = Attention(cfg) self.ln2 = LayerNorm(cfg) self.mlp = MLP(cfg) def forward(self, resid_pre: Float[Tensor, 'batch posn d_model']) -> Float[Tensor, 'batch posn d_model']: resid_mid = self.attn(self.ln1(resid_pre)) + resid_pre ### skip connection resid_post = self.mlp(self.ln2(resid_mid)) + resid_mid return resid_post Here skip connections are nothing but adding input directly into the Residual Stream along with Attention and MLP. resid_pre says the residual before normalization, which is raw input. resid mid is the residual after attention and it again gets added. This is done inorder to stabilize training for large amount of time. UnEmbed UnEmbed Matrix is when you want to map the learned representations back to the probability over all the tokens in vocab. Questions to Ask: Q. What input does it take? A. Residual Stream token vector. [batch posn d_model]Q. What does it give out? A. It gives out probability of tokens that are likely given current token. i.e.,a matrix of size [batch posn d_vocab]look at logits for how precisely it is calculated. class UnEmbed(nn.Module): def __init__(self,cfg:Config): super().__init__() self.cfg = cfg self.W_U = nn.Parameter(t.empty(cfg.d_model, cfg.d_vocab)) nn.init.normal_(self.W_U, std=self.cfg.init_range) self.b_U = nn.Parameter(t.zeros((cfg.d_vocab), requires_grad=False)) def forward(self, normalized_resid_final: Float[Tensor, 'batch posn d_model']) -> Float[Tensor, 'batch pos d_vocab']: logits = einops.einsum(normalized_resid_final, self.W_U, 'batch posn d_model, d_model d_vocab -> batch posn d_vocab') + self.b_U return logits Transformer Finally we arrive at the last part. Here, we just need to put all the components we have seen together. Let’s do that !!! class Transformer(nn.Module): def __init__(self, cfg: Config): super().__init__() self.cfg = cfg self.embed = Embed(cfg) self.posembed = PosEmbed(cfg) self.blocks = nn.ModuleList([TransformerBlock(cfg) for _ in range(cfg.n_layers)]) self.ln_final = LayerNorm(cfg) self.unembed = UnEmbed(cfg) def forward(self, tokens: Int[Tensor, 'posn']) -> Float[Tensor, 'batch posn d_vocab']: residual = self.embed(tokens) + self.posembed(tokens) for block in self.blocks: residual = block(residual) logits = self.unembed(self.ln_final(residual)) return logits Here we go from taking tokens as input to calling residual/transformer blocks for 12 times. Implementation detail: Since all the Transformer blocks have their own parameters to be tracked, we need to define them in ModuleList. This is proper way of Initializing a list of blocks we need.Each block will take input from Residual Stream, will learn and contribute their learnings to Residual Stream. Thats it Guys!!!!! Hope you have gained a ton of Knowledge on how to build your own GPT. Support and Follow me for more cool blogs! Thanks to Neel Nanda and Callum McDougall !!!! I have learnt a lot from their Materials and Videos. This blog is inspired from their work. Connect with Me on: https://www.linkedin.com/in/akhilshekkari/ Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI
    0 Commentaires 0 Parts 36 Vue
  • WWW.IGN.COM
    The New Google Streamer 4K Has Dropped in Price for the First Time Ever
    The Google Streamer 4K is on sale for the first time ever. Right now you can pick one up for just $79.99 on Amazon as well as other retailers. The Google Streamer is one of the best Android-based 4K streaming devices out there right now. It's also much more affordable than other similar options like the Apple TV, Roku Ultra, and Nvidia Shield TV Pro while offering up-to-date features like smart hub functionality and Thread compatibitility.20% Off the Google Streamer 4K (2024)Google Streamer 4KThe Google Streamer 4K was released in September of 2024. It is the successor to the Chromecast 4K, but improves upon it in nearly all aspects. For starters, Google claims that the new processor is up to 22% faster, with double the memory (2GB) and quadruple the onboard storage (32GB). Both high-speed 802.11ac WiFI and gigabit ethernet connectivity options are available. The Google Streamer 4K is now powered via a USB Type-C port, which makes it easier to track down a power brick if you lose the included one, and if you happen to misplace the newly redesigned remote, there's a handy remote-finder button right on the main unit's body.Like other top-shelf models, the Google Streamer 4K can stream at up to 4K resolution in HDR, including Dolby Vision, HDR10, and HDR10+ formats. Although the Google Streamer 4K is equpped with an HDMI 2.1 port, it's limited to 4K @ 60fps. It also supports Dolby Atmos for spatial audio if you pair it with Atmos-compatible speakers like the Sonos Arc. The Steamer can also control your smart home devices and it functions as both a Thread border router and Matter hub.The Google Streamer 4K is comparable to the Apple TV or Roku Ultra but at a lower price point. The Amazon Fire TV Stick 4K Max is less expensive, but the Android TV is a cleaner and more intuitive UI with no annoying Amazon ads. The only obviously superior option (on the streaming side) is the Nvidia Shield TV Pro, but it retails for $200. Throw in bonus smart home functionality and Matter and Thead compatiblity and the Google Streamer 4K is definitely a killer package.Which streaming services should I sign up for?If you want to test out a streaming service for free before you decide to go all-in, we've compiled a list of our favorite platforms that are currently offering free trials. These include Hulu, Paramount+, Apple TV+, and Crunchyroll. Sign up for all of these trials and you'll have more shows than you'll have time to watch them before the subscriptions expire.If you're wondering what our favorite paid subscription is, the Disney+/Hulu/ESPN+ package bundle is hard to beat at just $16.99 for all three services. Considering what it offers, Disney Plus is one of the best streaming services on the block. From classic Disney animated films to the latest Marvel shows and Star Wars movies, excellent kids' programming like Bluey, and so much more, Disney Plus puts an incredible range of high-quality viewing options at your fingertips. And with so much to check out, including season 2 of Andor on April 22, you'll want to find a plan that best suits you.Why Should You Trust IGN's Deals Team?IGN's deals team has a combined 30+ years of experience finding the best discounts in gaming, tech, and just about every other category. We don't try to trick our readers into buying things they don't need at prices that aren't worth buying something at. Our ultimate goal is to surface the best possible deals from brands we trust and our editorial team has personal experience with. You can check out our deals standards here for more information on our process, or keep up with the latest deals we find on IGN's Deals account on Twitter.Eric Song is the IGN commerce manager in charge of finding the best gaming and tech deals every day. When Eric isn't hunting for deals for other people at work, he's hunting for deals for himself during his free time.
    0 Commentaires 0 Parts 39 Vue
  • 9TO5MAC.COM
    9to5Mac Daily: April 21, 2025 – Apple foldables, TV+ promo
    Listen to a recap of the top stories of the day from 9to5Mac. 9to5Mac Daily is available on iTunes and Apple’s Podcasts app, Stitcher, TuneIn, Google Play, or through our dedicated RSS feed for Overcast and other podcast players. Sponsored by 9to5Mac Daily Plus: Get ad-free versions of every episode by visiting 9to5mac.com/join.  New episodes of 9to5Mac Daily are recorded every weekday. Subscribe to our podcast in Apple Podcast or your favorite podcast player to guarantee new episodes are delivered as soon as they’re available. Stories discussed in this episode: Listen & Subscribe: Subscribe to support Chance directly with 9to5Mac Daily Plus and unlock: Ad-free versions of every episode Bonus content Catch up on 9to5Mac Daily episodes! Don’t miss out on our other daily podcasts: Share your thoughts! Drop us a line at happyhour@9to5mac.com. You can also rate us in Apple Podcasts or recommend us in Overcast to help more people discover the show. Add 9to5Mac to your Google News feed.  FTC: We use income earning auto affiliate links. More.You’re reading 9to5Mac — experts who break news about Apple and its surrounding ecosystem, day after day. Be sure to check out our homepage for all the latest news, and follow 9to5Mac on Twitter, Facebook, and LinkedIn to stay in the loop. Don’t know where to start? Check out our exclusive stories, reviews, how-tos, and subscribe to our YouTube channel
    0 Commentaires 0 Parts 36 Vue
  • THEHACKERNEWS.COM
    Microsoft Secures MSA Signing with Azure Confidential VMs Following Storm-0558 Breach
    Apr 22, 2025Ravie LakshmananIdentity Management / Cloud Security Microsoft on Monday announced that it has moved the Microsoft Account (MSA) signing service to Azure confidential virtual machines (VMs) and that it's also in the process of migrating the Entra ID signing service as well. The disclosure comes about seven months after the tech giant said it completed updates to Microsoft Entra ID and MS for both public and United States government clouds to generate, store, and automatically rotate access token signing keys using the Azure Managed Hardware Security Module (HSM) service. "Each of these improvements helps mitigate the attack vectors that we suspect the actor used in the 2023 Storm-0558 attack on Microsoft," Charlie Bell, Executive Vice President for Microsoft Security, said in a post shared with The Hacker News ahead of publication. Microsoft also noted that 90% of identity tokens from Microsoft Entra ID for Microsoft apps are validated by a hardened identity Software Development Kit (SDK) and that 92% of employee productivity accounts are now using phishing-resistant multifactor authentication (MFA) to mitigate risk from advanced cyber attacks. Besides isolating production systems and enforcing a two-year retention policy for security logs, the company also said it's protecting 81% of production code branches using MFA through proof-of-presence checks. "To reduce the risk of lateral movement, we are piloting a project to move customer support workflows and scenarios into a dedicated tenant," it added. "Security baselines are enforced across all types of Microsoft tenants, and a new tenant provisioning system automatically registers new tenants in our security emergency response system." The changes are part of its Secure Future Initiative (SFI), which the company characterized as the "largest cybersecurity engineering project in history and most extensive effort of its kind at Microsoft." The SFI gained traction last year in response to a report from the U.S. Cyber Safety Review Board (CSRB), which criticized the tech giant for a series of avoidable errors that led to the breach of nearly two dozen companies across Europe and the U.S. by a China-based nation-state group called Storm-0558 in 2023. Microsoft, in July 2023, revealed that a validation error in its source code allowed for Azure Active Directory (Azure AD) or Entra ID tokens to be forged by Storm-0558 using an MSA consumer signing key to infiltrate several organizations and gain unauthorized email access for subsequent exfiltration of mailbox data. Late last year, the company also launched a Windows Resiliency Initiative to improve security and reliability and avoid causing system disruptions like what happened during the infamous CrowdStrike update incident in July 2024. This includes a feature called Quick Machine Recovery, which enables IT administrators to run specific fixes on Windows PCs even in situations when the machines are unable to boot. It's built into the Windows Recovery Environment (WinRE). "Unlike traditional repair options that rely on user intervention, it activates automatically when the system detects failure," Patch My PC's Rudy Ooms said late last month. "The whole cloud remediation process is pretty straightforward: it checks if flags/settings like CloudRemediation, AutoRemediation, and optionally HeadlessMode are set. If the environment meets the conditions (such as an available network and required plugin), Windows silently initiates recovery." Found this article interesting? Follow us on Twitter  and LinkedIn to read more exclusive content we post. SHARE    
    0 Commentaires 0 Parts 30 Vue
  • WEWORKREMOTELY.COM
    The Tech Tribe: Marketing Content Creator
    If you want to work from home (or your favorite café or beach) on your own schedule, creating marketing content with a fun team, then keep reading 🤓 We’re looking for a Marketing Content Creator to join our fully remote team to create regular marketing collateral & campaigns for our clients. In a nutshell, we run a membership program & community (called The Tech Tribe) and we help the owners of IT Support businesses better run and grow their business through training, resources, templates & mentoring. You will be working directly with the owner of the business (me) to create regular monthly marketing content for our 4,000+ clients (members).We've been in business since 2017 and are in a VERY stable position both financially and in the marketplace, so for the right person, there's HUGE job stability. We like to think we're a fun business to work for and we like to work with purpose and efficiency, we also like to keep things as stress free as possible.Our mission to become the most valuable & results driven program for MSPs & IT Support businesses on the planet, which will see the business keep growing over the next few years. One of our things we do for our members is provide them regular physical and digital marketing campaign templates that they can use for their own marketing. This includes things such as:Postcard CampaignsPrinted FlyersSocial Media PostsBlog PostsPrinted NewslettersDirect Mail Campaigns eBook ProjectsAnd more...Up until now we've built this content with a mixture of internal part-time resources and external contractors.And, whilst the quality is good - it's currently not GREAT.So, it's time for us to increase the value and quality and this is where you come in 🤓We're looking for someone who geeks out on creating high quality marketing content based using direct response principles. We love creating fun, quirky, memorable and (most importantly) results driven Marketing Collateral.Don't worry, we won't chuck you in the deep end without first making sure you have everything you need to succeed. You'll have deep support & feedback from the CEO who geeks out on marketing to help you excel at the role, including a heavier focus in the beginning to help you get up to speed with our way of approaching things. You'll also get access to a bunch of marketing courses & training programs that we have purchased over the years to help you continually improve and refine your skills.🤓 ABOUT YOUYou have a proven history of designing Marketing CollateralYou've worked in a B2B Marketing Role beforeYou have an eye for detail, especially when it comes to designYou're ready to leverage A.I. to help you excel in the roleYou understand the importance of good copywritingYou know what Direct Response is and why it's importantYou've created both Digital Content (e.g. Social Media Posts & Blog Posts) and Printed Content (e.g. Postcards & Mailers)You're happy working in a fully remote environmentYou have at least ~2 hours of working overlap within business hours in the Sydney, Australia timezone (currently GMT+10)Some Nice to Haves (but not Deal Breakers):Experience in the MSP or Cybersecurity IndustryMarketing Strategy Experience 🏆WHAT SUCCESS LOOKS LIKEYou'll be responsible for building classy, world-class, physical and digital direct response driven Marketing Collateral and Campaigns for our members to help them grow their business.Success in this role is when our members use the Collateral, Content and Campaigns you create to land more clients 🥳🏅 WHY JOIN US?We're a bootstrapped, profitable and stable company that believes in empowering our team with whatever it takes so they can take ownership & excel in their role.We're committed to doing anything we can to help our clients (we call them members) succeed and you'll have a very important part to play in this journey 🤓You can see what our Team says about working for The Tech Tribe on our Glassdoor Profile here.You can see what our members say about us on our Wall of Love Page here 🧡Here's some other fun facts:📈 We've grown nearly every single month since January 2017📢 We grow mostly by "Word of Mouth" Referrals🤓 We have ~4,000+ MSPs as Paying Members📑 We've created 100's of Templates, Checklists, Marketing Campaigns and Workshops✅ We have a Culture of SOPs and Documentation to help us delivery excellence👷‍♂️ We work daily in ClickUp for Project and Task Management🌎 We have Team Members in Australia, UK, Canada, USA, Venezuela and the Philippines💵 We're financially very stable, so job security is extremely highABOUT US 🤓Our mission is to empower MSPs by providing Knowledge, Skills, Training, Templates, Tools, Confidence, Community & Support to become the best MSPs in the world* 🤓[* an MSP is technical jargon for an outsourced B2B I.T. Services Provider. Essentially they help other small businesses with Technology, including things like Cloud Platforms, Networking, Cybersecurity, Infrastructure, Computers, Servers, Firewalls etc]We currently have 4,000+ members in our program and an additional ~5,000 of their team members and we support them with our small remote team of 10 humans spread out around the world.(including USA, Canada, Mexico, the Philippines and Australia (where our CEO is from))We work hard to deliver an amazing experience to our members to help them better run and grow their business and we take a lot of pride in what we do.\BENEFITSFlexible Work Hours (especially after the first 1-2 months once you’re onboarded)Work from Anywhere (you can travel around the world if you want)Continual Education with opportunities to expand your knowledge and skills.Never work on your Birthday (or the next biz day)Company Laptop & Home Setup (External Monitors, Stand-Up Desk etc)We intentionally keep a very low stress workplaceAccess to world class Marketing trainingWork in a business where our clients LOVE what we do for them (you can see some of this at thetechtribe.com/love)Strong, stable company with a great reputation in the marketplace (most of our 4,000+ members have come from Referrals from existing Members)If all the above sounds exciting - then the next steps is to let us know a little more about you here.A FRIENDLY NOTE: Any applications we receive outside of this application process will be ignored and may cause your whole application to be ignored (we are after someone who is detail oriented after all 😜)  
    0 Commentaires 0 Parts 29 Vue
  • WORLDARCHITECTURE.ORG
    Coldefy and CRA reveals French Pavilion that acts as a "theatre of life" at Osaka Expo 2025
    html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd" The France Pavilion at the Osaka World Expo 2025, created by the French architecture firm Coldefy in collaboration with the Italian design and innovation firm CRA-Carlo Ratti Associati, is currently open until October 13, 2025.With its design framing moments of presence, movement, and connection, the France Pavilion is intended to be a "theatre of life." The Pavilion's architecture, which draws inspiration from mise-en-scène, or stage design and layout, is a flowing series of spaces that lead visitors through shifting perspectives that reflect the beginning, transition, pause, and departure rhythms of daily life.The Japanese tale of Akai Ito, the invisible red thread that binds destined souls, serves as the inspiration for the design. The Pavilion reclaims physical space as a forum for conversation in an era characterized by digital estrangement. A peaceful inner garden provides sanctuary, highlighting the importance of interactions with nature as much as with each other.Its dynamic façade responds to light and wind thanks to 17-meter-high fabric veils that resemble theater curtains and are suspended along two sides. A meticulously planned sequence of steps leads through the Pavilion, rising to an indoor exhibition, changing between indoor and outdoor areas, and culminating with a return to the Expo grounds.With its circular design and prefabricated and modular components, the Pavilion reflects a vision of architecture that is as flexible as life itself by ensuring that its materials may be disassembled and reused.The architectural concept of the Pavilion is based on sensory experience and theatricality. Visitors are welcomed into an expanding experience by the balcony and entrance stairway, which constitute a stage. As a component of the building's façade, the winding staircase blurs the lines between the Pavilion's inside and exterior and fosters a conversation between the two, making everyone feel welcome and open.Visitors travel a round route that circles the center of the exhibition, passing through various themed areas before exiting into a tiny garden and returning to the Pavilion for one more outdoor experience. In contrast to conventional linear experiences, this journey echoes the Pavilion's overarching themes by reflecting cycles and pulsations while showcasing French savoir-faire or knowhow.Three "acts" are used by visitors to experience the Pavilion: The first is the ascent, which leads to an observation deck via a sensuous staircase.  Second, Exhibition Journey: As visitors enter, they make their way through a number of carefully planned areas where they come across installations and scenic features related to the Pavilion's themes. Garden Interlude: Upon exiting, guests are greeted by an auditory landscaped space that provides a chance for introspection before returning to the internal areas.Final Transition emphasizes the rhythmic flow between inside and outdoors, the journey comes to a close with a return to the open air and the Expo site."Infused with a spirit of play, the France Pavilion is a dynamic, flexible space that sparks unexpected encounters. In an era of increasing polarization, physical space offers a much-needed antidote. Unlike the digital realm, it forces us to confront diversity and engage with perspectives that might challenge our preconceptions," said Carlo Ratti, founding partner of CRA and Curator of the 19th International Architecture Exhibition of the 2025 Venice Architecture Biennale. "This mirrors the mission of today’s World Expos, as vibrant hubs for open dialogue and discovery. It’s an honor to bring CRA’s ongoing research to Osaka and contribute to France's part in this global dialogue – a place that has shaped so much of my thinking, from studying at the École des Ponts to this moment," Ratti added.The Pavilion embraces modularity and material reuse in its circular approach. In addition to being a dynamic visual feature, its movable curtain facade is made to be disassembled and reused following the event.To make future dismantling easier, the Pavilion incorporates as many temporary and prefabricated elements as feasible. For instance, the office spaces are housed in container structures. These design decisions ensure minimum impact on the environment, simplicity in reconfiguration, and flexibility after the Expo. The Pavilion is an adaptable building that reflects how architecture and exhibition spaces are changing in response to modern issues rather than being a static monument."The France Pavilion invites visitors to enter the theatre of life. Both actors and spectators in this production, visitors traverse a path through the Pavilion that is an expression of the symbiosis between humanity and its environment," said Thomas Coldefy, founding partner of Coldefy."It’s an honor to have been chosen to design the France Pavilion, and we truly believe that the World Expo has the potential to create a moment of reflection – about how we live, what we value, and how design can shape better futures. Even a brief experience – whether it’s a spatial gesture, a surprising material, or a shared moment – can resonate deeply," Coldefy added.French architecture studio Coldefy and Italian architecture and innovation practice Carlo Ratti Associati unveiled the design of the French Pavilion at Osaka Expo 2025 in January 2024. Site planCirculation DiagramAxonometric DiagramFacade ActivationProgram DiagramPavilion DiagramGround Floor PlanLevel 1 Floor PlanLevel 2 Floor PlanLevel 3 Floor PlanLevel 4 Floor PlanSite SectionCRA-Carlo Ratti Associati is a global design and innovation firm with offices in New York City and Turin, Italy. The office, which draws from Carlo Ratti's work at the Massachusetts Institute of Technology (MIT), is currently working on numerous projects all over the world that cover every possible intervention scale, from furniture to urban planning.Thomas Coldefy and Isabel Van Haute established Coldefy, a global architecture and urban planning firm, in 2006. The firm has offices in Lille, Paris, and Shanghai. Coldefy's sensitive architecture produces balanced environmental, urban, and social compositions that push the limits of cities and living, and there are projects and construction sites all over the world.Project factsProject name: France Pavilion, Osaka World Expo 2025Location: Osaka, JapanSurface: 3,600 m2 Net AreaProject Cost: 22 M EurosProgramme: Exhibition halls, reception hall, office, shop, café Environmental Certification: CASBEEWinning Competition: 2023 Delivery: 2025Project Owner: COFREX General Contractor: RimondProject TeamClient: COFREXArchitects: Coldefy + CRA-Carlo Ratti AssociatiLocal architects and engineers: Yasui SekkeiGeneral contractor: Rimond TeamColdefy: Thomas Coldefy, Isabel Van Haute, Zoltán Neville, Martin Mercier, Marianna Guarino, Léo Akahori, Leonardo Ronchi, Shuai WangCRA-Carlo Ratti Associati: Carlo Ratti, Andrea Cassi (partner in charge), Ina Sefgjini, Gizem Veral, Zeynep Kalaycioglu, Jelena Krco, Gabriele Sacchi, Alba Leon Alvarez, Marie Petrault, Antoine PiconPartnersLocal Architects & Engineers: Yasui SekkeiCompetition Partners: Bollinger + Grohmann (Structural engineers), Coloco (Landscape architects), Ramboll (Environmental engineers), de_form (Graphics/signage) Scenographers: Justine Emard, GSM ProjectAll images: France Pavilion at the Osaka World Expo 2025 © Coldefy & CRA-Carlo Ratti Associati. Images © Julien Lanoo.All drawings © Coldefy + CRA-Carlo Ratti Associati.> via Coldefy & Carlo Ratti Associati
    0 Commentaires 0 Parts 38 Vue
  • WWW.ARCHITECTSJOURNAL.CO.UK
    Case study: Stratford Workshops by Studio MUTT
    This project is a redesign of Stratford Workshops, a former 1980s printworks building. The factory’s original features, mostly covered up and painted over through time, were the inspiration for the design, which aimed to reveal the native heritage and character of the building, creating a contemporary take on industrial aesthetics. Client General Projects’ brief was to provide improved workspace to existing and new tenants, upgrading the building’s offer through layout redistribution, a new entrance, prominent reception and pockets of communal space scattered through its dense plan of office units. The original building was constructed in 1893 and served as Great Eastern Railways’ press for printing tickets, timetables and posters. Printing took place across four stacked open floor plates, with large north light windows in a ribbon across the top floor. Monumental steel beams spanned the width of the building, creating a flexible plan which, over time, was fragmented into cellular offices with minimal shared amenities. Studio MUTT’s approach was to maximise interaction between disparate tenants by reclaiming some of the original building’s expansive dimensions for communal space, as well as using colour to create more legible circulation. The redesign was holistic, and included work on interiors, bespoke furniture, graphics and wayfinding. Advertisement Much of the design was about peeling back and revealing. New additions such as doors, a reception, tea points, washrooms and meeting rooms were designed to build on the character of the building’s heritage, but also to be in contrast with the material and detailing of the old brick-and-steel structure. These peculiar, yet familiar, insertions breathe new life into the building, allowing it to begin writing the story of its future. Alexander Turner, director, Studio MUTT   Project data Start on site April 2023 Completion Gross internal floor area 3,700m2 Construction cost Undisclosed Architect Studio MUTT Client General Projects M&E consultant David Webb Associates Quantity surveyor Quartz Project manager Quartz Principal designer Private consultant Approved building inspector Private consultant Wayfinding and artworks Studio MUTT, Corbin Wood Main contractor Brac CAD software used Vectorworks Predicted design life 15 years     Specification  The design of Stratford Workshops prioritised robust materials to suit the needs of creative work units and maker spaces. The visual aesthetic aimed to reflect the building’s energy and reference the history of the original printing press through colour choices. A cohesive colour strategy was implemented throughout the building, incorporating material selection and furniture design. Timber became a signature material for common areas and the colour palette drew inspiration from old printing machinery and ticket colours. Burnt orange shades were used in shared spaces such as stairwells and kitchens, distinguishing them from private work areas. Bathrooms were redesigned as unisex facilities with a central washbasin and mirrored panels to create a sense of spaciousness. The orange colour scheme continued in the bathrooms, with coloured MDF stalls creating a patchwork effect.Advertisement In the kitchens, stainless steel was used for worktops, splashbacks and shelves, referencing the building’s steel beams. Mirrored surfaces again enhanced the feeling of space and pine doors with a lacquered stripe pattern added a decorative touch. Alexander Turner, director, Studio MUTT     Selected products  Valchromat Investwood Orange, Treatex Hardwax oil finish Bathrooms investwood.ptEuropean decorative pine Specialised panel products Fire-rated, clear matt Osmo oil finish Kitchens spp.co.ukRubber flooring NORA Rubber Flooring Norament 5345 (dark green), 5342 (light green) Corridors nora.comVinyl flooring Forbo Surestep 172932 tangerine Bathrooms forbo.comStainless steel sink, backsplash, skirting Bespoke, Stainless Direct UK Marine-grade, brushed finish Kitchens, bathrooms stainlessdirectuk.comCone pendant light Frandsen Benjamine White Kitchens madeindesign.co.ukTube pendant light Encapsulite MT70 LED AC2, 878mm, MT360 LED, 1525mm Corridors and bathroom encapsulite.co.uk Franke sink Franke 580 x 450mm, stainless steel Kitchen franke.comWall-mounted washbasin Duravit Duravit Vero handbasin Bathrooms duravit.co.ukBathroom fittings Crosswater Crosswater Mike Pro single lever tap and mixer Bathrooms crosswater.co.ukBoiling water tap Rangemaster Kitchens plumbworld.co.ukToilet Duravit Duravit ME compact Bathrooms pro.duravit.co.ukFlush plate Geberit Geberet Omega 60, brushed stainless steel Bathrooms geberit.co.ukSoap dispenser Manomano Manomano 304, stainless steel Bathrooms manomano.co.ukToilet cistern Geberit Duofix frame, wall-hung Omego cistern Bathrooms geberit.co.ukStrip light Iguzzini Iguzzini LED Strip Tube, white, IP44 Bathrooms iguzzini.comRecessed spotlight Iguzzini Iguzzini Easy Space Recess, white, IP44 Bathrooms iguzzini.com      
    0 Commentaires 0 Parts 33 Vue
  • WWW.CNET.COM
    Knight Takes Pawn. Chess Lessons Are Coming to Duolingo
    The course is in beta now, but will be available to everyone in a few months.
    0 Commentaires 0 Parts 37 Vue