Here: A test with Tom Hanks
beforesandafters.com
The de-ageing Pepsi challenge. An excerpt from issue #24 of befores & afters magazine in print.Ultimately, there would be 53-character minutes of full-face replacement in Here between the four key actors: Tom Hanks, Robin Wright, Paul Bettany and Kelly Reilly. This was a significant amount of screen time. Shots were also often longup to four minutesand the de-aged periods went close to up to 40 years in time. It was obvious that there was no way we could use traditional CGI methods to do this, suggests visual effects supervisor Kevin Baillie. It was not going to be economical and it was not going to be fast enough, and it also risked us falling into the Uncanny Valley. There was just no way we could keep that level of quality up to be believable and relatable throughout the entire movie using CG methods.We also didnt want to bring tons of witness cameras and head-mounted cameras and all these other kinds of technologies onto set, adds Baillie. That sent us down the road of AI machine learning-based approaches, which were just becoming viable to use in feature film production.Test footage of a de-aged Tom Hanks by Metaphysic, which aided in green-lighting the film.With that in mind, the production devised a test in November 2022 featuring Hanks. The actor (who is now in his late 60s) was filmed performing a planned scene from the film in a mocked-up version of the set at Panavision in Thousand Oaks. We had a handful of companies, and we did a paid de-ageing test across all these companies, outlines Baillie. One test was to turn him into a 25-year-old Tom Hanks just to see if the tech could even work. At the same time, we also hired a couple of doubles to come in and redo the performance that hed done to see if we could use doubles to avoid having to de-age arms and necks and ears.Metaphysic won, as Baillie describes it, the Pepsi Challenge on that test. When we saw the results we said, Oh my gosh, that looks like Tom Hanks from Big. What also became clear in that test was that the concept of using doubles to save us some work on arms and hands and neck and ears and things was never going to work. Even though they were acting to Tom Hanks voice and Tom was there helping to give them direction, it just was clear that it wasnt Tom. It wasnt the soul of the actor that was there. I actually think this will help to make people a little more comfortable with some of the AI tools that are coming out. They just dont work without the soul of the performer behind them. Thats why it was key for us to have Tom and Robin and Paul Bettany and Kelly Reillytheyre driving every single one of the character performance moments that you see on screen.Metaphysics approach to de-ageingThe key to de-ageing characters played by the likes of Tom Hanks and Robin Wright with machine learning techniquesMetaphysic relies on their bespoke process known as its Neural Performance Toolsetwas data. Ostensibly this came from prior performances on film, interviews, family photographs, photos from premieres and press images. Its based upon photographic reference that goes into these neural models that we train, outlines Metaphysic visual effects supervisor Jo Plaete. In the case of Tom Hanks, for example, we get a body of movies of him where he appears in those age ranges, and ingest it into our system. We have a toolset that extracts Tom from these movies and then preps and tailors all that, those facial expressions, and all these lighting conditions, and all these poses, into what we call a dataset, which then gets handed over to our visual data scientists.Hanks de-aged as visualized live on set.The raw camera feed.Final shot.I make the analogy, continues Plaete, that where you used to build the asset through modeling and all these steps, in our machine learning world, the asset built is a visual data science process. Its important to note that, ultimately, the artistic outcome requires an artist to sit down and do that process. Its just a different set of tools. Its more about curation of data, how long do you expose it to this neural network, what kind of parameters and at what stage do you dial in? Its like a physics simulation.Metaphysics workflow involved honing the neural network to deliver the de-aged characters at various ages. There is an element of training branches of that network where you start to hone in onto subsections of that dataset to get, say, Tom Hanks at 18, at 30, at 45, says Plaete. Eventually, also, we had some networks that aged Robin into her 80s, which was a slightly different approach even though its the same type of network. At the same time, we have our machine learning engineers come in and tweak the architectures of the neural networks themselves.Such tweaking is necessary owing to, as Plaete calls it, identity leak. You get Toms brother or cousin coming out, instead. Everybody knows Tom extremely well, so you want to hit that likeness 100%. So we have that tight collaboration from the machine learning engineers with the visual data scientists and artists to bring them together. They tweak the architecture, they tweak the dataset and the training loops. Together, we review the outputs, and ultimately, at the end of the day, we are striving for the best looking network. But rather than hitting render on a 3D model, we hit inference on a neural net, and thats what comes out and goes into compositing.On set, Metaphysic carried out a few short data shoots with the actors it would be de-ageing (and up-ageing) in the lighting of the scene. That just involved capturing the faces of the actors and perhaps having a little bit of extra poses and expression range to help our networks learn how a face behaves and presents itself within that lighting scenario, explains Plaete. Ultimately, we have a very low footprint.De-aged, liveThe de-ageing effects were not carried out in isolation. Bob was very excited to involve the actors in reviewing the de-aged faces, recounts Baillie. Id show them, Okay, heres you at 25, what do you think? They had a hand in sculpting what their likeness was like. I remember in particular Robin when I first showed footage of her de-aged to 25and the plate for this was I just sat with her and had a conversation for a couple of minutes and we filmed it on a RED and then went away and a month later I showed herthat it was really emotional for her. She said, Ive been thinking about how to bring the innocence of my youth back into my eyes, back into my expression and suffering over that. This helped me do it. Thats all that the AI knows, is that innocence of her youth. For her, I think there was just this moment of realization that this can help her get back there.Part of Metaphysics workflow is feature recognition, which detects the actors physiology in basic outlines.Final shot.Indeed, this helped drive an effort on set for a preview of the de-ageing to be available to the cast and crew. That helped us to make sure that the body performance of the actor matched the intended age of the character at that time, says Baillie. Its very hard to judge that if youre not seeing it. Every time Bob would call cut, Tom would run back around behind the monitors and watch himself and be like, Oh, I need to straighten up a little bit more. Oh, I was shuffling a little bit, or maybe I was overacting my youth in that one. It became a tool for the actors that they were able to use, and Bob was able to use it, and even our hair and makeup team and costume design team were all able to use it.Metaphysic already had an on-set preview system in the works before Here, but ramped it up on the production when Baillie asked the studio if it could be done. I think a week or two later, recalls Plaete, we hopped on a Zoom call and we had a test live face swap running into the Zoom to show him an example. Kevin said, Yeah, we should do it.The real-time system worked using only the main camera feed, without any special witness cameras or other gear, notes Baillie. It was just literally one SDI video feed running off of the camera, out a hole in the side of our stage into a little porta-cabin that theyd set up next to the set. Thats where all the loud, hot GPUs were sitting, and Metaphysic had a small team of four people that were in that cabin.The real-time budget was about 100 or 200 milliseconds, adds Plaete. We needed to take all our normal steps, optimize them, and hook them together as fast as we could. That was a bit of a hackathon, as you can imagine. But ultimately, it meant training models that were lower resolution. Still, the inference would run fast. I mean, the inference of these models is fast anyway. The high resolution model will still take a second to pop out the frame, which is crazy different from the 25 hours of ray tracing that we come from [in visual effects].The team had built a toolset that carried out a computer vision-like identity recognition pass so that the de-ageing could occur on the right actor. Those recognitions would hand over their results to the face swapper, details Plaete, which would face swap these optimized models that would come out with this type of square that you sometimes see in our breakdowns, and a mask. That would hand over to a real-time compositing step, which is an optimized version of our proprietary color transfer and detail transfer tools that we run in Nuke for our offline models, but optimized, again, to run superfast on a GPU, and then hooking that all together. Wed send back a feed with the young versions.A monitor displaying Metaphysics identity detection that went into making sure that each actors younger real-time likeness was swapped onto them, and only them, during filming.We had one monitor on set that was the live feed from the camera and another monitor that was about six-frames delayed that was the de-aged actors, outlines Baillie. When we did playback, wed just shift the audio by six frames to give us perfectly synchronized lipsync with the young actors. It was really, really remarkable to see that used as a tool. Rita, Toms wife, walked on the set and was like, Oh, my gosh, thats the age he was when we first met. It was lightweight. It was reliable. Its the most unobtrusive visual effects technology Ive ever seen used on set, and it had such an emotional impact at the same time.Plaete adds that he was surprised to see Zemeckis constantly referring to the monitor displaying the live face swap, rather than the raw feed. It was the highest praise to see if a filmmaker that level used that tool constantly on the set. The actors themselves, as well, would come back after every take to analyze if the facial performance with the young face would work on the body.The art of editing a machine learned performanceOne challenge in using machine learning tools for any visual effects work has been, thus far, a limited ability to alter the final result. Some ML systems are black boxes. However, in the case of Metaphysics tools, the studio has purposefully set up a workflow that can be somewhat editable.Tests of various AI models for capturing Wrights younger likeness. Note the difference in mood between the outputs, which needed to be curated and guided by artists at Metaphysic.In addition to compositors, advises Baillie, they even have animators on their team. But instead of working in Maya, theyre working in Nuke with controls that allow them to help to make sure that the output from these models is as close to the intent of the performance as possible.I call them neural animators, points out Plaete. Theyre probably the first of their kind. They edit in Nuke, and its all in real-time. They see a photoreal output, and as they move things around, it updates in real-time. They love it because they dont have the long feedback loop that theyre used to to see the final pixels. The sooner youre in this photoreal world, the sooner youre outside of the Uncanny Valley and the more you can iterate on perfecting it and addressing the things that really matter. I think thats where the toolset is just next level.Sometimes the trained models will make mistakes, such as recognizing a slightly saggier upper eyelid for some other intention. Our eyelids as we age tend to droop a little bit, and these models will misinterpret that as a squint, observes Baillie. Or in lip motion, sometimes there might be an action that happens between frames, especially during fast movements when you say P, and here the model will actually do slightly the wrong thing.Its not wrong in that its interpreting the image the best that it can, continues Baillie, but its not matching the intent of their performance. What these tools allow Metaphysic to do is go into latent space, like statistical space, and nudge the AI to do something slightly different. It allows them to go back in and fix that squint to be the right thing. With these animation tools, it feels just like the last 2% of tweaking that you would do on a CG face replacement, but youre getting to 98% 10 times as fast.You can compare it with blend shapes where you have sliders that move topology, says Plaete. These sliders, they nudge the neural network to nudge the output as a layer on top of the performance that is already coming through from the input. You can nudge an eyeline, for example. Bob likes his characters to flirt with the camera but obviously not look down the barrel. These networks tend to magnetically do that. Eyeline notes would be something that we get when we present the first version to Kevin and Bob, and theyd say, Okay, maybe lets adjust that a little bit.Dealing with edge casesAnother challenge with de-ageing and machine learning in the past has been when the actor is not fully visible to camera, or turns their head, or where there is significant motion blur in the frame. All these things had to be dealt with by Metaphysic.Paul Bettany, de-aged.We knew that that was going to be an issue, states Baillie, so we lensed the movie knowing that we were going to have a 10% pad around the edges of the film. That meant the AI would have a chance to lock onto features of an actor if theyre coming in from off-screen, so that we werent going to have issues from exiting camera or coming onto camera.A kiss between two de-aged characters proved tricky, in that regard. The solution here was to paint out the foreground actors face, do the face swap onto the roughly reconstructed actor, and then place the foreground person back on top. Or, when an actor turned away from camera to, say, a three-quarter view, this meant that there would be less or no features to lock onto. What the team had to do in that scenario was track a rough 3D head of the actor onto the actor and project the last good frame of the face swap onto it and do a transition and let that 3D head carry around the face in an area where the AI itself wouldnt have been able to succeed, outlines Baillie. All these limitations of the AI tools, they need traditional visual effects teams who know how to do this stuff to backstop them and help them succeed.To tackle some of these edge cases, Metaphysic built a suite of tools they call dataset augmentation. You find holes in your dataset and you fill them in by a set of other machine learning based approaches that synthesize parts of the dataset that might be missing or that are missing, discusses Plaete. We also trained identity-specific enhancement models. Thats another set of neural networks that we can run in Nuke and the compositors have access to that. Thats basically specific networks that can operate on frames that are coming out impaired or soft and restore those in place for compositors to have extra elements that are still identity-specific.All of Metaphysics tools are exposed in Nuke, giving their compositors ways of massaging the performance. They can run the face swap straight in Nuke via a machine learning server connection, and they can run these enhancement models, explains Plaete. They have these meshes that get generated where they can do 2.5D tricks or sometimes they might fall back onto plate for a frame where its possible. Theres some amazing artistry on the compositing side.Ageing upwardsMost of Metaphysics work on the film related to de-ageing, but some ageing of Robin Wrights character with machine learning techniques did occur (from a practical effects point of view, makeup and hair designer Jenny Shircore crafted a number of makeup effects and prosthetics approaches for aged, and de-aged characters).Here, Wright appears in old-age makeup, which is then compared with synthesized images of her at her older age, which were used to improve the makeup using similar methods to the de-aging done in the rest of the film.For the machine learning approach, a couple of older actors that were the right target age were cast who had a similar facial structure to Wright. Metaphysic then shot an extensive dataset of them to provide for skin textures and movement of the target age. We would mix that in with the oldest layer of data of Robin, states Plaete, which we would also synthetically age up by means of some other networks. Wed mold our Robin dataset to make it look older, but to keep the realism, wed then fuse in some people actually at that age. Ultimately, this was run on top of a set of prosthetics that had already been applied by the makeup department.Plaete stresses that collaboration with hair and makeup on the film was incredibly tight, and important. You want the makeup that they apply to be communicating a time or a look that would settle that scene in a certain place within the movies really extensive timeline. We had to be careful with our face swap technology that is trained on a certain look from the data, from the archival, from the movies, that we wouldnt just wash away all the amazing work from the makeup department. We worked really closely together to introduce these looks into our datasets and make sure that that stylization came out as well.Read the full issue on Here, which also goes in-depth on the virtual production side of the film.All images 2024 CTMG, Inc. All Rights Reserved.The post Here: A test with Tom Hanks appeared first on befores & afters.
0 Comments
·0 Shares
·91 Views