The issue of training data: with Grant Farhall, Getty Images
www.fxguide.com
AsChief Product Officer, Grant Farhall is responsible for Getty Images overall product strategy and vision. We sat down with Grant to discuss the issue of training data, rights and Getty Images strong approach.Training dataArtificial Intelligence, specifically the subset of generative AI, has captured the imagination and attention of all aspects of media and entertainment. Recent rapid advances seem to humanize AI in a way that has caught the imagination of so many people. It has been born from the nexus of new machine learning approaches, foundational models, and, in particular, advanced GPU-accelerated computing, all combined with impressive advances in neural networks and data science.One aspect of generative AI that is often too quickly passed over is the nature and quality of training data. It can sometimes be wrongly assumed that in every instance, more data is good any data, just more of it. Actually, there is a real skill in curating training data.Owning your ownGenerative AI is not limited to just large corporations or research labs. It is possible to build on a foundation model and customize it for your application without having your own massive AI factory or data centre.It is also possible to create a Generative AI model that works only on your own training data. Getty Images does exactly this with its iStock, Getty Creative Images, and API stock libraries. These models are trained on only the high-quality images approved for use, using NVIDIAs Edify NIM built on Picasso.NVIDIA developed the underlying architecture. Gettys model is not a fine-tuned version of a foundational model. It is only trained on our content, so it is a foundational model in and of itself Grant Farhall, Getty ImagesGetty produces a mix of women when prompted with Woman CEO, closeup.BiasPeople sometimes speak of biases in training data, and this is a real issue, but data scientists also know that carefully curating training data is an important skill. This is not an issue of manipulating data but rather providing the right balance in the training data to produce the most accurate results. As part of the curation process is getting enough data of the types needed and often with metadata that helps the deep learning algorithms.Particularly the nature of what data exists in the world already and the qualities of that data that can be used to make the most effective generative AI tool. At first glance, one might assume you just want the greatest amount of ground truth or perfect examples possible, but that is not how things actually work in practice.It is also key that the output responses to prompts provide a fair and equitable mix, especially when dealing with people. Stereotypes can be reinforced without attention to output bias.ProvenanceIt is important to know if the data used to build the generative AI model was licensed and approved for this use. Many early academic research efforts scraped the Internet for data since their work was non-commercial and experimental. We have since come a long way in understanding, respecting, and protecting the rights of artists and people in general, and we have to protect their work from being used without permission. As you can hear in this espisode of the podcast, companies such as Getty Images pride themselves on having clean and ethically sourced generative AI models that are free from compromise and artist exploitation. In fact, they offer not only compensation for artists whose work is used as training data but also guarantees, in some cases, indemnifies against any possible future issues over artists rights.The question that is often asked is, Can I use these images from your AI generator in a commercial way, in a commercial setting? Most services will say yes, says Grant Frarhall of Getty Images. The better question is, can I use these images commercially, and what level of legal protection are you offering me if I do? As Getty knows the provenance of every image used to train their model, their corporate customers enjoy fully uncapped legal indemnification.Furthermore, misuse is impossible if the content is not in the training model. Frarhall points out, There are no pictures of Taylor Swift, Travis Kelsey, athletes, musicians, logos, brands, or any similar stuff. None of thats included in the training set, so it cant be inappropriately generated.AI Generator imageRights & CopyrightFor centuries, artists have consciously or subconsciously drawn inspiration from one another to influence their work. However, with the rise of generative AI, it is crucial to respect the rights associated with the use of creative materials.A common issue and concern is copyright and this is an important area, but it is one open to further clarification and interpretation as governments around the world respond to this new technology. As it stands, only a person can own copyright, it is not possible for a non-human to own copyright. It is unclear how open the law is to training on material without explicit permission worldwide, as Generative AI models do not store a copy of the original.However, it is illegal in most contexts to pass off any material in a way that misrepresents, such as implying or stating the work was created by someone who did not. It is also illegal to use the likeness of someone to sell or promote something without their permission, regardless of how that image was created. The laws in each country or territory need to be clarified, but, as a rule of thumb, Generative AI should restricted by an extension of existing laws such as defamation, exploitation, and privacy rights. These laws can come into play if the AI-generated content is harmful or infringing on someones rights.In addition, there are ongoing discussions about the need for new laws or regulations specifically addressing the unique issues raised by AI, such as the question of who can be held responsible for violations using AI-generated content. It is important to note that just because a generative piece of art or music is stated as being approved for commercial use, that does not imply that the training data used to build the model was licensed and respected all contributing artists appropriately.Generative AIThis fxpodcast is not sponsored, but is based on research done for the new Field Guide to Generative AI. fxguides Mike Seymour was commissioned by NVIDIA to unpack the impact of generative AI on the media and entertainment industries, offering practical applications, ethical considerations, and a roadmap for the future.The Field Guide is free and can be downloaded here: Field Guide to Generative AI. In M&E, generative AI has proven itself a powerful tool for boosting productivity and creative exploration. But it is not a magic button that does everything. Its a companion, not a replacement. AI lacks the empathy, cultural intuition, and nuanced understanding of a storys uniqueness that only humans bring to the table. But when generative AI is paired with VFX artists and TDs, it can accelerate pipelines and unlock new creative opportunities.
0 Comments
·0 Shares
·79 Views