
Open-Source AI Is Increasingly Popular But Not Risk-Free
www.informationweek.com
Jessica Hill, Principal Legal Counsel, Product and Privacy, New RelicFebruary 28, 20255 Min ReadYury Zap via Alamy StockOpen-source AI projects are exploding in popularity and are contributing to PwCs estimated $15.7 trillion impact AI will have on the global economy by 2030. However, some enterprises have hesitated to fully embrace AI.In 2023, VentureBeat found that while more than 70% of companies were experimenting with AI, only 20% were willing and able to invest more.Open-source tooling offers enterprises cost-effective, accessible AI use with benefits including customization, transparency and platform independence. But it also carries potentially hefty costs for the unprepared. As enterprises expand their AI experimentation, managing these risks becomes critical.Risk #1: Training dataMany AI tools rely on vast stores of training data to develop models and generate outputs. For example, OpenAIs GPT-3.5 was reportedly trained on 570 gigabytes of online text data, approximating 300 billion words.More advanced models require even larger and often less transparent datasets. Some open-source AI tools are released without dataset disclosures or with overwhelming disclosures, limiting useful model evaluations and posing potential risks. For example, a code generation AI tool could be trained on proprietary, licensed datasets without permission, leading to unlicensed output, and potential liability.Related:Open-source AI tools using open datasets still face challenges, such as evaluating data quality to ensure a dataset hasn't been corrupted, is regularly maintained, and includes data suited for the tools intended purpose.Regardless of the datas origins, enterprises should carefully review training data sources and tailor future datasets to the use case, where possible.Risk #2: LicensingProper data, model, and output licensing presents complicated issues for AI proliferation. The open-source community has been discussing the suitability of traditional open-source software licenses for AI models.Current licensing ranges from freely open to partial use restrictions, but unclear criteria for qualifying as open source can lead to licensing confusion. The licensing question can trickle downstream: If a model produces output from a source with a viral license, you may need to adhere to that licenses requirements.With models and datasets evolving constantly, evaluate every AI tools licensing against your chosen use case. Legal teams should help you understand limitations, restrictions and other requirements, like attribution or a flow-down of terms.Risk #3: PrivacyRelated:As global AI regulations emerge and discussions swirl around the misuse of open-source models, companies should assess regulatory and privacy concerns for AI tech stacks.At this stage, be comprehensive in your risk assessments. Ask AI vendors targeted questions, such as:Does the tool use de-identification to remove personal identifiable information (PII), especially from training datasets and outputs?Where is training data and fine-tuning data stored, copied and processed?How does the vendor review and test accuracy and bias, and on what cadence?Is there a way to opt in or out of data collection?Where possible, implement explainability for AI and human review processes. Build trust and the business value of the AI by understanding the model and datasets enough to explain why the AI returned a given output.Risk #4: SecurityOpen-source softwares security benefits simultaneously pose security risks. Many open-source models can be deployed in your environment, giving you the benefit of your security controls. However, open-source models can expose the unsuspecting to new threats, including manipulation of outputs and harmful content by bad actors.AI tech startups offering tools built on open AI can lack adequate cyber security, security teams, or secure development and maintenance practices. Organizations evaluating these vendors should ask targeted questions, such as:Related:Does the open project address cybersecurity issues?Are the developers involved in the project demonstrating secure practices like those outlined by OWASP?Have vulnerabilities and bugs been promptly remediated by the community?Enterprises experimenting with AI tooling should continue following internal policies, processes, standards, and legal requirements. Consider best security practices like:The tools source code should remain subject to vulnerability scanning.Enable branch protection for AI integrations.Interconnections should be encrypted in transit and databases at rest.Establish boundary protection for the architecture and use cases.A strong security posture will serve enterprises well in their AI explorations.Risk #5: Integration and performanceIntegration and performance of AI tooling matters for both internal and external use cases at an organization.Integration can affect many internal elements, like data pipelines, other models and analytics tools, increasing risk exposure and hampering product performance. Tools can also introduce dependencies upon integration, such as open source vector databases supporting model functionality. Consider how those elements affect your tool integration and use cases, and determine what additional adjustments are needed.After integration, monitor AIs impact on system performance. AI vendors may not carry a performance warranty, causing your organization to handle development if open-source AI does not meet your expectations. The costs associated with maintaining and scaling AI functions, including data cleaning and subject matter expertise time, climb quickly.Know Before You Go Open SourceOpen-source AI tooling offers enterprises an accessible and affordable way to accelerate innovation. Still, successful implementation requires scrutiny and a proactive compliance and security posture. An intentional evaluation strategy for hidden costs and considerations of leveraging open-source AI will ensure ethical and intelligent use.About the AuthorJessica HillPrincipal Legal Counsel, Product and Privacy, New RelicJessica Hill is an experienced product and privacy attorney adept at navigating the intersection of law and technology. She had been at New Relic for over three years, where she specializes in cross-functional initiatives to drive compliance and innovation.See more from Jessica HillWebinarsMore WebinarsReportsMore ReportsNever Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.SIGN-UPYou May Also Like
0 Commentarii
·0 Distribuiri
·60 Views