Unstructured Data Management Tips
John Edwards, Technology Journalist & AuthorMay 26, 20255 Min ReadLuis Moreira via Alamy Stock PhotoStructured data, such as names and phone numbers, fits neatly into rows and columns. Unstructured data, however, has no fixed scheme, and may have a highly complex format such as audio files or web pages. Unfortunately, there's no single best way to effectively manage unstructured data. On the bright side, there are several approaches that can be used to successfully tackle this critical, yet persistently elusive challenge. Here are five tested ways to achieve effective unstructured data management from experts who participated in online interviews. Tip 1. Use AI-powered vector databases combined with retrieval-augmented generation "One of the most effective methods I've seen is using AI-powered vector databases combined with retrieval augmented generation," says Anbang Xu, founder of AI video generator firm Jogg.AI. A former senior software engineer at Google, Xu suggests that instead of forcing unstructured data into rigid schemas, using vector databases will allow enterprises to store and retrieve data based on contextual meaning rather than exact keyword matches. "This is especially powerful for text, audio, video, and image data, where traditional search methods fall short," he notes. For example, Xu says, organizations using AI-powered embeddings can organize and query vast amounts of unstructured data by meaning rather than syntax. "This is what powers advanced AI applications like intelligent search, chatbots, and recommendation systems," he explains. "At Jogg.AI, we’ve seen first-hand how AI-driven indexing and retrieval make it significantly easier to turn raw, unstructured data into actionable insights." Related:Tip 2. Take a schema-on-read approach Another innovative approach to managing unstructured data is schema-on-read. "Unlike traditional databases, which define the schema -- the data's structure -- before it's stored, schema-on-read defers this process until the data is actually read or queried," says Kamal Hathi, senior vice president and general manager of machine-generated data monitoring and analysis software firm at Splunk, a Cisco company. This approach is particularly effective for unstructured and semi-structured data, where the schema is not predefined or rigid, Hathi says. "Traditional databases require a predefined schema, which makes working with unstructured data challenging and less flexible." The key advantage of schema-on-read is that it enables users to work with raw data without needing to apply traditional extract-transform-loadprocesses, Hathi states. "This, in turn, allows for working with the diversity typically seen in machine-generated data, such as system and application telemetry logs." Related:Tip 3. Look to the cloud Manage unstructured data by integrating it with structured data in a cloud environment using metadata tagging and AI-driven classifications, suggests Cam Ogden, a senior vice president at data integrity firm Precisely. "Traditionally, structured data -- like customer databases or financial records -- reside in well-organized systems such as relational databases or data warehouses," he says. However, to fully leverage all of their data, organizations need to break down the silos that separate structured data from other forms of data, including unstructured data such as text, images, or log files. This is where the cloud comes into play. Integrating structured and unstructured data in the cloud allows for more comprehensive analytics, enabling organizations to extract deeper insights from previously siloed information, Ogden says. AI-powered tools can classify and enrich both structured and unstructured data, making it easier to discover, analyze, and govern in a central platform, he notes. "The cloud offers the scalability and flexibility required to handle large volumes of data while supporting dynamic analytics workloads." Additionally, cloud platforms offer advanced data governance capabilities, ensuring that both structured and unstructured data remain secure, compliant, and aligned with business objectives. "This approach not only optimizes data management but also positions organizations to make more informed and effective data-driven decisions in real-time." Related:Tip 4. Use AI-powered classification and indexing One of the best ways to get a grip on unstructured data is to use AI-powered classification and indexing, says Adhiran Thirmal, a senior solutions engineer at cybersecurity firm Security Compass. "With machine learningand natural language processing, you can automatically sort, tag, and organize data based on its content and context," he explains. "Pairing this approach with a scalable data storage system, like a data lake or object storage, makes it easier to find and use information when you need it." AI takes the manual work out of organizing data, Thirmal says. "No more wasting time digging through files or struggling to keep things in order," he states. "AI can quickly surface the information you need, reducing human error and improving efficiency. It's also excellent for compliance, ensuring sensitive data -- like personal or financial information -- is properly handled and protected." Tip 5. Create a unified, sovereign data platform An innovative approach to managing unstructured data goes beyond outdated data lake methods, says Benjamin Anderson, senior vice president of technology at database services provider EnterpriseDB. A unified, sovereign data platform integrates unstructured, semi-structured, and structured data in a single system, eliminating the need for separate solutions. "This approach delivers quality-of-service features previously available only for structured data," he explains. "With a hybrid control plane, organizations can centrally manage their data across multiple environments, including various cloud platforms and on-premises infrastructure." When it comes to managing diverse forms of data, whether structured, unstructured, or semi-structured, the traditional approach required multiple databases and storage solutions, adding operational complexity, cost, and compliance risk, Anderson notes. "Consolidating structured and unstructured data into a single multi-model data platform will help accelerate transactional, analytical, and AI workloads." About the AuthorJohn EdwardsTechnology Journalist & AuthorJohn Edwards is a veteran business technology journalist. His work has appeared in The New York Times, The Washington Post, and numerous business and technology publications, including Computerworld, CFO Magazine, IBM Data Management Magazine, RFID Journal, and Electronic Design. He has also written columns for The Economist's Business Intelligence Unit and PricewaterhouseCoopers' Communications Direct. John has authored several books on business technology topics. His work began appearing online as early as 1983. Throughout the 1980s and 90s, he wrote daily news and feature articles for both the CompuServe and Prodigy online services. His "Behind the Screens" commentaries made him the world's first known professional blogger.See more from John EdwardsWebinarsMore WebinarsReportsMore ReportsNever Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.SIGN-UPYou May Also Like
#unstructured #data #management #tips
Unstructured Data Management Tips
John Edwards, Technology Journalist & AuthorMay 26, 20255 Min ReadLuis Moreira via Alamy Stock PhotoStructured data, such as names and phone numbers, fits neatly into rows and columns. Unstructured data, however, has no fixed scheme, and may have a highly complex format such as audio files or web pages. Unfortunately, there's no single best way to effectively manage unstructured data. On the bright side, there are several approaches that can be used to successfully tackle this critical, yet persistently elusive challenge. Here are five tested ways to achieve effective unstructured data management from experts who participated in online interviews. Tip 1. Use AI-powered vector databases combined with retrieval-augmented generation "One of the most effective methods I've seen is using AI-powered vector databases combined with retrieval augmented generation," says Anbang Xu, founder of AI video generator firm Jogg.AI. A former senior software engineer at Google, Xu suggests that instead of forcing unstructured data into rigid schemas, using vector databases will allow enterprises to store and retrieve data based on contextual meaning rather than exact keyword matches. "This is especially powerful for text, audio, video, and image data, where traditional search methods fall short," he notes. For example, Xu says, organizations using AI-powered embeddings can organize and query vast amounts of unstructured data by meaning rather than syntax. "This is what powers advanced AI applications like intelligent search, chatbots, and recommendation systems," he explains. "At Jogg.AI, we’ve seen first-hand how AI-driven indexing and retrieval make it significantly easier to turn raw, unstructured data into actionable insights." Related:Tip 2. Take a schema-on-read approach Another innovative approach to managing unstructured data is schema-on-read. "Unlike traditional databases, which define the schema -- the data's structure -- before it's stored, schema-on-read defers this process until the data is actually read or queried," says Kamal Hathi, senior vice president and general manager of machine-generated data monitoring and analysis software firm at Splunk, a Cisco company. This approach is particularly effective for unstructured and semi-structured data, where the schema is not predefined or rigid, Hathi says. "Traditional databases require a predefined schema, which makes working with unstructured data challenging and less flexible." The key advantage of schema-on-read is that it enables users to work with raw data without needing to apply traditional extract-transform-loadprocesses, Hathi states. "This, in turn, allows for working with the diversity typically seen in machine-generated data, such as system and application telemetry logs." Related:Tip 3. Look to the cloud Manage unstructured data by integrating it with structured data in a cloud environment using metadata tagging and AI-driven classifications, suggests Cam Ogden, a senior vice president at data integrity firm Precisely. "Traditionally, structured data -- like customer databases or financial records -- reside in well-organized systems such as relational databases or data warehouses," he says. However, to fully leverage all of their data, organizations need to break down the silos that separate structured data from other forms of data, including unstructured data such as text, images, or log files. This is where the cloud comes into play. Integrating structured and unstructured data in the cloud allows for more comprehensive analytics, enabling organizations to extract deeper insights from previously siloed information, Ogden says. AI-powered tools can classify and enrich both structured and unstructured data, making it easier to discover, analyze, and govern in a central platform, he notes. "The cloud offers the scalability and flexibility required to handle large volumes of data while supporting dynamic analytics workloads." Additionally, cloud platforms offer advanced data governance capabilities, ensuring that both structured and unstructured data remain secure, compliant, and aligned with business objectives. "This approach not only optimizes data management but also positions organizations to make more informed and effective data-driven decisions in real-time." Related:Tip 4. Use AI-powered classification and indexing One of the best ways to get a grip on unstructured data is to use AI-powered classification and indexing, says Adhiran Thirmal, a senior solutions engineer at cybersecurity firm Security Compass. "With machine learningand natural language processing, you can automatically sort, tag, and organize data based on its content and context," he explains. "Pairing this approach with a scalable data storage system, like a data lake or object storage, makes it easier to find and use information when you need it." AI takes the manual work out of organizing data, Thirmal says. "No more wasting time digging through files or struggling to keep things in order," he states. "AI can quickly surface the information you need, reducing human error and improving efficiency. It's also excellent for compliance, ensuring sensitive data -- like personal or financial information -- is properly handled and protected." Tip 5. Create a unified, sovereign data platform An innovative approach to managing unstructured data goes beyond outdated data lake methods, says Benjamin Anderson, senior vice president of technology at database services provider EnterpriseDB. A unified, sovereign data platform integrates unstructured, semi-structured, and structured data in a single system, eliminating the need for separate solutions. "This approach delivers quality-of-service features previously available only for structured data," he explains. "With a hybrid control plane, organizations can centrally manage their data across multiple environments, including various cloud platforms and on-premises infrastructure." When it comes to managing diverse forms of data, whether structured, unstructured, or semi-structured, the traditional approach required multiple databases and storage solutions, adding operational complexity, cost, and compliance risk, Anderson notes. "Consolidating structured and unstructured data into a single multi-model data platform will help accelerate transactional, analytical, and AI workloads." About the AuthorJohn EdwardsTechnology Journalist & AuthorJohn Edwards is a veteran business technology journalist. His work has appeared in The New York Times, The Washington Post, and numerous business and technology publications, including Computerworld, CFO Magazine, IBM Data Management Magazine, RFID Journal, and Electronic Design. He has also written columns for The Economist's Business Intelligence Unit and PricewaterhouseCoopers' Communications Direct. John has authored several books on business technology topics. His work began appearing online as early as 1983. Throughout the 1980s and 90s, he wrote daily news and feature articles for both the CompuServe and Prodigy online services. His "Behind the Screens" commentaries made him the world's first known professional blogger.See more from John EdwardsWebinarsMore WebinarsReportsMore ReportsNever Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.SIGN-UPYou May Also Like
#unstructured #data #management #tips