Get Control Over Unstructured Data
www.informationweek.com
Worldwide, data is generated at a daily rate of 402.7 million terabytes, and roughly 80% coming into enterprises is unstructured. By unstructured, we mean data that is not organized into recognizable and parsable record lengths that have established keys into the data.Instead, unstructured data could come in the form of monolithic video or audio recordings, photos, CAD drawings, e-mails, hard-copy documents, X-rays and MRIs, social media posts, or even the jibberish from telecommunications and network device handshakes and exchanges.Enterprises struggle to get on top of this data, or to even use it at all. This prompted Splunk to report that, More than 1,300 business and IT leaders in seven leading economies have spoken: They struggle to find all their data -- and report that more than half of it is dark -- untapped and, often, completely unknown. And, while they know AI will be transformative, theyre not sure when and how.These points are well taken, because if you want to excel in AI, you need the ability for the AI to mine all of the data that is available, not just 20% of it. To do this, enterprises must get a handle on their unstructured data.How do you do this? By sorting through the data, deciding which parts of it are good, and then organizing the good data so it can be used in systemic processes like AI.Related:The catch for IT is defining an approach that can do these steps. How do you sort, classify and organize data that is coming into the company at such fierce velocities?Step 1: Analyze your unstructured data. Where is your unstructured data coming from, and in what form? How much storage is the data consuming, and what is the cost? Where is the data stored, and who is using it? Who owns the data? How old is the data?All are top-level questions that should be answered for every type of unstructured data that you have in your company.Step 2: Identify data silos. Some of the unstructured data is likely to be owned by specific user departments and may be on separate systems. If the data is exclusively contained within a specific user department, it is considered a data silo that cannot be leveraged by other departments company because those departments dont have access to the data. The data in these silos may not be consumed for what could be a variety of untapped business processes. Siloed data also creates risk when different departments use disparate data and come to discordant business decisions.The primary goal in step 2 is identifying data silos, along with identifying the types of unstructured data that reside in those silos.Related:Step 3: Revisit data retention. How much of this unstructured data doesn't add value, including network handshake noise, or data that is so old or obsolete that no one has used it for years?With IT offering guidance, central data storage and systems in the data center or in user departments and the cloud should be reviewed to determine which data can be jettisoned because it isnt useful. Internal and cloud data retention policies should be reviewed by IT and end users so there is an agreed-to understanding on which types of unstructured data are to be retained and for how long.Some of this data may be non-electronic, such as a hardcopy company products catalogue that has been stored in a backroom closet since the 1980s.Finally, financial insight should be incorporated into the data housekeeping effort. How much facility and disk space are you freeing up by getting rid of useless data, and what is the annual savings?Step 4: Classify and organize data. Once you have eliminated unnecessary unstructured data, its time to classify and organize the data that remains. This task can be labor intensive because so much data classification must be done by hand, with knowledgeable users applying data tags to data objects. For example, that may require tagging all unstructured data artifacts with a product label because they consist of CAD, CAM, photo and video documents of company products.Related:Data tags are the only way to define and navigate through unstructured data objects so people can find what theyre looking for. Unfortunately, data tagging is time-consuming and frustrating when the number of unstructured data objects is huge. These data tags should also be standardized and agreed to across the organization so data retrieval is simplified.Although most organizations cant get around hand tagging data, we are beginning to see automated data tagging software come to market that can do the tagging automatically if it is given a set of business rules. There will also be future support from AI-powered tools that can learn how to evaluate and classify unstructured data objects.Step 5: Enrich data. Let's say that Company ABC wants a bid for a power plant. Much of the data for preparing the bid comes in forms such as schematics, PDF files, hardcopy and email correspondence. This unstructured data, along with traditional structured data, needs to be cleaned, formatted and normalized so it can interact with other types of data in a single data repository that supports decision making during the bid process.There is also a need to import outside data from the cloud and third parties on elements like logistics and weather conditions in the project locale, as well as local regulatory and zoning requirements.Tools like ETL (extract-transform-load) can automate much of the data cleaning and formatting processes, but it still requires IT to write the business rules for data transformation. Plus, the unstructured data being funneled into the data repository must be pre-classified and tagged by end users.The goal of step 5 is to enrich data so that it can interact with all the other types of data to produce a complete picture of a customer, a product, a situation, etc. This helps business decision makers as they think through strategy, tactics, schedules, pricing, and so on.Closing RemarksRealistically, few companies will succeed at harnessing 100% of the unstructured data that streams into them every day, but they can begin to get a handle on unstructured data by identifying where the data is coming from, where it will end up being hosted, what it is, and when it can be discarded.A follow-up and highly do-able step is silo busting, and the beginning of a corporate-wide data repository that contains both structured and unstructured data.The ultimate goal of developing highly enriched data that delivers optimal business value might have to wait until automated data classification and AI technologies mature, but theres a lot that IT can do right now to be ready for that time.
0 Yorumlar ·0 hisse senetleri ·42 Views