Basic Media Analysis – Part 3 (Text & Metadata)

Metadata

“a set of data that describes and gives information about other data.”

Too often people don’t think about metadata when considering what or how data can be organised and given context. In part 1 and part 2 of this series of posts, some tips regarding the production of metadata for other forms of media (audio and visual) has been addressed. The final part, is about text & metadata, ending with some tips about processes that can now apply to all forms of media, to produce structured data, as the information stored in various media formats via these tips will now be formatted in a manner that can be processed in a similar way.

Whether it be documents, spreadsheets, PDFs or other types of ‘document’ files; the metadata can provide more important information about the file, than the file itself. When undertaking data-recovery sometimes the file-extension is incorrect and the way to figure out what sort of file it is, is via metadata. Here’s a list of tools for documents via forensicswiki and a broader series of tools can be found via a google search.

The next process may be to undertake produce a sentiment analysis of the document using a tool such as depechemood. Now, if you’ve been able to follow the processes provided in this series to deal with audio, visual and text structured documents, you should have a still basic, yet sophisticated series of metadata contexts associated with the content in your project. This can be added to a database, or for further, far broader enhancements to the utility of this data; it’s time to format the accessibility of it, as Linked Data.