Image Recgonition Video Playlist
Basic Media Analysis – Part 3 (Text & Metadata)
Metadata
“a set of data that describes and gives information about other data.”
Basic Media Analysis – Part 2 (visual)
This is a basic introduction which will be followed-up in latter posts.
Essentially, the analysis of video is for the most part, very similar to the analysis of images, as video is a stream of images.
The first element of analysing images / video, is that the files themselves contain an array of metadata in the file as part of the file creation process (depending on how its done). The ‘metadata’ contained within images can include;
- Time/Date it was taken
- Where it was taken (IE: GPS coordinates)
- Information about the device it was taken with
- Copyright information / information about who took the image
- other embedded image metadata.
These elements of data doesn’t rely upon the image or video being of good quality. It’s simply data created as part of creating the file in the first place. MetaPicz is an online example of an online application/service that provides an example of this information. The next process is to analyse the content depicted in the image itself…
The first issue with analysing vision is ensuring the quality of the images is of sufficient quality as to identify, analyse and process the informatics available in the imagery. Where its required, correcting the vision is a useful first step. Beyond the usual processes, adjusting contrast, brightness and other standardised image processing methods; increasingly ‘super-resolution‘ processes are becoming available.
One process when using multiple still images is detailed here;
or alternatively, this guide on petapixel. Once these processes are done; others, involving the use of AI related processes, include those detailed here.
The time-consuming ‘trick’ is, to go through a multitude of processes with an appropriate ‘treatment methodology’, involving the use of ‘master’ and derivative content stacks; that in-turn requires tooling, inclusive of appropriate equipment, to do so effectively.
Once the source-material has been processes as to get the best possible visual quality; the means to produce ‘entity graphs’, or further additional ‘structured data’ converting objects the vision to a structured dataset.
One of the basic differences between video and still images is a timecode. Ideally, storage of metadata/structured data in relation to video content, includes time-code information.
One of the seminal presentations with respect to the works in entity recognition in vision is the TedTalk presentation about ImageNet
In an attempt to make things easier, i’ll try to break down modern image analysis into a couple of different categories.
- Identification of ‘Things’
- Identification of ‘Persons’ or ‘Faces’
- Identification of ‘Emotions’ or ‘gestures’
- Biometrics – The identification of a unique living organism
There’s an ’emerging’ array of services available to the public that have an array of similar capabilities, to which ends, this post will not explore, other than highlighting this emergent field of ‘knowledge banking’, which is producing a significant mass of information leveraging scale of organisations, as to enhance AI / classification and intepretation technologies. This is in-turn producing a core-asset for these organisations by way of providing API access, most-often on a fee-for-service basis, to users, as to enhance the services capability for enhanced analytics capabilities, SOME OF WHICH, they provide public access to by way of their online services.
To produce tooling that is truely ‘enhanced’ beyond traditional knowhow, it’s essential to DIY (“Do It Yourself”).
The easy way to outline services (in a simple way) is dot-points;
- ClarifAI is a service that identifies the objects
- Google cloud vision (wp plugin: https://github.com/amirandalibi/perception )
- Amazon rekognition (Wp plugin: https://wordpress.org/plugins/wp-rekogni/
- cloud sight
- Kairos noting, they’ve got a good comparison guide
- Affectiva
- Cognitec
- betaFaceAPI – Designed for faces
Once the data has been retrieved, database the informatics provided by the tools (inclusive of time-code if video) ideally in an RDF format. The usefulness of RDF provides for enabling the metadata / structured data, discovered in media, to be part of the broader database that is the web.
Basic Media Analysis – Part 1 (Audio)
When collecting materials, media files are long and often disused. The process of turning voice from audio files into something useful, such as a transcript, once required a person to manuallytranscribe the audio (a service that is still available) rather than their being an accessible and accurate method, to do so.
Media, tells a story that incorporates different information to what can otherwise be found solely via text or other forms of metadata. Whilst emotional intonations and other relevent capacities of audio analysis to machine readable formats is a constituent of what can be done, this guide will provide some basic examples of how to process Audio as to transcribe to text as to provide text based information that can be used for further analysis that will be covered in a different post.
ONLINE SERVICES & TOOLS
After a short amount of time searching for basic tools; three have been easily identified alongside the means in which to use YouTube to perform this action.
YOUTUBE
By uploading media to YouTube, YouTube can transcribe the audio automatically. Searching google using terms like “Automatically transcribe audio using youTube” will easily pick it up.
A number of online services exist to provide automatic Audio to Text. Many of them provide a free trial. A few examples include;
Sonix: Sonic (invite link) provides 30 minutes free.
Trint also provides 30 minutes free.
SpokenOnline also provides 30 minutes free.
Local Desktop Alternatives include products provided by Nuance who has a long-history in the field, producing solutions for multiple sectors.
Data Recovery: Laptop & Computers
Data Recovery on Computers and Laptops can be a complex tasks, and in most cases quite time-consuming. In cases where physical hardware damage is the case of data-loss, the likelihood of getting the data back – goes down…
In past experience, even the same type of drive produced via a different batch; the parts won’t work on the old drive. This is something to take into consideration if you or your organisation is storing important data. When purchasing the storage devices (ie: IDE/SATA based drives that are not Solid state) it might well be worth purchasing a spare or two, or ensuring a spare is available; to strip the daughter-board off the drive, with the same manufacture codes; as to retrieve lost data in the case the daughter-board dies…
Furthermore; It is not advisable to create a stripped array over a multitude of disks, if you at all value the data you intend to store of that storage device.
If you’re just looking for an ultra-quick cache for content / data you have stored elsewhere; then, yeah. just don’t trust it for long-term storage.
PROCESS FOR RETRIEVING DATA FROM ‘COMPUTERS’.
For non-technical people who don’t know the difference between the storage device and the ‘computer’; most computers have a storage device part that is able to be removed from a computer, even when the computer doesn’t work.
More common examples of where this happens; is where the drive is a little faulty, and kinda works, sometimes. Or where the power-supply or some other part in the computer stopped working, and the data is ‘trapped’.
Other examples is where something bad has happened. You know there should be a record of it in the computer; but it’s not obvious, and, you want to check it out.
STEP 1: REMOVE THE STORAGE DEVICE
If you can’t remove the storage device, you’re not going to be having much joy. Some newer computers have their storage devices fixed into the circuitry of the device and well; if it don’t work, you’re going to be in trouble.
for the majority of computers over the past 20+ years; they can be removed.
What you don’t want to be doing, is writing anything to that disk. that means, you don’t want to be turning it on or using it, until you’ve tried to get the data you want back.
If it’s simply a case of the computer dying, and you need to move the data to your new computer; that’s easier.
In any-case; find some screwdrivers that are suitable and disassemble the computer to find the hard-drive. If you don’t know what your looking for,
a. search google for ‘hard drive’ images
b. get someone else to help you.
STEP 2: Plug the HDD into a new computer
The local computer shop has an array of cables and cradles that can help you plug your old hard-drive into a new computer. Another option is to get an ‘external case’ for your old hard-drive; if you want to keep it about.
STEP 3: Download data
If you’re simply going to copy the data from ‘your old computer’ to ‘your new computer’, then that’s relatively straight forward. Browse the directories on the hard-drive and copy them across to your new computers hard drive.
Job done.
If; you’ve lost data, the drive isn’t working so well or some other issue; it becomes useful to get another drive with the same amount of space on it, as the one you’re intending to get the data-from, to use as a ‘working drive’ to copy all your files across to.
STEP 4: DATA RECOVERY
So, the first thing is; do not use the hard-drive you want to get data-from, as the disk you use to turn on the computer, etc.
If you want to get data-back, use a different hdd and plug-in the drive you want to recover data from. It’s also useful to have a second drive, to put the data onto from the drive your recovering data from.
Goto google; search “data recovery software” to find something that will work on the computer you’re using.
Run the program, target the drive you want to recover from; and store the retrieved data on the disk you’ve got to back it up onto.
Data Recovery & Collection: Mobile Devices
Have you got a bunch of important messages on your phone and you’re wondering how you can store this data for safe-keeping. Have you experienced an incident that has made you feel unsafe, and your wondering how to make a record of it to report it to your employer, school or police.
if you type into google ‘templates incident report’ you’ll find a bunch of example documents that you can use to make something that suits your purposes.
However; one of the problems might be that if you’re simply writing things out, perhaps the matter won’t be taken seriously… not what anyone wants.
For this reason, and many others, below is an outline of how to get data out of your phone. We’ll also cover the process just in-case you’ve ‘accidentally’ deleted important data on your phone already. whilst the method is not 100% successful, it’s a process worth trying out, just in case it makes your life easier.
We’ll just focus on Android and iOS. Whilst there’s a few other options out there; the majority of the case, it’ll be one or the other.
Data collection off most “smart phones“, is most-often handled by some-app that’s connected to it; whether that be facebook, gmail, twitter or several photo apps, etc. these systems all store the data within their apps and so, its alot more complicated to think about how to retrieve anything that may have been deleted within those apps; and indeed, the data is stored on the ‘cloud service’, in which case – its’ better to figure out how to download a copy.
However; Things like SMS’s & Call Logs are a little different. these are generally not stored as part of a cloud-service and need to be retrieved from the phone.
PART 1: Lets start with a situation where the data you want has been deleted;
STEP 1.
Try not to use it and do not download anything to the phone in an attempt to get that data.
when a user tells the operating system managing the device to delete something, it’s generally not deleted. it’s just ‘marked’ for deletion and is no-longer available through the graphical user-interface of the computer, making it ‘deleted’ as far as most people would know. The space is then ‘freed-up’ which means the operating system knows that the area of the storage device used previously to store that data; can now be over-written with something else.
Whilst the process of writing to the storage device does not necessarily write over that specific part, its not really very controllable. Sometimes data can remain for years; in other cases, it can be overwritten very, very quickly.
‘data recovery’ applications that seek the user to download something to the same disk; aren’t the types of tools you want to use.
STEP 2.
Find an application that works on a Laptop or Desktop Computer. A simple example of how to do this is to type into google ‘iPhone Data Recovery’ or ‘Android data recovery’.
Features you may want to look for;
– What types of data application supports retrieving.
– What formats the application outputs the records.
The benefit of obtaining data in a format such as CSV is that the data is thereafter more easily consumed by analytics tools to have a better look into what’s been going on; or how to present that, to others seeking evidence.
STEP 3.
Plug your mobile device into your computer & download the data.
STEP 4.
Make a copy of the data for back-up purposes, and do what you want with the working copy of the recovered data.
PART 2: Data that is still on the phone, and you don’t need to worry about any deleted records.
So, if the data is already on the phone and the whole ‘recovery’ process is unnecessary, then you’ll find a bunch of apps online that will work with your phone, on your phone, to collect and upload your data to a nominated location.
Importantly; if, you need to make a point about something – an issue you might want to consider is that the ‘metadata’ stored in the files is more easily manipulated when you take that data off the phone. Whilst data-records like call-logs remain on the phone; it’s far, far more difficult to manipulate these records. Therefore; in-terms of ‘evidence collection’, you might find taking a ‘screenshot’ of the data on the phone – to be an important part of your data-collection process.
Similar to the above examples – search google for ‘snapshot android’ or ‘snapshot iOS’ and the method to do so can easily be found.
PART 3: I’ve got voice-mail messages; and, the provider won’t give them to me.
The method i’ve found to obtain a copy; has been to use an audio recorder app, put the phone onto speakerphone, and whilst the audio-recorder is working; call the voicemail service and record the messages, including the information about when they were created, etc.
Once you have obtained these messages; use a audio editing application on a desktop or laptop computer and be sure to add the information about when the recording was made, etc.
Concluding remarks.
once you have the data you need, you might find it helpful to log the records chronologically; and have a look at any available metadata that might be available to you, to further illustrate a clearer picture to those who need to know. Obviously, undertaking these sorts of tasks on innocent, unsuspecting 3rd parties without their knowledge is most likely, illegal, but moreover a gross breach of privacy and indeed trust. In some cases, it may be that someone needs help to do these sorts of tasks; in which case, it’s recommended that any would-be ‘good samaritan’ goes about doing it, on the data-owners equipment as to ensure, no lazy copies end-up floating about unnecessarily.
Introduction to Maltego
What is Maltego?
Maltego is a tool that’s available on dual licensing enabling commercial use or freely, providing a tool that’s used to investigate relationships using data, then map, store and print reports from those investigative views that makes it far more difficult for others to ignore.
I first stumbled across it when reviewing the information provided by Facebook to professional users, such as app providers, and the means in which that data subsequently allows them to facilitate advanced behavioural analysis as an international commercial entity
(it’s generally rather difficult for a person to participate in society without social media accounts; the above video gives some insight into what that costs).
The commercial version of Maltego offers more features and plugins that are otherwise not available in the community edition. One example is the Social Links framework that provides enhancements for social-network analysis than is otherwise provided ‘out of the box’ by the Maltego community edition. Whilst SocialLinks is only one example, their videos can be found here.
Does Anonymity exist?
The easiest way to answer this question is; in effect, no. 99% of all issues pertaining to a circumstance where it is said that the information doesn’t exist, is actually a problem of who has access to that information and the cost of obtaining it, rather than an honest circumstance where the data actually does not exist to substantiate a circumstance.
Whilst it took many years to convince leading, high-value organisations that the internet was a useful and worthwhile investment; their investments now, similarly to their investments in past, instigate controls over the internet that make it very difficult to genuinely do anything without leaving traces of those actions on the internet somewhere.
The bigger problem; is that this information is not available to the majority of victims who have been harmed by the unlawful behaviours of others; and in many circumstances, it’s illegal to collect that information for the purposes of participation in rule of law fully; as a subset of the guiding principles that operate our society.
These problems are thereafter not technical in nature; but rather, socio-political. If public servants are found to be doing the wrong thing; that would cost the government, if they were easily able to provide that information to a court of law in a manner required by that court to effectively evaluate a circumstance. If powerful married men with families want to engage in sex with those who are not their wives, and their wives at times do the same; then whilst the ‘data’ may exist, it’s not available, regardless of the subsequent harm an acrimonious relationship may cause children.
Most organisations use sophisticated computing systems to manage their accounting, stock-management and related business records; yet we are still provided thermally printed paper receipts that fade in sunlight.
Mobile phones continuously track the whereabouts (and speed of travel) of it as a device; but this is not available for the purpose of dealing with traffic infringements. New vehicles can tell whether someone is wearing a seatbelt; but a special device is needed to get that information.
Our web-usage is continuously tracked, the websites we use can figure out when we sleep (due to lack of activity on mobile devices, et.al.) and whilst these things all form part of what is used for crimes that pertain to significant financial loss of government entities, it is more often than not suggested ‘not to exist’, and save particularly ‘special circumstances’ are not made available to a citizen seeking lawful remedy.
Whilst it is true that some, particularly skilled, dedicated and well-financed individuals can form circumstances in which their actions are made ‘anonymous’ or unable to be identified; this is simply not reality for the vast majority living in our modern ‘connected’ age.
So, whereas whether living in a democracy or otherwise; we seek ‘lawful remedy’ the question becomes how exactly it is that we go about achieving this, when we may be discouraged by others to do so.
An introduction to Virtual Machines.
A virtual machine (VM) is an emulation of a computer system through the use of specialised software on a host computing system. Virtual Machines (VM’s) are used throughout the internet for hosting systems, websites and other resources for an array of purposes including the means to scale a solution from using limited hardware resources as a small site or solution; through to managing the hardware requirements for that solution as it grows. Other uses of VM’s include developers who want to test and/or develop websites, technology professionals who need to test particular forms of software, figure out or manage security risks such as malware; and an array of other purposes that make the use of VMs very, very popular.
On a less sophisticated basis; VM’s offer the means to run any type of operating system, as an application on most computers or laptops, so it doesn’t matter if you have a mac or a pc; you can run whatever operating system you want in a VM and it will load on your machine when you want to use it, and can be turned-of whenever you want to turn it off – without leaving problems on your host-machine. Because the Virtual Machine is an independent environment, from the operating system right through to any and all applications that run within it, whatever you do in that environment is stored within the virtual machine rather than in your normal computing environment. It’s also possible to put a VM on a USB key and load it on other machines, or share the work you’ve done in the VM by simply copying the VM on a USB Key and giving it to someone (with the relevant details) for them to review of store safely for you.
A commonly used application for creating Virtual Machines is VirtualBox.
Virtual Machines can be used to create a cleaner computing environment that can be used for some sort of specific purpose that you don’t want to be stored on your every day computing environment. In this way, virtual machines are an effective means to deal with other web-persistence issues, ideally also alongside the use of a VPN.
Web-Persistence
I’m not quite sure what to title this section. Many speak of this concept as digital identity persistence, yet often it’s not the person that is subject to ‘web persistence’ but rather the machine or home network address that provides persistent information about the characteristics of a user; regardless of who the actual user is.
This can end-up with an array of unfortunate situations. A father, mother or other adult in a household who enjoys adult material; may unwittingly alter the website advertising being provided to others in the household who use the same internet connection (children included!). Families who share machines and the accounts set-up in those machines may create web-experiences that pollinate in different ways irrespective of the user at the time.
These issues pertain to what i’ll call ‘web persistence’ in describing the circumstance that the use of internet is tracked by operators of the internet who work with whatever information they can get, altering the use of internet on that machine, in that location, from whatever account on the machine the user is using; in an effort sought by them to make money through your use of internet and they do this through the use of identifiers, and ‘scraps’ of information left-over from previous uses in relation to those identifiers. The systems that collect this information is not simply the website you intend to go to, but also the services that websites uses as part of providing the functionality delivered by the sites you visit. When thinking about this from a security point of view, the term that is used is ‘vectors’. The concept being ‘attack vectors’ or ‘security vectors’ or other forms of ‘vectors’ that can be used to trace, track and identify.
An easy way to understand the different ways this may occur is by considering the OSI Model.
Each Machine has a MAC ADDRESS, which in-turn connects to a network and is provided an IP ADDRESS. From there, your machine forms a fingerprint.
Parts of this digital fingerprint includes your User Account, the host IP Address used by the network you are able to be sent webpages from publicly and the information stored by your browser, such as cookies, or login information within your browser that websites may use to infer you were using the internet in a particular way; regardless of whether it was you at your keyboard or someone else.
OpenRefine
OpenRefine (formerly Google Refine) is a powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data.
Importantly, OpenRefine also includes an array of plugins that helps users sort out data, then update it and work with data in an RDF format
Google Tracking Data (geolocation)
Have you ever got a traffic infringement and couldn’t figure out what it was about? Have you wanted to look back at what you did some day in the past, and wished you had a record of some form?
Well, if you’re not too sensitive about tracking systems data; you might find google maps timeline feature really very useful.
Whilst you need to turn it on, the means to have the data about where you were at a particular time in the past; might be more helpful in the future than you might otherwise know.
The google maps system uses your Geolocation information from your phone that is otherwise tracked by telecommunications companies and apps on your phone. Google Maps timeline is a way you can have access to that same data.
Better yet; it’s downloadable.
It’s important you don’t use this functionality on an account that is not yours and that you have no right / permission, to use or obtain. Whilst considerations about privacy and surveillance is different in different jurisdictions, it’s generally not a good thing to do to another person unless they’ve explicitly asked you to do so.
Downloading My Data from Social Networks
Almost anyone on the web today has accounts on social-network websites. These websites offer a rich source of data for users, but it’s often difficult to get unless you know how.
Facebook has a help link that can be found in google. the current information about how to download your data can be found here.
Twitter is very similar to Facebook. The current link via google can be found here
Same as above. see link
Every other social-network site i am aware of has a similar means in which it makes it easy to download your data into an archive format of some sort. Once you have the data, you then need to figure out how to make it available in a useable format. We’ll cover that in a seperate post.
Introduction to Linked Data
What is Linked Data?
Linked Data is a way of authoring a hypertext document in a way that makes the concepts and relations described in the document machine readable.
How is it used on the web?
The easiest way to see Linked-Data on the web is to use a tool that helps you see the information embedded into the web pages you use. These tools show you how machines can read the information embedded in web-pages, which can be used for many purposes; including referencing the machine-readable data in those pages in applications you might want to develop that can utilise this data, from all the webpages you reference in your application to create what is called a ‘graph’ of information about any particular subject.
Some of the tools to see the data in webpages includes the OpenLink Data Sniffer alongside specialist tools, such as Google’s Structured Data Testing Tool.
Most major websites use linkeddata; but they’re not all using the same languages. Linked-Data languages are called ontologies. An ontology is a specially defined language or machine-readable dictionary, that’s been designed for some specified purpose. Websites such as wikipedia is made available as linked-data via wikidata.
For a more comprehensive list of linked-data ontologies the website LOD (linked open data) Cloud provides a navigable tool to find various ontologies for almost any purpose. Other tools like LinDA can be used to search for particular terms and find different ontologies that define those terms in a machine-readable way. The most popular ontology for search-engine optimisation is called schema.org and is continually developed via a github issues list and related W3C Community CG.