Data-Acquisition-OSINT
With just under a month to go before I am presenting at the Australian OSINT Symposium, I have decided to write this blog. I want to be transparent and say that this is not the presentation itself but an accompanying read or appetizer.
I have been interested in Data Acquisition OSINT for some years, especially Exfiltrated Data. It was obvious that this was going to be a rich form of OSINT material, material that in some cases was never meant to be available. Material that is so valuable to an OSINT investigator that it could not be ignored.
It is always important to understand and acknowledge that for certain types of data, you have to consider the following, Legislation, Lawfulness, Regulations, Ethics, Morals and Polices. Be under no illusion, in some countries and jurisdictions certain types of data maybe out of bounds for some OSINT practitioners and I recommend you understand those limitations placed upon you before embarking on acquiring data for OSINT, especially when it comes to the lawfulness of your activity.
Basic definition of Data: –
‘Data is information, facts, statistics or pieces of information collected together for reference or analysis to be examined, considered and used to help decision-making; or information in an electronic form that can be stored and used by a computer.’
Where can Data be used? It goes without saying that it has many usages, OSINT investigations being the starting point. Cyber Threat Intelligence is another, understanding from an organisational perspective what data exists that is available that may create a vulnerability for that organisation. Digital forensics, Data-Acquisition-OSINT will provide methodologies and solutions that are not always provided in traditional digital forensic training. Cybercrime investigations, where bad actors deliberately obfuscate and deploy strong OPSEC. Social Engineering, understanding the Digital DNA of your subject of interest.
I will explore Digital DNA further and Locard’s principal of, ‘Every contact leaves a trace,’ (Whether he said those exact word is open to debate). Even though his principal is over 100 years old, it is more relevant today than even Locard probably himself envisioned.
Locard’s principal lends itself well to the digital world too: –
Digital Fingerprinting – the method of how websites you visit give your browser a unique fingerprint in order to identify you and track you.
Digital Footprint – what we leave behind on the internet that can help identify us. IP addresses, username, email address etc. Sound familiar? Some would say this is bread and butter OSINT.
Digital DNA – understanding your subject or interest, how they use the internet, who are they. The snippets they leave that identifies what makes them who they are, when they post, how they post, why they post, what they post. By understanding a SOI who you have never met, helps direct your investigation to areas where you may never have thought of looking.
Data Acquisition OSINT, is not just about, ‘Breached Data,” it encompasses many other forms of data.“Breached Data’ is a term that is often used to describe the acquisition or use of Exfiltrated Data for OSINT. This can be misleading, not all acquired OSINT data is breached data. I tend to look at data as follows: –
Public Data.
Publicly Available Data.
Public Data and Publicly Available Data are two different concepts; however they are often used interchangeably. Let’s take a closer look at what public and publicly available means.
Public Data is openly and freely available. I am from the UK and in the UK there are masses of Government data that is made available to the public. It is meant to be openly and freely public. As we also have the Freedom of Information Act, we can also request data be made public, where it is not necessary as openly and freely available.
And it is not just Governments who make data public, academia, organisations and businesses will also make their data available to the public. Public data is information that can be shared, used, reused and in many cases redistributed without restriction. It was meant to be Public.
Publicly Available Data, however in most circumstances was never meant to be available to the public and was more than likely Exfiltrated in some way or another. It too can be unilaterally shared, used, reused and redistributed. It is not necessarily as openly available or as free as Public Data.
It has to be noted that both Public Data & Publicly Available Data may contain copy-righted material.
Basic meaning of Exfiltrate,
‘Communications & Information, sometimes passive, to remove data from a computer, network, etc surreptitiously and without permission or unlawfully.’
Another stream of data worth considering is Data Broker data. Data collected when you sign up for an online service or when you download an app, and is then subsequently sold to Data Brokers. This is then aggregated with other Public Data and Publicly Available Data and combined to provide a product to sell.
Let’s break this down and look at the different strands of Exfiltrated Data: –
Breached Data
Leaked Data
Stealer Data
Accidental Exposed Data
Insecure Data
As I mentioned previously, “Breached Data,” is an umbrella term used often and frequently to describe Publicly Available Data, however this is often misrepresented in the press or media. A better overarching term is Exfiltrated Data, however not all Exfiltrated Data has been breached. Let’s look at this further,
Breached Data
Breached Data ordinarily will have been exfiltrated from some type of entity, whether that be Government, academia, business, organisations or even an individual, by a criminal act of breaking into a computer system. This could also involve encryption of the source of the exfiltration with a demand for payment.
Stealer Data
On the balance of probabilities some of you reading this blog may have stealer malware on a device you own. Stealers, sit in the background exfiltrating data from the device it is on, to the mother server. This data is then made available on the clear, deep or dark web for payment or otherwise.
Leaked Data
Data can be leaked in many ways, a disgruntled employee, an insider threat, for the greater good, ethical considerations, whistle blower or simply for the kudos.
Probably a good example of, “Leaked Data,” is Jack Teixeria. Young Jack had access to highly sensitive data, in the form of US Military information on the war in the Ukraine. He exfiltrated the sensitive documents and subsequently posted them on his Discord server. Why, was he an insider threat or was it for the kudos? It matters not, it was out there, and probably still is.
Accidental Exposed Data
This is data where there was no malicious intent, no bad actor at play. It was a genuine error or mistake. A Freedom of Information request, where some columns from an spreadsheet should have been deleted or redacted but wasn’t. The data however can then become publicly available as it is exfiltrated from the document and the person in receipt of the data then decides to publish it. The Police Service of Northern Ireland suffered this type of Accidental Exposure where some data was not redacted from a document before it was released under a Freedon of Infomation request.
Insecure Data
This is where data is stored insecurely on the internet and is exfiltrated. We know that the internet is an imperfect and insecure place. Either through bad design, bad practices or lack of knowledge and / or experience, things, as in the internet things, are sometimes left insecure and publicly available. We have to ask ourselves the question, “Was this meant to be insecure?” This is where I think we have an obligation to assess whether we should inform those who own or are in control of the data, that they may want to secure it. After all, in the UK & Europe it could be a breach of GDPR for it not to be secured.
Now, what if, you use the Google Dork, filetype: and discover documents that when viewed, raises the question as to whether it should have been publicly available. Anecdotally I can say that when I have found such documents and I have assessed them, I have then informed the owner. I suspect that this will come down to individuals or organisations to decide based on numerous factors and there is unlikely to be a consensus.
Public Data and Publicly Available Data, in particular Exfiltrated Data requires careful verification and validation. You could obtain data that originates from a single source that has however been redistributed via intermediaries who have manipulated that data. Sometimes it is necessary to locate the original data to ensure you have a clear understanding of what it means to your OSINT investigation, altered or manipulated data could mislead you or lead you to the wrong conclusion.
There are also further considerations as to what is meant by Publicly Available, in the form of it not being as openly or freely available as Public Data.
Publicly Available Data can exist on the clear, deep and dark webs and some will argue that only data that is on the clear web is publicly available. They argue that having to download special software such as TOR, or use a resource like Shodan, or having to download an app such as Telegram and create an account, means it is not truly publicly available.
I disagree, and this is my rationale, before the zero’s and one’s revolution how did we obtain Open Source Intelligence?
Libraries were and still are an extremely useful information resource. However, depending on the country you live in, you may have to create an account, provide some form of identification or proof of residence. Newspapers are another good source of information, are they free, maybe some are but, in my experience, you have to pay for the product. Magazines also fall into the same category as newspapers.
What about meeting places? In the UK we have but they are not as many nowadays, Working Men’s Clubs. Could they be considered a hive of socialist activity and therefore rive with open-source information. In some Clubs, you had to provide your personal details, be interviewed by the committee and pay either a monthly or annual fee.
Yes, the dark and deep web can provide obstacles for some in that greater knowledge maybe required, however that knowledge can be obtained if desired.
On the subject of Telegram, Telegrams newish feature of recommending Channels and Groups that maybe of interest is an easy way to find more content of interest. This is why I do not always rush to harden settings on the platforms, browsers or search engines I use. To whole point of Social Networking sites such as Facebook and now WhatsApp is to connect you with people or content that the algorithms have decided maybe of interest to you. Why from OSINT perspective would you not want to utilise this potential information stream. Of course you need to assess each OSINT deployment, what is the ask, what are the aims and objectives. Don’t restrict yourself to one methodology, understand what your OPSEC requirements are.
Assess – Adapt – Deploy
I know some within the OSINT community will disagree with this methodology and that is fine, as this is what makes the OSINT community what it is, a place where ideas and thoughts are shared openly and freely.
It is important to remember Data Acquisition OSINT is not just about one stream of Data. It includes both Public Data and Publicly Available Data. You must ensure that if you acquire data for OSINT that you use it responsibly and lawfully regardless of its origins. Being responsibly and lawfully applies to all OSINT material you acquire.