What is cybersecurity breached data
Cybersecurity breached data refers to any sensitive or confidential information that gets accessed by unauthorized individuals through various means such as:
Hacking: Hackers exploit weaknesses in computer systems or networks to steal data.
Malware: Malicious software, like viruses or ransomware, can infiltrate systems and steal data.
Insider threats: Employees or trusted individuals with access to data may misuse it intentionally or unintentionally.
Physical loss or theft: Unsecured devices like laptops or hard drives containing sensitive data can be lost or stolen.
Accidental leaks: Human error can lead to accidentally exposing sensitive data.
The type of breached data can vary depending on the target of the attack, and some examples are:
Personal data: This includes information like names, addresses, Social Security numbers, credit card details, medical records and etc.
Financial data: Bank account numbers, investment details, and tax information can be attractive to cybercriminals.
Corporate data: Trade secrets, intellectual property, customer lists, and internal communications are all valuable targets.
Data breaches can have serious consequences for individuals and organizations. Breached personal data can be used for identity theft, fraud, or blackmail. Companies can suffer financial losses, reputational damage, and legal repercussions.
How are breached data useful to cyber defenders
Breached data reveals what kind of information is exposed, and this allows cyber defenders to know the severity of the impact to the organization or collateral damage(s) to other organization(s). There are follow up actions that can be performed such as advising users to reset passwords if breach data contain exposed credentials, replace credit cards if credit card details are exposed and etc.
Searching within a large breached dataset can be challenging
A breach dataset can be massive and contains large number of records. This sheer volume of data makes searching computationally expensive and time-consuming. Additionally, the data itself can be complex, with information in various formats (text, numbers, dates) and inconsistencies (spelling errors, typos).
Data breach incident
Recently in March 2024, it was reported that a hacker group is alleged to have breached AirAsia, a Malaysian airline. The link to the breached data was exposed in Telegram (Image1).
Image1. Snapshot of telegram post containing link to AirAsia breached data
Tools for quick triage of Breached data
There are 2 useful tools that can potentially be used for quick triage of breach data, and they are the bulk_extractor and Agent Ransack.
Bulk_extractor
Bulk_extractor is a digital forensics tool designed to extract useful information from digital media. Below is a breakdown of its key features:
Target: It scans disk images, files, or directories for hidden content.
Functionality: Unlike traditional tools that rely on file system structures, bulk_extractor examines every byte of data to find specific patterns. This allows it to uncover embedded information like compressed files, encoded data (like Base64 encoded images) and even deleted files that might have bypassed traditional file deletion methods.
Extracted Information: It can extract a wide variety of data including:
Emails
Phone numbers
Credit card numbers
URLs
Images (JPEGs)
Text documents
etc
Speed and Efficiency: Bulk_extractor is known for its speed due to its multi-threaded processing capabilities.
Output: Extracted data is saved in separate text files categorized by type (e.g., emails.txt, creditcardnumbers.txt). This makes it easy to analyze the findings.
Histogram Generation: Bulk_extractor also creates histograms which show the frequency of identified features. This can be helpful for investigators to prioritize what data to examine first.
As a demonstration, bulk_extractor (Image2) was used to extract information of AirAsia's breached data. Inside the extracted folder (directory which bulk_extractor is ran), there is a list of folders and text files. The text files will show the information that are present in the actual files, and they are named according to the type of information (Image3). Each text file will show the directories and filenames of the files which the information are in. For example, email.txt will contain exposed emails addresses and the actual files that contain the respective email addresses. By using Linux commands such as "cat" and "grep", the text files can be quickly searched and displayed. For example, to search for the keyword "airport" in domain.txt, we can use a linux command such as "cat domain.txt | grep -i airport" (image4).
Image2. Extracting information from AirAsia breached data using bulk_extractor
Image3. Output files from running bulk_extractor
Image4. Using "grep" function to search for keyword "airport" within the output files from running bulk_extractor
Agent Ransack
Agent Ransack is a free software utility designed to help you find files on your computer. It's particularly useful for locating files that the built-in Windows search tool might miss.
Here's a breakdown of what Agent Ransack offers:
Thorough Searching: It goes beyond just file names and allows you to search within the contents of the files themselves.
Customization: You can refine your searches using various filters like file size, date modified, and even regular expressions for complex searches.
Speed and Efficiency: It's known for its fast search capabilities, saving you time rummaging through your hard drive.
User-friendly Interface: Despite its advanced features, the interface remains relatively simple to navigate.
Although Agent Ransack is free to use, there's also a paid Pro version called FileLocator Pro that offers additional functionalities. As a demonstration, Agent Ransack was used to search the download directory of AirAsia breached data. It will highlight "keyword" hits and show the file which the "keyword" is identified in (Image5)
Image5. Searching AirAsia breached data containing specific keyword (e.g. airport)
Conclusion
Bulk_extractor and Agent Ransack are useful tools which cyber defender can use to conduct quick triage on breached data to identify exposed information.
Comments