File Hidden Data & Metadata Frequently Asked Questions


What Are File Metadata And Hidden Data?

The word metadata means "data about other data". It is a general term, with the exact meaning depending on the context. In the context of electronic files, file metadata is commonly embedded in many types of files and contain various types of information such as the file creator name and the organization he belongs to, the creation and modification times, and the device or software used in the creation and processing of the file. This metadata is usually generated automatically by the software or device used to create the file, often without the user is even aware of it.

File Metadata should be distinguished from File System Metadata. Whereas the former is embedded in the file itself, the latter is not. Therefore file system metadata is not a cause for a concern when sharing files. When inspecting a file with the integral File Properties Viewer of Windows, all the properties under the "File" section in the "Details" tab are harmless file system metadata, including "Owner", "Computer", "Folder path", "Date created", and "Date modified".

Hidden data in a file refers to every type of data that could not be visible at all when using a standard viewer, or could be visible only under certain settings of the viewer, even though it does reside in the file and can be viewed by changing the viewer settings, in case of the latter, or by using special software to reveal the hidden data, in case of the former. Common hidden data types include comments, document revision history, and presentation notes. Many applications also embed various application specific hidden data.

Strictly speaking, in most applications file metadata is just one type of hidden data. However, often the two terms are used interchangeably.

[Back to Contents]

Where Does File Metadata And Hidden Data Can Be Found?

Virtually every popular file format contains potentially sensitive and privacy-compromising hidden data and metadata, including Microsoft Word, Excel®, and PowerPoint® documents, LibreOffice® Writer, Calc, Impress, and Draw documents, and PDF documents. Metadata can also be found in various image and media file types such as JPEG, PNG, WebP, SVG, AVI, WAV, MP3, and MP4.

It's important to know that the integral File Properties Viewer of Windows is dangerously reassuring due to its very limited support of file formats and metadata elements. Files may, and often does, contain a whole lot more properties and other types of potentially privacy-compromising hidden data than the ones shown by this viewer. Users should not rely on this viewer to asses the risk involved in sharing files.

There are some file types that cannot hold hidden data, such as plain text TXT files and BMP image files.

[Back to Contents]

What Risks Does File Metadata And Hidden Data Pose?

While hidden data and metadata are useful for finding files and reviewing documents, they pose privacy and confidentiality risks when the files are shared. The hidden data often contains private and sensitive information, that if unintentionally exposed can cause the document creator and his organization embarrassment with possible financial and legal implications.

[Back to Contents]

How To Avoid The Risks That File Metadata And Hidden Data Pose?

To avoid inadvertently divulging potentially damaging information thorough hidden data & metadata, files must be cleaned with a reliable hidden data & metadata removal software before they are being shared with others. This type of software is also known as a metadata scrubber or a metadata stripper.

When selecting a tool for removing hidden data & metadata, it's important that it will be dedicated to the file types one wishes to clean, with a broad-spectrum of supported hidden data & metadata types.

In recent years, several dedicated hidden data & metadata removers have been created by several developers, both offline applications and online web services that can be accessed by any web browser and are often free. Using online hidden data & metadata removers is generally slower than using offline applications, since they require uploading the files to the server, waiting for their cleaning, and then downloading the cleaned files. They are also generally less comfortable to use, especially for cleaning multiple files at once. When using an online service, it is also imperative to make sure it is trustworthy, and will not misuse uploaded files in a way that will compromise privacy and confidentiality. Offline applications that run on the local computer are therefore preferable.

Windows® 11 comes with an integral metadata remover - the Remove Properties and Personal Information feature. However, one should avoid using it since it is unreliable due to its very limited support of file formats and hidden data types, and due to its highly misleading user interface. This feature can only remove a small number of metadata properties, and it cannot remove at all many dangerous hidden data types, including document revision history and document comments. This problem also exist in many third-party tools that are advertised as metadata scrubbers, both commercial and open-source.

[Back to Contents]

What Solutions Does Digital Confidence Provides?

Digital Confidence provides four main hidden data & metadata removal products:

  • ConfidentSend™ - Add-on for Microsoft Outlook® and Mozilla Thunderbird™ email clients that can remove hidden data & metadata from outgoing email attachments on-the-fly

  • BatchPurifier™ - Batch hidden data & metadata removal tool that can clean multiple files at once

  • MailValve EX™ - Hidden data & metadata removal add-in for Microsoft Exchange server

  • MailValve GX™ - Hidden data & metadata removal solution for any SMTP email server

Digital Confidence's products are the most comprehensive hidden data removal solutions on the market today, capable of permanently removing more than 60 types of hidden data from 25 file formats, including Microsoft Office® documents (Word, Excel®, and PowerPoint®), LibreOffice® documents, PDF documents, and popular image and media file types such as JPEG, JPEG 2000, PNG, SVG, AVI, WAV, AIFF, MP3, and MP4.

Thanks to their very broad spectrum of supported hidden data & metadata types, Digital Confidence's products are able to considerably minimize the amount of dangerous hidden data & metadata left in files processed by them. However, using them does not guarantee no hidden data is left in files, and does not guarantee processed files are anonymized.

Many file types are very intricate and complex, and some are also extensible. This is especially true for document file types, such as Microsoft Office® documents (Word, Excel, PowerPoint), and PDF documents. Some software might insert hidden data to files processed by them in a non-standard and unexpected manner. In some cases, one might be able to learn new information about the file and its authors by closely examining processed files for hidden data, either from embedded hidden data on its own, or by combining it with other data from other sources.

Digital Confidence constantly working on improving its products and expanding the coverage of supported hidden data & metadata types.

[Back to Contents]

Who Needs A File Hidden Data And Metadata Removal Software?

In today's digital age, when the Internet is taking central stage in our personal lives and business, every person and organization should put in place safeguards to protect private and confidential information and prevent sensitive data from leaking out. For that sake, hidden data removal software has become a necessity, just like firewalls and anti-spyware software have.

Hidden data removal software is a must-have for every business, legal professional, defense and military organization, webmaster, and privacy conscious person who share files with others, and wishes to avoid divulging more information than they intend. The importance of using hidden data & metadata removal software has been emphasized in recent years with several high-profile incidents, where sensitive data was inadvertently leaked through the hidden data stored within files that were sent to other parties by email or were posted on the web.

[Back to Contents]

Related White Papers