Content Inspection Engine (SDK)

At the heart of every Clearswift product sits a high-performance content inspection engine that provides comprehensive data recognition and thorough content processing, allowing:
  • Data recognition using true-file typing, not simply extension-based recognition. Over 150 common formats are recognised accurately.
  • Data integrity
  • Data decomposition nested and compressed files, including large files up to 16GB Extracting files for subsequent analysis
  • Text extraction from standard office files including (MS Office, OpenOffice, PDF and HTML)
  • Known patterns include credit-card numbers, IBAN numbers, social security numbers (US), NI numbers (UK)
  • Each expression can have its own weighting
  • Logical operators: AND, OR, XOR, ANDNOT
  • Proximity operators: NEAR, BEFORE, AFTER, FOLLOWEDBY
  • Search within body, headers, footers, meta-data or whole document
  • Independent of language, character set or document format
  • Active content detection recognising macros and scripts in Office and PDF formats
  • Malware detection including interfaces to 3rd party AV engines
This technology is central to every Clearswift product. It is used by companies across all vertical markets operating in all corners of the globe to provide solutions to ensure regulatory compliance, prevent data leakage of sensitive or classified information and detect inappropriate communication.

System Integrators or software vendors that want to build new products, services or extend the functionality of their own applications can add this functionality simply by using the Clearswift Content Inspection Engine (SDK).

Other potential uses include:

  • Managed file transfer
  • Email scanning
  • Web scanning
  • Data at rest scanning
  • Instant Messaging
  • Cross-domain solutions

SDK available on:

  • 32/64bit
  • Windows 2003 and 2008
  • Red Hat 5.6 and 6.1
  • Supports C, C++ and Java APIs
  • Multi-threaded
  • Client and server architectures

Case studies

A number of 3rd parties are already enhancing their applications by using the Clearswift Content Inspection Engine (SDK).

Information governance

A system integrator incorporated the Policy Engine as part of their document workflow suite of tools, so that employees would have to register the documents into the system before they could be sent externally via email or web. The SDK is being used to process and index files as part of their registration, and to check them again when they are transferred to ensure that they have not changed.

Data processing

A systems integrator required the ability to content scan a large volume of data in excess of 20Gb per hour, including very large individual files (up to 16Gb compressed files), and apply content filtering based on file name, file type, keyword search and anti-virus scanning.