Home page
Emergency Help
Evolution of Forensic Computing
Evidential Hardware
Evidential Software
GenX
GenText
GenTree
Computer Electronic Disclosure
Training
Literature Requst
Investigation Services
Laboratory Services
Computer Forensic Systems

GenText Processing Software

Power and Performance
GenText is a state-of-the-art product, developed over many years by our team of in-house software engineers. It is used on a daily basis in our data recovery and forensic laboratories. It is also used by Government organisations worldwide.

Major features are:

  • 32bit application – modern, efficient and user-friendly
  • High-performance – maximises the benefit of modern machines
  • Extremely fast processing times – gigabytes of data can be processed in minutes (actual time dependent upon the amount and type of data present on the image and the options selected for processing)
  • Unattended operation
Efficient and rapid investigation of an image requires initial powerful, accurate and speedy processing of the image file content. The first stage is achieved by running GenX. The next stage is to run GenText.

What does GenText do?
GenText is a very powerful 32bit application, which has two main functions:

  • Extraction of textual data from all areas within an image
  • Subsequent word and number indexing of that textual data (GenX mapped output only)
For GenX mapped output, this provides GenTree, the investigation software, the means to:
  • View textual content in various ways (from each file/data source)
  • Perform very fast and flexible word searches
For GenX extracted output, this creates an extra set of files, each containing the pure textual data of each file/area. These new files may be viewed using standard viewers. No indexing is performed.

Textual Data Extraction
Textual data extraction can be performed on GenX extracted output (live files) or mapped output (mapped files within an image).

The extraction of text from a sample of data is inherently difficult. This is because text data is encoded. However, text is not the only kind of data that has to be encoded. The programs that computers run also have to be encoded (in machine code). Graphic images, fonts, resource files, etc. all have to be encoded. This means that any data read from a disk may contain data stored in a number of possible ways.

GenText has the ability to identify textual data from files and other areas within and outside of the file system, ignoring data used to store other types of information.

This is achieved by using the following procedure:

  • A file is opened
  • A check is made to identify the application that produced this file.
  • If the file is identified, data is read via an appropriate conversion filter, which will convert the textual contents into the encoding scheme used by GenText.
  • Raw data is also read from the file and the encoding scheme recognised (ASCII/EBCDIC depending on the filters selected).
  • This data is then converted and filtered, removing any data that is not text.
  • This data is then passed through for indexing.
Indexing
Indexing is performed on GenX mapped output.

Within the extracted textual data, any string of 2 characters or more is indexed, creating:

  • Word index
  • Soundex ("sounds like") index
  • Numbers index
These index files provide GenTree with the ability to rapidly search for words, sentences, numbers, using Boolean operators and many other filters, across files and any other areas that contain textual data. Time consuming brute force searching will not be required. See GenTree for further information on the wide range of investigation options and features available.

Flexibility and Control
The GenText processing software gives the user access to a number of extremely powerful processing options. The user can choose how strict GenText is when extracting textual data from a file/data source. These text extraction rules are extremely useful if the source is text interspersed with binary data as GenText can extract text from these areas using context sensitive methodology.

A wide range of further options are available, including:

  • Specific word indexing inclusion/exclusion lists
  • Specific file inclusion/exclusion lists
  • File system area selection
  • Handling of European character sets
  • Handling of Unicode data
  • Comprehensive logging
The next step of the process is to run GenTree, the investigation software, which allows the investigation of images in a fraction of the time associated with conventional methods of data investigation.

UK +44 (0) 1869 355255
Freephone 0800 581263
investigate@vogon.co.uk USA +1 405 321 2585
Toll Free 1-800 392-5373
investigate@vogon.us
München +49 (0) 89 3235030
Köln +49 (0) 2203 91547 400
Freecall 00800 42424200
investigate@vogon.de Norway +47 2337 1400
Freecall 00800 42004242
etterforskning@vogon.no

Copyright Vogon International. All rights reserved.  
Home Page | Investigation Services | Laboratory Services | Forensic Systems