PHPWordLib version 2.0

The PHPWordLib is a piece of PHP software which is intended to convert MS Word (.DOC) and Rich Text Format (.RTF or .DOC) files to plain text. The PHP library is self contained and does not require absolutelly anything external in order to run. The library has two simple functions to use - LoadFile and GetPlainText. The library does all necessary checking internally and its functions will always return FALSE if something seems to be wrong with the input file.

Features

PHPWordLib provides the following functionality:

  • Supports Rich Text Format (RTF), including international character sets.
  • Supports MS Word (DOC) files, however they are currently not entirely supported such as character sets or complex files.
  • Automatic detection whether the input file is of the correct format: MS Word 97/2000/XP/2003 or newer; RTF format 1.0 or 1.5.
  • Smart rendering text engine, which tries to present the output plain text information as accuretly as possible.
  • Easy to use - just two function calls needed to convert a whole DOC/RTF file.
  • No external libraries, applications, etc. needed!
  • Runs on all platforms where you can run PHP, such as: Linux, any Unix, Windows, MacOS, etc.
  • It's fast! Usually much faster than MS Word itself when saving to plain text file!/li>
  • Ideal to use as an indexing tool for your search engine.

Requirements

Since our software is completely PHP based, these are the server requirements and you have to make sure that your server really supports them before attempting to install this software. If unsure about this, you are stongly adviced to contact your server administrator or the support staff at your web hosting company.

Server requirements:

  • Operating system: any.
  • Scripting support: PHP version 4.1.2 or later.
  • Minimum Web space: around 100KB to extract everything.
  • Zend Optimizer is absolutelly required since the PHPWordLib is distributed in encoded form!

Library Usage

PHPWordLib is organised as a class library and before usage must be initialised. For complete information on the usage of the library and an example of how to use it, please refer to test.php. The following two functions are the interface of PHPWordLib:

  • Function: LoadFile.
  • Input: Filename of the file to convert - extension is not checked and could be anything. Expects a valid MS WORD 97/2000/XP/2003 or RTF file.
  • Returns: The contents of the specified file. Returns FALSE on failure to load or identify the file as a valid content.
  • Notes: You can use your function to load the binary file, but you will loose on the format validity check of the function.
  • Example:
    $contents = $phpwordlib->LoadFile("bginfo.doc").
  • Function: GetPlainText
  • Input: Contents of a valid MS Word 97/2000/XP/2003 or RTF file.
  • Returns: Plain text generated on the basis of the input contents.
  • Notes: The function automatically detects the format of the file and converts it accordingly. Returns FALSE on failure to identify the contents as valid.
  • Example:
    $plaintext = $phpwordlib->GetPlainText(&$contents);