Text Precis Utility

Project Overview

Version 0.2
Date
20 Nov 2002
Author
Hugh Miles (hugh_miles@users.sourceforge.net)

1. Introduction

Create an automatic summary of a text passage by using its internal references. This project stems from a talk I once heard on natural language processing. The algorithm used in this utility was one of the examples.

A text passage that runs to several sentences will contain implicit references because nouns and verbs are used in more than one sentence. Each sentence has a number of implicit forward references and a number of implicit backward references. A passable summary can be made by listing the three sentences with the best balance of forward and backward references.

Here's an example.

A project of this nature is necessarily heavily tied to a specific natural language. Variants of noun and verb forms must be recognised. Sentences must be recognized and enumerated. Part of the goal is to abstract those parts of the algorithm which are common across target languages so that, at a later stage, languages other than English may be served.

2. Project Methodology

This project will follow an iterative development style. Each iteration will have a set of goals and some QA at the end. This style of project has worked for me in the past and gives a high level of quality throughout.

2.1 Project Stages

Here's a first cut at what the iterations should encompass. I may adjust this after each iteration.

Prototype
Build enough of the system to demonstrate the algorithm working against example texts. Use British English only for grammar rules.
Finish API
Decouple the API from the implementation. Specifically, allow for different grammar rules and different natural languages. Use British English and US English.
Email Plugin
Write an adapter to filter an email inbox and present the summary as part of the inbox listing.
Beta Test
Complete test cases.
First Release
QA, document and package for email clients.

2.2 Project Stage Deliverables

At each stage I will expect to complete a functional specification and a set of unit test cases.

Prototype Functional Specification

SourceForge.net Logo