Version | 0.2 |
Date |
20 Nov 2002 |
Author |
Hugh Miles (hugh_miles@users.sourceforge.net) |
Create an automatic summary of a text passage by using its internal references. This project stems from a talk I once heard on natural language processing. The algorithm used in this utility was one of the examples.
A text passage that runs to several sentences will contain implicit references
because nouns and verbs are used in more than one sentence. Each sentence
has a number of implicit forward references and a number of implicit backward
references. A passable summary can be made by listing the three sentences
with the best balance of forward and backward references.
Here's an example.
A project of this nature is necessarily heavily tied to a specific natural language. Variants of noun and verb forms must be recognised. Sentences must be recognized and enumerated. Part of the goal is to abstract those parts of the algorithm which are common across target languages so that, at a later stage, languages other than English may be served.
This project will follow an iterative development style. Each iteration
will have a set of goals and some QA at the end. This style of project has
worked for me in the past and gives a high level of quality throughout.
Here's a first cut at what the iterations should encompass. I may adjust
this after each iteration.
Prototype |
Build enough of the system to demonstrate the algorithm
working against example texts. Use British English only for grammar rules. |
---|---|
Finish API |
Decouple the API from the implementation. Specifically,
allow for different grammar rules and different natural languages. Use British
English and US English. |
Email Plugin |
Write an adapter to filter an email inbox and present
the summary as part of the inbox listing. |
Beta Test |
Complete test cases. |
First Release |
QA, document and package for email clients. |
At each stage I will expect to complete a functional specification and a set of unit test cases.
Prototype | Functional Specification |
---|