Text Precis Utility

Example Text Analysis

Version 0.1
Date
21 Sep 2002
Author
Hugh Miles (hugh_miles@users.sourceforge.net)

1. Example Text

Here's some text; taken from an article on Linux in ZDNet.

Until Linux's popularity blossomed, language selection had a hand in dictating operating system. If you were writing Java code, it most likely would run on Sun's Solaris platform to have access to the best tools and to get the best performance. Of course, developer kit ports are available. But when given the choice between using such a utility on the primary platform or one it was migrated to, your best bet is to stick with the native implementation. With major vendors now supporting Linux, reliable tools are increasingly becoming available for the OS.

In addition to proprietary software, the GNU Foundation and other open source advocates have delivered developer solutions that challenge even the best-funded proprietary rivals. In fact, Bugzilla, Concurrent Versioning System (CVS), and a variety of popular IDEs have been created natively for Linux and ported elsewhere, giving the Linux trend momentum from another direction. Some of these solutions have overtaken expensive competitors, leaving little room for doubt when it comes to supporting developer needs.

As for ease of use, Linux was criticised for the longest time because its windowless environment had a high learning curve for converted Windows and Macintosh users. Today, however, several Linux windowing environments will make any developer feel at home.

Finally, Linux was originally created as one developer's project, in reaction to MINIX, the first open source operating system. Since that time, literally thousands of developers have added their experience to the further evolution of Linux to make it the most actively maintained system in the world. Ultimately, Linux was created by developers, for developers.

2. Sentence analysis

The text is separated into sentences. This can be done by so simple a technique as looking for a full stop (period), question mark or exclamation mark.

Sentence # Text
1 Until Linux's popularity blossomed, language selection had a hand in dictating operating system.
2 If you were writing Java code, it most likely would run on Sun's Solaris platform to have access to the best tools and to get the best performance.
3 Of course, developer kit ports are available.
4 But when given the choice between using such a utility on the primary platform or one it was migrated to, your best bet is to stick with the native implementation.
5 With major vendors now supporting Linux, reliable tools are increasingly becoming available for the OS.
etc...

3. Term analysis

Each sentence is analysed for the terms used in it. Here the algorithm requires a bit of intelligence to spot variants: "Linux's" and "Linux", or "developer", "developer's" and "developers". Also, there's a list of "noise" terms which should be disregarded: "until", "had", "a", "in", "if", "you", "is", etc. Here's a term analysis by hand

Term Used in sentence #
blossom
1
language 1
linux 1, 5, 7, 9, 10, 11, 12, 13
popular 1, 7
select 1
etc...

Thus, looking at the use of "popular", sentence #1 has a forward reference to sentence #7 and sentent #7 has a backward reference to sentence #1.

4. Counting references

Counting the forward and backward references for each sentence

Sentence # # of forward references # of backward references
1 5 0
2 2
0
3 8
0
4 1
1
5 5
3
6 5
1
7
5
4
8
4
3
9
4
3
10
3
7
11
2
8
12
1
9
13
0
10

Not surprisingly, the earlier sentences have a preponderance of forward references; the later sentences have a preponderance of backward references.

5. Selection

Using the table above, I select sentences 7, 8 and 9 as having a balance of forward and backward references.

In fact, Bugzilla, Concurrent Versioning System (CVS), and a variety of popular IDEs have been created natively for Linux and ported elsewhere, giving the Linux trend momentum from another direction. Some of these solutions have overtaken expensive competitors, leaving little room for doubt when it comes to supporting developer needs. As for ease of use, Linux was criticised for the longest time because its windowless environment had a high learning curve for converted Windows and Macintosh users.


SourceForge.net Logo