Version | 0.1 |
Date |
21 Sep 2002 |
Author |
Hugh Miles (hugh_miles@users.sourceforge.net) |
Here's some text; taken from an article on Linux in ZDNet.
Until Linux's popularity blossomed, language selection had a hand in dictating operating system. If you were writing Java code, it most likely would run on Sun's Solaris platform to have access to the best tools and to get the best performance. Of course, developer kit ports are available. But when given the choice between using such a utility on the primary platform or one it was migrated to, your best bet is to stick with the native implementation. With major vendors now supporting Linux, reliable tools are increasingly becoming available for the OS.
In addition to proprietary software, the GNU Foundation and other open source advocates have delivered developer solutions that challenge even the best-funded proprietary rivals. In fact, Bugzilla, Concurrent Versioning System (CVS), and a variety of popular IDEs have been created natively for Linux and ported elsewhere, giving the Linux trend momentum from another direction. Some of these solutions have overtaken expensive competitors, leaving little room for doubt when it comes to supporting developer needs.
As for ease of use, Linux was criticised for the longest time because its windowless environment had a high learning curve for converted Windows and Macintosh users. Today, however, several Linux windowing environments will make any developer feel at home.
Finally, Linux was originally created as one developer's project, in reaction to MINIX, the first open source operating system. Since that time, literally thousands of developers have added their experience to the further evolution of Linux to make it the most actively maintained system in the world. Ultimately, Linux was created by developers, for developers.
The text is separated into sentences. This can be done by so simple a technique as looking for a full stop (period), question mark or exclamation mark.
Sentence # Text 1 Until Linux's popularity blossomed, language selection had a hand in dictating operating system. 2 If you were writing Java code, it most likely would run on Sun's Solaris platform to have access to the best tools and to get the best performance. 3 Of course, developer kit ports are available. 4 But when given the choice between using such a utility on the primary platform or one it was migrated to, your best bet is to stick with the native implementation. 5 With major vendors now supporting Linux, reliable tools are increasingly becoming available for the OS. etc...
Each sentence is analysed for the terms used in it. Here the algorithm requires a bit of intelligence to spot variants: "Linux's" and "Linux", or "developer", "developer's" and "developers". Also, there's a list of "noise" terms which should be disregarded: "until", "had", "a", "in", "if", "you", "is", etc. Here's a term analysis by hand
Term Used in sentence # blossom
1
language 1
linux 1, 5, 7, 9, 10, 11, 12, 13
popular 1, 7
select 1
etc...
Thus, looking at the use of "popular", sentence #1 has a forward reference
to sentence #7 and sentent #7 has a backward reference to sentence #1.
Counting the forward and backward references for each sentence
Sentence # # of forward references # of backward references
1 5 0
2 2
0
3 8
0
4 1
1
5 5
3
6 5
1
7
5
4
8
4
3
9
4
3
10
3
7
11
2
8
12
1
9
13
0
10
Not surprisingly, the earlier sentences have a preponderance of forward
references; the later sentences have a preponderance of backward references.
Using the table above, I select sentences 7, 8 and 9 as having a balance of forward and backward references.
In fact, Bugzilla, Concurrent Versioning System (CVS), and a variety of popular IDEs have been created natively for Linux and ported elsewhere, giving the Linux trend momentum from another direction. Some of these solutions have overtaken expensive competitors, leaving little room for doubt when it comes to supporting developer needs. As for ease of use, Linux was criticised for the longest time because its windowless environment had a high learning curve for converted Windows and Macintosh users.