Splitting text into sentences is not a trivial - this is why you should add an extra space after the sentences

Here are some existing tools for splitting text into sentences:


None of them is recognizing the end of sentences 100% correctly.

Here is an example (from this blog):

Now you should know why you should enter two spaces after the sentences. Some programs rely on this.

In the computer era, spacing between sentences is handled in several different ways by various software packages. Some systems accept whatever the user types, while others attempt to alter the spacing, or use the user input as a method of detecting sentences. Computer-based word processors, and software such as TeX allow users to arrange text in a manner previously only available to professional typesetters.[71]

The text editing environment in Emacs uses a double space following a period to identify the end of sentences unambiguously; the double space convention prevents confusion with periods within sentences that signify abbreviations. How Emacs recognizes the end of a sentence is controlled by the settings sentence-end-double-space and sentence-end.[72] The vi editor also follows this convention; thus, it is relatively easy to manipulate (jump over, copy, delete) whole sentences in both emacs and vi.

Wikipedia: Sentence spacing - Digital age

When you type two spaces in a HTML code the extra space is not displayed.

Recommended articles:
Testing out the NLTK sentence tokenizer
How to Split Sentences

Comments

Popular posts from this blog

[fixed] "Evolution is currently offline due to a network outage."

Do not use (only) flash memory (SSD drives, hardware wallets, USB flash drives) for your precious private keys!

Archiving private keys - TLDR version

[ad removed]