For the moment, you can ignore the details and just concentrate on the output.

The Reuters Corpus contains 10,788 news documents totaling 1.3 million words.

Some languages have no established writing system, or are endangered.

We will wait until later before exploring each Python construct systematically.

The documents have been classified into 90 topics, and grouped into two sets, called "training" and "test"; thus, the text with fileid Unlike the Brown Corpus, categories in the Reuters corpus overlap with each other, simply because a news story often covers multiple topics.

We can ask for the topics covered by one or more documents, or for the documents included in one or more categories.

The simplest kind lacks any structure: it is just a collection of texts.

Often, texts are grouped into categories that might correspond to genre, source, author, language, etc.