Fit to print is a tool that can help you match your writing style to the editorial voice of one of 9 major news organizations. It analyzes over 30 stylistic features, and determines which publication has the most similar writing style to that of the article. It also presents the user with a comparison between their article and the average measure of each feature of a single, user specified publication, as well as a comparison of the most informative features across all 9 of the media organizations.
The features that the model uses to classify articles can be divided into four major categories:
Language Use
These features are related to individual word use. Many of these features are used in author attribution problems.
The Flesch Readability score: Measured on a scale of 0-100, this is a measure of how readable a text is based on a combination of number of syllables per word, words per sentence, and sentence length.
The average word length: Average number of letters per word.
Unique word fraction: The number of unique words used divided by the total word count.
Word rarity: The mean ‘rarity’ of words used based on a list of english word frequencies.
Number of ‘told’ and ‘said’ used per sentence: A measure of how many times the article cites someone else.
Foreign words: Related to the word rarity (or how ‘fancy’ the language used is).
Phrase complexity
These features are related to the sentence structures.
Sentence length (in words): The average number of words per sentence.
Number of ‘but’ and ‘and’ used per sentence: A measure of the number of clauses used.
Number of Wh-adverbs: Words like ‘who’, ‘what’, ‘where’, but also ‘wherein’, and ‘whence’. These can mark questions, or more complex phrases.
Number of verbs per sentence: A measure of the number of clauses per sentence.
Sentence length variation: The standard deviation of sentence length in an article.
Commas per sentence: Broadly, a measure of different ways of organizing sentences.
Tone
These features measure how emotive or judgemental an article is, and include:
Superlative, comparative, and normal adverbs per sentence: Possibly a measure of opinion, excessive adverb use is also typically seen as a mark of bad writing.
Superlative, comparative, and normal adjectives: Another potential measure of opinion.
Question and exclamation marks per sentence: A possible measure of the emotion in an article.
Use of “I”: Does the author insert themselves into the article?
Parts of Speech
The other ‘fingerprint’ parts of speech that I measure (each on a per sentence basis) are:
Singular present verbs, pronouns, prepositions, past tense verbs, past participles, gerunds, and determiners.