Processed 111 popular science fiction stories from the Gutenberg Project, for the top 2,000 two-word phrases. And then visualized it as a word cloud.

bretbernhoft

Bret Bernhoft
Joined
Nov 30, 2020
Messages
118
Location
USA
In short, I downloaded 111 popular science fiction stories from the Gutenberg Project. Then used Python to find the 2,000 most popular two-word phrases from all of those stories, when looked at as one mass of text.

What you see here are the top 2,000 two-word phrases from all of those stories.

2000-top-two-word-phrases-from-111-sf-stories.png
 
The one I find odd is “men women.” How does one put these next to each other in a sentence that way? All I can think of is weird phrasing like “he looked like one of those men women try to avoid.”
 
Mmm - there's quite a few really odd pairs.
"ca see", "ca get", "want know", "want get", and " enough know" don't really sound like two-word phrases. And if there's 111 stories in there, Captain Nemo must be the star of a lot of them to warrant that font size.

Enjoyed finding all the EE 'Doc' Smith shouts, though!
 
The one I find odd is “men women.” How does one put these next to each other in a sentence that way? All I can think of is weird phrasing like “he looked like one of those men women try to avoid.”
Two that stood out to me were "tara helium" and "von horn." They are sideways just above the start of "captain nemo."
 
Thank you to everyone who has already commented on these data visualizations. It has been a pretty eye-opening journey to learn how to do all of this. I appreciate this community.
 
Looking at the two words clouds, it appears the program must strip out “and” along with maybe some common prepositions and articles. Thus “men women” probably represents “men and women” and other phrases.
That is correct. Numerous common and other "stop words" were removed from the data, to help emphasize broader trends.
 
In short, I downloaded 111 popular science fiction stories from the Gutenberg Project. Then used Python to find the 2,000 most popular two-word phrases from all of those stories, when looked at as one mass of text.

What you see here are the top 2,000 two-word phrases from all of those stories.

2000-top-two-word-phrases-from-111-sf-stories.png
I'm glad that "hither thither" made the cut.
 
Out of curiosity, what was the minimal number of hits required to make the list? The median number? I am wondering it some of the unusual pairings were due to a small number of matches.
That's an interesting question. I took a look, and the last entry in the data is the two-word phrase "everything happened", which was counted a total of 33 times.
 

Back
Top