HW9: Word Frequencies

Write a program frequency.py that reads in the file turing.txt and counts the number of occurrences of each word. Your program should print out these word occurrences in sorted order by decreasing frequency value. For example, “the” occurs 10 times and would be printed out before “Alan”, which occurs just once.

Your program should not take in any user input, just read in the hard-coded filename "turing.txt". You may use any example template to read in the file (for loop, while loop, etc).

Recommended Development Steps:

  1. Read in the file & print it back out.
  2. Read in the file, split each line on spaces, & print each word on a separate line.
    • To check your work, use the UNIX command python3 frequency.py | sort | uniq -c | sort -rn to give a list of all the words sorted by frequency.
    • Make sure you don’t see any unwanted values like punctuation or spaces. If you do, tweak your program to get your desired output. (See hints below.)
    • Your next step is to implmement the above UNIX command in your python program using a dictionary.
  3. Create a frequency dictionary and add each word to it: freq[word] = 1. Print out the dictionary.
  4. Count the number of words (see hints below). Print out the dictionary.
  5. Sort the dictionary by values before printing. (See hints below.)

Hints:

When completed, your program’s output should look something like:

('the', 10)
('of', 3)
('code', 3)
('', 3)
('and', 2)
('war', 2)
('at', 2)
('enigma', 2)
('turing', 2)
('story', 1)
('life', 1)
('team', 1)
('by', 1)
('2014', 1)
('nailbiting', 1)
('portrays', 1)
('during', 1)
('win', 1)
('against', 1)
('is', 1)
('man', 1)
('bletchley', 1)
('imitation', 1)
('behind', 1)
('ii', 1)
('race', 1)
('park', 1)
('unlock', 1)
('game', 1)
('an', 1)
('alan', 1)
('cryptanalyst', 1)
('codebreakers', 1)
('darkest', 1)
('real', 1)
('world', 1)
('who', 1)
('school', 1)
('legendary', 1)
('was', 1)
('his', 1)
('cypher', 1)
('days', 1)
('cracked', 1)
('brilliant', 1)
('every', 1)
('government', 1)
('secret', 1)
('true', 1)
('based', 1)
('time', 1)
('film', 1)
('topsecret', 1)
('britains', 1)
('on', 1)