Question Description
I’m working on a algorithms & data structures exercise and need support to help me study.
Problem C (50 points)
This problem deals with finding “pangrams” in text. A pangram is a sentence containing all 26 letters of the alphabet. x and y in the cell below are example sentences, x is a pangram, y is not.
x=”Jimquicklyrealizedthatthebeautifulgownsareexpensive.” y = “This sentence is most certainly not a pangram.”
C1. (5 points) Define a generator function, indices() , that takes a string as input and outputs the index numbers where a letter occurs for the first time in the string. [Hint: you can compare letters like numbers. For example, char >= “a” is a valid conditional statement. You can use this to check whether characters in a string are letters.]
In [ ]:
In[1]:
def indices(my_string): my_string = input() string_1 = 0
for i in my_string:
if char >= “a”: yield my_string
string_1 = string_1 + 1
for i in indices(my_string): print(i)
C2. (3 points) Define a function, verify() , that takes a string as input and uses the indices() function to check if the string is a pangram. The output should be boolean True or False .
def verify():
my_string = input()
In [6]:
In [ ]:
In [ ]:
C3: (2 points) Write a version of verify() named tiny_verify() that performs the check in a single line of code, without using indices() . [Hint: Use a comprehension.]
C4. (5 points) Modify the verify() function to figure out which letters (if any) are missing from a purported pangram. This version should return the list of missing letters instead of a boolean value. [Hint: You can get a string containing all the letters of the alphabet by importing ascii_lowercase from the string module.]
http://localhost:8888/nbconvert/html/module-C.ipyn… Page 2 of 4
module-C 4/18/21, 4:28 PM
C5. (5 points) Load and iterate through the collected list of pangrams in data/pangrams.txt line by line and determine if they are actually pangrams. Print out
any lines that are not actually pangrams, and also the letters that are missing.
In [ ]:
In [ ]:
In [2]:
)
In [ ]:
C6: (3 points) Use the output from the verify() function to fix (by any means necessary) the failed pangrams, and verify that you have fixed them.
C7. (5 points) In the cell below are provided some information about a set of books. Create a data object that holds the book numbers and titles associated to each authors’s name. Write this out as a JSON file in the data/books/ directory using the following schema.
books = { AuthorName: {
BookNumber: BookTitle, …
}, …
}
# 84.txt; Frankenstein, or the Modern Prometheus; Mary Wollstonecraft (Godwin # 98.txt; A Tale of Two Cities; Charles Dickens
# 161.txt; Sense and Sensibility; Jane Austen
# 730.txt; Oliver Twist or the Parish Boy’s Progress; Charles Dickens
# 768.txt; Wuthering Heights; Emily Brontë
# 1322.txt; Leaves of Grass; Walt Whitman
# 1342.txt; Pride and Prejudice; Jane Austen
# 1400.txt; Great Expectations; Charles Dickens
# 2701.txt; Moby Dick; or the Whale; Herman Melville # 4300.txt; Ulysses; James Joyce
C8. (7 points) Write a function, get_pangrams() , that takes a book number and outputs a list of the book’s pangram sentences and the total number of sentences in the book. You will need to use the re (regular expressions) module to split the book text into sentences using the re.split(pattern, string) function. The pattern you will need is “[.?!] [^a-zA-Z]” .
In [ ]:
http://localhost:8888/nbconvert/html/module-C.ipyn… Page 3 of 4
She
l
module-C 4/18/21, 4:28 PM
C9. (8 points) Determine who is the pangrammiest author and what the pangrammiest book is, as determined by most pangrams per sentence. [Hint: Use defaultdict s to create “pangrams by author” and “pangrams by book” objects.]
In [ ]:
In [ ]:
C10. (7 points) Print out the most efficient pangram and its author and book, as determined by fewest characters per pangram.