Improving on a CodeAcademy exercise

In CodeAcademy’s “Data Structures, Meet Iteration” exercise, I was asked to create a Histogram program. It would take any input and then list the words from that input in order of how frequently they occurred.

I had some issues with this program, however.

 

Here is the completed program:

puts “Paste some text:”
text = gets.chomp
 
words = text.split(” “)
 
frequencies = Hash.new(0)
 
words.each {|word| frequencies[word] += 1}
 
frequencies = frequencies.sort_by{|word, value| value}
frequencies.reverse!
 
frequencies.each{|word, value|
    puts word + ” ” + value.to_s
    }
 
I wanted to know what words occurred most in the analogism “If it walks like a duck, looks like a duck, and sounds like a duck, it might be a duck.”
 
Here was its output:

a 4
duck, 3
like 3
it 2
looks 1
and 1
sounds 1
might 1
be 1
walks 1
duck. 1
“If 1
 
The word ‘duck’ was miscounted, because of its placement next to the sentence’s punctuation.

This histogram program would be useless to a human, unless the document they were counting had been specially pre-formatted for the program.

Well, I wasn’t satisfied with that. In the Startup world, an easy way to think of something people would pay for is to look at a process people are already doing, and offer to do some of it for them for a fee lower than the cost of doing it themselves. Many successful companies have been built on this principle. I also believe a good tool is one that works in the user’s existing flow.

 
I decided to modify the histogram so that it could process natural language documents.
 
Ideally, I would just .split the text on all punctuation, including “,” and “.”
But there’s a problem. string.split only takes one argument.
 
I could solve this by calling .split three times, once for each item of punctuation. 
 
To solve it more elegantly and with fewer lines of code, however, I could simply remove the punctuation before splitting the text. But how does one do that?
 
I turned to the documentation.
 
.delete is perfect for my needs, and even takes multiple arguments!
 
Here is the code I added:
 
text.downcase!
text = text.delete “,”, “.”
words = text.split(” “)
 
I want to delete ” characters too, so that words at the beginning or end of quotes will also be counted accurately. However, ” is a special character in Ruby, so I have to find out how to tell Ruby that I mean ” as it appears in a string. I used ‘ around “, so that ” would be recognized as a string.
 
Also, there was an error in the official Ruby documentation about the syntax of the .delete method. The documentation gave str.delete “a” , “b”- this deletes only the intersection of the arrays a and b. Through StackOverflow, I found out that you could use delete this way, str.delete “abc” to delete characters a, b, and c. Finally, my code looks like this:

puts “Paste some text:”
text = gets.chomp
 
text.downcase!
text = text.delete(‘,.”‘)
 
words = text.split(” “)
 
frequencies = Hash.new(0)
 
words.each {|word| frequencies[word] += 1}
 
frequencies = frequencies.sort_by{|word, value| value}
frequencies.reverse!
 
frequencies.each{|word, value|
    puts word + ” ” + value.to_s
    }
 
And gives this output:

duck 4
a 4
like 3
it 2
be 1
looks 1
and 1
sounds 1
walks 1
might 1
if 1
 
Beautiful.
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: