I want to look at the Fark headlines without opening a browser. Why? I dunno, maybe I just want to see what’s new since the last time I looked, without being distracted by the site clutter.
Now, I could just turn off images and go to the site, and that would work fine. Actually, it would work quite well. No need for this article, then. I’m off for some coffee …
What? I have to write something in here about Ruby? … okay.
ahem.
I want to look at the Fark headlines without opening a browser. Why? Well, as it so happens, I am logged into a machine via ssh, and using elinks to load the page will result in a lot of extra clutter from text versions of ads which obscure the headlines. I’m just interested in seeing what interesting, scary, or amusing things have been posted on Fark since the last time I checked.
Finding a Solution
Well, we could always just dump the Fark page to the console:
Three lines of code and we have output.
But running this doesn’t quite get the result I was looking for. I just want the headlines, and I want them without the HTML, thank you very much.
Fark and a lot of other news sites make RSS feeds available. These are special XML files containing mainly — you guessed it, the headlines, without the HTML, you’re very welcome.
This is a little closer to what we want.
Now I get the RSS file dumped to the console. At least the story headlines are little easier to find. To get the behavior I want, though, we’re going to need to chop out the bits we don’t care about and get straight to the headlines. This task is straightforward in Ruby, thanks to the RSS library. The RSS library has recently been made an official part of the standard libs, which makes a lot of this exercise much easier.
Okay, now what does this look like?
This is even better still, but that’s an awful lot of headlines. How about just the most recent ones? How about the last 10?
What does it look like now?
Now we’ve got it down to the freshest 10, but each item is still filling up a lot of space. One way to cut down the length of each line is to split each headline into multiple lines. Let’s start by cutting the category and title into two separate lines:
It’s a lot easier for me to read the output now.
Occasionally I saw HTML entities in the output. ", stuff like that. Let’s fix that problem before we move on to anything else.
It isn’t an issue this time, but I feel better.
Wherever possible, I’m using standard library tools to get my work done. I’m too lazy to remember escaping every possible HTML entity, and I would rather spend a few minutes searching through the Standard Library documentation to find what I need. It’s a good habit, and you might want to try it yourself.
Maybe I only care about particular types of headline. Say, I want to be interested, but not amused.
And it does indeed show me only “interesting” headlines.
That’s pretty nifty, except that it only looks for Interesting items out of the last 10 headlines, rather than looking for the last 10 Interesting headlines.
Is this any better?
Well, we can only look at today’s headlines. I guess we can’t be sure of ten interesting things happening every day. Still, at least I know I’m getting all of the interesting headlines that are available, up to my limit.
Next problem: the only way I can fetch different headline types is to manually dig in to the source code and change the category.
Let’s test it by requesting “PSA” headlines.
That works. Pretty nicely, I might add. OptParse is a great library for handling command-line arguments.
This program now does everything that I set out to do, and then some. I might choose to do a some refactoring to “bulletproof” the code, or wrap it up in some OO niceness to make it pretty. The truth is that this application is exactly what it needs to be for now, and I think that I shouldn’t overwork something that I may never come back to. Maybe later I’ll come back to it when I think of new features or find new bugs, and then I can overwork it to my heart’s content.
I hope you enjoyed working along with me as much as I enjoyed sitting here and typing random nonsense to myself.
What Else?
I may be done with this exercise for now, but here are a few ideas about features that can be added to make it a little cooler. Go ahead and try them out!
Add word wrap to make the output a little more readable.
Add a parameter to change the number of headlines grabbed.
Modify so that this script will work with other newsfeeds.
Modify so that the functionality of this script can be embedded in other Ruby programs.