I want to look at the Fark headlines without opening
a browser. Why? I dunno, maybe I just want to see what’s new since the
last time I looked, without being distracted by the site clutter.
Now, I could just turn off images and go to the site, and that would
work fine. Actually, it would work quite well. No need for this article,
then. I’m off for some coffee …
What? I have to write something in here about Ruby? … okay.
ahem.
I want to look at the Fark headlines without opening
a browser. Why? Well, as it so happens, I am logged into a machine via
ssh, and using elinks to load the page will result in a lot of extra
clutter from text versions of ads which obscure the headlines. I’m just
interested in seeing what interesting, scary, or amusing things have
been posted on Fark since the last time I checked.
Finding a Solution
Well, we could always just dump the Fark page to the console:
Three lines of code and we have output.
But running this doesn’t quite get the result I was looking for. I just
want the headlines, and I want them without the HTML, thank you very
much.
Fark and a lot of other news sites make RSS
feeds available. These are special XML files containing mainly - you
guessed it, the headlines, without the HTML, you’re very welcome.
This is a little closer to what we want.
Now I get the RSS file dumped to the console. At least the story
headlines are little easier to find. To get the behavior I want, though,
we’re going to need to chop out the bits we don’t care about and get
straight to the headlines. This task is straightforward in Ruby, thanks
to the RSS
library. The RSS library has recently been made an official part of the
standard libs, which makes a lot of this exercise much easier.
Okay, now what does this look like?
This is even better still, but that’s an awful lot of headlines. How
about just the most recent ones? How about the last 10?
What does it look like now?
Now we’ve got it down to the freshest 10, but each item is still filling
up a lot of space. One way to cut down the length of each line is to
split each headline into multiple lines. Let’s start by cutting the
category and title into two separate lines:
It’s a lot easier for me to read the output now.
Occasionally I saw HTML entities in the output. ", stuff like
that. Let’s fix that problem before we move on to anything else.
It isn’t an issue this time, but I feel better.
Wherever possible, I’m using standard library tools to get my work done.
I’m too lazy to remember escaping every possible HTML entity, and I
would rather spend a few minutes searching through the Standard Library
documentation to find what I need.
It’s a good habit, and you might want to try it yourself.
Maybe I only care about particular types of headline. Say, I want to be
interested, but not amused.
And it does indeed show me only “interesting” headlines.
That’s pretty nifty, except that it only looks for Interesting items out
of the last 10 headlines, rather than looking for the last 10
Interesting headlines.
Is this any better?
Well, we can only look at today’s headlines. I guess we can’t be sure of
ten interesting things happening every day. Still, at least I know I’m
getting all of the interesting headlines that are available, up to my
limit.
Next problem: the only way I can fetch different headline types is to
manually dig in to the source code and change the category.
Let’s test it by requesting “PSA” headlines.
That works. Pretty nicely, I might add.
OptParse is a
great library for handling command-line arguments.
This program now does everything that I set out to do, and then some. I
might choose to do a some refactoring to “bulletproof” the code, or wrap
it up in some OO niceness to make it pretty. The truth is that this
application is exactly what it needs to be for now, and I think that I
shouldn’t overwork something that I may never come back to. Maybe later
I’ll come back to it when I think of new features or find new bugs, and
then I can overwork it to my heart’s content.
I hope you enjoyed working along with me as much as I enjoyed sitting
here and typing random nonsense to myself.
What Else?
I may be done with this exercise for now, but here are a few ideas about
features that can be added to make it a little cooler. Go ahead and try
them out!
Add word wrap to make the output a little more readable.
Add a parameter to change the number of headlines grabbed.
Modify so that this script will work with other newsfeeds.
Modify so that the functionality of this script can be embedded in
other Ruby programs.
Revision History
12 March 2009: Ran under Ruby 1.9, changed parsing to reflect Fark RSS changes
3 January 2007: Major rewrite to incorporate RSS library and changes at Fark
19 September 2004: Changed the network library used from net/http
to open-uri in the refactoring stage. This is from a suggestion
that was made by Gavin Sinclair, Frederick Ros, and others. It’s a
good suggestion, and I’m not going to ignore a good suggestion!