Our search journey continues. We have accomplished the hard part: checking a single star to see if it has the traits we’re looking for. Today we just have to use that logic to search a set of stars. First we’ll examine a handpicked selection. Guess what happens after that? We finally get back into the full HYG Catalog and search for stars from the command line. That’s right. After all this work, stellar
grows up and becomes an application.
Note
There are easier ways to get searches out of a large CSV file. If that was really all I wanted to do, I could use a higher level language like Perl or Python to feed the CSV into a SQLite database and directly query the database. However, we are not building a SQL database. We are learning how to do interesting things with Parrot.
Building a Catalog and Searching It
The first thing that’s tripping me up is how to set up the catalog itself. You
know the “set of stars” I was talking about? The easy way to do this from a test
is to have a few CSV strings for some sample stars, apply extract_from_csv
to
each of them, push each star into an array, then search through the array. Thing
is, I know that this is not going to be acceptable when I get to the real data.
I expect this application to be one where you run it from the command line,
using your search conditions as command line arguments. Loading all the data
before searching it takes time. I should write this code so that it searches
while reading in data. That would be much faster.
On the other hand, what if I add an interactive prompt to this application later? Loading the full catalog into memory before applying searches could be faster in the long run compared to reading the data file for every search.
That is trying to predict the future, though. I know how I want to use this catalog today. I want to run a search and see the results as soon as the application knows about them.
Searching The Catalog
I do not want to dig right into searching the full 119,617 entries of the real catalog. Instead, let’s set up a small test catalog and write some tests.
Where you put your test data is a matter of taste. I will be keeping my data in
a folder named data
. That seems reasonable.
Only a few entries are needed in the test catalog. We just need to be sure that the code works with a CSV file with the same structure as the HYG database. I’ll grab Sol, another G2V spectrum star, and a K3V star.
The test data is out of the way, so now I feel comfortable writing the tests that use it.
I am deliberately keeping the tests simple right now. The goal is to make sure the basic functionality works rather than to guarantee behavior for every little detail. Tests can be added for those details as they become important.
The actual search_catalog
sub borrows quite a bit from [step 07][].
search_catalog
will handle the task of reading the file and looking for
stars that match the search conditions it has been given. After it defines
a star from the current line, it asks check_star
to compare that star
to the set of conditions it has been given. It remembers the stars that
match, and returns them once it has reached the end of the file. It is not
the fastest approach, but it works.
It works well enough that I am ready to add real data and some way for people to use it!
Searching From The Command Line
Now that we know stellar
can read a CSV and return results, it’s time to work
on that empty main
that has been sitting in stellar.pir
. Oh yeah - we will
want to make hygxyz.csv
available now. I will be pushing my copy into the
data
folder, next to test-catalog.csv
. You can place your copy wherever you
like, but make sure that you set the path appropriately in main
.
Here is the result of all that work we have done setting up the project and
support code. The main subroutine in stellar
is downright civilized
compared to what we had for [step 07][]. All we do is search based on the
command line parameters and display each of the matches.
Hey, this thing is almost useful!
Conclusion
stellar
has reached a major milestone. When I started fiddling with the HYG
Database, I wanted to write a command-line Parrot tool that could look up stars based on specific fields. This step gives us that ability. I admit that a lot more could be done. For example, it only does exact matches. You can easily find a star that is 108.108108108108
light years away, but not stars that are roughly 108
light years away. And forget about finding stars within 20 light years.
I am going to take a little break from the stellar
project, though. Rakudo Star is almost out, and I want to play with that.
You can add to stellar
yourself. Make it faster. Make it object-oriented. Make it a library. Rewrite it in LOLCODE. Have fun. Just remember to give David Nash credit for creating the HYG Database. We have been having all of this fun because he took the time to put that catalog together.
Enjoy yourself!
Added to vault 2024-01-15. Updated on 2024-06-24