Collecting my attempts to improve at tech, art, and life

Summarizing A File With Crystal

Tags: files programming

Does any of this enlighten, entertain, or otherwise please you? Please consider a Tip. Every little bit helps!

cover-summarizing-a-file-with-crystal

Okay, I don’t have a lot of time here. We’re on a tight schedule. But hey tests are running so I’ll write a tiny bit of Crystal.

How would I print a quick summary of a file? Besides ls, of course. I mean how would I print a quick summary of a file using Crystal?

Code Sample
    filename = "#{ENV["HOME"]}/Dropbox/Camera Uploads/2019-11-13 08.11.12.png"
puts `ls -l #{filename}`
  
Code Sample
    -rw-r--r-- 1 randomgeek randomgeek 3346960 Nov 13 08:11 /home/randomgeek/Dropbox/Camera Uploads/2019-11-13 08.11.12.png
  

2019-08-25 Trying the Crystal Language looked at Crystal as a glue language. Today I’m wondering more about how I would get this information using Crystal’s standard library.

Turns out I can get the same information with File::Info.

Code Sample
    puts File.info "#{ENV["HOME"]}/Dropbox/Camera Uploads/2019-11-13 08.11.12.png"
  
Code Sample
    Crystal::System::FileInfo(@stat=LibC::Stat(@st_dev=2051, @st_ino=6983901, \
  @st_nlink=1, @st_mode=33188, @st_uid=1000, @st_gid=1000, @__pad0=0,     \
  @st_rdev=0, @st_size=3346960, @st_blksize=4096, @st_blocks=6552,        \
  @st_atim=LibC::Timespec(@tv_sec=1573661608, @tv_nsec=641856438),        \
  @st_mtim=LibC::Timespec(@tv_sec=1573661472, @tv_nsec=0),                \
  @st_ctim=LibC::Timespec(@tv_sec=1573661609, @tv_nsec=941857986),        \
  @__glibc_reserved=StaticArray[0, 0, 0]))
  

This is both more and less information than I was hoping for. Clearly whoever wrote to_s for File::Info figured the main time you would need to directly print the object is when you were debugging.

That makes sense, and they provide methods to get at the information I care about most.

Code Sample
    # Returns a multiline string summary of a single file
def describe_file(filename)
    info = File.info(filename)

    size = ->(bytes : UInt64) {
      scales = { {1024**3, "GB"}, {1024**2, "MB"}, {1024, "KB"} }
      scale = scales.find { |i| bytes > i[0] }

      scale.nil? ? "#{bytes} bytes" : "%.2f %s" % [bytes / scale[0], scale[1]]
    }.call(info.size)

    String.build do |str|
      str << "Filename: #{filename}\n"
      str << "Size:     #{size}\n"
      str << "Modified: #{info.modification_time}\n"
    end
end

filename = "#{ENV["HOME"]}/Dropbox/Camera Uploads/2019-11-13 08.11.12.png"
puts describe_file filename
  
Code Sample
    Filename: /home/randomgeek/Dropbox/Camera Uploads/2019-11-13 08.11.12.png
Size:     3.19 MB
Modified: 2019-11-13 16:11:12 UTC
  

I grabbed the logic from 2019-06-01 Weighing Files With Python to get a description of the size in kilobytes, megabytes, or gigabytes. That is easier for my brain to understand than the UInt64 integer byte count provided by File::Info.size.

Yes, the whole thing is more clever than the situation requires, but I am trying to learn the language here. Using a Proc was one way to basically copy and paste the logic from my earlier post and reformat for Crystal. Sure, I could have — and probably should have — defined a new, separate method. At the same time, Procs are great to show that there’s this bit of behavior you want to encapsulate, but you don’t plan to use anywhere else.

But really it was just a bit of late night silliness so I could see Crystal Procs in action. Silliness for the sake of learning is okay.

And what did I learn?

  • File::Info gives me what I want for file summaries.
  • Crystal supports Tuples: special immutable lists that can be more efficient than a full Array
  • String.build is a nice-looking way to make multiline strings without heredocs or +=. Apparently there are performance reasons to use it too, but I’ll never see them in this short program. Same with Tuples really, but the type you specify can tell people what your intentions are.
  • Proc argument types must be specified. That must mean the compiler treats them differently than normal methods.

Hang on. I’m curious to explore that last one. Procs are treated differently. Are they faster?

Code Sample
    require "benchmark"

filename = "#{ENV["HOME"]}/Dropbox/Camera Uploads/2019-11-13 08.11.12.png"
bytes = File.info(filename).size

def describe_size(bytes)
  scales = { {1024**3, "GB"}, {1024**2, "MB"}, {1024, "KB"} }
  scale = scales.find { |i| bytes > i[0] }

  scale.nil? ? "#{bytes} bytes" : "%.2f %s" % [bytes / scale[0], scale[1]]
end

size_proc = ->(bytes : UInt64) {
  scales = { {1024**3, "GB"}, {1024**2, "MB"}, {1024, "KB"} }
  scale = scales.find { |i| bytes > i[0] }

  scale.nil? ? "#{bytes} bytes" : "%.2f %s" % [bytes / scale[0], scale[1]]
}


Benchmark.ips do |benchmark|
  benchmark.report("using method") do
    size = describe_size(bytes)
  end

  benchmark.report("using proc") do
    size = size_proc.call(bytes)
  end
end
  
Code Sample
    $ crystal run --release proc_vs_def.cr
using method   2.20M (455.45ns) (± 6.08%)  352B/op        fastest
  using proc   2.18M (458.85ns) (± 5.46%)  352B/op   1.01× slower
  

The method is almost three whole nanoseconds faster than the Proc. I wonder…

Code Sample
    $ crystal run --release proc_vs_def.cr
using method   2.15M (465.37ns) (± 5.93%)  352B/op   1.01× slower
  using proc   2.16M (462.10ns) (± 6.04%)  352B/op        fastest

  

Yeah, that’s what I thought. For this case at least, local environment variations — did Spotify just hit a new track? — will have a bigger impact than whether I choose a Proc or a method.

Okay, tests are done. Everything passed, yay! Back to it. Maybe back to the drawing, actually.