Currently I've left the search splitting by file size out of this commit
since it's ~200 lines of logic, and I think it's best to get it reviewed
separately.
In it's current state the crawler would only be able to get the last
1000 indexed files by Github, but it does show the general structure of
how the crawler is implemented.