- Increase logging signal to noise ratio.
- Allow to specify the `http.Client` for github requests. (This allows
the use of caching http.Clients).
- Clean up implementation.
Binary searches through different ranges of file sizes to create search
queries with fewer than 1000 results. This is required since github will
only return the first 1000 result of any search query.
The implementation handles the case where some files may be deleted
while the search is running, and (possibly artificially) assures that the
number of files increases monotonically as the filesize range grows.
The implementation also caches queries and is expected to make fewer
than O((#files/1000) * lg(max file size)) API calls to retrieve the
range queries that can be used to index all of the files.
In practice running the search splitting takes a few minutes, while
retrieving all of the data takes a few hours.
Currently I've left the search splitting by file size out of this commit
since it's ~200 lines of logic, and I think it's best to get it reviewed
separately.
In it's current state the crawler would only be able to get the last
1000 indexed files by Github, but it does show the general structure of
how the crawler is implemented.
To make this easier to read, use, and modify, I've abstracted the
important parts of the github query api into crawler/github/query.go
which allows to describe at a high level what is to be searched without
knowing the API syntax.
`crawler.Crawler` interface is defined, where a crawler has to implement
a `Crawl` method that forwards document found by the crawler to a channel.
A helper function that launches a list of crawlers concurrently and
merges their channels into one main output channel, forwarding errors is
also implemented.
Finally, a test that verifies the correctness and concurrency of the
helper method is provided.
[doc]: https://github.com/golang/go/wiki/Modules#releasing-modules-v2-or-higher
Per this Go modules [doc] a repo or branch that's
already tagged v2 or higher should increment the major
version (e.g. go to v3) when releasing their first Go
module-based packages.
At the moment, the kustomize repo has these top level
packages in the sigs.k8s.io/kustomize module:
- `cmd` - holds main program for kustomize
Conceivably someone can depend on this
package for integration tests.
- `internal` - intentionally unreleased subpackages
- `k8sdeps` - an adapter wrapping k8s dependencies
This exists only for use in pre-Go-modules kustomize-into-kubectl
integration and won't live much longer (as everything involved is
switching to Go modules).
- `pkg` - kustomize packages for export
This should shrink in later versions, since
the surface area is too large, containing
sub-packages that should be in 'internal'.
- `plugin` - holds main programs for plugins
This PR changes the top level go.mod file from
```
module sigs.k8s.io/kustomize
```
to
```
module sigs.k8s.io/kustomize/v3
```
and adjusts all import statements to
reflect the change.