Haiyan Meng
145ba0c7ff
Avoid reprocess queries whose range size is 0
2020-06-23 11:36:29 -07:00
Haiyan Meng
f5419e9f72
Check the incomplete_results field of github query responses
...
Currently, we don't check the `incomplete_results` field of a github
query response, which is problematic when incomplete query results are
used to split the query ranges: the splitted query ranges will
be very wild.
2020-02-03 09:59:52 -08:00
Haiyan Meng
7a87c84403
Reprocess the github filesize search ranges which have more than 1000 items
2020-02-03 09:59:52 -08:00
Jeff Regan
0ce076758d
Merge pull request #2150 from haiyanmeng/stats
...
Add `fileType` and `User` into the index
2020-01-28 09:18:31 -08:00
HowJMay
00f68c12a8
fix typos
...
Fix typos
2020-01-23 23:35:38 +08:00
Haiyan Meng
0820865e1d
Retry FindRangesForRepoSearch
2020-01-22 10:13:57 -08:00
Haiyan Meng
1120c6bc7a
Add a User field into Document to make it easy to aggregate on github
...
user level.
2020-01-21 10:09:52 -08:00
Haiyan Meng
f4636f8555
Add a fileType field into the index
2020-01-17 13:15:49 -08:00
Haiyan Meng
cf8d53a195
Move SeenMap to the utils dir
2020-01-15 15:29:16 -08:00
Haiyan Meng
2e895c147e
Use log.Print* instead of fmt.Print*
2020-01-14 15:50:35 -08:00
Haiyan Meng
72eda992bd
make seen a non-primitive type
2020-01-14 12:14:00 -08:00
Haiyan Meng
230e0ca752
Add two methods to type RangeQueryResult: Add and String
2020-01-14 12:14:00 -08:00
Haiyan Meng
14eb524b9e
Add a command for searching for kustomize resource files
2020-01-14 12:14:00 -08:00
Haiyan Meng
81d62f90bf
Improve the efficency of crawling github
...
Make sure a github file is crawled once
2020-01-14 12:14:00 -08:00
Haiyan Meng
b154af8be4
Check the error of closing response body
2020-01-08 10:32:12 -08:00
Haiyan Meng
ccd129f7a5
Check empty http response before accessing it
2020-01-08 10:24:00 -08:00
Haiyan Meng
142c105500
SKip the empty resource/base item in a kustomization file and set the
...
defaultBranch if needed
2020-01-06 12:06:18 -08:00
Haiyan Meng
ee659a70e4
Fix how to construct URLs for finding all the commits related to a
...
github file
The existing logic sets the creation time of a github file to the time
when the github repository was created.
The fix sets the creation time of a github file to the time when the
file was created.
2020-01-06 12:06:18 -08:00
Haiyan Meng
2c2aa928cc
Delete non-existing documents from the index
2019-12-18 15:56:44 -08:00
Haiyan Meng
8c89f0946c
Avoid to index a document if FetchDcoument or SetCreated fails
2019-12-18 15:56:44 -08:00
Haiyan Meng
e44d1298df
Return errors if http Client.Do resp status code is not 2xx
2019-12-18 15:56:44 -08:00
Haiyan Meng
a9244f759e
Add supports for crawling a specific git user or repo
2019-12-13 11:18:33 -08:00
Haiyan Meng
d9239104aa
Escape spaces in the query paths of git commit requests
2019-12-12 10:03:15 -08:00
Haiyan Meng
0d79219e46
Avoid processing the nil pointer returned by kustomizationResultAdapter
...
Currently, the crawler job panics whenever a nil pointer is returned by
kustomizationResultAdapter.
2019-12-11 13:54:01 -08:00
Haiyan Meng
bffc0d7071
Mulitple improvements of the crawler
...
1) Set document IDs to avoid duplicating documents;
2) Set the `creationTime` field of each document in the index;
3) set the `values`, `kinds` and `identifiers` fields for all documents;
4) Add a `Copy` method into the `Document` struct: this fixes the issue
where all the documents existing in the index point to the same Document
object;
5) Avoid using keystore redis;
6) Set imagePullPolicy to `Always` for crawler jobs.
2019-12-11 11:10:48 -08:00
Jeffrey Regan
e9ab3da164
Fix some nits in the crawler and elsewhere.
2019-12-03 10:44:44 -08:00
Haiyan Meng
84b75afae4
Make the crawler work
...
1) add the crawler binary and fix the crawler library
2) remove the readiness probe in the search backend
3) add config for redis keystore
4) add github_api_secret.txt file with instructions
2019-11-26 09:50:51 -08:00
Haiyan Meng
9255c991f4
Replace the sigs.k8s.io/kustomize/hack/crawl/* import path with
...
`sigs.k8s.io/kustomize/api/internal/crawl/*`
2019-11-14 13:38:18 -08:00
Haiyan Meng
d08140d3f7
Remove api/internal/hack/crawl/crawler/git dir, use api/internal/git
...
instead.
2019-11-14 13:35:00 -08:00
Haiyan Meng
f69d2d2e69
Move hack/crawl under api/internal
2019-11-14 13:17:28 -08:00