Commit Graph

10 Commits

Author SHA1 Message Date
Haiyan Meng
145ba0c7ff Avoid reprocess queries whose range size is 0 2020-06-23 11:36:29 -07:00
HowJMay
00f68c12a8 fix typos
Fix typos
2020-01-23 23:35:38 +08:00
Haiyan Meng
cf8d53a195 Move SeenMap to the utils dir 2020-01-15 15:29:16 -08:00
Haiyan Meng
ee659a70e4 Fix how to construct URLs for finding all the commits related to a
github file

The existing logic sets the creation time of a github file to the time
when the github repository was created.
The fix sets the creation time of a github file to the time when the
file was created.
2020-01-06 12:06:18 -08:00
Haiyan Meng
a9244f759e Add supports for crawling a specific git user or repo 2019-12-13 11:18:33 -08:00
Haiyan Meng
d9239104aa Escape spaces in the query paths of git commit requests 2019-12-12 10:03:15 -08:00
Haiyan Meng
bffc0d7071 Mulitple improvements of the crawler
1) Set document IDs to avoid duplicating documents;
2) Set the `creationTime` field of each document in the index;
3) set the `values`, `kinds` and `identifiers` fields for all documents;
4) Add a `Copy` method into the `Document` struct: this fixes the issue
where all the documents existing in the index point to the same Document
object;
5) Avoid using keystore redis;
6) Set imagePullPolicy to `Always` for crawler jobs.
2019-12-11 11:10:48 -08:00
Jeffrey Regan
e9ab3da164 Fix some nits in the crawler and elsewhere. 2019-12-03 10:44:44 -08:00
Haiyan Meng
84b75afae4 Make the crawler work
1) add the crawler binary and fix the crawler library
2) remove the readiness probe in the search backend
3) add config for redis keystore
4) add github_api_secret.txt file with instructions
2019-11-26 09:50:51 -08:00
Haiyan Meng
f69d2d2e69 Move hack/crawl under api/internal 2019-11-14 13:17:28 -08:00