Mulitple improvements of the crawler

1) Set document IDs to avoid duplicating documents;
2) Set the `creationTime` field of each document in the index;
3) set the `values`, `kinds` and `identifiers` fields for all documents;
4) Add a `Copy` method into the `Document` struct: this fixes the issue
where all the documents existing in the index point to the same Document
object;
5) Avoid using keystore redis;
6) Set imagePullPolicy to `Always` for crawler jobs.
This commit is contained in:
Haiyan Meng
2019-12-05 09:51:22 -08:00
parent 54b1549586
commit bffc0d7071
13 changed files with 125 additions and 36 deletions

View File

@@ -68,7 +68,7 @@ func TestQueryType(t *testing.T) {
func TestGithubSearchQuery(t *testing.T) {
const (
perPage = 100
perPage = 100
)
testCases := []struct {
@@ -82,7 +82,7 @@ func TestGithubSearchQuery(t *testing.T) {
}{
{
rc: RequestConfig{
perPage: perPage,
perPage: perPage,
},
codeQuery: Query{
Filename("kustomization.yaml"),