Commit Graph

23 Commits

Author SHA1 Message Date
Haiyan Meng
5cf0d887b1 Avoid exposing ElasticSearch DB over public IPs 2020-06-09 13:03:37 -07:00
Michael Cook
aa46b6ec44 fix misspellings 2020-03-18 14:36:12 +01:00
Haiyan Meng
b7b7a5a79f Fix typo 2020-02-10 15:44:51 -08:00
Haiyan Meng
807ca9c1e3 Add notes on backup and restore 2020-02-10 08:30:08 -08:00
Haiyan Meng
baccf58ccf Avoid tracking the change in github_api_secret.txt
This helps prevent commiting your Github personal access token into
Github by accident.
2020-02-05 12:06:21 -08:00
Haiyan Meng
d0602c732b Remove the usage of github access token from the kustomize-stats job 2020-02-05 11:04:59 -08:00
Haiyan Meng
c9bce3fc0a Add comments on backup and restore 2020-02-05 11:04:59 -08:00
Haiyan Meng
3ebeebabde Add comments for backup and restore 2020-02-03 12:37:18 -08:00
Haiyan Meng
0fcb3a014c Add config for index backup and restore 2020-02-03 09:59:52 -08:00
Haiyan Meng
bb409a5ea8 Set up cronjob to run crawler every 7 days 2020-02-03 09:59:52 -08:00
Haiyan Meng
74e1b5d54b Add GCP service account into ESCluster config
This is necessary for index backup into GCS and index recovery from GCS
2020-02-03 09:59:52 -08:00
Haiyan Meng
3ead42fe27 Add --index flag to kustomize_stats config file 2020-01-15 15:29:16 -08:00
Haiyan Meng
29e50ab476 Collect stats on generators and transformers 2020-01-15 12:10:08 -08:00
Haiyan Meng
af131c7471 Use flags to specify crawling mode and github user/repo info 2020-01-14 15:36:12 -08:00
Haiyan Meng
bb09f82f3c Remove kustomize-index-name setting 2020-01-14 13:53:16 -08:00
Haiyan Meng
594a3bf0d2 Add a binary for generating the stats of the index
1) how many kinds of objects are being customized?
2) how many times is every kind of object customized?
3) how many kustomization features are being used?
4) how many times is every kustomization feature used?
2020-01-07 15:10:25 -08:00
Haiyan Meng
be2e03681d Remove unused param from IndexFunc 2019-12-18 15:56:44 -08:00
Haiyan Meng
127541f610 Support diffrent modes of running the crawler 2019-12-18 15:56:44 -08:00
Haiyan Meng
afd24c6faf Expose ElasticSearch as a LoadBalancer-type service 2019-12-11 15:05:10 -08:00
Haiyan Meng
bffc0d7071 Mulitple improvements of the crawler
1) Set document IDs to avoid duplicating documents;
2) Set the `creationTime` field of each document in the index;
3) set the `values`, `kinds` and `identifiers` fields for all documents;
4) Add a `Copy` method into the `Document` struct: this fixes the issue
where all the documents existing in the index point to the same Document
object;
5) Avoid using keystore redis;
6) Set imagePullPolicy to `Always` for crawler jobs.
2019-12-11 11:10:48 -08:00
Haiyan Meng
9bba761a14 Add config for creating an ElasticSearch Cluster 2019-11-26 19:38:17 -08:00
Haiyan Meng
84b75afae4 Make the crawler work
1) add the crawler binary and fix the crawler library
2) remove the readiness probe in the search backend
3) add config for redis keystore
4) add github_api_secret.txt file with instructions
2019-11-26 09:50:51 -08:00
Haiyan Meng
f69d2d2e69 Move hack/crawl under api/internal 2019-11-14 13:17:28 -08:00