Commit Graph

214 Commits

Author SHA1 Message Date
Haiyan Meng
3ebeebabde Add comments for backup and restore 2020-02-03 12:37:18 -08:00
Haiyan Meng
a3b3449b1f Add curl commands for generator/transformer exploration 2020-02-03 09:59:52 -08:00
Haiyan Meng
1b8488da2c Add curl commands for snapshoting 2020-02-03 09:59:52 -08:00
Haiyan Meng
f5419e9f72 Check the incomplete_results field of github query responses
Currently, we don't check the `incomplete_results` field of a github
query response, which is problematic when incomplete query results are
used to split the query ranges: the splitted query ranges will
be very wild.
2020-02-03 09:59:52 -08:00
Haiyan Meng
7a87c84403 Reprocess the github filesize search ranges which have more than 1000 items 2020-02-03 09:59:52 -08:00
Haiyan Meng
0fcb3a014c Add config for index backup and restore 2020-02-03 09:59:52 -08:00
Haiyan Meng
0b38e6d284 Improve the analysis on generator and transformer 2020-02-03 09:59:52 -08:00
Haiyan Meng
d5c66cb3d4 Add KustomizationDocument.Copy method 2020-02-03 09:59:52 -08:00
Haiyan Meng
b35b5aa73d Check the checksums of documents in the index 2020-02-03 09:59:52 -08:00
Haiyan Meng
bb409a5ea8 Set up cronjob to run crawler every 7 days 2020-02-03 09:59:52 -08:00
Haiyan Meng
74e1b5d54b Add GCP service account into ESCluster config
This is necessary for index backup into GCS and index recovery from GCS
2020-02-03 09:59:52 -08:00
Jeff Regan
0ce076758d Merge pull request #2150 from haiyanmeng/stats
Add `fileType` and `User` into the index
2020-01-28 09:18:31 -08:00
Haiyan Meng
154208d331 Improve the efficiency of crawling github by skipping the documents
already in the index
2020-01-24 19:55:56 -08:00
Haiyan Meng
b7b88cae76 Add curl commands for querying different filetypes 2020-01-23 16:04:55 -08:00
HowJMay
00f68c12a8 fix typos
Fix typos
2020-01-23 23:35:38 +08:00
Haiyan Meng
0820865e1d Retry FindRangesForRepoSearch 2020-01-22 10:13:57 -08:00
Haiyan Meng
1120c6bc7a Add a User field into Document to make it easy to aggregate on github
user level.
2020-01-21 10:09:52 -08:00
Phani Teja Marupaka
0bd872e6d5 Do not remove empty lines in configmap/secret 2020-01-20 11:42:39 -08:00
Haiyan Meng
96ee9e9146 Add curl ElasticSearch cmd for using filter and range together 2020-01-17 15:49:14 -08:00
Haiyan Meng
377eb5b66d Fix the regexp for determining kustomization file 2020-01-17 15:48:38 -08:00
Haiyan Meng
f4636f8555 Add a fileType field into the index 2020-01-17 13:15:49 -08:00
Haiyan Meng
9f80da28ae Refactor the stats code for generators and transformers 2020-01-16 09:20:24 -08:00
Haiyan Meng
5477bde7e5 Use an env variable for index name and fix the call to NewKustomizeIndex in backend 2020-01-15 15:29:17 -08:00
Haiyan Meng
3ead42fe27 Add --index flag to kustomize_stats config file 2020-01-15 15:29:16 -08:00
Haiyan Meng
cf8d53a195 Move SeenMap to the utils dir 2020-01-15 15:29:16 -08:00
Haiyan Meng
aaaba99389 Use Document.Path instead of its fields 2020-01-15 12:10:08 -08:00
Haiyan Meng
29e50ab476 Collect stats on generators and transformers 2020-01-15 12:10:08 -08:00
Haiyan Meng
3519cc56a1 Add support to get files referred in the generators and tranformers
fields
2020-01-15 12:10:08 -08:00
Haiyan Meng
2e895c147e Use log.Print* instead of fmt.Print* 2020-01-14 15:50:35 -08:00
Haiyan Meng
af131c7471 Use flags to specify crawling mode and github user/repo info 2020-01-14 15:36:12 -08:00
Haiyan Meng
7ac573ae51 Add a flag to specify the index name 2020-01-14 14:25:29 -08:00
Haiyan Meng
bb09f82f3c Remove kustomize-index-name setting 2020-01-14 13:53:16 -08:00
Haiyan Meng
72eda992bd make seen a non-primitive type 2020-01-14 12:14:00 -08:00
Haiyan Meng
230e0ca752 Add two methods to type RangeQueryResult: Add and String 2020-01-14 12:14:00 -08:00
Haiyan Meng
14eb524b9e Add a command for searching for kustomize resource files 2020-01-14 12:14:00 -08:00
Haiyan Meng
81d62f90bf Improve the efficency of crawling github
Make sure a github file is crawled once
2020-01-14 12:14:00 -08:00
Kubernetes Prow Robot
1a330f89d9 Merge pull request #2080 from yujunz/git-cloner
Simplify git cloner logic
2020-01-13 15:23:11 -08:00
Haiyan Meng
569fafba81 Add the Document ID pointing to a kuostomization root into cache to
avoid crawl it repeatedly
2020-01-11 15:32:25 -08:00
Yujun Zhang
ae458d0c80 Simplify git cloner logic
Related to #2072
2020-01-11 20:40:55 +08:00
Haiyan Meng
c801958d40 Log response status code to help debug
Recently, the crawler job often fails after 10+ hours with the following
error (10.0.47.27:9200 is the ElasticSearch master):
dial tcp 10.0.47.27:9200: connect: connection refused
2020-01-10 11:37:22 -08:00
Haiyan Meng
f9a4d5a14e Track the crawling process 2020-01-10 11:10:38 -08:00
Jeff Regan
9555095de9 Merge pull request #2016 from haiyanmeng/stats
Add a binary for generating the stats of the index
2020-01-09 13:11:50 -08:00
Jeff Regan
a46046dac5 Merge pull request #2051 from haiyanmeng/nil
Two fixes of the crawler
2020-01-08 18:39:26 -08:00
Jeff Regan
6186e4edb7 Merge pull request #2017 from haiyanmeng/search
Add ElasticSearch query examples
2020-01-08 11:19:32 -08:00
Haiyan Meng
b154af8be4 Check the error of closing response body 2020-01-08 10:32:12 -08:00
Haiyan Meng
ccd129f7a5 Check empty http response before accessing it 2020-01-08 10:24:00 -08:00
Haiyan Meng
e2b56910f9 Add ElasticSearch query examples 2020-01-08 09:23:19 -08:00
Jeff Regan
32c280664d Merge pull request #2025 from phanimarupaka/ConfigMapSpacesAndTabs
Trim trailing spaces and tabs from config map files
2020-01-07 15:53:31 -08:00
Haiyan Meng
594a3bf0d2 Add a binary for generating the stats of the index
1) how many kinds of objects are being customized?
2) how many times is every kind of object customized?
3) how many kustomization features are being used?
4) how many times is every kustomization feature used?
2020-01-07 15:10:25 -08:00
Jeff Regan
7190ea2688 Merge pull request #2038 from haiyanmeng/log-parser
Add a binary to parse GKE log
2020-01-07 14:57:40 -08:00