Haiyan Meng
3ebeebabde
Add comments for backup and restore
2020-02-03 12:37:18 -08:00
Haiyan Meng
a3b3449b1f
Add curl commands for generator/transformer exploration
2020-02-03 09:59:52 -08:00
Haiyan Meng
1b8488da2c
Add curl commands for snapshoting
2020-02-03 09:59:52 -08:00
Haiyan Meng
f5419e9f72
Check the incomplete_results field of github query responses
...
Currently, we don't check the `incomplete_results` field of a github
query response, which is problematic when incomplete query results are
used to split the query ranges: the splitted query ranges will
be very wild.
2020-02-03 09:59:52 -08:00
Haiyan Meng
7a87c84403
Reprocess the github filesize search ranges which have more than 1000 items
2020-02-03 09:59:52 -08:00
Haiyan Meng
0fcb3a014c
Add config for index backup and restore
2020-02-03 09:59:52 -08:00
Haiyan Meng
0b38e6d284
Improve the analysis on generator and transformer
2020-02-03 09:59:52 -08:00
Haiyan Meng
d5c66cb3d4
Add KustomizationDocument.Copy method
2020-02-03 09:59:52 -08:00
Haiyan Meng
b35b5aa73d
Check the checksums of documents in the index
2020-02-03 09:59:52 -08:00
Haiyan Meng
bb409a5ea8
Set up cronjob to run crawler every 7 days
2020-02-03 09:59:52 -08:00
Haiyan Meng
74e1b5d54b
Add GCP service account into ESCluster config
...
This is necessary for index backup into GCS and index recovery from GCS
2020-02-03 09:59:52 -08:00
Jeff Regan
0ce076758d
Merge pull request #2150 from haiyanmeng/stats
...
Add `fileType` and `User` into the index
2020-01-28 09:18:31 -08:00
Haiyan Meng
154208d331
Improve the efficiency of crawling github by skipping the documents
...
already in the index
2020-01-24 19:55:56 -08:00
Haiyan Meng
b7b88cae76
Add curl commands for querying different filetypes
2020-01-23 16:04:55 -08:00
HowJMay
00f68c12a8
fix typos
...
Fix typos
2020-01-23 23:35:38 +08:00
Haiyan Meng
0820865e1d
Retry FindRangesForRepoSearch
2020-01-22 10:13:57 -08:00
Haiyan Meng
1120c6bc7a
Add a User field into Document to make it easy to aggregate on github
...
user level.
2020-01-21 10:09:52 -08:00
Phani Teja Marupaka
0bd872e6d5
Do not remove empty lines in configmap/secret
2020-01-20 11:42:39 -08:00
Haiyan Meng
96ee9e9146
Add curl ElasticSearch cmd for using filter and range together
2020-01-17 15:49:14 -08:00
Haiyan Meng
377eb5b66d
Fix the regexp for determining kustomization file
2020-01-17 15:48:38 -08:00
Haiyan Meng
f4636f8555
Add a fileType field into the index
2020-01-17 13:15:49 -08:00
Haiyan Meng
9f80da28ae
Refactor the stats code for generators and transformers
2020-01-16 09:20:24 -08:00
Haiyan Meng
5477bde7e5
Use an env variable for index name and fix the call to NewKustomizeIndex in backend
2020-01-15 15:29:17 -08:00
Haiyan Meng
3ead42fe27
Add --index flag to kustomize_stats config file
2020-01-15 15:29:16 -08:00
Haiyan Meng
cf8d53a195
Move SeenMap to the utils dir
2020-01-15 15:29:16 -08:00
Haiyan Meng
aaaba99389
Use Document.Path instead of its fields
2020-01-15 12:10:08 -08:00
Haiyan Meng
29e50ab476
Collect stats on generators and transformers
2020-01-15 12:10:08 -08:00
Haiyan Meng
3519cc56a1
Add support to get files referred in the generators and tranformers
...
fields
2020-01-15 12:10:08 -08:00
Haiyan Meng
2e895c147e
Use log.Print* instead of fmt.Print*
2020-01-14 15:50:35 -08:00
Haiyan Meng
af131c7471
Use flags to specify crawling mode and github user/repo info
2020-01-14 15:36:12 -08:00
Haiyan Meng
7ac573ae51
Add a flag to specify the index name
2020-01-14 14:25:29 -08:00
Haiyan Meng
bb09f82f3c
Remove kustomize-index-name setting
2020-01-14 13:53:16 -08:00
Haiyan Meng
72eda992bd
make seen a non-primitive type
2020-01-14 12:14:00 -08:00
Haiyan Meng
230e0ca752
Add two methods to type RangeQueryResult: Add and String
2020-01-14 12:14:00 -08:00
Haiyan Meng
14eb524b9e
Add a command for searching for kustomize resource files
2020-01-14 12:14:00 -08:00
Haiyan Meng
81d62f90bf
Improve the efficency of crawling github
...
Make sure a github file is crawled once
2020-01-14 12:14:00 -08:00
Kubernetes Prow Robot
1a330f89d9
Merge pull request #2080 from yujunz/git-cloner
...
Simplify git cloner logic
2020-01-13 15:23:11 -08:00
Haiyan Meng
569fafba81
Add the Document ID pointing to a kuostomization root into cache to
...
avoid crawl it repeatedly
2020-01-11 15:32:25 -08:00
Yujun Zhang
ae458d0c80
Simplify git cloner logic
...
Related to #2072
2020-01-11 20:40:55 +08:00
Haiyan Meng
c801958d40
Log response status code to help debug
...
Recently, the crawler job often fails after 10+ hours with the following
error (10.0.47.27:9200 is the ElasticSearch master):
dial tcp 10.0.47.27:9200: connect: connection refused
2020-01-10 11:37:22 -08:00
Haiyan Meng
f9a4d5a14e
Track the crawling process
2020-01-10 11:10:38 -08:00
Jeff Regan
9555095de9
Merge pull request #2016 from haiyanmeng/stats
...
Add a binary for generating the stats of the index
2020-01-09 13:11:50 -08:00
Jeff Regan
a46046dac5
Merge pull request #2051 from haiyanmeng/nil
...
Two fixes of the crawler
2020-01-08 18:39:26 -08:00
Jeff Regan
6186e4edb7
Merge pull request #2017 from haiyanmeng/search
...
Add ElasticSearch query examples
2020-01-08 11:19:32 -08:00
Haiyan Meng
b154af8be4
Check the error of closing response body
2020-01-08 10:32:12 -08:00
Haiyan Meng
ccd129f7a5
Check empty http response before accessing it
2020-01-08 10:24:00 -08:00
Haiyan Meng
e2b56910f9
Add ElasticSearch query examples
2020-01-08 09:23:19 -08:00
Jeff Regan
32c280664d
Merge pull request #2025 from phanimarupaka/ConfigMapSpacesAndTabs
...
Trim trailing spaces and tabs from config map files
2020-01-07 15:53:31 -08:00
Haiyan Meng
594a3bf0d2
Add a binary for generating the stats of the index
...
1) how many kinds of objects are being customized?
2) how many times is every kind of object customized?
3) how many kustomization features are being used?
4) how many times is every kustomization feature used?
2020-01-07 15:10:25 -08:00
Jeff Regan
7190ea2688
Merge pull request #2038 from haiyanmeng/log-parser
...
Add a binary to parse GKE log
2020-01-07 14:57:40 -08:00