Yujun Zhang
ff6250cdb4
Allow loading file from http
2020-02-29 16:19:21 +08:00
Haiyan Meng
b7b7a5a79f
Fix typo
2020-02-10 15:44:51 -08:00
Haiyan Meng
807ca9c1e3
Add notes on backup and restore
2020-02-10 08:30:08 -08:00
Haiyan Meng
baccf58ccf
Avoid tracking the change in github_api_secret.txt
...
This helps prevent commiting your Github personal access token into
Github by accident.
2020-02-05 12:06:21 -08:00
Haiyan Meng
c7bdb3fbe4
Add cmds to process the kustomize-stats log
2020-02-05 11:04:59 -08:00
Haiyan Meng
967fe44e3f
Add curl commands for kustomize stats
2020-02-05 11:04:59 -08:00
Haiyan Meng
d0602c732b
Remove the usage of github access token from the kustomize-stats job
2020-02-05 11:04:59 -08:00
Haiyan Meng
a4179fa87f
Use the silence mode of curl
2020-02-05 11:04:59 -08:00
Haiyan Meng
c9bce3fc0a
Add comments on backup and restore
2020-02-05 11:04:59 -08:00
Haiyan Meng
3ebeebabde
Add comments for backup and restore
2020-02-03 12:37:18 -08:00
Haiyan Meng
a3b3449b1f
Add curl commands for generator/transformer exploration
2020-02-03 09:59:52 -08:00
Haiyan Meng
1b8488da2c
Add curl commands for snapshoting
2020-02-03 09:59:52 -08:00
Haiyan Meng
f5419e9f72
Check the incomplete_results field of github query responses
...
Currently, we don't check the `incomplete_results` field of a github
query response, which is problematic when incomplete query results are
used to split the query ranges: the splitted query ranges will
be very wild.
2020-02-03 09:59:52 -08:00
Haiyan Meng
7a87c84403
Reprocess the github filesize search ranges which have more than 1000 items
2020-02-03 09:59:52 -08:00
Haiyan Meng
0fcb3a014c
Add config for index backup and restore
2020-02-03 09:59:52 -08:00
Haiyan Meng
0b38e6d284
Improve the analysis on generator and transformer
2020-02-03 09:59:52 -08:00
Haiyan Meng
d5c66cb3d4
Add KustomizationDocument.Copy method
2020-02-03 09:59:52 -08:00
Haiyan Meng
b35b5aa73d
Check the checksums of documents in the index
2020-02-03 09:59:52 -08:00
Haiyan Meng
bb409a5ea8
Set up cronjob to run crawler every 7 days
2020-02-03 09:59:52 -08:00
Haiyan Meng
74e1b5d54b
Add GCP service account into ESCluster config
...
This is necessary for index backup into GCS and index recovery from GCS
2020-02-03 09:59:52 -08:00
Jeff Regan
0ce076758d
Merge pull request #2150 from haiyanmeng/stats
...
Add `fileType` and `User` into the index
2020-01-28 09:18:31 -08:00
Haiyan Meng
154208d331
Improve the efficiency of crawling github by skipping the documents
...
already in the index
2020-01-24 19:55:56 -08:00
Haiyan Meng
b7b88cae76
Add curl commands for querying different filetypes
2020-01-23 16:04:55 -08:00
HowJMay
00f68c12a8
fix typos
...
Fix typos
2020-01-23 23:35:38 +08:00
Haiyan Meng
0820865e1d
Retry FindRangesForRepoSearch
2020-01-22 10:13:57 -08:00
Haiyan Meng
1120c6bc7a
Add a User field into Document to make it easy to aggregate on github
...
user level.
2020-01-21 10:09:52 -08:00
Haiyan Meng
96ee9e9146
Add curl ElasticSearch cmd for using filter and range together
2020-01-17 15:49:14 -08:00
Haiyan Meng
377eb5b66d
Fix the regexp for determining kustomization file
2020-01-17 15:48:38 -08:00
Haiyan Meng
f4636f8555
Add a fileType field into the index
2020-01-17 13:15:49 -08:00
Haiyan Meng
9f80da28ae
Refactor the stats code for generators and transformers
2020-01-16 09:20:24 -08:00
Haiyan Meng
5477bde7e5
Use an env variable for index name and fix the call to NewKustomizeIndex in backend
2020-01-15 15:29:17 -08:00
Haiyan Meng
3ead42fe27
Add --index flag to kustomize_stats config file
2020-01-15 15:29:16 -08:00
Haiyan Meng
cf8d53a195
Move SeenMap to the utils dir
2020-01-15 15:29:16 -08:00
Haiyan Meng
aaaba99389
Use Document.Path instead of its fields
2020-01-15 12:10:08 -08:00
Haiyan Meng
29e50ab476
Collect stats on generators and transformers
2020-01-15 12:10:08 -08:00
Haiyan Meng
3519cc56a1
Add support to get files referred in the generators and tranformers
...
fields
2020-01-15 12:10:08 -08:00
Haiyan Meng
2e895c147e
Use log.Print* instead of fmt.Print*
2020-01-14 15:50:35 -08:00
Haiyan Meng
af131c7471
Use flags to specify crawling mode and github user/repo info
2020-01-14 15:36:12 -08:00
Haiyan Meng
7ac573ae51
Add a flag to specify the index name
2020-01-14 14:25:29 -08:00
Haiyan Meng
bb09f82f3c
Remove kustomize-index-name setting
2020-01-14 13:53:16 -08:00
Haiyan Meng
72eda992bd
make seen a non-primitive type
2020-01-14 12:14:00 -08:00
Haiyan Meng
230e0ca752
Add two methods to type RangeQueryResult: Add and String
2020-01-14 12:14:00 -08:00
Haiyan Meng
14eb524b9e
Add a command for searching for kustomize resource files
2020-01-14 12:14:00 -08:00
Haiyan Meng
81d62f90bf
Improve the efficency of crawling github
...
Make sure a github file is crawled once
2020-01-14 12:14:00 -08:00
Kubernetes Prow Robot
1a330f89d9
Merge pull request #2080 from yujunz/git-cloner
...
Simplify git cloner logic
2020-01-13 15:23:11 -08:00
Haiyan Meng
569fafba81
Add the Document ID pointing to a kuostomization root into cache to
...
avoid crawl it repeatedly
2020-01-11 15:32:25 -08:00
Yujun Zhang
ae458d0c80
Simplify git cloner logic
...
Related to #2072
2020-01-11 20:40:55 +08:00
Haiyan Meng
c801958d40
Log response status code to help debug
...
Recently, the crawler job often fails after 10+ hours with the following
error (10.0.47.27:9200 is the ElasticSearch master):
dial tcp 10.0.47.27:9200: connect: connection refused
2020-01-10 11:37:22 -08:00
Haiyan Meng
f9a4d5a14e
Track the crawling process
2020-01-10 11:10:38 -08:00
Jeff Regan
9555095de9
Merge pull request #2016 from haiyanmeng/stats
...
Add a binary for generating the stats of the index
2020-01-09 13:11:50 -08:00