Commit Graph

206 Commits

Author SHA1 Message Date
Haiyan Meng
b35b5aa73d Check the checksums of documents in the index 2020-02-03 09:59:52 -08:00
Haiyan Meng
bb409a5ea8 Set up cronjob to run crawler every 7 days 2020-02-03 09:59:52 -08:00
Haiyan Meng
74e1b5d54b Add GCP service account into ESCluster config
This is necessary for index backup into GCS and index recovery from GCS
2020-02-03 09:59:52 -08:00
Jeff Regan
0ce076758d Merge pull request #2150 from haiyanmeng/stats
Add `fileType` and `User` into the index
2020-01-28 09:18:31 -08:00
Haiyan Meng
154208d331 Improve the efficiency of crawling github by skipping the documents
already in the index
2020-01-24 19:55:56 -08:00
Haiyan Meng
b7b88cae76 Add curl commands for querying different filetypes 2020-01-23 16:04:55 -08:00
HowJMay
00f68c12a8 fix typos
Fix typos
2020-01-23 23:35:38 +08:00
Haiyan Meng
0820865e1d Retry FindRangesForRepoSearch 2020-01-22 10:13:57 -08:00
Haiyan Meng
1120c6bc7a Add a User field into Document to make it easy to aggregate on github
user level.
2020-01-21 10:09:52 -08:00
Phani Teja Marupaka
0bd872e6d5 Do not remove empty lines in configmap/secret 2020-01-20 11:42:39 -08:00
Haiyan Meng
96ee9e9146 Add curl ElasticSearch cmd for using filter and range together 2020-01-17 15:49:14 -08:00
Haiyan Meng
377eb5b66d Fix the regexp for determining kustomization file 2020-01-17 15:48:38 -08:00
Haiyan Meng
f4636f8555 Add a fileType field into the index 2020-01-17 13:15:49 -08:00
Haiyan Meng
9f80da28ae Refactor the stats code for generators and transformers 2020-01-16 09:20:24 -08:00
Haiyan Meng
5477bde7e5 Use an env variable for index name and fix the call to NewKustomizeIndex in backend 2020-01-15 15:29:17 -08:00
Haiyan Meng
3ead42fe27 Add --index flag to kustomize_stats config file 2020-01-15 15:29:16 -08:00
Haiyan Meng
cf8d53a195 Move SeenMap to the utils dir 2020-01-15 15:29:16 -08:00
Haiyan Meng
aaaba99389 Use Document.Path instead of its fields 2020-01-15 12:10:08 -08:00
Haiyan Meng
29e50ab476 Collect stats on generators and transformers 2020-01-15 12:10:08 -08:00
Haiyan Meng
3519cc56a1 Add support to get files referred in the generators and tranformers
fields
2020-01-15 12:10:08 -08:00
Haiyan Meng
2e895c147e Use log.Print* instead of fmt.Print* 2020-01-14 15:50:35 -08:00
Haiyan Meng
af131c7471 Use flags to specify crawling mode and github user/repo info 2020-01-14 15:36:12 -08:00
Haiyan Meng
7ac573ae51 Add a flag to specify the index name 2020-01-14 14:25:29 -08:00
Haiyan Meng
bb09f82f3c Remove kustomize-index-name setting 2020-01-14 13:53:16 -08:00
Haiyan Meng
72eda992bd make seen a non-primitive type 2020-01-14 12:14:00 -08:00
Haiyan Meng
230e0ca752 Add two methods to type RangeQueryResult: Add and String 2020-01-14 12:14:00 -08:00
Haiyan Meng
14eb524b9e Add a command for searching for kustomize resource files 2020-01-14 12:14:00 -08:00
Haiyan Meng
81d62f90bf Improve the efficency of crawling github
Make sure a github file is crawled once
2020-01-14 12:14:00 -08:00
Kubernetes Prow Robot
1a330f89d9 Merge pull request #2080 from yujunz/git-cloner
Simplify git cloner logic
2020-01-13 15:23:11 -08:00
Haiyan Meng
569fafba81 Add the Document ID pointing to a kuostomization root into cache to
avoid crawl it repeatedly
2020-01-11 15:32:25 -08:00
Yujun Zhang
ae458d0c80 Simplify git cloner logic
Related to #2072
2020-01-11 20:40:55 +08:00
Haiyan Meng
c801958d40 Log response status code to help debug
Recently, the crawler job often fails after 10+ hours with the following
error (10.0.47.27:9200 is the ElasticSearch master):
dial tcp 10.0.47.27:9200: connect: connection refused
2020-01-10 11:37:22 -08:00
Haiyan Meng
f9a4d5a14e Track the crawling process 2020-01-10 11:10:38 -08:00
Jeff Regan
9555095de9 Merge pull request #2016 from haiyanmeng/stats
Add a binary for generating the stats of the index
2020-01-09 13:11:50 -08:00
Jeff Regan
a46046dac5 Merge pull request #2051 from haiyanmeng/nil
Two fixes of the crawler
2020-01-08 18:39:26 -08:00
Jeff Regan
6186e4edb7 Merge pull request #2017 from haiyanmeng/search
Add ElasticSearch query examples
2020-01-08 11:19:32 -08:00
Haiyan Meng
b154af8be4 Check the error of closing response body 2020-01-08 10:32:12 -08:00
Haiyan Meng
ccd129f7a5 Check empty http response before accessing it 2020-01-08 10:24:00 -08:00
Haiyan Meng
e2b56910f9 Add ElasticSearch query examples 2020-01-08 09:23:19 -08:00
Jeff Regan
32c280664d Merge pull request #2025 from phanimarupaka/ConfigMapSpacesAndTabs
Trim trailing spaces and tabs from config map files
2020-01-07 15:53:31 -08:00
Haiyan Meng
594a3bf0d2 Add a binary for generating the stats of the index
1) how many kinds of objects are being customized?
2) how many times is every kind of object customized?
3) how many kustomization features are being used?
4) how many times is every kustomization feature used?
2020-01-07 15:10:25 -08:00
Jeff Regan
7190ea2688 Merge pull request #2038 from haiyanmeng/log-parser
Add a binary to parse GKE log
2020-01-07 14:57:40 -08:00
Jeff Regan
6bdb4fe2a6 Update main.go 2020-01-07 14:52:20 -08:00
Jeff Regan
bbceb49fc4 Merge pull request #2012 from julienp/master
Show namespace resource on id conflict
2020-01-07 11:41:01 -08:00
Haiyan Meng
950660ff63 Add a binary to parse GKE log 2020-01-07 10:31:10 -08:00
Kubernetes Prow Robot
f749a4a194 Merge pull request #2036 from pwittrock/fix-go-mod
Switch to api version 0.3.1
2020-01-07 10:08:18 -08:00
Phillip Wittrock
b1f514632a Switch to api version 0.3.1 2020-01-07 08:54:05 -08:00
Haiyan Meng
745b58b3d0 Check whether a pointer is empty before accessing it to avoid SIGSEGV 2020-01-06 12:06:18 -08:00
Haiyan Meng
142c105500 SKip the empty resource/base item in a kustomization file and set the
defaultBranch if needed
2020-01-06 12:06:18 -08:00
Haiyan Meng
5f8a8b545b Add "kustomization" into the kustomization filenames used by the crawler 2020-01-06 12:06:18 -08:00