Haiyan Meng
b35b5aa73d
Check the checksums of documents in the index
2020-02-03 09:59:52 -08:00
Haiyan Meng
bb409a5ea8
Set up cronjob to run crawler every 7 days
2020-02-03 09:59:52 -08:00
Haiyan Meng
74e1b5d54b
Add GCP service account into ESCluster config
...
This is necessary for index backup into GCS and index recovery from GCS
2020-02-03 09:59:52 -08:00
Jeff Regan
0ce076758d
Merge pull request #2150 from haiyanmeng/stats
...
Add `fileType` and `User` into the index
2020-01-28 09:18:31 -08:00
Haiyan Meng
154208d331
Improve the efficiency of crawling github by skipping the documents
...
already in the index
2020-01-24 19:55:56 -08:00
Haiyan Meng
b7b88cae76
Add curl commands for querying different filetypes
2020-01-23 16:04:55 -08:00
HowJMay
00f68c12a8
fix typos
...
Fix typos
2020-01-23 23:35:38 +08:00
Haiyan Meng
0820865e1d
Retry FindRangesForRepoSearch
2020-01-22 10:13:57 -08:00
Haiyan Meng
1120c6bc7a
Add a User field into Document to make it easy to aggregate on github
...
user level.
2020-01-21 10:09:52 -08:00
Phani Teja Marupaka
0bd872e6d5
Do not remove empty lines in configmap/secret
2020-01-20 11:42:39 -08:00
Haiyan Meng
96ee9e9146
Add curl ElasticSearch cmd for using filter and range together
2020-01-17 15:49:14 -08:00
Haiyan Meng
377eb5b66d
Fix the regexp for determining kustomization file
2020-01-17 15:48:38 -08:00
Haiyan Meng
f4636f8555
Add a fileType field into the index
2020-01-17 13:15:49 -08:00
Haiyan Meng
9f80da28ae
Refactor the stats code for generators and transformers
2020-01-16 09:20:24 -08:00
Haiyan Meng
5477bde7e5
Use an env variable for index name and fix the call to NewKustomizeIndex in backend
2020-01-15 15:29:17 -08:00
Haiyan Meng
3ead42fe27
Add --index flag to kustomize_stats config file
2020-01-15 15:29:16 -08:00
Haiyan Meng
cf8d53a195
Move SeenMap to the utils dir
2020-01-15 15:29:16 -08:00
Haiyan Meng
aaaba99389
Use Document.Path instead of its fields
2020-01-15 12:10:08 -08:00
Haiyan Meng
29e50ab476
Collect stats on generators and transformers
2020-01-15 12:10:08 -08:00
Haiyan Meng
3519cc56a1
Add support to get files referred in the generators and tranformers
...
fields
2020-01-15 12:10:08 -08:00
Haiyan Meng
2e895c147e
Use log.Print* instead of fmt.Print*
2020-01-14 15:50:35 -08:00
Haiyan Meng
af131c7471
Use flags to specify crawling mode and github user/repo info
2020-01-14 15:36:12 -08:00
Haiyan Meng
7ac573ae51
Add a flag to specify the index name
2020-01-14 14:25:29 -08:00
Haiyan Meng
bb09f82f3c
Remove kustomize-index-name setting
2020-01-14 13:53:16 -08:00
Haiyan Meng
72eda992bd
make seen a non-primitive type
2020-01-14 12:14:00 -08:00
Haiyan Meng
230e0ca752
Add two methods to type RangeQueryResult: Add and String
2020-01-14 12:14:00 -08:00
Haiyan Meng
14eb524b9e
Add a command for searching for kustomize resource files
2020-01-14 12:14:00 -08:00
Haiyan Meng
81d62f90bf
Improve the efficency of crawling github
...
Make sure a github file is crawled once
2020-01-14 12:14:00 -08:00
Kubernetes Prow Robot
1a330f89d9
Merge pull request #2080 from yujunz/git-cloner
...
Simplify git cloner logic
2020-01-13 15:23:11 -08:00
Haiyan Meng
569fafba81
Add the Document ID pointing to a kuostomization root into cache to
...
avoid crawl it repeatedly
2020-01-11 15:32:25 -08:00
Yujun Zhang
ae458d0c80
Simplify git cloner logic
...
Related to #2072
2020-01-11 20:40:55 +08:00
Haiyan Meng
c801958d40
Log response status code to help debug
...
Recently, the crawler job often fails after 10+ hours with the following
error (10.0.47.27:9200 is the ElasticSearch master):
dial tcp 10.0.47.27:9200: connect: connection refused
2020-01-10 11:37:22 -08:00
Haiyan Meng
f9a4d5a14e
Track the crawling process
2020-01-10 11:10:38 -08:00
Jeff Regan
9555095de9
Merge pull request #2016 from haiyanmeng/stats
...
Add a binary for generating the stats of the index
2020-01-09 13:11:50 -08:00
Jeff Regan
a46046dac5
Merge pull request #2051 from haiyanmeng/nil
...
Two fixes of the crawler
2020-01-08 18:39:26 -08:00
Jeff Regan
6186e4edb7
Merge pull request #2017 from haiyanmeng/search
...
Add ElasticSearch query examples
2020-01-08 11:19:32 -08:00
Haiyan Meng
b154af8be4
Check the error of closing response body
2020-01-08 10:32:12 -08:00
Haiyan Meng
ccd129f7a5
Check empty http response before accessing it
2020-01-08 10:24:00 -08:00
Haiyan Meng
e2b56910f9
Add ElasticSearch query examples
2020-01-08 09:23:19 -08:00
Jeff Regan
32c280664d
Merge pull request #2025 from phanimarupaka/ConfigMapSpacesAndTabs
...
Trim trailing spaces and tabs from config map files
2020-01-07 15:53:31 -08:00
Haiyan Meng
594a3bf0d2
Add a binary for generating the stats of the index
...
1) how many kinds of objects are being customized?
2) how many times is every kind of object customized?
3) how many kustomization features are being used?
4) how many times is every kustomization feature used?
2020-01-07 15:10:25 -08:00
Jeff Regan
7190ea2688
Merge pull request #2038 from haiyanmeng/log-parser
...
Add a binary to parse GKE log
2020-01-07 14:57:40 -08:00
Jeff Regan
6bdb4fe2a6
Update main.go
2020-01-07 14:52:20 -08:00
Jeff Regan
bbceb49fc4
Merge pull request #2012 from julienp/master
...
Show namespace resource on id conflict
2020-01-07 11:41:01 -08:00
Haiyan Meng
950660ff63
Add a binary to parse GKE log
2020-01-07 10:31:10 -08:00
Kubernetes Prow Robot
f749a4a194
Merge pull request #2036 from pwittrock/fix-go-mod
...
Switch to api version 0.3.1
2020-01-07 10:08:18 -08:00
Phillip Wittrock
b1f514632a
Switch to api version 0.3.1
2020-01-07 08:54:05 -08:00
Haiyan Meng
745b58b3d0
Check whether a pointer is empty before accessing it to avoid SIGSEGV
2020-01-06 12:06:18 -08:00
Haiyan Meng
142c105500
SKip the empty resource/base item in a kustomization file and set the
...
defaultBranch if needed
2020-01-06 12:06:18 -08:00
Haiyan Meng
5f8a8b545b
Add "kustomization" into the kustomization filenames used by the crawler
2020-01-06 12:06:18 -08:00