Commit Graph

186 Commits

Author SHA1 Message Date
Haiyan Meng
2e895c147e Use log.Print* instead of fmt.Print* 2020-01-14 15:50:35 -08:00
Haiyan Meng
af131c7471 Use flags to specify crawling mode and github user/repo info 2020-01-14 15:36:12 -08:00
Haiyan Meng
7ac573ae51 Add a flag to specify the index name 2020-01-14 14:25:29 -08:00
Haiyan Meng
bb09f82f3c Remove kustomize-index-name setting 2020-01-14 13:53:16 -08:00
Haiyan Meng
72eda992bd make seen a non-primitive type 2020-01-14 12:14:00 -08:00
Haiyan Meng
230e0ca752 Add two methods to type RangeQueryResult: Add and String 2020-01-14 12:14:00 -08:00
Haiyan Meng
14eb524b9e Add a command for searching for kustomize resource files 2020-01-14 12:14:00 -08:00
Haiyan Meng
81d62f90bf Improve the efficency of crawling github
Make sure a github file is crawled once
2020-01-14 12:14:00 -08:00
Kubernetes Prow Robot
1a330f89d9 Merge pull request #2080 from yujunz/git-cloner
Simplify git cloner logic
2020-01-13 15:23:11 -08:00
Haiyan Meng
569fafba81 Add the Document ID pointing to a kuostomization root into cache to
avoid crawl it repeatedly
2020-01-11 15:32:25 -08:00
Yujun Zhang
ae458d0c80 Simplify git cloner logic
Related to #2072
2020-01-11 20:40:55 +08:00
Haiyan Meng
c801958d40 Log response status code to help debug
Recently, the crawler job often fails after 10+ hours with the following
error (10.0.47.27:9200 is the ElasticSearch master):
dial tcp 10.0.47.27:9200: connect: connection refused
2020-01-10 11:37:22 -08:00
Haiyan Meng
f9a4d5a14e Track the crawling process 2020-01-10 11:10:38 -08:00
Jeff Regan
9555095de9 Merge pull request #2016 from haiyanmeng/stats
Add a binary for generating the stats of the index
2020-01-09 13:11:50 -08:00
Jeff Regan
a46046dac5 Merge pull request #2051 from haiyanmeng/nil
Two fixes of the crawler
2020-01-08 18:39:26 -08:00
Jeff Regan
6186e4edb7 Merge pull request #2017 from haiyanmeng/search
Add ElasticSearch query examples
2020-01-08 11:19:32 -08:00
Haiyan Meng
b154af8be4 Check the error of closing response body 2020-01-08 10:32:12 -08:00
Haiyan Meng
ccd129f7a5 Check empty http response before accessing it 2020-01-08 10:24:00 -08:00
Haiyan Meng
e2b56910f9 Add ElasticSearch query examples 2020-01-08 09:23:19 -08:00
Jeff Regan
32c280664d Merge pull request #2025 from phanimarupaka/ConfigMapSpacesAndTabs
Trim trailing spaces and tabs from config map files
2020-01-07 15:53:31 -08:00
Haiyan Meng
594a3bf0d2 Add a binary for generating the stats of the index
1) how many kinds of objects are being customized?
2) how many times is every kind of object customized?
3) how many kustomization features are being used?
4) how many times is every kustomization feature used?
2020-01-07 15:10:25 -08:00
Jeff Regan
7190ea2688 Merge pull request #2038 from haiyanmeng/log-parser
Add a binary to parse GKE log
2020-01-07 14:57:40 -08:00
Jeff Regan
6bdb4fe2a6 Update main.go 2020-01-07 14:52:20 -08:00
Jeff Regan
bbceb49fc4 Merge pull request #2012 from julienp/master
Show namespace resource on id conflict
2020-01-07 11:41:01 -08:00
Haiyan Meng
950660ff63 Add a binary to parse GKE log 2020-01-07 10:31:10 -08:00
Kubernetes Prow Robot
f749a4a194 Merge pull request #2036 from pwittrock/fix-go-mod
Switch to api version 0.3.1
2020-01-07 10:08:18 -08:00
Phillip Wittrock
b1f514632a Switch to api version 0.3.1 2020-01-07 08:54:05 -08:00
Haiyan Meng
745b58b3d0 Check whether a pointer is empty before accessing it to avoid SIGSEGV 2020-01-06 12:06:18 -08:00
Haiyan Meng
142c105500 SKip the empty resource/base item in a kustomization file and set the
defaultBranch if needed
2020-01-06 12:06:18 -08:00
Haiyan Meng
5f8a8b545b Add "kustomization" into the kustomization filenames used by the crawler 2020-01-06 12:06:18 -08:00
Haiyan Meng
ee659a70e4 Fix how to construct URLs for finding all the commits related to a
github file

The existing logic sets the creation time of a github file to the time
when the github repository was created.
The fix sets the creation time of a github file to the time when the
file was created.
2020-01-06 12:06:18 -08:00
Phani Teja Marupaka
011804e14d Make suggested changes 2020-01-02 13:06:14 -08:00
Phani Teja Marupaka
fa8f504ff4 Trim trailing spaces and tabs from config map files 2020-01-02 10:28:03 -08:00
Julien Poissonnier
0988f74d39 Show namespace resource on id conflict 2019-12-27 16:00:14 +01:00
Haiyan Meng
be2e03681d Remove unused param from IndexFunc 2019-12-18 15:56:44 -08:00
Haiyan Meng
127541f610 Support diffrent modes of running the crawler 2019-12-18 15:56:44 -08:00
Haiyan Meng
f5ff254203 Update deps 2019-12-18 15:56:44 -08:00
Haiyan Meng
a35f002139 Run goimports 2019-12-18 15:56:44 -08:00
Haiyan Meng
bef157d6b3 Fix insert/updating document logic 2019-12-18 15:56:44 -08:00
Haiyan Meng
2c2aa928cc Delete non-existing documents from the index 2019-12-18 15:56:44 -08:00
Haiyan Meng
1eb713157c Sort the string slice fields of a document to avoid updating the index
unnecessarily
2019-12-18 15:56:44 -08:00
Haiyan Meng
272b7a6fcd Use UpdateRequest to insert/update a document
Currently, `IndexRequest` is used to insert/update a document, which
increases the version of the document every time IndexRequest.Do is
called.
2019-12-18 15:56:44 -08:00
Haiyan Meng
5598d35e4b Add a summary for doCrawl 2019-12-18 15:56:44 -08:00
Haiyan Meng
8c89f0946c Avoid to index a document if FetchDcoument or SetCreated fails 2019-12-18 15:56:44 -08:00
Haiyan Meng
12fc8f41c7 Add support for github paths starting with "git@github.com:" 2019-12-18 15:56:44 -08:00
Haiyan Meng
e44d1298df Return errors if http Client.Do resp status code is not 2xx 2019-12-18 15:56:44 -08:00
Ilia Zlobin
cc8b100331 Handle variables in annotations 2019-12-18 17:04:43 +03:00
Phillip Wittrock
de824c2e4d Drop mdrip dependency from api/ because it has conflicting deps with kubectl 2019-12-17 13:48:29 -08:00
Jeffrey Regan
90597d56c9 Update go.sums 2019-12-16 11:37:35 -08:00
Jeffrey Regan
7e205b46b8 Update serialize-javascript 2019-12-13 14:45:43 -08:00