mirror of
https://github.com/kubernetes-sigs/kustomize.git
synced 2026-06-10 08:20:59 +00:00
First draft of documentation for internal/tools
This commit is contained in:
428
internal/tools/README.md
Normal file
428
internal/tools/README.md
Normal file
@@ -0,0 +1,428 @@
|
||||
## What is this?
|
||||
### In short
|
||||
Be the GoDoc.org of k8s configuration files.
|
||||
|
||||
### More explicitly
|
||||
Support k8s document indexing from open-source configurations in order to make
|
||||
it easy for people to learn to use a new feature, explore k8s configs in a
|
||||
central hub, and see some metrics about kustomize use.
|
||||
|
||||
We want people to be able to support three main classes of queries:
|
||||
|
||||
1. Structured document queries: how should I use the following fields
|
||||
- Grace periods: `spec:template:spec:terminationGracePeriod`?
|
||||
- Kustomize inline patch: `patches:patch`?
|
||||
|
||||
2. Key value queries: how should I use this more specific use case of a
|
||||
structure configuration.
|
||||
- HorizontalPodAutoScalers: `kind=HorizontalPodAutoScaler`?
|
||||
- Patches on StatefulSets: `patches:target:kind=StatefulSet`?
|
||||
|
||||
3. Full text search: search the comments and the document text from any
|
||||
type of k8s config file.
|
||||
|
||||
## Road map
|
||||
There is a lot that can be added in order to improve the state of this
|
||||
application. Some more details along with general thoughts and comments can be
|
||||
found in the Roadmap.md file in this directory. This README contains only
|
||||
what can be considered as mostly complete and iterable parts of this project.
|
||||
|
||||
## Running this project
|
||||
Everything is configured using kubernetes, so it should be easy for people to
|
||||
spin this up on any k8s cluster. Everything should just work (TM).
|
||||
|
||||
The config files live in the `config` directory.
|
||||
|
||||
```
|
||||
config
|
||||
├── base
|
||||
│ └── kustomization.yaml
|
||||
├── crawler
|
||||
│ ├── base
|
||||
│ │ ├── github_api_secret.txt
|
||||
│ │ └── kustomization.yaml
|
||||
│ ├── cronjob
|
||||
│ │ ├── cronjob.yaml
|
||||
│ │ └── kustomization.yaml
|
||||
│ └── job
|
||||
│ ├── job.yaml
|
||||
│ └── kustomization.yaml
|
||||
├── elastic
|
||||
│ └── ...
|
||||
├── redis
|
||||
│ ├── document_keystore
|
||||
│ │ ├── kustomization.yaml
|
||||
│ │ ├── redis.yaml
|
||||
│ │ └── service.yaml
|
||||
│ └── http_cache
|
||||
│ ├── kustomization.yaml
|
||||
│ ├── redis.yaml
|
||||
│ └── service.yaml
|
||||
├── webapp
|
||||
│ ├── backend
|
||||
│ │ ├── deployment.yaml
|
||||
│ │ ├── kustomization.yaml
|
||||
│ │ └── service.yaml
|
||||
│ └── frontend
|
||||
│ ├── deployment.yaml
|
||||
│ ├── kustomization.yaml
|
||||
│ └── service.yaml
|
||||
└── schema_files
|
||||
└── kustomization_index
|
||||
├── es_index_mappings.json
|
||||
└── es_index_settings.json
|
||||
```
|
||||
|
||||
To get everything up and running you have to:
|
||||
|
||||
1. Get some instance of elasticsearch working... and configure the
|
||||
configmapGenerator in `config/base` to point to the right endpoint(s). The
|
||||
configurations that need this value to be populated are the following:
|
||||
- `config/crawler/cronjob` to run periodic crawls.
|
||||
- `config/crawler/job` to run crawls on demand.
|
||||
- `config/webapp/backend` to run the search server.
|
||||
|
||||
2. Configure the elasticsearch indices:
|
||||
```
|
||||
kustomize build config/schema_files/kustomization_index | kubectl apply -f -
|
||||
```
|
||||
This will run a `curl` command that reads json data from a ConfigMap. This will
|
||||
setup the schema. If you want to make more complex modifications to the
|
||||
schema, you should refer to the elastic docs to figure out whether the mapping
|
||||
can be added to the current index, or whether you will need to copy the
|
||||
existing index into a different one with the appropriate mappings. Modifications
|
||||
can be made by using the elasticsearch go library and writing a simple program,
|
||||
or it can be made with any http command to the appropriate server endpoint from
|
||||
within the cluster. Unfortunately I did not have the time to write a few helper
|
||||
tools for this. Feel free to contact me if you need help with modifying
|
||||
elasticsearch configs, I'm by no means an expert, but I can try to help.
|
||||
|
||||
3. (Optional) run the redis http chache for the crawler:
|
||||
```
|
||||
kubectl apply -k config/redis/http_cache
|
||||
```
|
||||
This will create a deployment for the cache, and a service. The crawler should
|
||||
be configured to connect to the `http_cache` if it exists, but you can always
|
||||
check the logs to make sure it connects, and that the identifiers match in the
|
||||
crawler configuration and for the service endpoint.
|
||||
|
||||
The please be aware that the cache does not have a persistent volume.
|
||||
|
||||
4. Configure the main redis instance:
|
||||
```
|
||||
kubectl apply -k config/redis/document_keystore
|
||||
```
|
||||
This will create a StatefulSet with a volume of 4GiB for a redis instance.
|
||||
|
||||
5. Get an access token from GitHub.
|
||||
|
||||
To be able to kindly ask GitHub for it's data on k8s config files, you'll need
|
||||
to create an access\_token. From my understanding, this is the only way to do
|
||||
these code search queries (without first specifying a repository).
|
||||
|
||||
To generate a token, go to your GitHub's account in Settings > Developer
|
||||
Settings > Personal access tokens. It should look like this.
|
||||
|
||||

|
||||
|
||||
From here you want to generate a new token and have the following
|
||||
configuration:
|
||||
|
||||

|
||||
|
||||
If you have uses for any other data from this token, (org data, or something
|
||||
else) you can pick and choose, but be careful since it can grant this
|
||||
application access to your notifications, etc. However, any such extension
|
||||
is explicitly a non-goal and would not be maintained by this project.
|
||||
|
||||
6. Launch the crawler:
|
||||
```
|
||||
kustomize build config/crawler/cronjob | kubectl apply -f -
|
||||
```
|
||||
This will periodically run the crawler every day according to the cron timing
|
||||
rules in the cronjob.yaml file.
|
||||
|
||||
Instead, to get the crawler running now, you can run:
|
||||
```
|
||||
kustomize build config/crawler/cronjob | kubectl apply -f -
|
||||
```
|
||||
which will launch a non-periodic version of the crawler. It will take a few
|
||||
minutes for the crawler to split the search, but then config files should
|
||||
start to get populated within 20 minutes. It may take a while to do the
|
||||
first crawl, since it has to fetch rate-limited endpoints for each new file it
|
||||
finds. It should get significantly faster to update in the future.
|
||||
|
||||
5. Launch the search backend
|
||||
```
|
||||
kustomize build config/webapp/backend | kubectl apply -f -
|
||||
```
|
||||
|
||||
6. Launch the search frontend
|
||||
```
|
||||
kustomize build config/webapp/frontend | kubectl apply -f -
|
||||
```
|
||||
|
||||
## Notes about the components
|
||||
|
||||
### Elasticsearch
|
||||
I will add a basic working setup soon. I just did the lazy thing and used an
|
||||
already packaged solution. Most clouds will provide their own elastic
|
||||
environments, however, Elasticsearch is also working on their own
|
||||
implementation of a
|
||||
, which might
|
||||
be worth checking out. Please note that it comes with its own license
|
||||
agreement.
|
||||
|
||||
### Redis
|
||||
There are two Redis instances that are used in this application.
|
||||
|
||||
One of them is configured to have on disk persistence, so make sure to have
|
||||
that set up in your kubernetes cluster. Also note that it is running on a
|
||||
single master node (i.e. it does not automatically shard keys to multiple head
|
||||
nodes as part of a highly available cluster). Since it's storing a sparse
|
||||
graph, I can't imagine this being much of an issue, but it's probably worth
|
||||
mentioning.
|
||||
|
||||
The other Redis instance is running as a HTTP (RFC 7234) cache for etags from
|
||||
GitHub (or any other document store from which we could crawl/index). This one
|
||||
does not require full persistent storage on disk. The caching strategy is an
|
||||
LRU cache which is probably a good starting point. It might be worth it to
|
||||
investigate other cache policies, but I think LRU will work well since
|
||||
documents may or may not expire anyway, and the amount of memory allocated for
|
||||
keys is fairly large, so eviction of frequently used documents seems unlikely
|
||||
anyway.
|
||||
|
||||
### Nginx + Angular
|
||||
There is a Dockerfile included for generating the container image with Nginx
|
||||
(using the default package) and adding all of the supporting compiled angular
|
||||
files. Any modifications to the code-base should be compatible with this setup,
|
||||
so all that's needed is to rebuild the container image, and possibly modify
|
||||
the image tags in the k8s file.
|
||||
|
||||
### Supporting Go binaries
|
||||
There are a few go binaries that each have their own Dockerfile to build
|
||||
containers in which to run them on k8s, namely the crawler and the search
|
||||
service. Their configurations are not optimal (read: needs to be cleaned up),
|
||||
but they are functional.
|
||||
|
||||
## Technical details
|
||||
|
||||
### Overall design and imlpementation
|
||||
|
||||
There are a few components that are all running together in order to get
|
||||
the overall application to work smoothly. This section will provide a brief
|
||||
overview of each component with the following sections going into more details.
|
||||
|
||||
The overall structure is outlined in the following figure:
|
||||

|
||||
|
||||
#### Crawler
|
||||
The leftmost component consists of a crawler with an http cache of GitHub
|
||||
queries does two things, it first looks at the list of documents in
|
||||
elasticsearch and tries to update them. In doing so, it maintains a set of
|
||||
newly updated files to exclude them from other parts of the crawl.
|
||||
|
||||
To find newly added documents, the crawler crawls any new dependencies
|
||||
introduced in the document updating step and it also queries GitHub for the
|
||||
most recently indexed kustomization.\* files. Each new file will be processed
|
||||
for efficient text queries and put into the document index. Any new dependency
|
||||
will also incur more crawl operations. Finally, a graphical
|
||||
representation of the documents and their dependencies is built in Redis to be
|
||||
used for graph algorithms such as PageRank and component analysis.
|
||||
|
||||
#### Data library
|
||||
There are a few helper libaries for dealing with Elasticsearch, Redis and
|
||||
documents. This is not persistent, nor is it centralized. They act as small
|
||||
components that help to package common pieces of code. Eventually it may make
|
||||
sense to merge all of it together and make a proper persistent model around
|
||||
this while providing an external API for document insertion/deletion. But
|
||||
that is definitely out of scope in terms of getting this to run. However
|
||||
there are limitations with the current model in terms of minimizing the
|
||||
API surface for the different components of the application. For now this
|
||||
problem is mostly mitigated by having the query server only connected to
|
||||
a data node of the Elasticsearch cluster, but the problem of knowing what
|
||||
is accessible and what isn't is left to the programmer instead of being
|
||||
clearly and explicitly supported by the API.
|
||||
|
||||
#### Server
|
||||
Uses the data library to communicate with the data store and answer queries.
|
||||
Processes the user entered text queries into somewhat optimized elasticsearch
|
||||
queries. Provides a few endpoints to get different metrics and to eventually
|
||||
allow for registration of remote repositories.
|
||||
|
||||
This application has an exposing service in order to allow users of the
|
||||
application access to queries and the results.
|
||||
|
||||
#### Nginx + Angular
|
||||
Communicates directly with the backend server to forward user queries and
|
||||
their results. Presents the results on an interface. It's still pretty simple
|
||||
looking but it seems usable (to me).
|
||||
|
||||
|
||||
### Crawling GitHub
|
||||
With the use of API keys, GitHub allows account owners to search for files
|
||||
using their API.
|
||||
|
||||
The search endpoints allow for the use of metadata search
|
||||
that is fairly useful/powerful. For instance they provide a `filename:` keyword
|
||||
that permits us to look for `kustomization.yaml`, `kustomization.yml`, etc.
|
||||
This enables the fetching of a list of kustomization documents, from which
|
||||
we can get the actual content from another endpoint
|
||||
(raw.githubusercontent.com).
|
||||
|
||||
However, the search API is fairly limited. There is a restriction to the number
|
||||
of documents that can be retrieved from this method. One possible way to
|
||||
mitigate this would be to periodically query GitHub for results, sorted by the
|
||||
last indexed time. This would allow you to collect most documents from this
|
||||
point forwards. The downside to this is that it may require a large number of
|
||||
requests to their API since you cannot know when new files will be added.
|
||||
Furthermore, there is a possibility that you would not be able to get all of
|
||||
files either, depending on the velocity of growth.
|
||||
|
||||
The approach that was taken to mitigate this is to use the `filesize:` keyword
|
||||
and to shard the search space into contiguous buckets of appropriate size in
|
||||
order to get all of the documents. This is fairly efficient, since you can find
|
||||
a good enough way to shard the documents in
|
||||
`lg(max file size) * number of documents / 1000` API queries. Moreover, since
|
||||
queries are paginated with at most 100 results per query, this solution is
|
||||
competitive with getting the optimal (non-contiguous) sharding of result sets.
|
||||
Furthermore, filesize queries can be cached to minimize the total number of
|
||||
queries called to the API in order to shard the search space. This is done by
|
||||
querying for file size intervals that always start with 0..X and binary
|
||||
searching over the `filesize:` space. This will allow you to reuse a lot of
|
||||
queries when you're looking for the next range, since it is upper bounded and
|
||||
lower bounded to a smaller number of queries within a range that has also been
|
||||
queried. I think this is only true because filesizes are power law distributed,
|
||||
so searches will typically require less queries as they progress from left to
|
||||
right.
|
||||
|
||||
However, this method in no way depends on intervals of the form 0..X, as
|
||||
the number of documents in the many intervals of the range search could be
|
||||
added together to also make this work. This approach just seemed simpler to
|
||||
implement, maintain, and debug so it was preferred.
|
||||
|
||||
To get an idea of how efficient this method is, to shard the search space of
|
||||
7000 documents, it will only take ~90 API range queries which should only take
|
||||
a few minutes. While actually fetching the documents and their relevant
|
||||
metadata (creation time, etc.) will take several hours. Furthermore, this
|
||||
could be made more efficient if a prior distribution is approximated.
|
||||
This prior could be scaled to the number of documents that need to be fetched,
|
||||
and then finding a shard that has an adequate number of requests, will only
|
||||
take a few queries per shard. It could probably be supported in a constant
|
||||
number of size queries if the size of each shard is halved which shouldn't
|
||||
have terrible performance impact for the retrieval. However, there where
|
||||
more pressing things to implement. I might revisit this later.
|
||||
|
||||
### Document Indexing and Processing
|
||||
In order to support simple text queries the structured documents must be
|
||||
processed in some way that makes searching them easy. The current method
|
||||
is to recursively traverse the map of configurations to generate each sub-path
|
||||
and each key-value pair for the leaf nodes of the recursion tree.
|
||||
|
||||
However, note that this means that a document has to be valid yaml/json
|
||||
format in order for indexing to happen. The rest of the document is treated
|
||||
as mostly text and uses default text settings from Elasticsearch.
|
||||
|
||||
What this means is that for the following yaml document:
|
||||
|
||||
```yaml
|
||||
resources:
|
||||
- service.yaml
|
||||
- deployment.yaml
|
||||
|
||||
configmapGenerator:
|
||||
- name: app-configuration
|
||||
files:
|
||||
- config.yaml
|
||||
|
||||
patchesJson6902:
|
||||
- target:
|
||||
version: v1
|
||||
kind: StatefulSet
|
||||
name: ss-name
|
||||
path: ss-patch.yaml
|
||||
- target:
|
||||
version: v1
|
||||
kind: Deployment
|
||||
name: dep-name
|
||||
path: dep-patch.yaml
|
||||
```
|
||||
|
||||
the following flattened structure would look like:
|
||||
```json
|
||||
{
|
||||
"identifiers": [
|
||||
"resources",
|
||||
"configmapGenerator",
|
||||
"configmapGenerator:name",
|
||||
"configmapGenerator:files",
|
||||
"patchesJson6902",
|
||||
"patchesJson6902:target",
|
||||
"patchesJson6902:target:version",
|
||||
"patchesJson6902:target:kind",
|
||||
"patchesJson6902:target:name",
|
||||
"patchesJson6902:path",
|
||||
],
|
||||
"values": [
|
||||
"resources=service.yaml"
|
||||
"resources=deployment.yaml"
|
||||
"configmapGenerator:name=app-configuration"
|
||||
"configmapGenerator:files=config.yaml"
|
||||
"patchesJson6902:target:version=v1",
|
||||
"patchesJson6902:target:kind=StatefulSet",
|
||||
"patchesJson6902:target:name=ss-name",
|
||||
"patchesJson6902:path=ss-patch.yaml",
|
||||
"patchesJson6902:target:kind=Deployment",
|
||||
"patchesJson6902:target:name=dep-name",
|
||||
"patchesJson6902:path=dep-patch.yaml",
|
||||
],
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
Note that unique paths and values are deduplicated.
|
||||
|
||||
On the search side, exact queries will be prioritized, but the document paths
|
||||
and key=value pairs will also be analyzed with 3-grams to have some amount of
|
||||
fuzzy search. The reason that a Levenshtein-Distance was not used instead, is due
|
||||
to searching multiple fields at the same time, which is a use case where
|
||||
Elasticsearch does not support proper fuzzy searching.
|
||||
|
||||
### Document Search
|
||||
Given a text query, each token is considered separately. Each token will be fed
|
||||
through a handful of analyzers on the Elasticsearch side, and will be compared
|
||||
with the reverse document index of each document fields. It will then determine
|
||||
the best matching documents. Text ordering is largely insignificant. This makes
|
||||
sense for the structured search, but may leave room for improvement for the
|
||||
text only search within the document.
|
||||
|
||||
Each token _must_ be matched, so each white space character acts as a
|
||||
conjunction of individual queries. There are also ways of telling
|
||||
Elasticsearch that some things _should_ match, but I think for now it makes
|
||||
more sense to leave it as is.
|
||||
|
||||
I think this behavior is sufficient to make the search feel fairly intuitive
|
||||
while providing support for fairly complex use cases.
|
||||
|
||||
### Metrics Computation
|
||||
From the each kustomization document that is indexed, we can find it's
|
||||
resources that are publicly available. This includes other kustomizations.
|
||||
From this, we can build a directed graph of dependencies and reverse
|
||||
dependencies.
|
||||
|
||||
This opens up the possibility to add a plethora of graph metrics that can
|
||||
give the project maintainers feedback and insight into how people are using
|
||||
their tools.
|
||||
|
||||
Some of these are useful such as getting an idea for how large the dependency
|
||||
graphs actually grow in practice, and can be used to find _popular_
|
||||
kustomizations within the corpus. This lends itself to implementing PageRank
|
||||
to help bubble up popular results as good search results. I unfortunately
|
||||
did not have the time to implement the algorithm, but I do plan to revisit
|
||||
this sometime soon to add a few good and efficient implementations of useful
|
||||
graph algorithms that would be useful to have. See the Roadmap.md for a more
|
||||
complete list of features that could be added and how I think they could be
|
||||
implemented.
|
||||
176
internal/tools/ROADMAP.md
Normal file
176
internal/tools/ROADMAP.md
Normal file
@@ -0,0 +1,176 @@
|
||||
# Road map and comments about this work
|
||||
|
||||
From working on this project, here is a collection of thoughts and suggestions
|
||||
for future improvements. For any questions about this, or to request help do
|
||||
not hesitate to contact @damienr74 on GitHub, my email should be listed.
|
||||
|
||||
I think this project has the potential for the K8s community to promote best
|
||||
practices. If this becomes popular, It could become easier to find
|
||||
*subjectively good* configurations. This can act as a way to guide newcomers
|
||||
to k8s config features that are easy to maintain, practical, and tested in some
|
||||
real world environment. However, a lot of work remains to be made if this is
|
||||
to happen. Extracting and ranking semantic-level information from the open
|
||||
source configuration files, is definitely not trivial, and will require a lot of
|
||||
though and consideration from the experts and the patterns that successful k8s
|
||||
project follow. This, is outside of my scope having little to no experience with
|
||||
k8s other than working on this project; however, if you have ideas I can
|
||||
probably suggest approaches in order to implement it, having worked a lot on
|
||||
this project.
|
||||
|
||||
### Improving configuration files and container configs
|
||||
I did not have a lot of time to refactor the images to use configmaps for
|
||||
everything. This is a good thing to improve, should be fairly easy. Another
|
||||
thing that could make the user experience of launcing this could be to make all
|
||||
of the go utilities be subcommands to the same binary/container image. This
|
||||
would reduce the number of things that would have to be rebuilt, in order to get
|
||||
it running, and it would make the application (and its components) more self
|
||||
contained. (also has some disadvantages, so I'll let someone else decide.
|
||||
|
||||
### Adding graph metrics
|
||||
From the Redis graph representation, we are able to run a multitude of graph
|
||||
algorithms (not all of which are implemented).
|
||||
|
||||
The simplest one would be to run kruskal's algorithm to find connected
|
||||
components, and to compute graph metrics on each component. Here are some of the
|
||||
metrics that may be useful:
|
||||
|
||||
+ Average size and histograms of the sizes of each components.
|
||||
|
||||
+ Average size and histograms of the node with the highest in degree (rdeps) of
|
||||
each component.
|
||||
|
||||
+ Average size and histograms of the number of repositories in a connected
|
||||
component.
|
||||
|
||||
+ Any other metric that may be helpful to measure the scale of the kustomize
|
||||
import graph.
|
||||
|
||||
Another cool thing that may be helpful, would be to output the graph
|
||||
representation of deps/rdeps. This should be fairly easy to do with graphviz/dot
|
||||
so if anyone really wants this, I (damienr74) should be able to do it. Feel free
|
||||
to send me an email or to @ mention me in an issue.
|
||||
|
||||
Note: dfs could also be used to find connected components, but I think union
|
||||
find is preferable, since the results can be stored and modified very
|
||||
efficiently. The only challenging part would be to implement deleting of edges
|
||||
and nodes from a component efficiently, but I know it is possible to support
|
||||
these operations with a union find structure.
|
||||
|
||||
### Implementing PageRank
|
||||
The graph is set up to be able to efficiently compute PageRank since the edge
|
||||
weights are real valued, and the graph representation is sparse which means that
|
||||
it will fit in the memory of a single machine which will make the processing
|
||||
much more efficient.
|
||||
|
||||
It could also be implemented as a Redis script, but I feel like there's
|
||||
something fundamentally wrong with implementing PageRank in lua. :P
|
||||
|
||||
### Implement feature tracking
|
||||
Each day, when the crawler finds and indexes these structured documents,
|
||||
it should insert aggregate data to a separate index. This data could look like the
|
||||
following:
|
||||
|
||||
```json
|
||||
{
|
||||
"kind": "kustomization",
|
||||
"added_identifiers": [
|
||||
{
|
||||
"identifier": "some:new:k8s:feature",
|
||||
"addedIn": [
|
||||
"docID1",
|
||||
"docID100",
|
||||
"docID45",
|
||||
...
|
||||
],
|
||||
}
|
||||
{
|
||||
"identifier": "another:k8s:feature",
|
||||
"documents": [
|
||||
...
|
||||
],
|
||||
}
|
||||
...
|
||||
]
|
||||
|
||||
"removed_identifiers": [
|
||||
{
|
||||
"identifier": "some:deprecated:field",
|
||||
"documents": [
|
||||
...
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
This would make it fairly easy to get deep insight into:
|
||||
- the speed at which things can effectively be deprecated.
|
||||
- how many people are migrating to current best practices.
|
||||
- how many documents get updated frequently/rarely.
|
||||
- detailed cross sections of growth/regression over conjunctions of features.
|
||||
- a world of possibilities.
|
||||
|
||||
This is also something that I would be interested to work on sometime soon, so
|
||||
feel free to contact me (damienr74) or ask questions about this.
|
||||
|
||||
As needed, it could be a good idea to also aggregate past data with a larger
|
||||
granularity. for instance each month, the past 30 days can be aggregated into
|
||||
weekish durations, And every year these weekly aggregations can be converted
|
||||
into monthly summaries depending on how much data this ends up being, and how
|
||||
much you want to pay for the storage of this data.
|
||||
|
||||
Another cool way to compress this data would be to dynamically compress this
|
||||
data into a logarithmic number of buckets with decreasing granularity. But it
|
||||
seems like overkill for the amount of data that we'd likely get.
|
||||
|
||||
### The UI probably needs a lot of work
|
||||
I'm not much of a UI/UX person and have little to no experience in developing
|
||||
these types of applications. If anyone with Angular experience wants to dive in
|
||||
and completely restructure the app to make the UI/UX/Code health better that
|
||||
would be greatly appreciated.
|
||||
|
||||
### Query tuning probably still has to be adjusted
|
||||
I'm also not an expert in Elasticsearch. From what I could read in the docs,
|
||||
I think I've made sane decisions in converting user queries into meaningful
|
||||
Elasticsearch queries, but I'm sure there are a lot of improvements that remain
|
||||
to be done in order to get more accurate results.
|
||||
|
||||
|
||||
### Some other signals that indicate the presence of a good configuration file
|
||||
There are lots of heuristics that could be used to achieve this. Here are a
|
||||
couple in no particular order:
|
||||
|
||||
+ Penalize for the number of yaml `---` document splits. I'm not sure what the
|
||||
general consensus is, but I think it's better to separate them, since it
|
||||
makes git commits less noisy, it's a trivial transformation, and it makes
|
||||
config files smaller. However, I can understand the argument that its somewhat
|
||||
practical to keep an overall view of the configurations together (maybe).
|
||||
|
||||
+ Penalize the number of unique identifiers in a structured document. I think
|
||||
this makes sense, since we don't want to have someone game the search engine
|
||||
to match documents with every possible path from the k8s docs. PageRank might
|
||||
help with this to some extent, but with a small corpus it would be fairly easy
|
||||
to game.
|
||||
|
||||
+ Assign weights to the usefulness of certain fields. It would be good to
|
||||
promote documents that use `keyRefFromConfigMap`, liveness probes, etc.
|
||||
|
||||
These are the main ones I can think of, but I'm sure there are a *ton* of
|
||||
ways to achieve this.
|
||||
|
||||
If the corpus gets large enough, we might even be able to use *blockchains*,
|
||||
*machine learning*, and maybe even self-driving cars.
|
||||
|
||||
### Add more support for indexing of other k8s/kustomize related data
|
||||
One thing that jumps to mind is the use of kustomize plugins. They are easy
|
||||
to track since they all have an unused global variable: `var KustomizePluggin`
|
||||
it would be easy to run the pluginator command and generate godocs for each
|
||||
go file with this unique identifier.
|
||||
|
||||
For the sake of completeness, here is the full GitHub query that we can use to
|
||||
find these:
|
||||
`api.github.com/search/code?q=var+KustomizePlugin+extension%3A.go&access_token=access_token`
|
||||
|
||||
Godoc will not show much, since most packages will be using package main, but
|
||||
using pluginator we can make it a properly named package such that Godoc would
|
||||
actually generate the relevant documentation.
|
||||
BIN
internal/tools/pictures/github_token.png
Normal file
BIN
internal/tools/pictures/github_token.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 53 KiB |
BIN
internal/tools/pictures/sys_arch.png
Normal file
BIN
internal/tools/pictures/sys_arch.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 44 KiB |
BIN
internal/tools/pictures/token_config.png
Normal file
BIN
internal/tools/pictures/token_config.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 32 KiB |
Reference in New Issue
Block a user