Skip to content

Refactor Performance Tests and add Grafana, Pyroscope and Prometheus monitoring stack#561

Open
rendre-greyling wants to merge 9 commits into
kptdev:mainfrom
nokia:perf-test-refactor
Open

Refactor Performance Tests and add Grafana, Pyroscope and Prometheus monitoring stack#561
rendre-greyling wants to merge 9 commits into
kptdev:mainfrom
nokia:perf-test-refactor

Conversation

@rendre-greyling
Copy link
Copy Markdown
Contributor

Refactor Performance Tests and add Grafana, Pyroscope and Prometheus monitoring stack

Description

New Performance Tests and Suite

The previous performance tests were split between 2 performance packages, one under test/performance folder, the other under test/e2e/performance which was the Iterative test. These tests were outdated and lacked certain metrics. Features of both have now been consolidated under one test/performance folder.

There are now 2 new tests which replace the previous tests and functionality:

  • Load Test: Creates a preset amount of repositories, packages, and revisions to simulate load on the Porch system.
  • Maximum Package Revisions: Tests the maximum number of package revisions that can be handled by Porch in a single repository.

See the test/performance/README.md for all info on the data that is collected by the tests and customization arguments. The data is stored in csv files for later analysis. The data is also exposed to OTel/Prometheus if the -enable-prometheus is set.

New Performance Monitoring Stack

The OTel interface has been expanded to expose various new API metrics as well as Memory and CPU data. A new deploy-monitoring.sh script has been added which will deploy Grafana, Prometheus and Pyroscope pods in the kind cluster for visual analysis of the data.

Grafana has 4 dashboards:

  • Porch API: PR and PRR time taken per API call, as well as req/s per user account.
  • Porch Performance Test Dashboard: All recorded metrics from the performance tests
  • Porch Resource Usage: Memory and CPU usage of the porch-server and function-runner
  • Pyroscope – Porch profiling: Advanced CPU, Memory and Go profiling data with the use of flame graphs.

Pyroscope UI can also be accessed by http://localhost:4040/ after running the deploy-monitoring.sh for more granularity.


Type of Change

  • Bug fix
  • New feature
  • Enhancement
  • Refactor
  • Documentation
  • Tests
  • Other: ________

Checklist

  • Code follows project style guidelines
  • Self-reviewed changes
  • Tests added/updated
  • Documentation added/updated
  • All tests and gating checks pass

Testing Instructions (Optional)

  1. Uncomment Pyroscope envvars in 2-function-runner.yaml and 3-porch-server.yaml in deployments/porch
  2. Setup kind cluster
  3. Follow instructions in test/performance/README.md

Additional Notes (Optional)

Also added the ability for Gitea to be deployed using a docker mirror with the DOCKERHUB_MIRROR envvar


AI Disclosure

[x] I have used AI in the creation of this PR.

  • Microsoft Copilot to analyse the code and create Grafana Dashboards and tedious features such as the csv generation

@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 26, 2026

Deploy Preview for porch ready!

Name Link
🔨 Latest commit 80f65a5
🔍 Latest deploy log https://app.netlify.com/projects/porch/deploys/69f355f82bd71a0008fcbb96
😎 Deploy Preview https://deploy-preview-561--porch.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
🤖 Make changes Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

@rendre-greyling
Copy link
Copy Markdown
Contributor Author

Rebased onto PR controller. Will add metrics to the controllers as well. Open to suggestions for any further metrics to track and panels in the grafana dashboard

@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
1 Security Hotspot
24.0% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

Comment thread internal/otel/otel.go
}

// Shutdown gracefully shuts down all OpenTelemetry resources.
func (r *OTelResources) Shutdown(ctx context.Context) error {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a separate explicit shutdown other than the context cancel for SetupOpenTelemetry? As I've seen that shuts down all providers cleanly.

Comment thread internal/otel/otel.go
return nil
}

func startMetricsServerIfConfigured(res *OTelResources) error {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed? If the otelPort and provider a confiugred correctly, then the otel SDK should be responsible for setting up the server.

Comment thread cmd/porch/main.go
}
}()

prof := &metrics.Profiling{}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could Pyroscope and metrics initialisation be moved into the otel package? It could be renamed to telemetry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants