26 Pipeline-Visualisierung und Monitoring

Pipelines sind der Herzschlag deiner CI/CD-Workflows. Eine Pipeline, die erfolgreich läuft, ist unsichtbar – aber eine fehlgeschlagene Pipeline braucht sofortige Aufmerksamkeit. GitLab bietet umfassende Visualisierungs- und Monitoring-Tools, um Pipeline-Status zu verstehen, Probleme zu diagnostizieren, und Performance zu optimieren.

Dieses Kapitel erklärt die GitLab UI für Pipelines, wie man Job-Logs liest, Artifacts inspiziert, Failures debuggt, und Monitoring für Production-Pipelines aufbaut.

26.1 Pipeline-Übersicht: Das Dashboard

26.1.1 Pipelines-Liste

Navigation: Project → CI/CD → Pipelines

Die Pipelines-Liste zeigt alle Pipelines des Projekts chronologisch:

Pipeline  Status    Ref        Commit              Stages                Duration  Triggered
#123      ✓ passed  main       Update docs (7a8f9)  build • test • deploy  2m 34s   git push
#122      ✗ failed  feature-x  Add feature (3b4c5)  build • test          1m 15s   git push
#121      ⚙ running main       Fix bug (9d2e1)      build • test • deploy  45s      git push
#120      ⏸ manual  release-v2 Release 2.0 (6f7g8)  build • test • deploy  3m 02s   schedule

Status-Icons: - ✓ passed (grün): Alle Jobs erfolgreich - ✗ failed (rot): Mindestens ein Job fehlgeschlagen - ⚙ running (blau): Pipeline läuft aktuell - ⏸ manual (orange): Wartet auf manuelle Aktion - ⊗ canceled (grau): Pipeline abgebrochen - ⚠ warning (gelb): Jobs mit allow_failure: true fehlgeschlagen

Ref: Branch oder Tag, der Pipeline triggerte Stages: Visual Indicators für jede Stage Duration: Gesamtlaufzeit (Wall-Clock-Time) Triggered: Wer/Was triggerte Pipeline

26.1.2 Pipeline-Details

Click auf Pipeline → Detailansicht:

Pipeline #123 (main)
✓ passed in 2 minutes 34 seconds

Triggered by: Andreas Mueller (@andreas)
Commit: 7a8f9b0c - Update documentation
Created: 2 hours ago

[Retry] [Cancel]

Aktionen: - Retry: Komplette Pipeline neu starten - Cancel: Laufende Pipeline abbrechen - Download: Alle Artifacts als ZIP

26.2 Pipeline-Graph: Visualisierung der Stages und Jobs

26.2.1 Standard-Graph (Stage-basiert)

┌─────────────────────────────────────────────────────────┐
│                     Stage: build                         │
│  ┌────────────────────────────────────────────────────┐ │
│  │ build-app                                    ✓ 45s │ │
│  └────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────┐
│                     Stage: test                          │
│  ┌──────────────────┐  ┌──────────────────┐            │
│  │ test-unit  ✓ 15s │  │ test-e2e   ✓ 30s │            │
│  └──────────────────┘  └──────────────────┘            │
│  ┌──────────────────┐                                   │
│  │ lint       ✓ 10s │                                   │
│  └──────────────────┘                                   │
└─────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────┐
│                    Stage: deploy                         │
│  ┌────────────────────────────────────────────────────┐ │
│  │ deploy-staging                           ⏸ manual │ │
│  └────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘

Parallele Jobs: Nebeneinander in Stage Sequenzielle Stages: Vertikal untereinander Manual Jobs: Orange-markiert

26.2.2 DAG-Graph (Needs-basiert)

Wenn Jobs needs verwenden, zeigt GitLab einen Directed Acyclic Graph:

# .gitlab-ci.yml mit needs
stages:
  - build
  - test
  - deploy

build-frontend:
  stage: build
  script: npm run build

build-backend:
  stage: build
  script: go build

test-frontend:
  stage: test
  needs: [build-frontend]
  script: npm test

test-backend:
  stage: test
  needs: [build-backend]
  script: go test

deploy:
  stage: deploy
  needs: [test-frontend, test-backend]
  script: ./deploy.sh

DAG-Visualisierung:

Vorteile: - Echte Dependencies: Nicht künstlich durch Stages limitiert - Schnellere Pipelines: Jobs starten sobald Dependencies fertig - Klarheit: Sieht welcher Job welchen braucht

Toggle: Pipeline-Ansicht hat Button “Graph” vs. “Needs”

26.3 Job-Details: Deep-Dive

26.3.1 Job-Übersicht

Click auf Job in Pipeline-Graph → Job-Detailseite:

Job #456: test-unit
✓ passed in 15 seconds

Stage: test
Runner: #789 (docker-runner-01)
Tags: docker, linux

[Retry] [Download artifacts] [Browse artifacts]

Coverage: 87.5%

Tabs: 1. Job log: Command-Output in Echtzeit 2. Job artifacts: Downloadable Files 3. Tests: JUnit-Reports (wenn konfiguriert) 4. Coverage: Code-Coverage-Report

26.3.2 Job-Log: Die wichtigste Troubleshooting-Quelle

Struktur eines Job-Logs:

Running with gitlab-runner 16.5.0 (abc123)
  on docker-runner-01 xyz456

Preparing the "docker" executor
  Using Docker executor with image node:18-alpine ...
  Pulling docker image node:18-alpine ...
  Using docker image sha256:def789 for node:18-alpine

Preparing environment
  Running on runner-xyz456-project-123-concurrent-0 via runner-01...

Getting source from Git repository
  Fetching changes with git depth set to 20...
  Reinitialized existing Git repository in /builds/user/project/.git/
  Checking out 7a8f9b0c as detached HEAD (ref is main)...

Downloading artifacts
  Downloading artifacts for build-job (456)...
  Downloaded artifacts successfully

Executing "step_script" stage of the job script
  $ npm ci
  added 1347 packages in 8s
  
  $ npm test
  > project@1.0.0 test
  > jest
  
  PASS  src/components/Button.test.js
  PASS  src/utils/helpers.test.js
  
  Test Suites: 2 passed, 2 total
  Tests:       15 passed, 15 total
  Snapshots:   0 total
  Time:        3.456 s
  Ran all test suites.

Saving cache
  Creating cache default-protected...
  node_modules/: found 1347 matching files
  Created cache

Uploading artifacts...
  coverage/: found 234 matching files
  Uploading artifacts to coordinator... done

Cleaning up project directory and file based variables
  Job succeeded

Sections: 1. Runner Info: Welcher Runner, welches Executor-Type 2. Image Pull: Docker-Image wird gepullt (bei Docker-Executor) 3. Git Clone: Repository wird geklont 4. Artifact Download: Artifacts von Dependencies 5. Script Execution: before_script, script, after_script 6. Cache Save: Cache wird hochgeladen 7. Artifact Upload: Job-Artifacts werden hochgeladen 8. Cleanup: Working Directory wird gelöscht

Expandable Sections: Click auf Section-Header → Expandiert/Kollapiert

Live-Streaming: Während Job läuft, streamt Log in Echtzeit.

26.3.3 Job-Log lesen: Troubleshooting

Szenario 1: npm install fehlschlägt

$ npm ci
npm ERR! code ENOENT
npm ERR! syscall open
npm ERR! path /builds/user/project/package-lock.json
npm ERR! errno -2
npm ERR! enoent ENOENT: no such file or directory, open '/builds/user/project/package-lock.json'

Job failed

Diagnose: - package-lock.json fehlt im Repository - Fix: npm install lokal ausführen, package-lock.json committen

Szenario 2: Test fehlschlägt

$ npm test

FAIL  src/utils/helpers.test.js
  ● calculateTotal › should return correct sum

    expect(received).toBe(expected) // Object.is equality

    Expected: 15
    Received: 10

      12 |   it('should return correct sum', () => {
      13 |     const result = calculateTotal([5, 5]);
    > 14 |     expect(result).toBe(15);
      15 |   });

Test Suites: 1 failed, 1 passed, 2 total
Tests:       1 failed, 14 passed, 15 total

Job failed

Diagnose: - calculateTotal gibt 10 statt 15 zurück - Fix: Entweder Test-Erwartung falsch (15 → 10) oder Funktion buggy

Szenario 3: Docker-Image nicht gefunden

Pulling docker image registry.gitlab.com/user/project/custom-image:latest ...
ERROR: Job failed: failed to pull image "registry.gitlab.com/user/project/custom-image:latest": Error response from daemon: pull access denied for registry.gitlab.com/user/project/custom-image, repository does not exist or may require 'docker login'

Diagnose: - Image existiert nicht oder Runner hat keine Permissions - Fix: Image builden/pushen, oder DOCKER_AUTH_CONFIG setzen

26.3.4 Collapsible Sections: Eigene Sections erstellen

job:
  script:
    - echo -e "\e[0Ksection_start:$(date +%s):dependencies[collapsed=true]\r\e[0KInstalling dependencies"
    - npm ci
    - echo -e "\e[0Ksection_end:$(date +%s):dependencies\r\e[0K"
    
    - echo -e "\e[0Ksection_start:$(date +%s):tests\r\e[0KRunning tests"
    - npm test
    - echo -e "\e[0Ksection_end:$(date +%s):tests\r\e[0K"

Im Log: “Installing dependencies” und “Running tests” sind collapsible sections.

26.4 Artifacts: Download und Browse

26.4.1 Artifacts Download

Job-Seite: [Download artifacts] Button

Pipeline-Seite: [Download all artifacts] Button (ZIP mit allen Job-Artifacts)

Direkt-URL:

https://gitlab.com/user/project/-/jobs/456/artifacts/download

Via API:

curl --header "PRIVATE-TOKEN: $TOKEN" \
  "https://gitlab.com/api/v4/projects/:id/jobs/:job_id/artifacts"

26.4.2 Artifacts Browse

Job-Seite: [Browse artifacts] Button → File-Browser

artifacts/
├── build/
│   ├── index.html
│   ├── assets/
│   │   ├── main.js
│   │   └── style.css
│   └── images/
│       └── logo.png
├── coverage/
│   ├── index.html
│   └── lcov.info
└── test-results/
    └── junit.xml

File-View: Click auf File → Direkter View (für Text-Files, Images)

26.4.3 Artifacts in Merge Requests

Wenn Job reports generiert, werden sie in MR angezeigt:

test:
  script:
    - npm test
  artifacts:
    reports:
      junit: junit.xml
      coverage_report:
        coverage_format: cobertura
        path: coverage/cobertura.xml

In MR:

Merge Request #123
Pipeline: ✓ passed

Test summary: 15 passed, 0 failed
Coverage: 87.5% (+2.3%)

[View full report]

Click “View full report” → Detaillierte Test-/Coverage-Ansicht.

26.5 Pipeline-Status in verschiedenen Kontexten

26.5.1 1. Commit-View

Repository → Commits:

7a8f9b0c  Update documentation        ✓ Pipeline #123 passed
3b4c5d6e  Add new feature            ✗ Pipeline #122 failed
9d2e1f0a  Fix critical bug           ⚙ Pipeline #121 running

Hover über Status: Tooltip mit Stage-Breakdown.

26.5.2 2. Branch-View

Repository → Branches:

main           Last updated 2 hours ago   ✓ Pipeline #123 passed
feature-x      Last updated 5 hours ago   ✗ Pipeline #122 failed
release-v2.0   Last updated 1 day ago     ⏸ Manual deploy pending

26.5.3 3. Merge Request

Merge Request #123:

Pipeline: ✓ All checks passed

✓ build-job passed in 45s
✓ test-unit passed in 15s
✓ test-e2e passed in 30s
✓ lint passed in 10s
⏸ deploy-staging (manual)

[Merge when pipeline succeeds]

Merge-Button-States: - Enabled: Pipeline passed, kann gemerged werden - Disabled: Pipeline running oder failed - “Merge when pipeline succeeds”: Auto-merge wenn Pipeline passed

26.5.4 4. Environments

Deployments → Environments:

production     v2.1.0  Deployed 3 days ago   ✓ Healthy
staging        v2.2.0  Deployed 2 hours ago  ✓ Healthy
review/feat-x  v2.2.0  Deployed 5 hours ago  ⚙ Deploying

Click auf Environment → Deployment-Historie mit Pipeline-Links.

26.6 Monitoring und Metriken

26.6.1 GitLab-interne Metriken

Project → Analytics → CI/CD Analytics:

Pipeline Success Rate: 94.2%
Average Pipeline Duration: 3m 24s
Total Pipelines (last 30 days): 847

Pipeline Duration Trend:
   ┌─────────────────────────────┐
 5m│        ╭─╮                   │
   │       ╭╯ ╰╮      ╭╮          │
 4m│      ╭╯   ╰╮    ╭╯╰╮         │
   │     ╭╯     ╰╮  ╭╯  ╰╮        │
 3m│────╯       ╰──╯    ╰────────│
   └─────────────────────────────┘
    Week 1  Week 2  Week 3  Week 4

Metriken: - Pipeline Success Rate - Average Duration - Pipeline Runs per Day - Most Failed Jobs

26.6.2 Prometheus Integration

GitLab exportiert CI/CD-Metriken für Prometheus:

Metrics-Endpoint:

https://gitlab.com/-/metrics?token=$PROMETHEUS_TOKEN

Wichtige Metriken:

# Pipeline-Metriken
gitlab_ci_pipeline_duration_seconds{status="success"}
gitlab_ci_pipeline_duration_seconds{status="failed"}
gitlab_ci_pipeline_size{status="success"}

# Job-Metriken
gitlab_ci_job_duration_seconds{stage="build"}
gitlab_ci_job_duration_seconds{stage="test"}
gitlab_ci_job_duration_seconds{stage="deploy"}

# Runner-Metriken
gitlab_runner_jobs{state="running"}
gitlab_runner_jobs{state="pending"}

Grafana-Dashboard:

# Prometheus config
scrape_configs:
  - job_name: 'gitlab-ci'
    metrics_path: '/-/metrics'
    scheme: https
    static_configs:
      - targets: ['gitlab.com']
    bearer_token: $PROMETHEUS_TOKEN

Grafana visualisiert dann: - Pipeline-Duration über Zeit - Job-Failure-Rate per Stage - Runner-Queue-Length - Build-Throughput (Pipelines/Hour)

26.6.3 Custom Metrics mit Dotenv Reports

build:
  script:
    - npm run build
    - echo "BUILD_TIME=$(date +%s)" >> build.env
    - echo "BUILD_SIZE=$(du -sh dist | cut -f1)" >> build.env
  artifacts:
    reports:
      dotenv: build.env

deploy:
  needs: [build]
  script:
    - echo "Build was created at $BUILD_TIME"
    - echo "Build size is $BUILD_SIZE"

Variables aus dotenv-Report sind in nachfolgenden Jobs verfügbar.

26.7 Benachrichtigungen

26.7.1 Email-Notifications

User Settings → Notifications → Global notification level:

Custom per-project: Project → Settings → Notifications → Override

26.7.2 Slack-Integration

Project → Settings → Integrations → Slack notifications:

Slack webhook URL: https://hooks.slack.com/services/...
Channel: #ci-notifications
Username: GitLab CI

Triggers:
☑ Pipeline events
☑ Job events (failures only)

Slack-Message bei Pipeline-Failure:

GitLab CI [12:34 PM]
❌ Pipeline #123 failed on main
Project: awesome-project
Commit: 7a8f9b0c - Update documentation
Failed jobs: test-e2e (1m 23s)

View pipeline: https://gitlab.com/user/project/pipelines/123

26.7.3 Webhook-Integration

Project → Settings → Webhooks:

URL: https://my-service.com/gitlab-webhook
Trigger:
☑ Pipeline events

Secret token: abc123
SSL verification: ☑ Enable

Payload bei Pipeline-Event:

{
  "object_kind": "pipeline",
  "object_attributes": {
    "id": 123,
    "ref": "main",
    "status": "failed",
    "duration": 94,
    "stages": ["build", "test", "deploy"]
  },
  "project": {
    "name": "awesome-project",
    "web_url": "https://gitlab.com/user/project"
  },
  "commit": {
    "id": "7a8f9b0c",
    "message": "Update documentation",
    "author": {
      "name": "Andreas Mueller"
    }
  }
}

26.8 Retry und Manual Jobs

26.8.1 Retry-Mechanismen

1. Manual Retry (UI): - Pipeline-Seite: [Retry] Button (gesamte Pipeline) - Job-Seite: [Retry] Button (einzelner Job)

2. Automatic Retry (Config):

job:
  retry:
    max: 2
    when:
      - runner_system_failure
      - stuck_or_timeout_failure
      - script_failure
  script:
    - npm test

Retry-Szenarien: - runner_system_failure: Runner crashed - stuck_or_timeout_failure: Job timeout - script_failure: Command returned non-zero - api_failure: GitLab API error - always: Immer retry (vorsichtig!)

Best Practice: Retry nur für transiente Failures (Runner-Probleme), nicht für Code-Failures.

26.8.2 Manual Jobs

deploy:production:
  stage: deploy
  script:
    - ./deploy.sh production
  when: manual
  environment:
    name: production
    url: https://example.com

Im UI:

Pipeline #123

Stage: deploy
  deploy:production  ▶ Play  (manual)

“Play” Button: Startet Manual-Job.

Manual mit auto_cancel:

deploy:production:
  when: manual
  allow_failure: false  # Blockiert Pipeline wenn nicht ausgeführt

26.9 Troubleshooting-Workflows

26.9.1 Workflow 1: Pipeline Failed – Wo ist das Problem?

1. Pipeline-Übersicht ansehen:

Pipeline #123 ✗ failed
└─ Stage: test
   ├─ test-unit ✓ passed
   └─ test-e2e ✗ failed  ← Hier

2. Fehlgeschlagenen Job öffnen: Click test-e2e

3. Job-Log durchsuchen: Scroll zu “Job failed” Marker

4. Error-Message finden:

Error: Timeout waiting for element #submit-button
    at Timeout._onTimeout (test/e2e/checkout.spec.js:42:15)

5. Fix lokal testen:

git checkout $CI_COMMIT_SHA
npm install
npm run test:e2e

6. Fix committen und pushen: Pipeline läuft erneut.

26.9.2 Workflow 2: Pipeline langsam – Performance optimieren

1. Analytics ansehen: CI/CD Analytics → Pipeline Duration Trend

2. Bottleneck identifizieren:

build-job:    45s
test-unit:    15s
test-e2e:     3m 30s  ← Bottleneck!
lint:         10s
deploy:       20s

3. Optimierungen: - Caching hinzufügen (Dependencies) - Parallelisierung: test-e2e in mehrere Jobs splitten - Hardware: Faster Runner oder tags: [fast]

4. Messen:

test-e2e:
  parallel: 3  # Split in 3 parallel Jobs

Result: 3m 30s → 1m 15s (3x schneller)

26.9.3 Workflow 3: Flaky Tests – Instabile Tests debuggen

Symptom: Test schlägt manchmal fehl, manchmal nicht.

1. Job-History ansehen: Job-Seite → “Show complete raw” → Historische Logs

2. Patterns erkennen:

Run #1: ✓ passed
Run #2: ✗ failed (Timeout)
Run #3: ✓ passed
Run #4: ✗ failed (Timeout)

3. Retry mit Logging:

test-e2e:
  retry:
    max: 2
    when: script_failure
  variables:
    DEBUG: "cypress:*"
  script:
    - npm run test:e2e

4. Flakiness fixen: - Timeouts erhöhen - Wait for elements explicitly - Mock external APIs - Seed test data properly

26.10 Performance-Tips für große Pipelines

26.10.1 1. Parallele Jobs maximieren

stages:
  - test

test:
  parallel: 10  # 10 parallel Jobs
  script:
    - npm test -- --shard=$CI_NODE_INDEX/$CI_NODE_TOTAL

Effekt: 10 Jobs laufen parallel, jeder tested 1/10 der Tests.

26.10.2 2. Cache intelligent nutzen

cache:
  key: ${CI_COMMIT_REF_SLUG}-${CI_JOB_NAME}
  paths:
    - node_modules/
    - .npm/
  policy: pull-push  # Default

Keys: Branch + Job-Name → Separate Caches für verschiedene Jobs.

26.10.3 3. Artifacts minimieren

build:
  artifacts:
    paths:
      - dist/  # Nur das Nötigste
    exclude:
      - dist/**/*.map  # Sourcemaps ausschließen
    expire_in: 1 week

Warum: Artifact-Upload/-Download ist langsam bei großen Files.

26.10.4 4. Docker-Image-Optimierung

# ✗ Schlecht
FROM node:18

# ✓ Gut
FROM node:18-alpine  # 5x kleiner

Image-Pull: Kleinere Images → schnellerer Pull.

26.10.5 5. Interruptible Jobs

test:
  interruptible: true  # Kann abgebrochen werden
  script:
    - npm test

Use Case: Neuer Push zu Branch → alte Pipeline wird abgebrochen, neue startet sofort.

26.11 Zusammenfassung

GitLab-Pipelines sind nicht Black-Box – sie sind vollständig transparent:

Visualisierung: - Pipeline-Liste für Übersicht - Pipeline-Graph (Stage/DAG) für Struktur - Job-Logs für Details - Artifacts für Outputs

Monitoring: - CI/CD Analytics für Trends - Prometheus/Grafana für Metriken - Benachrichtigungen (Email, Slack, Webhook)

Troubleshooting: - Job-Logs lesen - Retry-Mechanismen nutzen - Flaky-Tests identifizieren - Performance optimieren

Best Practices: - Expandable Sections für übersichtliche Logs - Artifacts Browser für Quick-Inspect - Manual Jobs für Production-Deployments - Interruptible Jobs für Branches - Cache für Dependency-Performance

Master Pipeline-Visualisierung und Monitoring, und du verwandelst CI/CD von “Why is it failing?!” zu “I see exactly what happened and how to fix it”.