Skip to content

Non-Functional Requirements

Status: Final Version: 1.0


Purpose

Define performance, scalability, observability, backup, disaster recovery, and operational requirements for the ESG platform.


Performance SLAs

Metric Target Measurement
API Response Time (p95) < 500ms GET requests, non-report generation
API Response Time (p99) < 1s GET requests
Submission Upload < 2s POST /api/v1/collector/submissions
Evidence Upload (10MB) < 5s POST evidence endpoint
Report Generation < 5min Async job, full GRI report (PDF)
Page Load Time < 2s Web dashboard (admin UI)

Scalability Assumptions

Dimension v1 Assumptions Scaling Strategy
Concurrent Users 100 simultaneous collectors Horizontal scaling (load balancer + multiple app servers)
Submissions/Day 10,000 submissions Queue workers scale horizontally
Data Volume 100 GB/year (submissions + evidence) S3 auto-scales, database vertical scaling
Tenants 50 tenants Multi-tenant DB with tenant_id sharding (vNext)

Queue Configuration

Queue Name Worker Count Memory Limit Retry Limit Timeout
validations 5 workers 512 MB 3 60s
processing 3 workers 512 MB 3 120s
reporting 2 workers 1 GB 2 600s (10 min)
default 2 workers 256 MB 3 90s

Quarkus Async Configuration:

# application.properties
# Quarkus uses virtual threads (Project Loom) by default for async operations
quarkus.thread-pool.core-threads=2
quarkus.thread-pool.max-threads=10
quarkus.thread-pool.queue-size=100

@ApplicationScoped
class AsyncConfig {
    @Produces
    @ApplicationScoped
    fun managedExecutor(): ManagedExecutor {
        return ManagedExecutor.builder()
            .maxAsync(10)
            .maxQueued(100)
            .build()
    }
}

// Using virtual threads (recommended for Quarkus)
@ApplicationScoped
class SomeService {
    @RunOnVirtualThread
    fun asyncOperation(): Uni<Result> {
        // Runs on virtual thread automatically
        return Uni.createFrom().item { performWork() }
    }
}

Background Jobs

Job Schedule Purpose
EscalateOverdueReviewsJob Hourly Flag reviews overdue by 3+ days
MarkEvidenceForDeletionJob Daily (midnight) Mark expired evidence (7+ years)
GeneratePeriodicReportsJob Weekly (Sunday) Auto-generate draft reports for open periods
PruneAuditLogsJob Monthly Archive audit logs older than 2 years to cold storage

Observability

Logging

Quarkus Logging Configuration (JBoss Logging): - daily: Application logs (7-day rotation) - error: Application errors, exceptions - syslog: Critical errors sent to external SIEM

Configuration:

# Quarkus logging configuration
quarkus.log.level=INFO
quarkus.log.category."com.example.esg".level=INFO
quarkus.log.console.json=true
quarkus.log.console.json.pretty-print=false

# File logging
quarkus.log.file.enable=true
quarkus.log.file.path=/var/log/esg-platform/application.log
quarkus.log.file.rotation.max-file-size=10M
quarkus.log.file.rotation.max-backup-index=7
quarkus.log.file.rotation.file-suffix=.yyyy-MM-dd

Structured Logging:

import org.jboss.logging.Logger

@ApplicationScoped
class SubmissionService {
    private val logger = Logger.getLogger(SubmissionService::class.java)

    fun processSubmission(submission: Submission) {
        logger.infof(
            "Submission processed: submission_id=%s, tenant_id=%s, state=%s, processing_time_ms=%d",
            submission.id,
            submission.tenantId,
            submission.state,
            processingTime
        )
    }
}

Metrics

Quarkus Observability Stack: - MicroProfile Metrics: /q/metrics endpoint with Prometheus format - MicroProfile Health: /q/health/live, /q/health/ready, /q/health/started - OpenTelemetry Tracing: Distributed tracing with Jaeger/Tempo - Dev UI: Development-time observability at /q/dev

Production Monitoring (Prometheus + Micrometer + Grafana): - API response times (histogram) - Queue depth (gauge) - Submission throughput (counter) - Error rates (counter) - JVM metrics (heap, threads, GC) for JVM mode - Native metrics (RSS memory) for native mode

Configuration:

# Enable Micrometer metrics
quarkus.micrometer.enabled=true
quarkus.micrometer.export.prometheus.enabled=true
quarkus.micrometer.binder.http-server.enabled=true
quarkus.micrometer.binder.jvm.enabled=true

# MicroProfile Health
quarkus.smallrye-health.root-path=/q/health
quarkus.smallrye-health.liveness-path=/q/health/live
quarkus.smallrye-health.readiness-path=/q/health/ready

# OpenTelemetry tracing
quarkus.otel.enabled=true
quarkus.otel.exporter.otlp.endpoint=http://jaeger:4317
quarkus.otel.traces.exporter=otlp
quarkus.otel.metrics.exporter=none

Custom Application Metrics:

import io.micrometer.core.instrument.MeterRegistry
import io.micrometer.core.instrument.Timer
import jakarta.enterprise.context.ApplicationScoped
import jakarta.inject.Inject

@ApplicationScoped
class SubmissionMetrics {
    @Inject
    lateinit var registry: MeterRegistry

    fun recordSubmissionProcessed(state: String, processingTime: Long) {
        registry.counter(
            "esg_submissions_total",
            "state", state
        ).increment()

        Timer.builder("esg_submission_processing_time")
            .tag("state", state)
            .register(registry)
            .record(processingTime, TimeUnit.MILLISECONDS)
    }

    fun updateQueueDepth(queue: String, depth: Int) {
        registry.gauge(
            "esg_queue_depth",
            listOf(Tag.of("queue", queue)),
            depth
        )
    }
}

Key Metrics:

esg_submissions_total{state="approved"} 5000
http_server_requests_seconds{endpoint="/submissions",quantile="0.95"} 0.45
esg_queue_depth{queue="validations"} 12
esg_submission_processing_time_seconds{state="processed",quantile="0.95"} 0.15
process_resident_memory_bytes 104857600
jvm_memory_used_bytes{area="heap"} 268435456

Distributed Tracing

Quarkus OpenTelemetry Integration: - Automatic span creation for JAX-RS endpoints - Database query tracing with Hibernate - Message queue tracing with SmallRye Reactive Messaging - Custom span instrumentation for business logic

Configuration:

# OpenTelemetry tracing
quarkus.otel.enabled=true
quarkus.otel.exporter.otlp.endpoint=http://jaeger-collector:4317
quarkus.otel.traces.exporter=otlp
quarkus.otel.traces.sampler=parentbased_traceidratio
quarkus.otel.traces.sampler.arg=1.0

# Service identification
quarkus.application.name=esg-platform
quarkus.otel.resource.attributes=service.name=esg-platform,service.version=1.0.0,deployment.environment=production

Custom Span Instrumentation:

import io.opentelemetry.api.trace.Tracer
import io.opentelemetry.api.trace.Span
import io.opentelemetry.api.trace.StatusCode
import jakarta.enterprise.context.ApplicationScoped
import jakarta.inject.Inject

@ApplicationScoped
class SubmissionService {
    @Inject
    lateinit var tracer: Tracer

    fun processSubmission(submission: Submission) {
        val span = tracer.spanBuilder("process-submission")
            .setAttribute("submission.id", submission.id.toString())
            .setAttribute("tenant.id", submission.tenantId.toString())
            .setAttribute("metric.type", submission.metricType)
            .startSpan()

        try {
            span.makeCurrent().use {
                // Business logic here
                validateSubmission(submission)
                persistSubmission(submission)
                span.setStatus(StatusCode.OK)
            }
        } catch (e: Exception) {
            span.setStatus(StatusCode.ERROR, e.message ?: "Unknown error")
            span.recordException(e)
            throw e
        } finally {
            span.end()
        }
    }
}

Trace Visualization: - Use Jaeger UI to visualize end-to-end request flows - Identify performance bottlenecks across services - Correlate logs with traces using trace IDs - Monitor error rates and latency distributions

Alerting

Alerts (PagerDuty/Slack): - API p95 latency > 1s for 5 minutes - Queue depth > 100 for 10 minutes - Error rate > 5% for 5 minutes - Disk usage > 80% - Health check failures (3 consecutive failures)

Development Observability

Quarkus Dev UI (http://localhost:8080/q/dev in dev mode): - Configuration Editor: Browse and modify application properties - Arc Container: Inspect CDI beans and dependencies - Health Checks: Test liveness, readiness, and startup probes - Metrics: View MicroProfile Metrics in real-time - OpenAPI: Interactive Swagger UI for API testing - Database Console: Query datasources (H2, PostgreSQL) - Scheduler: View and trigger scheduled jobs - Continuous Testing: Run and monitor tests on code changes

Dev Services (Automatic Testcontainers): - PostgreSQL: Auto-started on port 5432 (or random) - RabbitMQ: Auto-started for message queue testing - Redis: Auto-started for caching (if configured) - No manual setup required - Quarkus starts containers automatically

Development Mode Features:

# Start dev mode with hot reload and continuous testing
./mvnw quarkus:dev

# Console commands:
# r - Re-run tests
# s - Open test report in browser
# d - Open Dev UI in browser
# h - Show help
# q - Quit

Reflection Configuration for Native Images:

# Quarkus automatically detects most reflection needs
# Manual registration if needed:
quarkus.native.additional-build-args=\
  -H:+ReportExceptionStackTraces,\
  --initialize-at-run-time=io.netty.handler.ssl.ReferenceCountedOpenSslEngine


Backups

Resource Frequency Retention RTO RPO
PostgreSQL Continuous (WAL archiving) 30 days 1 hour 5 minutes
S3 Evidence Bucket Versioning enabled 7 years N/A (immutable) Real-time
Application Code Git (on push) Indefinite 15 minutes N/A
Secrets AWS Secrets Manager backup 90 days 30 minutes N/A

Database Backup:

# Automated via AWS RDS automated backups
# Snapshot retention: 30 days
# Point-in-time recovery (PITR): 5-minute granularity


Disaster Recovery

RTO & RPO Targets

  • RTO (Recovery Time Objective): 4 hours (critical period: data collection deadlines)
  • RPO (Recovery Point Objective): 1 hour (max acceptable data loss)

DR Strategy

Multi-AZ Deployment (AWS): - RDS: Multi-AZ automatic failover (< 2 min) - Application servers: Auto Scaling Group across 2+ AZs - S3: Cross-region replication (CRR) to DR region

Failover Procedure: 1. Detect outage (CloudWatch alarms) 2. Promote RDS standby (automatic) 3. Route 53 DNS failover to DR region load balancer 4. Notify team via PagerDuty 5. Verify application health checks (/q/health/live, /q/health/ready)


Data Residency

GDPR Requirement: EU customer data must remain in EU regions.

Implementation: - Tenant metadata includes data_region (e.g., eu-west-1, us-east-1) - S3 buckets per region: evidence-eu-west-1, evidence-us-east-1 - Database: Regional RDS instances (future multi-region support)

val evidenceBucket = "evidence-${tenant.dataRegion}"
storageService.putFile(bucket = evidenceBucket, key = path, file = file.bytes)

Acceptance Criteria

  • API p95 latency < 500ms under normal load (100 users)
  • Queue workers configured with correct retry/timeout limits
  • PostgreSQL automated backups enabled (30-day retention)
  • S3 versioning enabled for evidence bucket
  • CloudWatch alarms configured for latency, queue depth, errors
  • RDS Multi-AZ enabled for high availability
  • Scheduled jobs run as expected (cron verified)

Cross-References


Change Log

Version Date Author Changes
1.0 2026-01-03 Senior Product Architect Initial NFR specification