Non-Functional Requirements
Status: Final Version: 1.0
Purpose
Define performance, scalability, observability, backup, disaster recovery, and operational requirements for the ESG platform.
Performance SLAs
| Metric | Target | Measurement |
|---|---|---|
| API Response Time (p95) | < 500ms | GET requests, non-report generation |
| API Response Time (p99) | < 1s | GET requests |
| Submission Upload | < 2s | POST /api/v1/collector/submissions |
| Evidence Upload (10MB) | < 5s | POST evidence endpoint |
| Report Generation | < 5min | Async job, full GRI report (PDF) |
| Page Load Time | < 2s | Web dashboard (admin UI) |
Scalability Assumptions
| Dimension | v1 Assumptions | Scaling Strategy |
|---|---|---|
| Concurrent Users | 100 simultaneous collectors | Horizontal scaling (load balancer + multiple app servers) |
| Submissions/Day | 10,000 submissions | Queue workers scale horizontally |
| Data Volume | 100 GB/year (submissions + evidence) | S3 auto-scales, database vertical scaling |
| Tenants | 50 tenants | Multi-tenant DB with tenant_id sharding (vNext) |
Queue Configuration
| Queue Name | Worker Count | Memory Limit | Retry Limit | Timeout |
|---|---|---|---|---|
validations |
5 workers | 512 MB | 3 | 60s |
processing |
3 workers | 512 MB | 3 | 120s |
reporting |
2 workers | 1 GB | 2 | 600s (10 min) |
default |
2 workers | 256 MB | 3 | 90s |
Quarkus Async Configuration:
# application.properties
# Quarkus uses virtual threads (Project Loom) by default for async operations
quarkus.thread-pool.core-threads=2
quarkus.thread-pool.max-threads=10
quarkus.thread-pool.queue-size=100
@ApplicationScoped
class AsyncConfig {
@Produces
@ApplicationScoped
fun managedExecutor(): ManagedExecutor {
return ManagedExecutor.builder()
.maxAsync(10)
.maxQueued(100)
.build()
}
}
// Using virtual threads (recommended for Quarkus)
@ApplicationScoped
class SomeService {
@RunOnVirtualThread
fun asyncOperation(): Uni<Result> {
// Runs on virtual thread automatically
return Uni.createFrom().item { performWork() }
}
}
Background Jobs
| Job | Schedule | Purpose |
|---|---|---|
EscalateOverdueReviewsJob |
Hourly | Flag reviews overdue by 3+ days |
MarkEvidenceForDeletionJob |
Daily (midnight) | Mark expired evidence (7+ years) |
GeneratePeriodicReportsJob |
Weekly (Sunday) | Auto-generate draft reports for open periods |
PruneAuditLogsJob |
Monthly | Archive audit logs older than 2 years to cold storage |
Observability
Logging
Quarkus Logging Configuration (JBoss Logging):
- daily: Application logs (7-day rotation)
- error: Application errors, exceptions
- syslog: Critical errors sent to external SIEM
Configuration:
# Quarkus logging configuration
quarkus.log.level=INFO
quarkus.log.category."com.example.esg".level=INFO
quarkus.log.console.json=true
quarkus.log.console.json.pretty-print=false
# File logging
quarkus.log.file.enable=true
quarkus.log.file.path=/var/log/esg-platform/application.log
quarkus.log.file.rotation.max-file-size=10M
quarkus.log.file.rotation.max-backup-index=7
quarkus.log.file.rotation.file-suffix=.yyyy-MM-dd
Structured Logging:
import org.jboss.logging.Logger
@ApplicationScoped
class SubmissionService {
private val logger = Logger.getLogger(SubmissionService::class.java)
fun processSubmission(submission: Submission) {
logger.infof(
"Submission processed: submission_id=%s, tenant_id=%s, state=%s, processing_time_ms=%d",
submission.id,
submission.tenantId,
submission.state,
processingTime
)
}
}
Metrics
Quarkus Observability Stack:
- MicroProfile Metrics: /q/metrics endpoint with Prometheus format
- MicroProfile Health: /q/health/live, /q/health/ready, /q/health/started
- OpenTelemetry Tracing: Distributed tracing with Jaeger/Tempo
- Dev UI: Development-time observability at /q/dev
Production Monitoring (Prometheus + Micrometer + Grafana): - API response times (histogram) - Queue depth (gauge) - Submission throughput (counter) - Error rates (counter) - JVM metrics (heap, threads, GC) for JVM mode - Native metrics (RSS memory) for native mode
Configuration:
# Enable Micrometer metrics
quarkus.micrometer.enabled=true
quarkus.micrometer.export.prometheus.enabled=true
quarkus.micrometer.binder.http-server.enabled=true
quarkus.micrometer.binder.jvm.enabled=true
# MicroProfile Health
quarkus.smallrye-health.root-path=/q/health
quarkus.smallrye-health.liveness-path=/q/health/live
quarkus.smallrye-health.readiness-path=/q/health/ready
# OpenTelemetry tracing
quarkus.otel.enabled=true
quarkus.otel.exporter.otlp.endpoint=http://jaeger:4317
quarkus.otel.traces.exporter=otlp
quarkus.otel.metrics.exporter=none
Custom Application Metrics:
import io.micrometer.core.instrument.MeterRegistry
import io.micrometer.core.instrument.Timer
import jakarta.enterprise.context.ApplicationScoped
import jakarta.inject.Inject
@ApplicationScoped
class SubmissionMetrics {
@Inject
lateinit var registry: MeterRegistry
fun recordSubmissionProcessed(state: String, processingTime: Long) {
registry.counter(
"esg_submissions_total",
"state", state
).increment()
Timer.builder("esg_submission_processing_time")
.tag("state", state)
.register(registry)
.record(processingTime, TimeUnit.MILLISECONDS)
}
fun updateQueueDepth(queue: String, depth: Int) {
registry.gauge(
"esg_queue_depth",
listOf(Tag.of("queue", queue)),
depth
)
}
}
Key Metrics:
esg_submissions_total{state="approved"} 5000
http_server_requests_seconds{endpoint="/submissions",quantile="0.95"} 0.45
esg_queue_depth{queue="validations"} 12
esg_submission_processing_time_seconds{state="processed",quantile="0.95"} 0.15
process_resident_memory_bytes 104857600
jvm_memory_used_bytes{area="heap"} 268435456
Distributed Tracing
Quarkus OpenTelemetry Integration: - Automatic span creation for JAX-RS endpoints - Database query tracing with Hibernate - Message queue tracing with SmallRye Reactive Messaging - Custom span instrumentation for business logic
Configuration:
# OpenTelemetry tracing
quarkus.otel.enabled=true
quarkus.otel.exporter.otlp.endpoint=http://jaeger-collector:4317
quarkus.otel.traces.exporter=otlp
quarkus.otel.traces.sampler=parentbased_traceidratio
quarkus.otel.traces.sampler.arg=1.0
# Service identification
quarkus.application.name=esg-platform
quarkus.otel.resource.attributes=service.name=esg-platform,service.version=1.0.0,deployment.environment=production
Custom Span Instrumentation:
import io.opentelemetry.api.trace.Tracer
import io.opentelemetry.api.trace.Span
import io.opentelemetry.api.trace.StatusCode
import jakarta.enterprise.context.ApplicationScoped
import jakarta.inject.Inject
@ApplicationScoped
class SubmissionService {
@Inject
lateinit var tracer: Tracer
fun processSubmission(submission: Submission) {
val span = tracer.spanBuilder("process-submission")
.setAttribute("submission.id", submission.id.toString())
.setAttribute("tenant.id", submission.tenantId.toString())
.setAttribute("metric.type", submission.metricType)
.startSpan()
try {
span.makeCurrent().use {
// Business logic here
validateSubmission(submission)
persistSubmission(submission)
span.setStatus(StatusCode.OK)
}
} catch (e: Exception) {
span.setStatus(StatusCode.ERROR, e.message ?: "Unknown error")
span.recordException(e)
throw e
} finally {
span.end()
}
}
}
Trace Visualization: - Use Jaeger UI to visualize end-to-end request flows - Identify performance bottlenecks across services - Correlate logs with traces using trace IDs - Monitor error rates and latency distributions
Alerting
Alerts (PagerDuty/Slack): - API p95 latency > 1s for 5 minutes - Queue depth > 100 for 10 minutes - Error rate > 5% for 5 minutes - Disk usage > 80% - Health check failures (3 consecutive failures)
Development Observability
Quarkus Dev UI (http://localhost:8080/q/dev in dev mode):
- Configuration Editor: Browse and modify application properties
- Arc Container: Inspect CDI beans and dependencies
- Health Checks: Test liveness, readiness, and startup probes
- Metrics: View MicroProfile Metrics in real-time
- OpenAPI: Interactive Swagger UI for API testing
- Database Console: Query datasources (H2, PostgreSQL)
- Scheduler: View and trigger scheduled jobs
- Continuous Testing: Run and monitor tests on code changes
Dev Services (Automatic Testcontainers): - PostgreSQL: Auto-started on port 5432 (or random) - RabbitMQ: Auto-started for message queue testing - Redis: Auto-started for caching (if configured) - No manual setup required - Quarkus starts containers automatically
Development Mode Features:
# Start dev mode with hot reload and continuous testing
./mvnw quarkus:dev
# Console commands:
# r - Re-run tests
# s - Open test report in browser
# d - Open Dev UI in browser
# h - Show help
# q - Quit
Reflection Configuration for Native Images:
# Quarkus automatically detects most reflection needs
# Manual registration if needed:
quarkus.native.additional-build-args=\
-H:+ReportExceptionStackTraces,\
--initialize-at-run-time=io.netty.handler.ssl.ReferenceCountedOpenSslEngine
Backups
| Resource | Frequency | Retention | RTO | RPO |
|---|---|---|---|---|
| PostgreSQL | Continuous (WAL archiving) | 30 days | 1 hour | 5 minutes |
| S3 Evidence Bucket | Versioning enabled | 7 years | N/A (immutable) | Real-time |
| Application Code | Git (on push) | Indefinite | 15 minutes | N/A |
| Secrets | AWS Secrets Manager backup | 90 days | 30 minutes | N/A |
Database Backup:
# Automated via AWS RDS automated backups
# Snapshot retention: 30 days
# Point-in-time recovery (PITR): 5-minute granularity
Disaster Recovery
RTO & RPO Targets
- RTO (Recovery Time Objective): 4 hours (critical period: data collection deadlines)
- RPO (Recovery Point Objective): 1 hour (max acceptable data loss)
DR Strategy
Multi-AZ Deployment (AWS): - RDS: Multi-AZ automatic failover (< 2 min) - Application servers: Auto Scaling Group across 2+ AZs - S3: Cross-region replication (CRR) to DR region
Failover Procedure:
1. Detect outage (CloudWatch alarms)
2. Promote RDS standby (automatic)
3. Route 53 DNS failover to DR region load balancer
4. Notify team via PagerDuty
5. Verify application health checks (/q/health/live, /q/health/ready)
Data Residency
GDPR Requirement: EU customer data must remain in EU regions.
Implementation:
- Tenant metadata includes data_region (e.g., eu-west-1, us-east-1)
- S3 buckets per region: evidence-eu-west-1, evidence-us-east-1
- Database: Regional RDS instances (future multi-region support)
val evidenceBucket = "evidence-${tenant.dataRegion}"
storageService.putFile(bucket = evidenceBucket, key = path, file = file.bytes)
Acceptance Criteria
- API p95 latency < 500ms under normal load (100 users)
- Queue workers configured with correct retry/timeout limits
- PostgreSQL automated backups enabled (30-day retention)
- S3 versioning enabled for evidence bucket
- CloudWatch alarms configured for latency, queue depth, errors
- RDS Multi-AZ enabled for high availability
- Scheduled jobs run as expected (cron verified)
Cross-References
Change Log
| Version | Date | Author | Changes |
|---|---|---|---|
| 1.0 | 2026-01-03 | Senior Product Architect | Initial NFR specification |