Troubleshooting Guide¶

Common errors, their root causes, and solutions for the Verity platform.

Common Errors¶

PostgreSQL INET Returns IPv4Address¶

Symptom: Pydantic validation error when loading AccessEvent from the database:

ValidationError: 1 validation error for AccessEvent
source_ip
  Input should be a valid string [type=string_type]

Cause: PostgreSQL's INET column type is returned by asyncpg as a Python IPv4Address object, not a string. Pydantic's strict validation rejects it.

Solution: The AccessEvent model includes a field_validator that coerces IPv4Address to string:

@field_validator("source_ip", mode="before")
@classmethod
def _coerce_source_ip(cls, v: object) -> Optional[str]:
    if v is None:
        return None
    return str(v)

If you encounter this with a new model, add the same validator pattern.

SQLAlchemy `metadata` Reserved Name¶

Symptom: SQLAlchemy error or missing data when the Pydantic model has a metadata field:

AttributeError: 'Principal' object has no attribute 'metadata'

Cause: SQLAlchemy's declarative base uses metadata as a reserved class attribute for table metadata. The ORM column must be named metadata_ to avoid the conflict.

Solution: The domain models use AliasChoices to accept both names:

metadata: dict = Field(
    default_factory=dict,
    validation_alias=AliasChoices("metadata_", "metadata")
)

With from_attributes=True and populate_by_name=True on VerityModel, Pydantic reads metadata_ from the ORM object and exposes it as metadata in the API.

ClickHouse Nullable(LowCardinality()) Ordering¶

Symptom: ClickHouse error when ordering or grouping by a Nullable(LowCardinality(String)) column:

DB::Exception: Argument at index 1 for function less must not be Nullable

Cause: ClickHouse does not support direct comparison/ordering on Nullable(LowCardinality(...)) columns in some contexts.

Solution: Use assumeNotNull() or coalesce() in your queries:

-- Option 1: assumeNotNull
SELECT * FROM audit_events
ORDER BY assumeNotNull(risk_level);

-- Option 2: coalesce with a default
SELECT * FROM audit_events
ORDER BY coalesce(risk_level, 'UNKNOWN');

When creating new ClickHouse tables, prefer LowCardinality(String) without Nullable where possible, using an empty string as the default.

Temporal DB=postgres12 (Not postgresql)¶

Symptom: Temporal fails to start with database connection errors:

unable to connect to database: invalid dsn

Cause: The Temporal auto-setup image requires DB=postgres12 as the database type environment variable, not DB=postgresql or DB=postgres.

Solution: Ensure the docker-compose or Helm configuration uses:

environment:
  DB: postgres12          # NOT "postgresql" or "postgres"
  DB_PORT: "5432"
  POSTGRES_USER: verity
  POSTGRES_PWD: verity_dev
  POSTGRES_SEEDS: postgres

Python Dashed Module Names in Containers¶

Symptom: ModuleNotFoundError when importing a service in a Docker container:

ModuleNotFoundError: No module named 'api-gateway'

Cause: Python does not allow hyphens in module/package names. Directory names like api-gateway/ cannot be imported directly.

Solution: Verity services use underscored package names internally (e.g., api_gateway/) while the directory and Docker image names use hyphens. Ensure your Dockerfile WORKDIR and CMD reference the correct Python module path:

# Correct
CMD ["python", "-m", "api_gateway.main"]

# Wrong
CMD ["python", "-m", "api-gateway.main"]

structlog `add_logger_name` Crash with PrintLoggerFactory¶

Symptom: Application crashes on startup with:

AttributeError: 'PrintLogger' object has no attribute 'name'

Cause: The structlog.stdlib.add_logger_name processor expects a stdlib Logger object with a name attribute. When using structlog.PrintLoggerFactory (which creates PrintLogger instances), this processor fails because PrintLogger has no name attribute.

Solution: Verity's logging configuration intentionally omits add_logger_name from the processor chain and uses PrintLoggerFactory(sys.stdout) for direct JSON output:

structlog.configure(
    processors=[
        structlog.contextvars.merge_contextvars,
        structlog.stdlib.add_log_level,        # ← This is fine
        # structlog.stdlib.add_logger_name,    # ← DO NOT use with PrintLoggerFactory
        structlog.processors.TimeStamper(fmt="iso"),
        ...
        structlog.processors.JSONRenderer(),
    ],
    logger_factory=structlog.PrintLoggerFactory(sys.stdout),
)

If you need the logger name, switch to structlog.stdlib.LoggerFactory() and use the stdlib logging integration.

Health Check Endpoints¶

All Verity services expose the following endpoints:

Endpoint	Method	Purpose	Expected Response
`/health`	GET	Liveness probe	`200 OK` with `{"status": "ok"}`
`/health/ready`	GET	Readiness probe	`200 OK` when all dependencies are healthy
`/v1/metrics`	GET	Prometheus metrics	Prometheus text format

Check Health from Inside the Cluster¶

# API Gateway health
kubectl exec -n verity -it deploy/verity-api-gateway -- \
  curl -s http://localhost:8000/health | python -m json.tool

# Check readiness
kubectl exec -n verity -it deploy/verity-api-gateway -- \
  curl -s http://localhost:8000/health/ready | python -m json.tool

Check via Port Forward¶

kubectl port-forward -n verity svc/verity-api-gateway 8000:8000
curl http://localhost:8000/health

Log Analysis Commands¶

View Structured Logs¶

# Recent logs from a service (JSON format)
kubectl logs -n verity -l app.kubernetes.io/component=decay-engine --tail=50

# Parse JSON logs with jq
kubectl logs -n verity -l app.kubernetes.io/component=decay-engine --tail=100 | \
  jq -r '. | "\(.timestamp) [\(.level)] \(.event)"'

# Filter for errors
kubectl logs -n verity -l app.kubernetes.io/component=api-gateway --tail=500 | \
  jq 'select(.level == "error")'

# Search for a specific trace ID
kubectl logs -n verity --all-containers --tail=1000 | \
  jq 'select(.trace_id == "abc-123-trace")'

Follow Logs in Real-Time¶

# Follow a single service
kubectl logs -n verity -l app.kubernetes.io/component=ingestion -f

# Follow with jq formatting
kubectl logs -n verity -l app.kubernetes.io/component=ingestion -f | \
  jq -r '. | "\(.timestamp) [\(.level)] \(.service): \(.event)"'

Aggregate Logs Across Services¶

# All errors in the last 5 minutes
kubectl logs -n verity --all-containers --since=5m | \
  jq 'select(.level == "error")' | \
  jq -r '. | "\(.service): \(.event)"'

Database Connection Issues¶

PostgreSQL Connection Refused¶

Symptom: Service logs show:

ConnectionRefusedError: [Errno 111] Connection refused

Diagnosis:

# Check PostgreSQL pod
kubectl get pods -n verity -l app.kubernetes.io/component=postgresql

# Check PostgreSQL logs
kubectl logs -n verity -l app.kubernetes.io/component=postgresql --tail=50

# Test connectivity from a service pod
kubectl exec -n verity -it deploy/verity-api-gateway -- \
  python -c "import asyncio, asyncpg; asyncio.run(asyncpg.connect('postgresql://verity:verity@verity-postgres:5432/verity'))"

Common causes:

PostgreSQL pod not running or restarting
Network policy blocking the connection
Incorrect DB_HOST in ConfigMap
Connection pool exhausted (max_connections reached)

ClickHouse Connection Timeout¶

# Test ClickHouse connectivity
kubectl exec -n verity -it deploy/verity-audit-writer -- \
  curl -s "http://verity-clickhouse:8123/?query=SELECT%201"

Kafka Consumer Group Debugging¶

Consumer Not Joining Group¶