14 KiB
Python Style Guide
Target: Python 3.12+ | Formatter/Linter: Ruff | Config: backend/pyproject.toml
This guide codifies the conventions used across the backend, and prescribes the target style for code written during the refactor (Phases 3-6). Existing code should be migrated incrementally -- don't reformat entire files in unrelated PRs.
Formatting
Enforced by ruff format (Black-compatible).
- Line length: 120 characters.
- Indent: 4 spaces. No tabs.
- Trailing commas: Required on multi-line function signatures, arguments, collections.
- Quotes: Double quotes (
") for strings. Single quotes are acceptable in f-string expressions and dict keys inside f-strings where avoiding escapes improves readability.
Run: ruff format backend/
Imports
Enforced by ruff's isort rules (rule set I).
Grouping -- three blocks separated by a blank line:
import asyncio # 1. stdlib
from pathlib import Path
import numpy as np # 2. third-party
from fastapi import APIRouter, HTTPException
from sqlalchemy.orm import Session
from backend.config import get_data_dir # 3. local (absolute)
from .database import get_db # or relative
Rules:
- Within the
backendpackage, use relative imports for sibling/child modules:from .database import get_db,from ..utils.audio import load_audio. - Absolute imports are fine for top-level references from entry points (
main.py,server.py). - Never use wildcard imports (
from module import *). - One import per line for
from X import Ywhen there are 4+ names; below that, comma-separated is fine. - Lazy imports are acceptable for heavy dependencies (torch, transformers, mlx) inside functions to reduce startup time. Add a comment:
# lazy: heavy import.
Type Annotations
Python 3.12 means we use built-in generics and union syntax natively. No from __future__ import annotations, no typing.List/typing.Dict.
# Yes
def process(items: list[str], config: dict[str, int] | None = None) -> tuple[int, str]: ...
# No
from typing import List, Dict, Optional, Tuple
def process(items: List[str], config: Optional[Dict[str, int]] = None) -> Tuple[int, str]: ...
What to annotate:
- All public function signatures (parameters + return type).
- Private functions: parameters at minimum; return type encouraged.
- Module-level variables: only when the type isn't obvious from the assignment.
- Route handlers: parameters are annotated via FastAPI's dependency injection. Add explicit
-> SomeResponsereturn types when the route doesn't useresponse_model.
Imports from typing that are still needed (no built-in equivalent):
Literal, TypeAlias, Protocol, runtime_checkable, Callable, Any, ClassVar, TypeVar, overload, TYPE_CHECKING.
Use collections.abc for abstract types: Sequence, Mapping, Iterable, Iterator, Generator.
Naming
| Thing | Convention | Example |
|---|---|---|
| Module | snake_case |
task_queue.py |
| Class | PascalCase |
ProgressManager |
| Function / method | snake_case |
create_profile |
| Variable | snake_case |
sample_rate |
| Constant | UPPER_SNAKE_CASE |
DEFAULT_SAMPLE_RATE |
| Private | _leading_underscore |
_generation_queue |
| Type alias | PascalCase |
EffectChain = list[dict[str, Any]] |
Specific conventions:
- Database ORM models imported with
DBprefix alias:from .database import VoiceProfile as DBVoiceProfile. - Pydantic models use descriptive suffixes:
VoiceProfileCreate,VoiceProfileResponse,GenerationRequest. - Backend classes use engine-name prefix:
MLXTTSBackend,PyTorchSTTBackend.
Docstrings
Google style. Required on all public functions, classes, and modules.
def combine_voice_prompts(
profile_dir: Path,
*,
target_sr: int = 24000,
) -> tuple[np.ndarray, int]:
"""Load and concatenate all voice prompt files for a profile.
Reads .wav/.mp3/.flac files from the profile directory, resamples to
the target sample rate, normalizes, and concatenates into a single array.
Args:
profile_dir: Path to the voice profile directory containing audio files.
target_sr: Target sample rate for the output. Defaults to 24000.
Returns:
Tuple of (concatenated audio array, sample rate).
Raises:
FileNotFoundError: If profile_dir does not exist.
ValueError: If no valid audio files are found.
"""
Short form is fine for simple functions:
def get_db_path() -> Path:
"""Get the path to the SQLite database file."""
When to skip: Private helpers under ~5 lines where the name and signature make intent obvious.
Module docstrings: A single sentence at the top of every file describing its purpose.
"""Voice profile CRUD operations."""
Comments
Comments explain why, not what. If the code needs a comment to explain what it does, the code should be rewritten to be clearer. The exceptions are non-obvious performance choices, external constraints, and concurrency/race-condition reasoning -- those always deserve a comment.
No section dividers
Do not use ASCII dividers to create visual sections in files:
# No -- any of these:
# ============================================
# GENERATION ENDPOINTS
# ============================================
# ---------------------------------------------------------------------------
# Device detection
# ---------------------------------------------------------------------------
# --- Load model --------------------------------------------------
If a file needs section dividers to be navigable, the file is too long. Split it into modules. Within a function, if you need labeled sections to follow the logic, extract those sections into named functions.
Inline comments
Inline comments (end-of-line) are fine when they add information the code can't express:
# Yes -- explains a non-obvious constraint or gives context:
audio, sr = load_audio(path, sr=24000) # Qwen expects 24kHz mono
_generation_queue: asyncio.Queue = None # type: ignore # initialized at startup
"tauri://localhost", # Tauri webview (macOS)
# No -- restates the code:
# Check if profile name already exists
existing = db.query(DBVoiceProfile).filter_by(name=data.name).first()
# Delete from database
db.delete(sample)
# Update fields
profile.name = data.name
Delete comments that narrate what the next line of code obviously does. If the function name, variable name, or method call already communicates intent, the comment is noise.
Block comments
Use block comments for why explanations -- constraints, workarounds, non-obvious decisions:
# PyInstaller + multiprocessing: child processes re-execute the frozen binary
# with internal arguments. freeze_support() handles this and exits early.
multiprocessing.freeze_support()
# Mark any stale "generating" records as failed -- these are leftovers
# from a previous process that was killed mid-generation.
db.query(Generation).filter_by(status="generating").update({"status": "failed"})
Keep block comments tight. Two to three lines is normal. If you need a paragraph, it probably belongs in the docstring or a design doc.
Linter/type-checker suppression
Always add a reason after noqa and type: ignore:
import intel_extension_for_pytorch # noqa: F401 -- side-effect import enables XPU
_queue: asyncio.Queue = None # type: ignore[assignment] # initialized at startup
Bare # noqa or # type: ignore with no explanation are not allowed.
TODO / FIXME
Use sparingly. Every TODO must include a brief description of what needs doing. Don't use them as a substitute for tracking work properly:
# TODO: replace with async SQLAlchemy once CRUD modules are migrated (Phase 5)
result = await asyncio.to_thread(profiles.get_profile, profile_id, db)
Never commit HACK, XXX, or FIXME -- fix the problem or file an issue.
Commented-out code
Delete it. That's what git is for. If you need to document that something was intentionally removed, a short tombstone comment is acceptable:
# Removed config.json-only check -- too lenient, doesn't confirm weights exist.
Error Handling
The refactor is standardizing on a two-layer pattern:
1. Domain layer -- raise plain exceptions
CRUD modules and services raise ValueError, FileNotFoundError, or (post-refactor) custom exceptions defined in backend/errors.py:
# backend/errors.py (to be created in Phase 4)
class NotFoundError(Exception):
"""Raised when a requested resource does not exist."""
class ConflictError(Exception):
"""Raised on uniqueness constraint violations."""
# In a service or CRUD module:
raise NotFoundError(f"Profile {profile_id} not found")
2. Route layer -- translate to HTTPException
Route handlers catch domain exceptions and convert:
@router.post("/profiles")
async def create_profile(data: VoiceProfileCreate, db: Session = Depends(get_db)):
try:
return await profiles.create_profile(data, db)
except ConflictError as e:
raise HTTPException(status_code=409, detail=str(e))
Background tasks catch Exception broadly, log with logger.exception(), and update the task status to "failed".
Never: silently swallow exceptions, use bare except:, or catch BaseException.
Async
Rules for the refactor
- Don't declare
async defunless the function awaits something. Several service modules still declareasync defwithout awaiting -- these should be migrated to sync functions withasyncio.to_thread()at the route layer, or to real async SQLAlchemy. - CPU-bound work (audio processing, numpy operations) goes through
asyncio.to_thread():audio, sr = await asyncio.to_thread(load_audio, source_path) - GPU-bound TTS inference is serialized through the generation queue (
services/task_queue.py). Never call a backend'sgenerate()directly from a route handler. - Fire-and-forget tasks: use
asyncio.create_task()and track the task reference to prevent garbage collection:task = asyncio.create_task(some_coro()) _background_tasks.add(task) task.add_done_callback(_background_tasks.discard)
Logging
Use the logging module. Not print().
import logging
logger = logging.getLogger(__name__)
logger.info("Loading model %s on %s", model_name, device)
logger.warning("Cache miss for %s, downloading", repo_id)
logger.exception("Generation %s failed") # logs traceback automatically
Rules:
- Use
%s-style placeholders in log calls (not f-strings). This avoids formatting the string if the log level is filtered out. - Use
logger.exception()insideexceptblocks -- it captures the traceback. - Logger name should be
__name__(yieldsbackend.utils.audio, etc.). - Existing
print()calls should be migrated to logging as files are touched during the refactor.
Constants
- Define at module level in the file where they're primarily used.
- Use
UPPER_SNAKE_CASE. - Shared/cross-cutting constants (sample rates, file size limits, CORS origins) go in
backend/config.pyafter Phase 6 consolidation. - Magic numbers in function bodies should be extracted to named constants:
# No if len(audio) > 24000 * 60 * 10: # Yes MAX_AUDIO_DURATION_SAMPLES = SAMPLE_RATE * 60 * 10 if len(audio) > MAX_AUDIO_DURATION_SAMPLES:
Function Signatures
- Keyword-only arguments (after
*) for functions with 3+ parameters, especially when several share the same type:def is_model_cached( hf_repo: str, *, weight_extensions: tuple[str, ...] = (".safetensors", ".bin"), required_files: list[str] | None = None, ) -> bool: - Parameters on separate lines when the signature exceeds ~100 characters or has 3+ params.
- Trailing comma after the last parameter in multi-line signatures.
- Default values inline with the parameter.
String Formatting
- f-strings for runtime string construction.
%s-style forloggingcalls (lazy evaluation)..format(): avoid; f-strings are preferred.
Testing
Framework: pytest with pytest-asyncio.
- Test files:
test_<module>.pyinbackend/tests/. - Use
conftest.pyfor shared fixtures (db sessions, test client, mock backends). - Group related tests in classes:
class TestProfileCRUD:. - Use
@pytest.mark.asynciofor async tests. - Use
@pytest.mark.parametrizeto reduce repetition. - Manual integration scripts stay in
tests/but are clearly marked (filename prefixmanual_or documented intests/README.md).
Project Layout
backend/
app.py # FastAPI app factory, CORS, lifecycle events
main.py # Entry point (imports app, runs uvicorn)
config.py # Data directory paths
models.py # Pydantic request/response schemas
server.py # Tauri sidecar launcher, parent-pid watchdog
routes/ # Thin HTTP handlers (validation, delegation, response formatting)
services/ # Business logic, CRUD, orchestration
backends/ # TTS/STT engine implementations
database/ # ORM models, session management, migrations, seeds
utils/ # Shared utilities (audio, effects, caching, progress)
tests/ # pytest suite
Ruff Adoption
pyproject.toml configures ruff for linting and formatting. Run:
# Lint (check)
ruff check backend/
# Lint (auto-fix)
ruff check backend/ --fix
# Format
ruff format backend/
Introduce ruff fixes file-by-file as you touch them. Don't run --fix across the entire codebase in one shot -- that creates unreviewable diffs.