Initial commit
This commit is contained in:
404
backend/STYLE_GUIDE.md
Normal file
404
backend/STYLE_GUIDE.md
Normal file
@@ -0,0 +1,404 @@
|
||||
# Python Style Guide
|
||||
|
||||
Target: **Python 3.12+** | Formatter/Linter: **Ruff** | Config: `backend/pyproject.toml`
|
||||
|
||||
This guide codifies the conventions used across the backend, and prescribes the target style for code written during the refactor (Phases 3-6). Existing code should be migrated incrementally -- don't reformat entire files in unrelated PRs.
|
||||
|
||||
---
|
||||
|
||||
## Formatting
|
||||
|
||||
Enforced by `ruff format` (Black-compatible).
|
||||
|
||||
- **Line length**: 120 characters.
|
||||
- **Indent**: 4 spaces. No tabs.
|
||||
- **Trailing commas**: Required on multi-line function signatures, arguments, collections.
|
||||
- **Quotes**: Double quotes (`"`) for strings. Single quotes are acceptable in f-string expressions and dict keys inside f-strings where avoiding escapes improves readability.
|
||||
|
||||
Run: `ruff format backend/`
|
||||
|
||||
---
|
||||
|
||||
## Imports
|
||||
|
||||
Enforced by ruff's `isort` rules (rule set `I`).
|
||||
|
||||
**Grouping** -- three blocks separated by a blank line:
|
||||
|
||||
```python
|
||||
import asyncio # 1. stdlib
|
||||
from pathlib import Path
|
||||
|
||||
import numpy as np # 2. third-party
|
||||
from fastapi import APIRouter, HTTPException
|
||||
from sqlalchemy.orm import Session
|
||||
|
||||
from backend.config import get_data_dir # 3. local (absolute)
|
||||
from .database import get_db # or relative
|
||||
```
|
||||
|
||||
**Rules:**
|
||||
- Within the `backend` package, use **relative imports** for sibling/child modules: `from .database import get_db`, `from ..utils.audio import load_audio`.
|
||||
- Absolute imports are fine for top-level references from entry points (`main.py`, `server.py`).
|
||||
- Never use wildcard imports (`from module import *`).
|
||||
- One import per line for `from X import Y` when there are 4+ names; below that, comma-separated is fine.
|
||||
- **Lazy imports** are acceptable for heavy dependencies (torch, transformers, mlx) inside functions to reduce startup time. Add a comment: `# lazy: heavy import`.
|
||||
|
||||
---
|
||||
|
||||
## Type Annotations
|
||||
|
||||
Python 3.12 means we use **built-in generics and union syntax natively**. No `from __future__ import annotations`, no `typing.List`/`typing.Dict`.
|
||||
|
||||
```python
|
||||
# Yes
|
||||
def process(items: list[str], config: dict[str, int] | None = None) -> tuple[int, str]: ...
|
||||
|
||||
# No
|
||||
from typing import List, Dict, Optional, Tuple
|
||||
def process(items: List[str], config: Optional[Dict[str, int]] = None) -> Tuple[int, str]: ...
|
||||
```
|
||||
|
||||
**What to annotate:**
|
||||
- All public function signatures (parameters + return type).
|
||||
- Private functions: parameters at minimum; return type encouraged.
|
||||
- Module-level variables: only when the type isn't obvious from the assignment.
|
||||
- Route handlers: parameters are annotated via FastAPI's dependency injection. Add explicit `-> SomeResponse` return types when the route doesn't use `response_model`.
|
||||
|
||||
**Imports from `typing` that are still needed** (no built-in equivalent):
|
||||
`Literal`, `TypeAlias`, `Protocol`, `runtime_checkable`, `Callable`, `Any`, `ClassVar`, `TypeVar`, `overload`, `TYPE_CHECKING`.
|
||||
|
||||
Use `collections.abc` for abstract types: `Sequence`, `Mapping`, `Iterable`, `Iterator`, `Generator`.
|
||||
|
||||
---
|
||||
|
||||
## Naming
|
||||
|
||||
| Thing | Convention | Example |
|
||||
|-------|-----------|---------|
|
||||
| Module | `snake_case` | `task_queue.py` |
|
||||
| Class | `PascalCase` | `ProgressManager` |
|
||||
| Function / method | `snake_case` | `create_profile` |
|
||||
| Variable | `snake_case` | `sample_rate` |
|
||||
| Constant | `UPPER_SNAKE_CASE` | `DEFAULT_SAMPLE_RATE` |
|
||||
| Private | `_leading_underscore` | `_generation_queue` |
|
||||
| Type alias | `PascalCase` | `EffectChain = list[dict[str, Any]]` |
|
||||
|
||||
**Specific conventions:**
|
||||
- Database ORM models imported with `DB` prefix alias: `from .database import VoiceProfile as DBVoiceProfile`.
|
||||
- Pydantic models use descriptive suffixes: `VoiceProfileCreate`, `VoiceProfileResponse`, `GenerationRequest`.
|
||||
- Backend classes use engine-name prefix: `MLXTTSBackend`, `PyTorchSTTBackend`.
|
||||
|
||||
---
|
||||
|
||||
## Docstrings
|
||||
|
||||
**Google style**. Required on all public functions, classes, and modules.
|
||||
|
||||
```python
|
||||
def combine_voice_prompts(
|
||||
profile_dir: Path,
|
||||
*,
|
||||
target_sr: int = 24000,
|
||||
) -> tuple[np.ndarray, int]:
|
||||
"""Load and concatenate all voice prompt files for a profile.
|
||||
|
||||
Reads .wav/.mp3/.flac files from the profile directory, resamples to
|
||||
the target sample rate, normalizes, and concatenates into a single array.
|
||||
|
||||
Args:
|
||||
profile_dir: Path to the voice profile directory containing audio files.
|
||||
target_sr: Target sample rate for the output. Defaults to 24000.
|
||||
|
||||
Returns:
|
||||
Tuple of (concatenated audio array, sample rate).
|
||||
|
||||
Raises:
|
||||
FileNotFoundError: If profile_dir does not exist.
|
||||
ValueError: If no valid audio files are found.
|
||||
"""
|
||||
```
|
||||
|
||||
**Short form** is fine for simple functions:
|
||||
|
||||
```python
|
||||
def get_db_path() -> Path:
|
||||
"""Get the path to the SQLite database file."""
|
||||
```
|
||||
|
||||
**When to skip**: Private helpers under ~5 lines where the name and signature make intent obvious.
|
||||
|
||||
**Module docstrings**: A single sentence at the top of every file describing its purpose.
|
||||
|
||||
```python
|
||||
"""Voice profile CRUD operations."""
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Comments
|
||||
|
||||
Comments explain **why**, not **what**. If the code needs a comment to explain what it does, the code should be rewritten to be clearer. The exceptions are non-obvious performance choices, external constraints, and concurrency/race-condition reasoning -- those always deserve a comment.
|
||||
|
||||
### No section dividers
|
||||
|
||||
Do not use ASCII dividers to create visual sections in files:
|
||||
|
||||
```python
|
||||
# No -- any of these:
|
||||
# ============================================
|
||||
# GENERATION ENDPOINTS
|
||||
# ============================================
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Device detection
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
# --- Load model --------------------------------------------------
|
||||
```
|
||||
|
||||
If a file needs section dividers to be navigable, the file is too long. Split it into modules. Within a function, if you need labeled sections to follow the logic, extract those sections into named functions.
|
||||
|
||||
### Inline comments
|
||||
|
||||
Inline comments (end-of-line) are fine when they add information the code can't express:
|
||||
|
||||
```python
|
||||
# Yes -- explains a non-obvious constraint or gives context:
|
||||
audio, sr = load_audio(path, sr=24000) # Qwen expects 24kHz mono
|
||||
_generation_queue: asyncio.Queue = None # type: ignore # initialized at startup
|
||||
"tauri://localhost", # Tauri webview (macOS)
|
||||
|
||||
# No -- restates the code:
|
||||
# Check if profile name already exists
|
||||
existing = db.query(DBVoiceProfile).filter_by(name=data.name).first()
|
||||
|
||||
# Delete from database
|
||||
db.delete(sample)
|
||||
|
||||
# Update fields
|
||||
profile.name = data.name
|
||||
```
|
||||
|
||||
Delete comments that narrate what the next line of code obviously does. If the function name, variable name, or method call already communicates intent, the comment is noise.
|
||||
|
||||
### Block comments
|
||||
|
||||
Use block comments for **why** explanations -- constraints, workarounds, non-obvious decisions:
|
||||
|
||||
```python
|
||||
# PyInstaller + multiprocessing: child processes re-execute the frozen binary
|
||||
# with internal arguments. freeze_support() handles this and exits early.
|
||||
multiprocessing.freeze_support()
|
||||
|
||||
# Mark any stale "generating" records as failed -- these are leftovers
|
||||
# from a previous process that was killed mid-generation.
|
||||
db.query(Generation).filter_by(status="generating").update({"status": "failed"})
|
||||
```
|
||||
|
||||
Keep block comments tight. Two to three lines is normal. If you need a paragraph, it probably belongs in the docstring or a design doc.
|
||||
|
||||
### Linter/type-checker suppression
|
||||
|
||||
Always add a reason after `noqa` and `type: ignore`:
|
||||
|
||||
```python
|
||||
import intel_extension_for_pytorch # noqa: F401 -- side-effect import enables XPU
|
||||
_queue: asyncio.Queue = None # type: ignore[assignment] # initialized at startup
|
||||
```
|
||||
|
||||
Bare `# noqa` or `# type: ignore` with no explanation are not allowed.
|
||||
|
||||
### TODO / FIXME
|
||||
|
||||
Use sparingly. Every `TODO` must include a brief description of what needs doing. Don't use them as a substitute for tracking work properly:
|
||||
|
||||
```python
|
||||
# TODO: replace with async SQLAlchemy once CRUD modules are migrated (Phase 5)
|
||||
result = await asyncio.to_thread(profiles.get_profile, profile_id, db)
|
||||
```
|
||||
|
||||
Never commit `HACK`, `XXX`, or `FIXME` -- fix the problem or file an issue.
|
||||
|
||||
### Commented-out code
|
||||
|
||||
Delete it. That's what git is for. If you need to document that something was intentionally removed, a short tombstone comment is acceptable:
|
||||
|
||||
```python
|
||||
# Removed config.json-only check -- too lenient, doesn't confirm weights exist.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Error Handling
|
||||
|
||||
The refactor is standardizing on a **two-layer pattern**:
|
||||
|
||||
### 1. Domain layer -- raise plain exceptions
|
||||
|
||||
CRUD modules and services raise `ValueError`, `FileNotFoundError`, or (post-refactor) custom exceptions defined in `backend/errors.py`:
|
||||
|
||||
```python
|
||||
# backend/errors.py (to be created in Phase 4)
|
||||
class NotFoundError(Exception):
|
||||
"""Raised when a requested resource does not exist."""
|
||||
|
||||
class ConflictError(Exception):
|
||||
"""Raised on uniqueness constraint violations."""
|
||||
```
|
||||
|
||||
```python
|
||||
# In a service or CRUD module:
|
||||
raise NotFoundError(f"Profile {profile_id} not found")
|
||||
```
|
||||
|
||||
### 2. Route layer -- translate to HTTPException
|
||||
|
||||
Route handlers catch domain exceptions and convert:
|
||||
|
||||
```python
|
||||
@router.post("/profiles")
|
||||
async def create_profile(data: VoiceProfileCreate, db: Session = Depends(get_db)):
|
||||
try:
|
||||
return await profiles.create_profile(data, db)
|
||||
except ConflictError as e:
|
||||
raise HTTPException(status_code=409, detail=str(e))
|
||||
```
|
||||
|
||||
**Background tasks** catch `Exception` broadly, log with `logger.exception()`, and update the task status to `"failed"`.
|
||||
|
||||
**Never**: silently swallow exceptions, use bare `except:`, or catch `BaseException`.
|
||||
|
||||
---
|
||||
|
||||
## Async
|
||||
|
||||
### Rules for the refactor
|
||||
|
||||
1. **Don't declare `async def` unless the function awaits something.** Several service modules still declare `async def` without awaiting -- these should be migrated to sync functions with `asyncio.to_thread()` at the route layer, or to real async SQLAlchemy.
|
||||
2. **CPU-bound work** (audio processing, numpy operations) goes through `asyncio.to_thread()`:
|
||||
```python
|
||||
audio, sr = await asyncio.to_thread(load_audio, source_path)
|
||||
```
|
||||
3. **GPU-bound TTS inference** is serialized through the generation queue (`services/task_queue.py`). Never call a backend's `generate()` directly from a route handler.
|
||||
4. **Fire-and-forget tasks**: use `asyncio.create_task()` and track the task reference to prevent garbage collection:
|
||||
```python
|
||||
task = asyncio.create_task(some_coro())
|
||||
_background_tasks.add(task)
|
||||
task.add_done_callback(_background_tasks.discard)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Logging
|
||||
|
||||
Use the `logging` module. Not `print()`.
|
||||
|
||||
```python
|
||||
import logging
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
logger.info("Loading model %s on %s", model_name, device)
|
||||
logger.warning("Cache miss for %s, downloading", repo_id)
|
||||
logger.exception("Generation %s failed") # logs traceback automatically
|
||||
```
|
||||
|
||||
**Rules:**
|
||||
- Use `%s`-style placeholders in log calls (not f-strings). This avoids formatting the string if the log level is filtered out.
|
||||
- Use `logger.exception()` inside `except` blocks -- it captures the traceback.
|
||||
- Logger name should be `__name__` (yields `backend.utils.audio`, etc.).
|
||||
- Existing `print()` calls should be migrated to logging as files are touched during the refactor.
|
||||
|
||||
---
|
||||
|
||||
## Constants
|
||||
|
||||
- Define at **module level** in the file where they're primarily used.
|
||||
- Use `UPPER_SNAKE_CASE`.
|
||||
- Shared/cross-cutting constants (sample rates, file size limits, CORS origins) go in `backend/config.py` after Phase 6 consolidation.
|
||||
- Magic numbers in function bodies should be extracted to named constants:
|
||||
```python
|
||||
# No
|
||||
if len(audio) > 24000 * 60 * 10:
|
||||
|
||||
# Yes
|
||||
MAX_AUDIO_DURATION_SAMPLES = SAMPLE_RATE * 60 * 10
|
||||
if len(audio) > MAX_AUDIO_DURATION_SAMPLES:
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Function Signatures
|
||||
|
||||
- **Keyword-only arguments** (after `*`) for functions with 3+ parameters, especially when several share the same type:
|
||||
```python
|
||||
def is_model_cached(
|
||||
hf_repo: str,
|
||||
*,
|
||||
weight_extensions: tuple[str, ...] = (".safetensors", ".bin"),
|
||||
required_files: list[str] | None = None,
|
||||
) -> bool:
|
||||
```
|
||||
- Parameters on **separate lines** when the signature exceeds ~100 characters or has 3+ params.
|
||||
- **Trailing comma** after the last parameter in multi-line signatures.
|
||||
- Default values inline with the parameter.
|
||||
|
||||
---
|
||||
|
||||
## String Formatting
|
||||
|
||||
- **f-strings** for runtime string construction.
|
||||
- **`%s`-style** for `logging` calls (lazy evaluation).
|
||||
- **`.format()`**: avoid; f-strings are preferred.
|
||||
|
||||
---
|
||||
|
||||
## Testing
|
||||
|
||||
Framework: **pytest** with `pytest-asyncio`.
|
||||
|
||||
- Test files: `test_<module>.py` in `backend/tests/`.
|
||||
- Use `conftest.py` for shared fixtures (db sessions, test client, mock backends).
|
||||
- Group related tests in classes: `class TestProfileCRUD:`.
|
||||
- Use `@pytest.mark.asyncio` for async tests.
|
||||
- Use `@pytest.mark.parametrize` to reduce repetition.
|
||||
- Manual integration scripts stay in `tests/` but are clearly marked (filename prefix `manual_` or documented in `tests/README.md`).
|
||||
|
||||
---
|
||||
|
||||
## Project Layout
|
||||
|
||||
```
|
||||
backend/
|
||||
app.py # FastAPI app factory, CORS, lifecycle events
|
||||
main.py # Entry point (imports app, runs uvicorn)
|
||||
config.py # Data directory paths
|
||||
models.py # Pydantic request/response schemas
|
||||
server.py # Tauri sidecar launcher, parent-pid watchdog
|
||||
routes/ # Thin HTTP handlers (validation, delegation, response formatting)
|
||||
services/ # Business logic, CRUD, orchestration
|
||||
backends/ # TTS/STT engine implementations
|
||||
database/ # ORM models, session management, migrations, seeds
|
||||
utils/ # Shared utilities (audio, effects, caching, progress)
|
||||
tests/ # pytest suite
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Ruff Adoption
|
||||
|
||||
`pyproject.toml` configures ruff for linting and formatting. Run:
|
||||
|
||||
```bash
|
||||
# Lint (check)
|
||||
ruff check backend/
|
||||
|
||||
# Lint (auto-fix)
|
||||
ruff check backend/ --fix
|
||||
|
||||
# Format
|
||||
ruff format backend/
|
||||
```
|
||||
|
||||
Introduce ruff fixes file-by-file as you touch them. Don't run `--fix` across the entire codebase in one shot -- that creates unreviewable diffs.
|
||||
Reference in New Issue
Block a user