Building Scalable Python Projects: What I Wish I Knew Earlier

13 min read

Categories

Coder
Coding
Developer
Development
Pipeline
Python

Memoization. Linked lists. Bubble sort. Dynamic Programming…

These were the fundamentals we drilled in school — coding puzzles wrapped in neat little functions. But no one warned us what happens when your Python project grows beyond a single script… when you’re no longer solving toy problems, but juggling APIs, routing logic, async calls, and cloud deployment.

Recently, I built a full-stack project — a marketing analytics dashboard powered by a Large Language Model (LLM). It was ambitious, exciting… and chaotic. I found myself staring at spaghetti code, debugging dead ends, and asking, “Is this still Python, or am I writing my own worst nightmare?”

In this post, I’ll walk you through the real lessons I wish I knew earlier: how to organise your project for scale, why tools like Flask and Uvicorn are more than just buzzwords, and what it takes to write Python code that doesn’t just work — but lasts.

Photo by charlesdeluvio on Unsplash

Photo by charlesdeluvio on Unsplash

Master the Fundamentals (Flask, Uvicorn, and APIs)

Before diving into complex pipelines or fancy AI integrations, it's crucial to have a solid grasp of the basics. I learned quickly that debugging an advanced feature is a nightmare if you’re shaky on fundamental concepts. Key things to understand include:

  • Flask and Web Servers: Flask is a WSGI-based web framework for Python – essentially, it handles HTTP requests and responses. Know how Flask works under the hood (routing, request/response cycle, etc.) and how to run it in development vs. production. For example, in production, you might use Gunicorn or uWSGI as the WSGI server, whereas in development, you use the built-in server. If your app uses asynchronous features or frameworks like FastAPI, understand that it uses ASGI instead of WSGI.

    Deploying to Production — Flask Documentation (3.1.x)


  • Uvicorn (and ASGI vs WSGI): Uvicorn is a lightning-fast ASGI server for Python web apps. In contrast to WSGI (which Flask uses), which handles one request at a time per worker, ASGI allows for asynchronous handling of multiple connections (ideal for WebSockets or long-polling). Even if you're using Flask, it’s good to know why something like Uvicorn exists – it excels at handling many simultaneous connections by leveraging async code. In my project, I stuck with Flask, but I was mindful that heavy concurrent workloads might require an ASGI-based approach.

    FastAPI with Uvicorn: A Comprehensive Tutorial | Orchestra

    How Python's WSGI vs. ASGI is Like Baking a Cake

  • APIs and HTTP Basics: Since our app exposes RESTful endpoints, I made sure I was comfortable with HTTP methods (GET, POST, etc.), status codes, and JSON data formatting. Understanding how to design an API (what each endpoint should do and what it returns) is foundational. If you're building a pipeline that will be accessed via an API, think about the input/output schema for those API calls early on.

    Mastering REST APIs: HTTP Status Codes in REST

    Use HTTP methods and status codes — API Design Guide 0.1 documentation

Grasping these fundamentals paid off. For example, when something went wrong in my app's request handling, I could discern whether it was an issue with Flask routing, the web server, or my own code.

Structuring a Flask App Cleanly (Routes, Models, Services, Utils)

One of the biggest lessons was the importance of a clean project structure. Initially, I threw everything into a single app.py file. But as the project grew, it become unmanageable. Here's how I structured my Flask application instead:

  • Routes (Views/Controllers): These define the API endpoints. Each route should remain thin, handling request parsing and response formatting, but delegating heavy lifting to other layers. In my project, I created a /routespackage with modules like data_routes.py and user_routes.py, each containing related endpoints. Using Flask Blueprints is a great way to organise routes by feature or resource – e.g., a user blueprint for user-related routes, a data blueprint for data-related routes.
  • Models: This layer includes data representations, such as database models or Pydantic schemas. Keeping models in a /models module ensures you have a single source of truth for the data structure. For instance, if using an ORM like SQLAlchemy, define your User model class in the models module. This way, both your routes and services can import the model definitions as needed.
  • Services: Business logic goes here. Services act as an intermediary between routes and models (or external APIs, ML models, etc.). In my app, the /services package held the core logic, like crunching analytics or orchestrating multi-step operations (more on pipelines soon). The routes would call functions in the service layer to perform actions. Following a service (or service/Data Access Object (DAO)) pattern keeps your routes clean:
    • Routes handle HTTP and call service functions.
    • Service functions implement the logic (and might call the data access layer or external APIs).
    • This separation makes it easier to test and modify logic without touching the routing code.
  • Utils: Lastly, a /utils module for utility functions or helpers that don’t belong in services or models. These could be small helpers for formatting, calculations, or common tasks used across the app. For example, I had some data formatting functions and date helpers in utils. By centralising them, I avoided duplicating code in multiple services.

By structuring the Flask app into these parts, the codebase remains modular and readable. If someone asks, "Where is the code that handles user login?", it's clear to look under routes/user_routes.py for the endpoint and maybe services/auth_service.py for the logic, rather than combing through one giant file. This also means that as the project grows, each piece remains fairly self-contained.

Here's a simplified example of what a Flask project structure might look like:

myapp/
├── app.py                    # Flask app entry point (could use FLASK_APP env var)
├── myapp/                    # Application package
│   ├── __init__.py           # Initialize Flask app (maybe using Application Factory)
│   ├── routes/               # API endpoint definitions (Flask Blueprints or routes)
│   │   ├── __init__.py
│   │   └── example_routes.py  # e.g., routes for a resource
│   ├── models/               # Data models (ORM or schemas)
│   │   ├── __init__.py
│   │   └── example_model.py   # e.g., a SQLAlchemy model or Pydantic schema
│   ├── services/             # Business logic layer
│   │   ├── __init__.py
│   │   └── example_service.py # e.g., functions to implement core logic
│   └── utils/                # Utility functions
│       ├── __init__.py
│       └── helpers.py         # e.g., common helper functions
└── requirements.txt          # Project dependencies

This structure is similar to how I organised the LLM dashboard's backend. Each folder corresponds to a concern, following the principle of separation of concerns. It not only makes the project cleaner but also helps multiple developers work on different parts without stepping on each other’s toes.

A note on Flask app initialisation:  

In the structure above, myapp/__init__.py might use the Application Factory pattern. This means instead of creating a global Flask app, you provide a function (like create_app) that initialises and returns a Flask app instance. This is useful for testing and different environments. For example, in myapp/__init__.py:

from flask import Flask

def create_app():
    app = Flask(__name__)
    # Initialize configs, extensions (DB, etc.)
    # Register blueprints: e.g., app.register_blueprint(routes.example_bp)
    return app

Then app.py (at project root) or a WSGI server can call create_app() to get the app. This decouples configuration from the app creation and plays nicely with Blueprints.

Designing Clear Multi-Step Pipelines

In an LLM-powered app (or any complex app), you often have pipelines – multi-step processes where data flows through a series of transformations or actions. In my case, a user’s query might go through steps like: classify query intent, fetch relevant data, generate a response (maybe using the LLM), and format the result. One crucial lesson: ensure clarity on the input and output of the main pipeline function.

When building a pipeline, I found it helpful to have one main function that orchestrates the steps. For example, a simplified version of an LLM query pipeline could look like:

def answer_query(query: str) -> dict:
    """
    Process a user query through multiple steps and return an answer.
    """
    # Step 1: Classify query type (e.g., "needs chart", "needs textual answer")
    query_type = classify_query(query)
    # Step 2: Fetch data if needed
    data = fetch_relevant_data(query, query_type) if query_type == "chart" else None
    # Step 3: Generate answer (text or chart) using LLM or ML model
    result = generate_response(query, query_type, data)
    # Step 4: Format the result into a dictionary (e.g., {"type": ..., "content": ...})
    return {"type": query_type, "content": result}

The inputs and outputs of answer_query are clear: it takes a query (string) and returns a dictionary with a certain structure. Defining this contract up front was a game-changer. Why? Because it forces you to think about what you need at the start and what you must deliver at the end.

A few tips I picked up for building clear pipelines:

  • Define responsibilities of each step: Each helper, like classify_queryfetch_relevant_datagenerate_response should do one thing (single responsibility principle). This makes debugging easier – if the data is weird, you know which stage to inspect.
  • Document and enforce the interface: The example shows a docstring for answer_query indicating what it returns. I also used type hints (discussed next) to make it explicit. If every function in the pipeline declares what it expects and returns, you can insert or remove steps with minimal ripple effect.
  • Test steps individually and together: I wrote simple unit tests for each helper (e.g., does classify_querycorrectly identify a chart request?), and an integration test for the whole answer_query pipeline. This way, if something breaks, I know whether it's the logic in a step or the wiring between steps.

Clarity in pipeline design saved me from a lot of "Why is this output empty?" moments. It also made the code more approachable; a new contributor can read the main function and immediately understand the high-level flow.

Embrace Type Hints for Clarity and Modularity

When I started adding type hints to all my functions, I felt a bit like I was overdoing it. But trust me, always declaring types for function inputs and outputs is worth it. It makes the code self-documenting and prevents a ton of confusion down the line.

  • Clarity: With type hints, anyone reading the function can see at a glance what types to pass in and what to expect back. For example, if I have def get_user(id: str) -> User: ..., it's clear that id is a string (maybe a username or UUID) and that it returns a User object (likely a model instance or dataclass). This improved readability and served as inline documentation for the team.
  • Easier Refactoring: Types make refactoring safer. At one point, I wanted to insert an intermediate step into the pipeline (to do additional data cleaning). Because I had type hints, I could quickly see what type was flowing between steps and ensure my new function matched the expected input/output. Essentially, type hints establish a clear interface contract that helps when modifying or extending code.
  • Modularity: If down the road I decide to split a function into two, or combine two into one, the type hints guide me on what the combined function’s signature should be. Also, if I expose some functions for external use (public API), I know those need stable, well-documented signatures, whereas purely internal ones can evolve more freely.

In Python, adding type hints doesn't change runtime behaviour (unless you use a tool like mypy to enforce them), so they’re low-risk to adopt. The payoff in maintainability is huge – my future self and my collaborators have thanked me multiple times for using type hints, as it made the code base much easier to navigate.

Using _ Prefix for Internal Helper Functions

Python has a simple convention: a name prefixed with a single underscore (e.g. _process_data) is treated as "internal" or private to the module. While Python won’t stop you from accessing a _helper function from outside, the underscore is a signal saying “this is not part of the public API; use at your own risk.”

In our project, we applied this to helper functions that were not meant to be used outside their module or file. For example, inside campaign_service.py, I had something like:

def get_campaign_report(campaign_id: str) -> dict:
    # Public function, part of the service's interface
    campaign = Campaign.find_by_id(campaign_id)
    stats = _calculate_statistics(campaign.data)
    return {"id": campaign_id, "stats": stats}

def _calculate_statistics(data: list[float]) -> dict:
    # Internal helper, not to be used outside this module
    ...

Here, _calculate_statistics is a detail of how get_campaign_report works. By naming it with an underscore, I'm indicating that other modules shouldn’t call _calculate_statistics directly – if they need that functionality, they should call the public get_campaign_report, which uses it. This convention helped define a clear API boundary within the code. When multiple developers were working, nobody accidentally used an internal function from another module, because the naming made its intended scope obvious.

Another benefit: when using from module import *, Python will not import names that start with _ by default. So it’s a mild safety mechanism to prevent namespace pollution. But mainly, it’s about communication to other developers (and your future self).

In summary, prefix internal helper functions (and even module-global variables that are internal) with _ to make your code’s intended usage clear. It’s a small habit that leads to more maintainable code, especially in larger projects where not everything is meant to be used everywhere.

Good Python Packaging Practices (__init__.py and Project Structure)

When your application grows beyond a single file, you’ll start creating packages (directories with multiple modules). Python’s packaging conventions are another area where a little effort upfront makes a big difference.

  • Understand __init__.py: This special file is placed inside directories to mark them as Python packages. In our project, the backend/app/ directory had an __init__.py, making it a package we could import from elsewhere. Essentially, if Python sees an __init__.py in a folder, it treats that folder as a package of modules. This allows you to do imports like from myapp.routes import data_routes seamlessly. In older Python versions, __init__.py was required; now you can have implicit namespace packages without it, but I strongly prefer including it for clarity and for any package initialisation code.
    • You can also use __init__.py to expose certain parts of your package. For instance, in myapp/__init__.py, you might import certain classes or functions (from myapp.services.example_service import important_function) so that whoever does import myapp can directly access myapp.important_functionwithout digging into submodules. This essentially defines the public API of your package.
  • Package and Module Structure: Organise code by functionality. Each folder in the example structure (routes, models, etc.) is a subpackage. Within those, each file is a module focusing on a specific topic. This modular approach means you can avoid very large files and make locating things intuitive. For example, if all database models are in models/, it’s easy to find and manage them. If utilities are all in utils/, you won’t confuse a utility function with core logic.
  • Avoid Circular Imports: A tip on packaging – plan dependencies between modules to avoid circular imports. If routes need something from services and vice versa, that’s a design smell. In our case, routes depend on services, but services did not depend on routes, which breaks the cycle. Sometimes breaking things into packages (like moving a commonly needed util or model to a shared place) can resolve circular import issues.
  • Initialising Packages Appropriately: If your package needs some setup (like configuring a logger, or establishing a database connection), you can do that in __init__.py or a dedicated initialisation module. In the Flask app context, this is often where the Application Factory lives (as mentioned earlier) or where Blueprints are registered.

To illustrate, think of how you might distribute your app as a library: you’d want a clear public interface and hide internal details. Even if you’re not actually publishing it, treating your app code as a package with sub-packages enforces a cleaner separation of concerns.

Good packaging also pays dividends when writing tests. In our project, we had a tests/ directory separate from the app code. Thanks to the package structure, writing from myapp.services.example_service import some_function in tests worked without hacks. Python knew where to find our code, because it was properly packaged and installed (we used a pyproject.toml to define it as a module). Even if you don’t go that far, having the __init__.py files and a logical hierarchy will make running and importing code in different contexts much easier.

Closing Thoughts

Building an LLM-powered full-stack app was both challenging and rewarding. Along the way, I learned that writing clean Python isn’t just about pleasing linters or adhering to style for its own sake – it has very practical benefits. By understanding the fundamentals (Flask, Uvicorn, REST, etc.), I avoided a lot of blind stumbling. By structuring the app into clear sections, the project remained manageable and collaborative. Designing the pipeline with clear inputs/outputs and using type hints made the complex logic easier to reason about. And using naming conventions (like the _ prefix) and solid packaging practices set up the project for long-term maintainability.

In a casual conversation with a fellow developer, I described these lessons and they remarked how it sounded like I was future-proofing my code. I hadn’t thought of it that way, but it’s a good description – the code is now in a state where adding features or onboarding new contributors is relatively painless. If you’re embarking on a similar project (whether it involves LLMs or not), I hope these insights help you build a clean Python workflow from the start. Trust me, your future self will thank you!

👋 Connect with Me

I’m Javian Ng, an aspiring Full-Stack Infrastructure Architect & LLM Solutions Engineer based in Singapore. I love building scalable infrastructure and AI systems.
Feel free to reach out or explore more about my projects and experiences.

Sources

  1. Flask Documentation – https://flask.palletsprojects.com/
  2. Uvicorn Documentation – https://www.uvicorn.org/
  3. WSGI vs ASGI – https://www.encode.io/articles/uvicorn-asgi-server/
  4. REST API Design Best Practices – https://restfulapi.net/
  5. Flask Blueprints – https://flask.palletsprojects.com/en/latest/blueprints/
  6. SQLAlchemy ORM Tutorial – https://docs.sqlalchemy.org/en/20/orm/quickstart.html
  7. Python Type Hints (PEP 484) – https://peps.python.org/pep-0484/
  8. Mypy (Static Typing for Python) – https://mypy.readthedocs.io/en/stable/
  9. Clean Architecture Principles – https://8thlight.com/blog/uncle-bob/2012/08/13/the-clean-architecture.html
  10. Python Packaging User Guide – https://packaging.python.org/
  11. Application Factory Pattern in Flask – https://flask.palletsprojects.com/en/latest/patterns/appfactories/
  12. Single Responsibility Principle – https://en.wikipedia.org/wiki/Single-responsibility_principle
  13. Private Functions with Underscore Convention – https://docs.python.org/3/tutorial/classes.html#private-variables
  14. Managing Circular Imports in Python – https://realpython.com/python-circular-imports/
  15. Introduction to init.py – https://realpython.com/pythons-init-py/
  16. PEP 420 – Implicit Namespace Packages – https://peps.python.org/pep-0420/
  17. Structuring Flask Applications – https://github.com/miguelgrinberg/flask-project-template