Integrating LLMs

integrate

Introduction

In the last lesson you learned what LLMs are and how to communicate with them through a browser interface. That's useful for experimentation — but it's not how real applications work. Real applications send prompts programmatically, receive structured responses, and incorporate AI as a feature inside a larger system.

In this lesson you'll move from the browser to the terminal. You'll learn how to authenticate with a hosted LLM provider, choose the right model for your use case, construct a prompt as code, and send it over an API — all using Python and the Gemini API.

By the end of this lecture you'll have a working Python script that accepts user input, sends it to Gemini, and prints the model's response. That's the foundation every AI-powered application you build in this phase is built on.

Lesson

End-State

Before writing any code, it helps to be clear about what you're building and why each piece exists. Here's the complete picture of what this lecture produces:

A Python script that:

Defines a mission for the AI via a system prompt — telling it who it is, what it knows, and how it should respond
Selects a specific model from the Gemini family
Authenticates with Google's API using a secret key
Builds a prompt by combining the system context with a user message
Sends a request to the Gemini API and prints the response

Each step maps to a decision you'll make every time you integrate an LLM into an application. Let's work through them in order.

Actions List

Define the mission for the AI
Choose your model
Authenticate with the API
Build your prompt
Communicate with the AI

Defining the Mission for AI

The single most important thing you do before writing any API code is deciding what your AI is for. This is what separates a useful, focused assistant from a generic chatbot that produces inconsistent, unpredictable output.

Defining the mission means answering three questions:

What do we want it to do?

Be specific. "Help users with coding questions" is better than "be a helpful assistant." "Answer questions about our product's Python SDK and nothing else" is better still. The more specific the mission, the more consistent the outputs.

How do we want it to behave?

Tone, persona, level of detail, what it should and shouldn't say. Should it ask clarifying questions or make its best attempt? Should it respond formally or conversationally? Should it always include code examples?

How should it respond?

Format and structure. Should it use bullet points or prose? Return JSON? Keep responses under a certain length? Include disclaimers? Proactively offer next steps?

The System Prompt is How You Define the Mission

In the browser, you typed prompts manually for each conversation. In the API, you define the mission once in a system prompt — a special instruction block that persists across the entire session. Every user message is interpreted through the lens of the system prompt.

Here's a well-structured system prompt for a coding assistant:

SYSTEM_PROMPT = """
You are a Python programming assistant for junior software developers.

Your role:
- Answer questions about Python syntax, standard library, and common patterns
- Review code snippets and suggest improvements
- Explain error messages with clear, actionable fixes

Your constraints:
- Keep responses concise — under 300 words unless a longer explanation is clearly necessary
- Always include a code example when explaining a concept
- If a question is outside Python programming, politely redirect

Your tone:
- Friendly and encouraging, but technically precise
- Do not add excessive warnings or caveats
"""

Notice how this answers all three questions: what it does (Python assistant), how it behaves (concise, code examples, redirect off-topic), and how it responds (under 300 words, code blocks).

A system prompt is a promise you make to the model about what context it's operating in. The more clearly you define that context, the more reliably it operates within it.

Choosing your Model

models

Gemini is a family of models, not a single model. Each variant in the family makes a different trade-off between capability, speed, and cost. Choosing the right one for your use case matters — especially when you're paying per token or optimizing for latency.

The Gemini Model Family

As of this writing, the main Gemini models you'll encounter are:

Model	Best For	Context Window	Notes
`gemini-2.5-flash`	Production apps, fast responses	1M tokens	Best balance of speed and quality
`gemini-2.5-flash-lite`	High-volume, cost-sensitive tasks	1M tokens	Fastest and cheapest; lower reasoning quality
`gemini-2.5-pro`	Complex reasoning, long documents	1M tokens	Most capable in stable family; higher cost and latency

For this module, gemini-2.5-flash is the right choice. It's fast, capable, and free within the API's generous rate limits — which makes it ideal for learning.

MODEL_NAME = "gemini-2.5-flash"

How to Think About Model Selection

The decision tree for model selection is simple:

Does the task involve very long documents or complex multi-step reasoning? → Use gemini-3.1-pro-preview
Is cost or throughput the primary constraint? → Use gemini-2.5-flash-lite
Everything else → Use gemini-2.5-flash

As you build more complex applications, you may run the same request against multiple models and compare outputs — a practice called model evaluation. For now, gemini-2.5-flash handles everything in this module.

This information is consistently changing so please ensure to checkout Gemini API Docs for updates!

Installing the Required Packages

Before we start adding content onto our project, we need to install the following two dependencies to our Python environment.

pip install -q -U google-genai python-dotenv

google-genai — Google's official Python SDK for Gemini
python-dotenv — loads .env files into environment variables

Handling Authentication

auth

To use the Gemini API, you need an API key — a credential that identifies your application to Google's servers and authorizes API calls against your account.

Getting a Gemini API Key

Go to Google AI Studio
Sign in with a Google account
Click "Get API key" in the top nav
Create a new key (or use an existing one from a project)
Copy the key — it starts with AIza...

Storing Your Key Safely

Never hardcode your API key in source code. If you push a hardcoded key to GitHub, automated scanners will find it within minutes and it will be compromised.

The correct approach is to store it as an environment variable within a .env file and read it at runtime.

Due to the way Gemini reads api keys, you must store this key within your environment under the variable of GEMINI_API_KEY otherwise it will fail to run correctly.

Step 1 — Create a .env file in your project root:

# .env
GEMINI_API_KEY=AIzaSy...your-key-here

Step 2 — Add .env to your .gitignore:

# .gitignore
.env

Step 3 — Read the key in Python using python-dotenv:

import os
from dotenv import load_dotenv

load_dotenv()  # reads .env into os.environ

api_key = os.getenv("GEMINI_API_KEY")
if not api_key:
    raise ValueError("GEMINI_API_KEY not set. Check your .env file.")

client = genai.Client(api_key=api_key)

This pattern — load from environment, validate it exists, raise clearly if it doesn't — is the standard approach across all API integrations, not just Gemini.

Communicate with AI

communication

Now that you have a model, prompt and API key within your project, we can confidently begin to send requests to genai and receive our selected models responses.

client = genai.Client(api_key=api_key)
user_input = input("What is your question?\n")
response = client.models.generate_content(
    model=MODEL_NAME,
    contents=SYSTEM_PROMPT + "\n\nUser question: " + user_input
)
print(response.text)

Now our Python script is capable of executing the following:

Insert our API key from a safe location to our Python code
Create an instance of Client that is authenticated as ourselves.
Capture user input regarding Pythonic questions.
Send a request to GenAI with the appropriate model, prompt, and user input
Print out a response.

Conclusion

You've moved from typing prompts in a browser to sending them programmatically — and that shift matters. Every AI-powered application, from a customer support bot to a code reviewer to a document summarizer, is built on exactly this foundation: a system prompt that defines the mission, a model chosen for the task, secure authentication, and a call that returns a response.

What you built is minimal by design. A single-turn script that takes one question and returns one answer. That's intentional — understanding the simplest version of the loop before adding complexity is how you avoid building on shaky ground.

In the next lesson you'll extend this foundation. You'll learn how to maintain conversation history across multiple turns, adjust model parameters to control response behavior, and structure your code so the AI component is a clean, reusable piece of a larger application — not just a script that runs once and exits.