5.3. Remote MLflow Tracking#

While local MLflow storage is convenient for individual work, remote MLflow servers enable team collaboration, centralized experiment management, and production deployments.

5.3.1. Overview#

By default, mloptimizer uses local file-based MLflow tracking. However, you can configure it to use remote MLflow tracking servers by setting the tracking URI before creating your GeneticSearch instance.

mloptimizer fully supports remote MLflow servers - no special configuration is needed beyond setting the tracking URI using the MLflow API.

5.3.2. Configuration Methods#

5.3.2.2. Method 2: Environment Variable#

Set the MLFLOW_TRACKING_URI environment variable:

export MLFLOW_TRACKING_URI=http://mlflow-server.company.com:5000
python your_optimization_script.py
# In your script - no tracking URI configuration needed
from mloptimizer.interfaces import GeneticSearch

# Will automatically use MLFLOW_TRACKING_URI
opt = GeneticSearch(..., use_mlflow=True)
opt.fit(X, y)

5.3.2.3. Configuration Priority#

MLflow uses this priority order:

  1. Explicit mlflow.set_tracking_uri() call (highest priority)

  2. MLFLOW_TRACKING_URI environment variable

  3. Default local file-based (./mlruns/)

5.3.3. Starting a Local MLflow Server#

You can run MLflow in server mode on your local machine for testing remote configuration:

5.3.3.1. Basic Local Server#

mlflow server --host 127.0.0.1 --port 5000

5.3.3.2. With Database Backend#

mlflow server \
    --backend-store-uri sqlite:///mlflow.db \
    --default-artifact-root ./mlruns \
    --host 127.0.0.1 \
    --port 5000

Then configure your code to use it:

import mlflow
mlflow.set_tracking_uri("http://127.0.0.1:5000")

5.3.4. Production MLflow Server Setup#

For production environments, use a robust database backend and remote artifact storage.

5.3.4.1. PostgreSQL Backend with S3 Artifacts#

mlflow server \
    --backend-store-uri postgresql://user:password@host:5432/mlflow_db \
    --default-artifact-root s3://your-bucket/mlflow-artifacts \
    --host 0.0.0.0 \
    --port 5000

Then in your code:

import mlflow
import os

# Configure AWS credentials (if needed)
os.environ['AWS_ACCESS_KEY_ID'] = 'your-key'
os.environ['AWS_SECRET_ACCESS_KEY'] = 'your-secret'

# Point to production server
mlflow.set_tracking_uri("http://mlflow-prod.company.com:5000")

# Run optimization
opt = GeneticSearch(..., use_mlflow=True)
opt.fit(X, y)

5.3.4.2. MySQL Backend with Azure Artifacts#

mlflow server \
    --backend-store-uri mysql://user:password@host:3306/mlflow_db \
    --default-artifact-root wasbs://container@account.blob.core.windows.net/mlflow \
    --host 0.0.0.0 \
    --port 5000

5.3.5. Cloud-Based MLflow#

5.3.5.1. Databricks MLflow#

Databricks provides managed MLflow hosting:

import mlflow

# Configure Databricks MLflow
mlflow.set_tracking_uri("databricks")
mlflow.set_experiment("/Users/your-name/model-optimization")

# Requires databricks-cli configured
opt = GeneticSearch(..., use_mlflow=True)
opt.fit(X, y)

5.3.5.2. AWS Managed MLflow#

If using AWS SageMaker with MLflow:

import mlflow

mlflow.set_tracking_uri("https://your-mlflow-endpoint.amazonaws.com")
opt = GeneticSearch(..., use_mlflow=True)
opt.fit(X, y)

5.3.6. Team Collaboration Example#

Setting up MLflow for team collaboration:

5.3.6.1. Server Setup (DevOps)#

# On dedicated MLflow server
mlflow server \
    --backend-store-uri postgresql://mlflow:password@db.company.com:5432/mlflow \
    --default-artifact-root s3://company-mlflow/artifacts \
    --host 0.0.0.0 \
    --port 80

5.3.6.2. Team Member Configuration#

Each team member configures the same tracking URI:

import mlflow
from mloptimizer.interfaces import GeneticSearch

# Point to shared server
mlflow.set_tracking_uri("http://mlflow.company.com")

# Use team experiment
mlflow.set_experiment("team_model_optimization")

# Run optimization - visible to whole team
opt = GeneticSearch(
    estimator_class=RandomForestClassifier,
    hyperparam_space=space,
    use_mlflow=True
)

opt.fit(X, y)

All runs are logged to the central server and visible to the entire team via the MLflow UI at http://mlflow.company.com

5.3.7. Viewing Results from Remote Server#

5.3.7.1. MLflow UI from Remote Server#

If the MLflow server has a web UI (it does by default):

# Just open in browser
http://mlflow-server.company.com:5000

Or configure a local MLflow UI to connect to the remote server:

mlflow ui --backend-store-uri http://mlflow-server.company.com:5000

5.3.7.2. Python API with Remote Server#

import mlflow

# Configure remote server
mlflow.set_tracking_uri("http://mlflow-server.company.com:5000")

# Query runs from remote server
runs = mlflow.search_runs(experiment_ids=["1"])
print(f"Found {len(runs)} runs on remote server")

5.3.8. Verification Test#

The existing MLflow test demonstrates remote server usage:

# From mloptimizer/test/interfaces/api/test_genetic_search_mlflow_tracking.py

MLFLOW_PORT = 5001
MLFLOW_TRACKING_URI = f"http://127.0.0.1:{MLFLOW_PORT}"

def test_genetic_search_creates_mlflow_runs(mlflow_server):
    # Configure remote server
    mlflow.set_tracking_uri(MLFLOW_TRACKING_URI)

    # Run optimization - logs to remote server
    search = GeneticSearch(
        estimator_class=DecisionTreeClassifier,
        hyperparam_space=space,
        use_mlflow=True
    )
    search.fit(X_train, y_train)

    # Verify runs logged to remote server
    client = MlflowClient(tracking_uri=MLFLOW_TRACKING_URI)
    runs = client.search_runs([experiment.experiment_id])
    assert len(runs) > 0  # Runs found on remote server

Run this test to verify remote MLflow works:

pytest mloptimizer/test/interfaces/api/test_genetic_search_mlflow_tracking.py -v

5.3.9. Security Considerations#

When using remote MLflow servers:

5.3.9.1. Authentication#

For production servers, enable authentication:

mlflow server \
    --backend-store-uri postgresql://... \
    --default-artifact-root s3://... \
    --host 0.0.0.0 \
    --port 5000 \
    --app-name basic-auth

Configure credentials:

import os

os.environ['MLFLOW_TRACKING_USERNAME'] = 'your-username'
os.environ['MLFLOW_TRACKING_PASSWORD'] = 'your-password'

mlflow.set_tracking_uri("http://mlflow-server.company.com:5000")

5.3.9.2. HTTPS#

Use HTTPS for production:

mlflow.set_tracking_uri("https://mlflow-server.company.com")

5.3.9.3. Network Security#

  • Configure firewalls to restrict MLflow server access

  • Use VPN for accessing internal MLflow servers

  • Implement proper database access controls

5.3.10. Troubleshooting#

5.3.10.1. Cannot Connect to Server#

# Test connection
import mlflow
mlflow.set_tracking_uri("http://mlflow-server.company.com:5000")

try:
    experiments = mlflow.search_experiments()
    print(f"✓ Connected successfully. Found {len(experiments)} experiments.")
except Exception as e:
    print(f"✗ Connection failed: {e}")

Common issues:

  • Server not running: Check mlflow server process

  • Firewall: Verify port is open

  • Wrong URL: Check hostname and port

  • Network: Verify connectivity with ping or curl

5.3.10.2. Different Results on Different Machines#

Ensure all team members use the same tracking URI:

import mlflow
print(f"Current tracking URI: {mlflow.get_tracking_uri()}")

5.3.10.3. Slow Performance#

For large artifacts, use appropriate storage:

  • Local server: Use SSD storage

  • Remote server: Use S3/Azure/GCS for artifacts

  • Database: Use PostgreSQL/MySQL instead of SQLite

Tip

Start with a local MLflow server for development, then migrate to a production server with database backend and cloud storage as your needs grow.

Note

Remote MLflow servers require network connectivity. Ensure your optimization runs can reach the server, or use local fallback for offline development.

Warning

Storing credentials in code is insecure. Use environment variables or secure secret management for production deployments.