Performance Optimization Summary¶

🎯 Achievement¶

7-8x Performance Improvement achieved through systematic optimizations!

Sequential: 5.12s
Optimized: 0.67s
Speedup: 7.68x
Time saved: 87%

📊 Final Benchmark Results¶

Comprehensive Comparison (15 requests)¶

Method	Time	Req/s	Apps/s	Speedup
Asyncio+Executor	0.67s	22.4	2,248	7.68x 🥇
Batch-ALL	0.69s	21.7	2,175	7.43x 🥈
Batch-5chunk	1.24s	12.1	1,214	4.15x
ThreadPool-10	1.24s	12.1	1,205	4.12x
ThreadPool-5	1.32s	11.4	1,136	3.88x
Sequential	5.12s	2.9	293	1.00x 📍

Smaller Scale (9 requests)¶

Method	Time	Apps	Speedup
Batch-ALL	0.68s	900	4.98x 🚀
Batch-3chunk	1.08s	900	3.14x
Sequential	3.39s	900	1.00x

Larger Scale (25 requests)¶

Method	Time	Req/s	Speedup
Batch-ALL	1.25s	20.05	7.97x 🚀
Batch-5chunk	2.82s	8.85	3.52x
Sequential	9.94s	2.51	1.00x

🔧 Implemented Optimizations¶

1. Batch Processing Functions ⚡⚡⚡ (CRITICAL)¶

Problem: Multiple block_on calls create overhead

# ❌ Before: 15 block_on calls
for req in requests:
    result = fetch_and_parse_list(...)  # Each call: runtime enter/exit

Solution: Single block_on with parallel futures

// ✅ After: 1 block_on call
runtime.block_on(async {
    let futures = requests.iter().map(|req| fetch(...));
    try_join_all(futures).await  // True parallel!
})

Impact: 7-8x faster!

2. Global HTTP Client ⚡⚡ (HIGH)¶

Before:

let client = PlayStoreClient::new(timeout)?;  // New client each time

After:

static HTTP_CLIENT: Lazy<PlayStoreClient> = Lazy::new(|| {
    PlayStoreClient::new(30).expect("Failed to create HTTP client")
});

Benefits:

TCP connection reuse
Connection pooling
~10% performance improvement

3. CPU-Aware Tokio Runtime ⚡⚡ (HIGH)¶

Before:

.worker_threads(4)  // Hardcoded

After:

let num_cpus = std::thread::available_parallelism()
    .map(|n| n.get())
    .unwrap_or(4);
let worker_threads = (num_cpus / 2).clamp(2, 8);

Benefits:

Adapts to system resources
Leaves CPU for Python threads
~5% performance improvement

4. Memory Optimization ⚡ (Nice-to-have)¶

String Interning:

Without: ~3,870 bytes (15 requests)
With: ~630 bytes (15 requests)
Savings: 83.7%

BatchRequestBuilder:

from playfast import BatchRequestBuilder

builder = BatchRequestBuilder(
    collection="topselling_free", lang="en", intern_strings=True
)

requests = list(builder.build_list_requests(countries=countries, categories=categories))

📈 Performance Analysis¶

Key Insights¶

1. block_on Count is Critical¶

Sequential (15x block_on):     5.12s
Batch-5chunk (3x block_on):    1.24s (4.1x faster)
Batch-ALL (1x block_on):       0.69s (7.4x faster)

Conclusion: Minimize block_on calls!

2. Batch vs Asyncio: Nearly Identical¶

Asyncio+Executor: 0.67s
Batch-ALL:        0.69s
Difference:       0.02s (3%)

Why?:

Both use Python 3.14t free-threading
Both achieve true parallelism
Asyncio has slightly less overhead

3. Chunking Trade-off¶

Batch-ALL (1 call):        0.69s
Batch-5chunk (3 calls):    1.24s
Batch-10chunk (2 calls):   ~0.9s (estimated)

Use chunking when:

Memory constraints
Need progress updates
Better error handling

4. Memory vs Performance¶

Optimization	Performance Impact	Memory Savings
Batch functions	+700% 🚀	-
block_on reduction	+700% 🚀	-
Global HTTP client	+10%	Connection pool
String interning	< 5%	83.7% 💾
Generators	0%	Lazy allocation

Priority: Performance > Memory (but we got both!)

💡 Usage Guide¶

When to Use What¶

Small Batches (< 10 requests) - Simple API¶

# NEW: High-level API - simplest and most intuitive!
from playfast import fetch_category_lists

results = fetch_category_lists(
    countries=["us", "kr", "jp"], categories=["GAME_ACTION", "SOCIAL"], num_results=100
)

Medium Batches (10-50 requests) - Organized Results¶

# NEW: Get organized results by country and category
from playfast import fetch_top_apps

organized = fetch_top_apps(
    countries=["us", "kr", "jp"], categories=["GAME_ACTION", "SOCIAL"], num_results=100
)

# Easy access
us_games = organized["us"]["GAME_ACTION"]
kr_social = organized["kr"]["SOCIAL"]

Large Batches (50+ requests) - Builder Pattern¶

# NEW: Use BatchFetcher for defaults and reuse
from playfast import BatchFetcher

fetcher = BatchFetcher(
    lang="en", default_num_results=100, default_collection="topselling_free"
)

# Multiple fetches with shared defaults
batch1 = fetcher.category_lists(
    countries=["us", "kr", "jp"], categories=["GAME_ACTION", "SOCIAL"]
)

batch2 = fetcher.category_lists(
    countries=["de", "gb", "fr"], categories=["PRODUCTIVITY", "ENTERTAINMENT"]
)

Complex Workflows - AsyncIO Integration¶

# Asyncio for integration with other async code
import asyncio
from playfast import fetch_category_lists


async def complex_workflow():
    # Mix with other async operations
    other_data = await fetch_something_else()

    # Use asyncio.to_thread for batch functions
    rust_results = await asyncio.to_thread(
        fetch_category_lists,
        countries=["us", "kr"],
        categories=["GAME_ACTION"],
        num_results=100,
    )

    return combine(other_data, rust_results)

Advanced: Low-Level API (if needed)¶

# For advanced users who need fine control
from playfast._core import fetch_and_parse_list_batch
from playfast import BatchRequestBuilder

builder = BatchRequestBuilder()
requests = list(builder.build_list_requests(countries, categories))
results = fetch_and_parse_list_batch(requests)

🎯 Best Practices¶

DO ✅¶

Use high-level batch functions (NEW!)

# ✅ Best: Simple and intuitive
from playfast import fetch_category_lists

results = fetch_category_lists(
    countries=["us", "kr"], categories=["GAME_ACTION"], num_results=100
)

Use organized results for easy access

# ✅ Good: Easy to navigate
from playfast import fetch_top_apps

organized = fetch_top_apps(countries, categories)
us_games = organized["us"]["GAME_ACTION"]

Use BatchFetcher for multiple batches

# ✅ Good: Reuse defaults
from playfast import BatchFetcher

fetcher = BatchFetcher(lang="en", default_num_results=100)
batch1 = fetcher.category_lists(countries1, categories)
batch2 = fetcher.category_lists(countries2, categories)

Use asyncio.to_thread for complex workflows

# ✅ Good: Integrates with other async code
results = await asyncio.to_thread(fetch_category_lists, countries, categories)

DON'T ❌¶

Don't access _core directly (unless needed)

# ❌ Bad: Low-level API (harder to use)
from playfast._core import fetch_and_parse_list_batch

requests = build_requests_manually()
results = fetch_and_parse_list_batch(requests)

# ✅ Good: High-level API (easier)
from playfast import fetch_category_lists

results = fetch_category_lists(countries, categories)

Don't call single functions in a loop

# ❌ Bad: Multiple block_on calls
from playfast._core import fetch_and_parse_list

for country in countries:
    result = fetch_and_parse_list(...)  # Slow!

# ✅ Good: Single batch call
from playfast import fetch_category_lists

results = fetch_category_lists(countries, categories)

Don't use executors for simple cases

# ❌ Bad: Unnecessary complexity
with ThreadPoolExecutor() as executor:
    futures = [executor.submit(...) for ...]

# ✅ Good: Just use batch functions
results = fetch_category_lists(countries, categories)

📚 Documentation¶

See examples/03_batch_api.py for high-level API examples
MEMORY_OPTIMIZATION.md - Memory analysis and BatchRequestBuilder
PERFORMANCE_IMPROVEMENTS.md - Detailed technical implementation
See benchmarks/ folder for all benchmark scripts

🎉 Summary¶

What We Achieved¶

Metric	Before	After	Improvement
Time (15 req)	5.12s	0.67s	87% faster
Throughput	293 apps/s	2,248 apps/s	7.68x
block_on calls	15	1	93% reduction
Memory (10K req)	~5 MB	~0.9 MB	83% reduction
Code complexity	Same	Same	No regression

Key Takeaways¶

New high-level API is easier 📝
No _core access needed
No manual request building
Organized results available
Example: fetch_category_lists(countries, categories)
Batch processing = 7-8x speedup 🚀
Most critical optimization
Simple to use with new API
Works out of the box
block_on count matters most ⚡
15 calls → 1 call = 7.4x faster
Even chunking (3 calls) = 4.1x faster
Memory optimizations are free 💾
Python auto-interns literals
BatchRequestBuilder adds < 5% overhead
83.7% memory savings
API Levels for different needs 🎯
High-level: fetch_category_lists() - Easy
Mid-level: BatchRequestBuilder - Flexible
Low-level: _core.* - Advanced control

Future Improvements¶

Automatic batch size optimization
Per-request error handling (return Result for each)
Request prioritization within batches
Streaming batch results
Python 3.14t further optimizations

Last Updated: 2025-10-13 Benchmark Environment: Windows, 16-core CPU, Python 3.14t