Performance Optimization Summary¶
๐ฏ Achievement¶
7-8x Performance Improvement achieved through systematic optimizations!
- Sequential: 5.12s
- Optimized: 0.67s
- Speedup: 7.68x
- Time saved: 87%
๐ Final Benchmark Results¶
Comprehensive Comparison (15 requests)¶
| Method | Time | Req/s | Apps/s | Speedup |
|---|---|---|---|---|
| Asyncio+Executor | 0.67s | 22.4 | 2,248 | 7.68x ๐ฅ |
| Batch-ALL | 0.69s | 21.7 | 2,175 | 7.43x ๐ฅ |
| Batch-5chunk | 1.24s | 12.1 | 1,214 | 4.15x |
| ThreadPool-10 | 1.24s | 12.1 | 1,205 | 4.12x |
| ThreadPool-5 | 1.32s | 11.4 | 1,136 | 3.88x |
| Sequential | 5.12s | 2.9 | 293 | 1.00x ๐ |
Smaller Scale (9 requests)¶
| Method | Time | Apps | Speedup |
|---|---|---|---|
| Batch-ALL | 0.68s | 900 | 4.98x ๐ |
| Batch-3chunk | 1.08s | 900 | 3.14x |
| Sequential | 3.39s | 900 | 1.00x |
Larger Scale (25 requests)¶
| Method | Time | Req/s | Speedup |
|---|---|---|---|
| Batch-ALL | 1.25s | 20.05 | 7.97x ๐ |
| Batch-5chunk | 2.82s | 8.85 | 3.52x |
| Sequential | 9.94s | 2.51 | 1.00x |
๐ง Implemented Optimizations¶
1. Batch Processing Functions โกโกโก (CRITICAL)¶
Problem: Multiple block_on calls create overhead
# โ Before: 15 block_on calls
for req in requests:
result = fetch_and_parse_list(...) # Each call: runtime enter/exit
Solution: Single block_on with parallel futures
// โ
After: 1 block_on call
runtime.block_on(async {
let futures = requests.iter().map(|req| fetch(...));
try_join_all(futures).await // True parallel!
})
Impact: 7-8x faster!
2. Global HTTP Client โกโก (HIGH)¶
Before:
After:
static HTTP_CLIENT: Lazy<PlayStoreClient> = Lazy::new(|| {
PlayStoreClient::new(30).expect("Failed to create HTTP client")
});
Benefits:
- TCP connection reuse
- Connection pooling
- ~10% performance improvement
3. CPU-Aware Tokio Runtime โกโก (HIGH)¶
Before:
After:
let num_cpus = std::thread::available_parallelism()
.map(|n| n.get())
.unwrap_or(4);
let worker_threads = (num_cpus / 2).clamp(2, 8);
Benefits:
- Adapts to system resources
- Leaves CPU for Python threads
- ~5% performance improvement
4. Memory Optimization โก (Nice-to-have)¶
String Interning:
- Without: ~3,870 bytes (15 requests)
- With: ~630 bytes (15 requests)
- Savings: 83.7%
BatchRequestBuilder:
from playfast import BatchRequestBuilder
builder = BatchRequestBuilder(
collection="topselling_free", lang="en", intern_strings=True
)
requests = list(builder.build_list_requests(countries=countries, categories=categories))
๐ Performance Analysis¶
Key Insights¶
1. block_on Count is Critical¶
Sequential (15x block_on): 5.12s
Batch-5chunk (3x block_on): 1.24s (4.1x faster)
Batch-ALL (1x block_on): 0.69s (7.4x faster)
Conclusion: Minimize block_on calls!
2. Batch vs Asyncio: Nearly Identical¶
Why?:
- Both use Python 3.14t free-threading
- Both achieve true parallelism
- Asyncio has slightly less overhead
3. Chunking Trade-off¶
Use chunking when:
- Memory constraints
- Need progress updates
- Better error handling
4. Memory vs Performance¶
| Optimization | Performance Impact | Memory Savings |
|---|---|---|
| Batch functions | +700% ๐ | - |
| block_on reduction | +700% ๐ | - |
| Global HTTP client | +10% | Connection pool |
| String interning | < 5% | 83.7% ๐พ |
| Generators | 0% | Lazy allocation |
Priority: Performance > Memory (but we got both!)
๐ก Usage Guide¶
When to Use What¶
Small Batches (< 10 requests) - Simple API¶
# NEW: High-level API - simplest and most intuitive!
from playfast import fetch_category_lists
results = fetch_category_lists(
countries=["us", "kr", "jp"], categories=["GAME_ACTION", "SOCIAL"], num_results=100
)
Medium Batches (10-50 requests) - Organized Results¶
# NEW: Get organized results by country and category
from playfast import fetch_top_apps
organized = fetch_top_apps(
countries=["us", "kr", "jp"], categories=["GAME_ACTION", "SOCIAL"], num_results=100
)
# Easy access
us_games = organized["us"]["GAME_ACTION"]
kr_social = organized["kr"]["SOCIAL"]
Large Batches (50+ requests) - Builder Pattern¶
# NEW: Use BatchFetcher for defaults and reuse
from playfast import BatchFetcher
fetcher = BatchFetcher(
lang="en", default_num_results=100, default_collection="topselling_free"
)
# Multiple fetches with shared defaults
batch1 = fetcher.category_lists(
countries=["us", "kr", "jp"], categories=["GAME_ACTION", "SOCIAL"]
)
batch2 = fetcher.category_lists(
countries=["de", "gb", "fr"], categories=["PRODUCTIVITY", "ENTERTAINMENT"]
)
Complex Workflows - AsyncIO Integration¶
# Asyncio for integration with other async code
import asyncio
from playfast import fetch_category_lists
async def complex_workflow():
# Mix with other async operations
other_data = await fetch_something_else()
# Use asyncio.to_thread for batch functions
rust_results = await asyncio.to_thread(
fetch_category_lists,
countries=["us", "kr"],
categories=["GAME_ACTION"],
num_results=100,
)
return combine(other_data, rust_results)
Advanced: Low-Level API (if needed)¶
# For advanced users who need fine control
from playfast._core import fetch_and_parse_list_batch
from playfast import BatchRequestBuilder
builder = BatchRequestBuilder()
requests = list(builder.build_list_requests(countries, categories))
results = fetch_and_parse_list_batch(requests)
๐ฏ Best Practices¶
DO โ ¶
- Use high-level batch functions (NEW!)
# โ
Best: Simple and intuitive
from playfast import fetch_category_lists
results = fetch_category_lists(
countries=["us", "kr"], categories=["GAME_ACTION"], num_results=100
)
- Use organized results for easy access
# โ
Good: Easy to navigate
from playfast import fetch_top_apps
organized = fetch_top_apps(countries, categories)
us_games = organized["us"]["GAME_ACTION"]
- Use BatchFetcher for multiple batches
# โ
Good: Reuse defaults
from playfast import BatchFetcher
fetcher = BatchFetcher(lang="en", default_num_results=100)
batch1 = fetcher.category_lists(countries1, categories)
batch2 = fetcher.category_lists(countries2, categories)
- Use asyncio.to_thread for complex workflows
# โ
Good: Integrates with other async code
results = await asyncio.to_thread(fetch_category_lists, countries, categories)
DON'T โ¶
- Don't access _core directly (unless needed)
# โ Bad: Low-level API (harder to use)
from playfast._core import fetch_and_parse_list_batch
requests = build_requests_manually()
results = fetch_and_parse_list_batch(requests)
# โ
Good: High-level API (easier)
from playfast import fetch_category_lists
results = fetch_category_lists(countries, categories)
- Don't call single functions in a loop
# โ Bad: Multiple block_on calls
from playfast._core import fetch_and_parse_list
for country in countries:
result = fetch_and_parse_list(...) # Slow!
# โ
Good: Single batch call
from playfast import fetch_category_lists
results = fetch_category_lists(countries, categories)
- Don't use executors for simple cases
# โ Bad: Unnecessary complexity
with ThreadPoolExecutor() as executor:
futures = [executor.submit(...) for ...]
# โ
Good: Just use batch functions
results = fetch_category_lists(countries, categories)
๐ Documentation¶
- See
examples/03_batch_api.pyfor high-level API examples - MEMORY_OPTIMIZATION.md - Memory analysis and BatchRequestBuilder
- PERFORMANCE_IMPROVEMENTS.md - Detailed technical implementation
- See
benchmarks/folder for all benchmark scripts
๐ Summary¶
What We Achieved¶
| Metric | Before | After | Improvement |
|---|---|---|---|
| Time (15 req) | 5.12s | 0.67s | 87% faster |
| Throughput | 293 apps/s | 2,248 apps/s | 7.68x |
| block_on calls | 15 | 1 | 93% reduction |
| Memory (10K req) | ~5 MB | ~0.9 MB | 83% reduction |
| Code complexity | Same | Same | No regression |
Key Takeaways¶
-
New high-level API is easier ๐
-
No
_coreaccess needed - No manual request building
- Organized results available
-
Example:
fetch_category_lists(countries, categories) -
Batch processing = 7-8x speedup ๐
-
Most critical optimization
- Simple to use with new API
-
Works out of the box
-
block_on count matters most โก
-
15 calls โ 1 call = 7.4x faster
-
Even chunking (3 calls) = 4.1x faster
-
Memory optimizations are free ๐พ
-
Python auto-interns literals
- BatchRequestBuilder adds < 5% overhead
-
83.7% memory savings
-
API Levels for different needs ๐ฏ
-
High-level:
fetch_category_lists()- Easy - Mid-level:
BatchRequestBuilder- Flexible - Low-level:
_core.*- Advanced control
Future Improvements¶
- Automatic batch size optimization
- Per-request error handling (return
Resultfor each) - Request prioritization within batches
- Streaming batch results
- Python 3.14t further optimizations
Last Updated: 2025-10-13 Benchmark Environment: Windows, 16-core CPU, Python 3.14t