Prompt
HTML Report Generation Prompt
HTML Report Generation Prompt
Copy
# ============================================================================
# PLAYBOOK CONSTRUCTION SKELETON - BEST PRACTICES (GENERALIZED)
# ============================================================================
# A comprehensive guide for building data-driven HTML reports with interactivity
# ============================================================================
## OVERALL STRUCTURE
1. CONFIGURATION & SETUP
2. DATA LOADING (External Files)
3. DATA QUERYING (Primary Data)
4. DATA QUERYING (Detail/Subcategory Data)
5. HELPER FUNCTIONS
6. DATA PROCESSING & TRANSFORMATION
7. METRIC DEFINITIONS
8. TABLE/REPORT BUILDING
9. DATA SERIALIZATION (to JavaScript)
10. HTML GENERATION
11. FILE OUTPUT
# ============================================================================
# STEP 1: CONFIGURATION & SETUP
# ============================================================================
Purpose: Define constants, parameters, and report scope
- Organization/entity identifier
- Data segment combinations (e.g., categories, regions, products)
- Date ranges
- Business logic constants
Best Practices:
- Use descriptive constant names in UPPER_CASE
- Group related configurations together
- Document any non-obvious business rules
- Make it easy to add/remove segments
# ============================================================================
# STEP 2: LOAD EXTERNAL DATA (Excel/CSV files)
# ============================================================================
Purpose: Load reference data from uploaded files (budgets, targets, benchmarks)
- Use try/except for error handling
- Create mappings between file data and database values
- Filter to relevant scope (organization, year, etc.)
- Handle missing files gracefully
Best Practices:
- Always use try/except when loading external files
- Create empty DataFrames as fallback
- Print clear success/error messages
- Validate data after loading (check for required columns)
- Map file names to database names explicitly
# ============================================================================
# STEP 3: QUERY PRIMARY DATA
# ============================================================================
Purpose: Pull main metrics from database
- Loop through each data segment combination
- Store results in dictionary keyed by combination
- Print progress for transparency
- Handle empty results gracefully
Best Practices:
- Use enumerate() for progress tracking
- Store in dictionary with tuple keys for multi-dimensional grouping
- Skip empty results but log them
- Print row counts for validation
- Use consistent filter structure across all queries
# ============================================================================
# STEP 4: QUERY DETAIL/SUBCATEGORY DATA
# ============================================================================
Purpose: Pull granular breakdowns (subcategories, line items, etc.)
- Similar loop structure as primary query
- Include additional dimensions for granularity
- CRITICAL: Include parent category dimension to prevent duplication
Best Practices:
- Only query details for segments with primary data
- Always include parent category to prevent row duplication
- Use consistent filter structure across queries
- Log when detail data is missing
- Verify no duplicate rows in results
# ============================================================================
# STEP 5: DEFINE HELPER FUNCTIONS
# ============================================================================
Purpose: Reusable functions for calculations and formatting
Key functions to include:
- format_value() - Format numbers as currency, percent, decimal, etc.
- get_value() - Extract single period value
- get_annual_value() - Calculate annual aggregate (weighted if needed)
- get_ytd_value() - Calculate year-to-date aggregate (weighted if needed)
- get_detail_value() - Extract subcategory values
- get_reference_value() - Lookup from external file (targets, benchmarks)
Best Practices:
- Add docstrings to all functions
- Handle None/NaN values gracefully
- Use consistent parameter names
- Include type hints for clarity
- Test edge cases (empty data, division by zero)
- Return None for missing data, not 0
# ============================================================================
# STEP 6: PROCESS & TRANSFORM DATA
# ============================================================================
Purpose: Clean, calculate derived metrics, merge datasets
- Convert date columns to datetime
- Calculate derived metrics (ratios, normalized values)
- Merge detail data with primary data
- Store in organized dictionary structure
Best Practices:
- Convert dates early in the process
- Handle division by zero explicitly
- Use .copy() to avoid SettingWithCopyWarning
- Store related datasets together in nested dictionaries
- Validate calculations with spot checks
- Create all derived metrics before building tables
# ============================================================================
# STEP 7: DEFINE COMPREHENSIVE METRICS LIST
# ============================================================================
Purpose: Create master list of all metrics to display
Structure: (display_name, column_name, format_type, is_indent, data_source, filter)
This creates a single source of truth for report structure.
Best Practices:
- Extract subcategories from actual data (don't hardcode)
- Use consistent naming conventions
- Group related metrics together
- Use indentation for hierarchical metrics
- Document any special calculation requirements
- Keep list in logical display order
# ============================================================================
# STEP 8: BUILD TABLES WITH DATA POPULATION
# ============================================================================
Purpose: Create final data tables for each segment combination
- Determine available time periods dynamically
- Loop through metrics list
- Populate each column (years, months, YTD, targets)
- Apply conditional formatting logic
- Store as DataFrame
Best Practices:
- Determine time periods dynamically from data
- Use consistent column naming patterns
- Handle parsing errors gracefully
- Consider metric directionality for conditional formatting
- Validate table structure before proceeding
- Apply conditional formatting after all data is populated
# ============================================================================
# STEP 9: CONVERT TO JAVASCRIPT DATA STRUCTURE
# ============================================================================
Purpose: Serialize Python data to JSON for HTML interactivity
- Organize by hierarchical structure (Category → Segment)
- Convert DataFrames to column-oriented dictionaries
- Serialize to JSON string for embedding
Best Practices:
- Use column-oriented format for efficient rendering
- Indent JSON for readability during debugging
- Validate JSON structure before embedding
- Consider data size (compress if needed)
- Test with special characters in data
# ============================================================================
# STEP 10: GENERATE HTML REPORT
# ============================================================================
Purpose: Create interactive HTML with embedded data and styling
Key Components:
1. Version/Timestamp generation
2. CSS styling (colors, layout, responsive design)
3. HTML structure (header, filters, tables, footer)
4. JavaScript for interactivity
5. Embedded data
Best Practices:
- Use semantic HTML5 elements
- Include meta tags for caching and viewport
- Embed CSS and JavaScript for portability
- Use CSS custom properties for theming
- Make responsive with media queries
- Add accessibility features (ARIA labels)
- Include version in multiple places
# ============================================================================
# STEP 11: FILE OUTPUT
# ============================================================================
Purpose: Save HTML file with proper naming convention
Best Practices:
- Use descriptive filenames with version
- Include UTF-8 encoding for special characters
- Log file size and key statistics
- Provide clear completion message
- Include next steps or usage instructions
# ============================================================================
# KEY DESIGN PATTERNS
# ============================================================================
1. Separation of Concerns
- Data acquisition (Steps 2-4)
- Data processing (Steps 5-6)
- Presentation (Steps 9-10)
- Each step has a single responsibility
2. Error Handling
- Try/except for external dependencies
- Check for empty results before proceeding
- Graceful degradation (continue without optional data)
- Clear error messages for debugging
3. Progress Tracking
- Print statements at each major step
- Row counts and success indicators
- Enumeration for loop progress
- Final summary with statistics
4. Reusability
- Helper functions for common operations
- Parameterized queries
- Modular HTML/CSS/JS components
- Configuration at the top
5. Maintainability
- Clear section headers with separators
- Inline comments for complex logic
- Consistent naming conventions
- Version tracking throughout
6. Performance
- Batch queries by segment
- Store results in dictionaries (O(1) lookup)
- Minimize redundant calculations
- Efficient data structures (column-oriented for JS)
7. Data Integrity
- Include parent categories to prevent duplication
- Use weighted averages for rates
- Validate calculations with spot checks
- Handle division by zero explicitly
# ============================================================================
# COMMON PITFALLS TO AVOID
# ============================================================================
## 1. DATA DUPLICATION
Problem: Querying subcategories without parent category causes duplicate rows
Solution: Always include parent_category dimension in detail queries
Problem: Using simple averages for rates/ratios
Solution: Use weighted averages by volume/member months
## 2. DATE HANDLING
Problem: Assuming dates are datetime objects
Solution: Convert to datetime early with pd.to_datetime()
Problem: Hardcoding month ranges
Solution: Determine dynamically from data (df['date'].max())
## 3. FORMATTING
Problem: Not checking for None/NaN before formatting
Solution: Always check: if value is None or pd.isna(value): return '-'
Problem: Inconsistent decimal places
Solution: Use format_type parameter consistently
## 4. CALCULATIONS
Problem: Division by zero errors
Solution: Check denominator > 0 before dividing
Problem: Wrong aggregation method for rates
Solution: Weighted average by volume, not simple average
## 5. HTML/JAVASCRIPT
Problem: Not escaping special characters in strings
Solution: Use json.dumps() for safe serialization
Problem: Hardcoding column names in JavaScript
Solution: Iterate through data keys dynamically
## 6. CONDITIONAL FORMATTING
Problem: Assuming lower is always better
Solution: Consider metric directionality (cost vs revenue)
Problem: Applying formatting to wrong column
Solution: Target specific columns (e.g., Current YTD only)
# ============================================================================
# QUICK REFERENCE
# ============================================================================
Step 1: Configure (constants, segments)
Step 2: Load external files (budgets, targets)
Step 3: Query primary data (main metrics)
Step 4: Query detail data (subcategories with parent category)
Step 5: Define helpers (format, calculate, lookup)
Step 6: Process data (transform, derive, merge)
Step 7: Define metrics (master list with formatting)
Step 8: Build tables (populate, format, conditionally style)
Step 9: Serialize to JS (column-oriented JSON)
Step 10: Generate HTML (embed data, CSS, JavaScript)
Step 11: Output file (save with version)
Critical Success Factors:
- Include parent category in detail queries (prevents duplication)
- Use weighted averages for rates (not simple averages)
- Handle None/NaN before formatting (prevents errors)
- Test with multiple segments (ensures flexibility)
- Version everything (enables tracking)