cyberneticlibrary

Analyze categorical data with charts

comparison-analysisskillsetup L31,354
OpenSenseNova/SenseNova-Skills
What it does

Analyze categorical data with comparison stats and charts

Best for

Quick categorical comparison analysis when data is spread across Excel sheets and visual charts (bar + pie) help identify distribution patterns.

Inputs
  • · Excel file (.xlsx) with data across multiple sheets
  • · two categorical dimensions to compare
  • · row count per sheet (to assess if large-file optimization needed)
Outputs
  • · total row count across all sheets
  • · data cleaned: merged cells filled (ffill), empty values dropped, placeholder rows excluded
  • · categorization statistics: count per category, difference, percent distribution
  • · multi-dimensional comparison table
  • · bar chart (matplotlib, colorized, labeled)
  • · pie chart (matplotlib, percentage labels)
  • · Excel export of analysis report
  • · download link to report
Requires
  • · pandas (read_excel, ffill, groupby, count)
  • · matplotlib (bar chart, pie chart, Chinese font config)
Preconditions
  • · Excel file (.xlsx) with multiple sheets
  • · Two categorical dimensions identified for comparison
  • · File size assessed (total row count determines optimization strategy)
Failure modes
  • · Merged cells not handled (data cells treated as empty)
  • · Placeholder rows ('代码', '名称') not excluded (inflates counts)
  • · Empty values not dropped (distorts statistics)
  • · Comparison table is sparse or missing (no aggregation)
  • · Charts lack labels or units (hard to interpret)
  • · Chinese font not configured in matplotlib (mojibake output)
  • · Large files processed without streaming (memory overflow)
Trust signals
  • · Five-step workflow: count rows → clean (ffill + exclude) → aggregate → visualize → export
  • · Chinese font configuration explicit (plt.rcParams for SimHei/DejaVu)
  • · DataFrame operations named: ffill (merged cells), dropna, groupby, count
  • · Two visualization types: bar chart (with value labels) + pie chart (with percentages)
  • · Merged cell handling documented (ffill strategy)
  • · Placeholder exclusion pattern ('代码' as exclude_val example)