Build: #11 was successful Changes by Shawn Booth
Code commits
Pipeline
-
Shawn Booth 66ed1f7d7b56284a5e1a80628300687d878de367
Implement Strategy A: Transient fields for result pickle reduction (92.3% savings)
SUMMARY:
Strategy A marks large flagging view dictionaries and nested result components
as transient fields, preventing them from being pickled. This delivers 92.3%
reduction on large multi-EB datasets (1.24 GB → 94.8 MB) with zero functional
regressions.
VALIDATION RESULTS:
- Small dataset (48 stages): 7.96 MB → 4.47 MB (43.9% reduction)
- Large multi-EB dataset (48 stages): 1.24 GB → 94.8 MB (92.3% reduction)
- Stage 4 (hif_rawflagchans): 1,022 MB → 9.5 KB (99.999% reduction)
- Stage 7 (hifa_tsyscal): 69 MB → 7.0 KB (99.99% reduction)
CORE CHANGES:
1. Result Classes with Transient Fields:
- FlaggableViewResults: Base class marking .view as transient
- TsysflagResults: Marks .components container as transient
- WvrgcalResult: Marks .view as transient
- Affects 9 result classes across 7 flagging tasks
2. Critical Infrastructure Fix:
- ResultsList.slim_for_pickle(): Recursive slimming of nested results
- Without this fix, transient fields on nested results were ignored
- Single most important enabler of 99%+ reductions
3. Renderer Hardening (6+ files):
- Added defensive getattr() guards for missing transient attributes
- Prevents AttributeError when rendering weblogs after pickling
- Pattern: getattr(obj, 'attr', default) for all transient field access
4. Investigation & Validation Tools:
- profile_result_objects.py: Analyze pickle contents and attribute sizes
- survey_result_objects.py: Survey all result classes in codebase
- Enhanced estimate_results_slimming.py for validation
TECHNICAL DETAILS:
Files Modified:
- pipeline/infrastructure/basetask.py: ResultsList.slim_for_pickle() override
- pipeline/h/tasks/common/flaggableviewresults.py: Base class _transient_fields
- pipeline/h/tasks/tsysflag/resultobjects.py: TsysflagResults._transient_fields
- pipeline/hifa/tasks/wvrgcal/resultobjects.py: WvrgcalResult._transient_fields
Renderers Hardened:
- pipeline/h/tasks/tsysflag/renderer.py (5 locations)
- pipeline/hif/tasks/rawflagchans/renderer.py
- pipeline/hif/tasks/lowgainflag/renderer.py
- pipeline/hifa/tasks/wvrgcalflag/renderer.py (2 locations)
- pipeline/hifa/tasks/gfluxscaleflag/renderer.py
Documentation Added:
- STRATEGY_A_IMPLEMENTATION.md: Complete implementation guide with validation
- LARGE_RESULT_INVESTIGATION.md: Investigation findings and root cause analysis
- RECOMMENDATIONS_LARGE_RESULTS.md: Optimization strategies and trade-offs
- Updated: EXECUTIVE_SUMMARY.md, COMPLETE_IMPLEMENTATION_SUMMARY.md,
CONTEXT_STORAGE_INVESTIGATION.md
KNOWN LIMITATIONS:
- Cannot regenerate weblogs from pickled context (views excluded)
- Weblog plots for flagging stages will not render post-pickle
- Acceptable tradeoff: weblogs generated during pipeline execution
- Opt-out mechanism available via environment variable (already implemented)
BACKWARD COMPATIBILITY:
- Full backward compatibility maintained
- Old pickles with views load correctly
- Missing attributes handled gracefully by renderer guards
- No breaking changes to existing workflows
NEXT OPPORTUNITIES:
- Stage 14 (hifa_bandpass): 55 MB - calibration solutions storage
- Stage 45 (hif_makeimlist): 15 MB - imaging metadata verbosity
- Stage 2 (hifa_flagdata): 11 MB - manual flagging storage
- Potential additional 87% reduction on remaining pickles
TESTING:
- Validated on small ALMA-IF dataset (48 stages)
- Validated on large multi-EB production dataset (48 stages)
- All stages complete successfully
- Weblogs render correctly during pipeline execution
- Zero functional regressions observed
- QA scores and flag commands preserved
Closes: Strategy A implementation (Phase 4 - Results Slimming)- docs/source/context_serialization/COMPLETE_IMPLEMENTATION_SUMMARY.md (version 66ed1f7d7b56284a5e1a80628300687d878de367) (diffs)
- docs/source/context_serialization/CONTEXT_STORAGE_INVESTIGATION.md (version 66ed1f7d7b56284a5e1a80628300687d878de367) (diffs)
- docs/source/context_serialization/EXECUTIVE_SUMMARY.md (version 66ed1f7d7b56284a5e1a80628300687d878de367) (diffs)
- docs/source/context_serialization/LARGE_RESULT_INVESTIGATION.md (version 66ed1f7d7b56284a5e1a80628300687d878de367) (diffs)
- docs/source/context_serialization/RECOMMENDATIONS_LARGE_RESULTS.md (version 66ed1f7d7b56284a5e1a80628300687d878de367) (diffs)
- docs/source/context_serialization/STRATEGY_A_IMPLEMENTATION.md (version 66ed1f7d7b56284a5e1a80628300687d878de367) (diffs)
- pipeline/config.yaml (version 66ed1f7d7b56284a5e1a80628300687d878de367) (diffs)
- pipeline/h/tasks/common/flaggableviewresults.py (version 66ed1f7d7b56284a5e1a80628300687d878de367) (diffs)
- pipeline/h/tasks/tsysflag/renderer.py (version 66ed1f7d7b56284a5e1a80628300687d878de367) (diffs)
- pipeline/h/tasks/tsysflag/resultobjects.py (version 66ed1f7d7b56284a5e1a80628300687d878de367) (diffs)
- pipeline/hif/tasks/lowgainflag/renderer.py (version 66ed1f7d7b56284a5e1a80628300687d878de367) (diffs)
- pipeline/hif/tasks/rawflagchans/renderer.py (version 66ed1f7d7b56284a5e1a80628300687d878de367) (diffs)
- pipeline/hifa/tasks/gfluxscaleflag/renderer.py (version 66ed1f7d7b56284a5e1a80628300687d878de367) (diffs)
- pipeline/hifa/tasks/wvrgcal/resultobjects.py (version 66ed1f7d7b56284a5e1a80628300687d878de367) (diffs)
- pipeline/hifa/tasks/wvrgcalflag/renderer.py (version 66ed1f7d7b56284a5e1a80628300687d878de367) (diffs)
- pipeline/infrastructure/basetask.py (version 66ed1f7d7b56284a5e1a80628300687d878de367) (diffs)
- pipeline/recipes/procedure_hifa_calimage.xml (version 66ed1f7d7b56284a5e1a80628300687d878de367) (diffs)
- tools/audit_storage_reduction.py (version 66ed1f7d7b56284a5e1a80628300687d878de367) (diffs)
- tools/check_dataset_size.py (version 66ed1f7d7b56284a5e1a80628300687d878de367) (diffs)
- tools/debug_pickle_contents.py (version 66ed1f7d7b56284a5e1a80628300687d878de367) (diffs)
- tools/drill_imaging_products.py (version 66ed1f7d7b56284a5e1a80628300687d878de367) (diffs)
- tools/estimate_results_slimming.py (version 66ed1f7d7b56284a5e1a80628300687d878de367) (diffs)
- tools/profile_result_objects.py (version 66ed1f7d7b56284a5e1a80628300687d878de367) (diffs)
- tools/result_object_survey.json (version 66ed1f7d7b56284a5e1a80628300687d878de367) (diffs)
- tools/survey_result_objects.py (version 66ed1f7d7b56284a5e1a80628300687d878de367) (diffs)