Coverage for /wheeldirectory/casa-6.7.0-12-py3.10.el8/lib/py/lib/python3.10/site-packages/casatasks/statwt.py: 56%
27 statements
« prev ^ index » next coverage.py v7.6.4, created at 2024-10-31 17:39 +0000
« prev ^ index » next coverage.py v7.6.4, created at 2024-10-31 17:39 +0000
1##################### generated by xml-casa (v2) from statwt.xml ####################
2##################### de160979a5e31e9f22ba9e7cfd7db8f9 ##############################
3from __future__ import absolute_import
4import numpy
5from casatools.typecheck import CasaValidator as _val_ctor
6_pc = _val_ctor( )
7from casatools.coercetype import coerce as _coerce
8from casatools.errors import create_error_string
9from .private.task_statwt import statwt as _statwt_t
10from casatasks.private.task_logging import start_log as _start_log
11from casatasks.private.task_logging import end_log as _end_log
12from casatasks.private.task_logging import except_log as _except_log
14class _statwt:
15 """
16 statwt ---- Compute and set weights based on variance of data.
18 --------- parameter descriptions ---------------------------------------------
20 vis Name of measurement set
21 selectdata Enable data selection parameters
22 field Selection based on field names or field index numbers. Default is all.
23 spw Selection based on spectral windows:channels. Default is all.
24 intent Selection based on intents. Default is all.
25 array Selection based on array IDs. Default is all.
26 observation Selection based on observation IDs. Default is all.
27 scan Select data by scan numbers.
28 combine Ignore changes in these columns (scan, field, and/or state) when aggregating samples to compute weights. The value "corr" is also supported to aggregate samples across correlations.
29 timebin Length for binning in time to determine statistics. Can either be integer to be multiplied by the representative integration time, a quantity (string) in time units
30 slidetimebin Use a sliding window for time binning, as opposed to time block processing?
31 chanbin Channel bin width for computing weights. Can either be integer, in which case it is interpreted as number of channels to include in each bin, or a string "spw" or quantity with frequency units.
32 minsamp Minimum number of unflagged visibilities required for computing weights in a sample. Must be >= 2.
33 statalg Statistics algorithm to use for computing variances. Supported values are "chauvenet", "classic", "fit-half", and "hinges-fences". Minimum match is supported, although the full string must be specified for the subparameters to appear in the inputs list.
34 fence Fence value for statalg="hinges-fences". A negative value means use the entire data set (ie default to the "classic" algorithm). Ignored if statalg is not "hinges-fences".
35 center Center to use for statalg="fit-half". Valid choices are "mean", "median", and "zero". Ignored if statalg is not "fit-half".
36 lside For statalg="fit-half", real data are <=; center? If false, real data are >= center. Ignored if statalg is not "fit-half".
37 zscore For statalg="chauvenet", this is the target maximum number of standard deviations data may have to be included. If negative, use Chauvenet\'s criterion. Ignored if statalg is not "chauvenet".
38 maxiter For statalg="chauvenet", this is the maximum number of iterations to attempt. Iterating will stop when either this limit is reached, or the zscore criterion is met. If negative, iterate until the zscore criterion is met. Ignored if statalg is not "chauvenet".
39 fitspw Channels to include in the computation of weights. Specified as an MS select channel selection string.
40 excludechans If True: invert the channel selection in fitspw and exclude the fitspw selection from the computation of the weights.
41 wtrange Range of acceptable weights. Data with weights outside this range will be flagged. Empty array (default) means all weights are good.
42 flagbackup Back up the state of flags before the run?
43 preview Preview mode. If True, no data is changed, although the amount of data that would have been flagged is reported.
44 datacolumn Data column to use to compute weights. Supported values are "data", "corrected", "residual", and "residual_data" (case insensitive, minimum match supported).
46 --------- examples -----------------------------------------------------------
49 IF NOT RUN IN PREVIEW MODE, THIS APPLICATION WILL MODIFY THE WEIGHT, WEIGHT SPECTRUM, FLAG,
50 AND FLAG_ROW COLUMNS OF THE INPUT MS. IF YOU WANT A PRISTINE COPY OF THE INPUT MS TO BE
51 PRESERVED, YOU SHOULD MAKE A COPY OF IT BEFORE RUNNING THIS APPLICATION.
53 This application computes weights for the WEIGHT and WEIGHT_SPECTRUM (if present) columns
54 based on the variance of values in the CORRECTED_DATA or DATA column. If the MS does not
55 have the specified data column, the application will fail. The following algorithm is used:
57 1. For unflagged data in each sample, create two sets of values, one set is composed solely
58 of the real part of the data values, the other set is composed solely of the imaginary
59 part of the data values.
60 2. Compute the variance of each of these sets, vr and vi.
61 3. Compute veq = (vr + vi)/2.
62 4. The associated weight is just the reciprocal of veq. The weight will have unit
63 of (data unit)^(-2), eg Jy^(-2).
65 Data are aggregated on a per-baseline, per-data description ID basis. Data are aggregated
66 in bins determined by the specified values of the timebin and chanbin parameters. By default,
67 data for separate correlations are aggregated separately. This behavior can be overriden
68 by specifying combine="corr" (see below).
70 RULES REGARDING CREATING/INITIALIZING WEIGHT_SPECTRUM COLUMN
72 1. If run in preview mode (preview=True), no data are modified and no columns are added.
73 2. Else if the MS already has a WEIGHT_SPECTRUM and this column has been initialized (has values),
74 it will always be populated with the new weights. The WEIGHT column will be populated with
75 the corresponding median values of the associated WEIGHT_SPECTRUM array.
76 3. Else if the frequency range specified for the sample is not the default ("spw"), the
77 WEIGHT_SPECTRUM column will be created (if it doesn't already exist) and the new weights
78 will be written to it. The WEIGHT column should be populated with the corresponding median
79 values of the WEIGHT_SPECTRUM array.
80 4. Otherwise the single value for each spectral window will be written to the WEIGHT column;
81 the WEIGHT_SPECTRUM column will not be added if it doesn't already exist, and if it does,
82 it will remain uninitialized (no values will be written to it).
84 TIME BINNING
86 One of two algorithms can be used for time binning. If slidetimebin=True, then
87 a sliding time bin of the specified width is used. If slidetimebin=False, then
88 block time processing is used. The sliding time bin algorithm will generally be
89 both more memory intensive and take longer than the block processing algorithm.
90 Each algorithm is discussed in detail below.
92 If the value of timebin is an integer, it means that the specified value should be
93 multiplied by the representative integration time in the MS. This integration is the
94 median value of all the values in the INTERVAL column. Flags are not considered in
95 the integration time computation. If either extrema in the INTERVAL column differs from
96 the median by more than 25%, the application will fail because the values vary too much
97 for there to be a single, representative, integration time. The timebin parameter can
98 also be specified as a quantity (string) that must have time conformant units.
100 Block Time Processing
102 The data are processed in blocks. This means that all weight spectrum values will be set to
103 the same value for all points within the same time bin/channel bin/correlation bin (
104 see the section on channel binning and description of combine="corr" for more details on
105 channel binning and correlation binning).
106 The time bins are not necessarily contiguous and are not necessarily the same width. The start
107 of a bin is always coincident with a value from the TIME column, So for example, if values
108 from the time column are [20, 60, 100, 140, 180, 230], and the width of the bins is chosen
109 to be 110s, the first bin would start at 20s and run to 130s, so that data from timestamps
110 20, 60, and 100 will be included in the first bin. The second bin would start at 140s, so that
111 data for timestamps 140, 180, and 230 would be included in the second bin. Also, time binning
112 does not span scan boundaries, so that data associated with different scan numbers will
113 always be binned separately; changes in SCAN_NUMBER will cause a new time bin to be created,
114 with its starting value coincident with the time of the new SCAN_NUMBER. Similar behavior can
115 be expected for changes in FIELD_ID and ARRAY_ID. One can override this behavior for some
116 columns by specifying the combine parameter (see below).
118 Sliding Time Window Processing
120 In this case, the time window is always centered on the timestamp of the row in question
121 and extends +/-timebin/2 around that timestamp, subject the the time block boundaries.
122 Rows with the same baselines and data description IDs which are included in that window
123 are used for determining the weight of that row. The boundaries of the time block to which
124 the window is restricted are determined by changes in FIELD_ID, ARRAY_ID, and SCAN_NUMBER.
125 One can override this behavior for FIELD_ID and/or SCAN_NUMBER by specifying the combine
126 parameter (see below). Unlike the time block processing algorithm, this sliding time window
127 algorithm requires that details all rows for the time block in question are kept in memory,
128 and thus the sliding window algorithm in general requires more memory than the blcok
129 processing method. Also, unlike the block processing method which computes a single value
130 for all weights within a single bin, the sliding window method requires that each row
131 (along with each channel and correlation bin) be processed individually, so in general
132 the sliding window method will take longer than the block processing method.
134 CHANNEL BINNING
136 The width of channel bins is specified via the chanbin parameter. Channel binning occurs within
137 individual spectral windows; bins never span multiple spectral windows. Each channel will
138 be included in exactly one bin.
140 The default value "spw" indicates that all channels in each spectral window are to be
141 included in a single bin.
143 Any other string value is interpreted as a quantity, and so should have frequency units, eg
144 "1MHz". In this case, the channel frequencies from the CHAN_FREQ column of the SPECTRAL_WINDOW
145 subtable of the MS are used to determine the bins. The first bin starts at the channel frequency
146 of the 0th channel in the spectral window. Channels with frequencies that differ by less than
147 the value specified by the chanbin parameter are included in this bin. The next bin starts at
148 the frequency of the first channel outside the first bin, and the process is repeated until all
149 channels have been binned.
151 If specified as an integer, the value is interpreted as the number of channels to include in
152 each bin. The final bin in the spectral window may not necessarily contain this number of
153 channels. For example, if a spectral window has 15 channels, and chanbin is specified to be 6,
154 then channels 0-5 will comprise the first bin, channels 6-11 the second, and channels 12-14 the
155 third, so that only three channels will comprise the final bin.
157 MINIMUM REQUIRED NUMBER OF VISIBILITIES
159 The minsamp parameter allows the user to specify the minimum number of unflagged visibilities that
160 must be present in a sample for that sample's weight to be computed. If a sample has less than
161 this number of unflagged points, the associated weights of all the points in the sample are
162 set to zero, and all the points in the sample are flagged.
164 AGGREGATING DATA ACROSS BOUNDARIES
166 By default, data are not aggregated across changes in values in the columns ARRAY_ID,
167 SCAN_NUMBER, STATE_ID, FIELD_ID, and DATA_DESC_ID. One can override this behavior for
168 SCAN_NUMBER, STATE_ID, and FIELD_ID by specifying the combine parameter. For example,
169 specifying combine="scan" will ignore scan boundaries when aggregating data. Specifying
170 combine="field, scan" will ignore both scan and field boundaries when aggregating data.
172 Also by default, data for separate correlations are aggregated separately. Data for all
173 correlations within each spectral window can be aggregated together by specifying
174 "corr" in the combine parameter.
176 Any combination and permutation of "scan", "field", "state", and "corr" are supported
177 by the combine parameter. Other values will be silently ignored.
179 STATISTICS ALGORITHMS
181 The supported statistics algorithms are described in detail in the imstat and ia.statistics()
182 help. For the current application, these algorithms are used to compute vr and vi (see above),
183 such that the set of the real parts of the visibilities and the set of the imaginary parts of
184 the visibilities are treated as independent data sets.
186 RANGE OF ACCEPTABLE WEIGHTS
188 The wtrange parameter allows one to specify the acceptable range (inclusive, except for zero)
189 for weights. Data with weights computed to be outside this range will be flagged. If not
190 specified (empty array), all weights are considered to be acceptable. If specified, the array
191 must contain exactly two nonnegative numeric values. Note that data with weights of zero are
192 always flagged.
194 EXCLUDING CHANNELS
196 Channels can be excluded from the computation of the weights by specifying the excludechans
197 parameter. This parameter accepts a valid MS channel selection string. Data associated with
198 the selected channels will not be used in computing the weights.
200 PREVIEW MODE
202 By setting preview=True, the application is run in "preview" mode. In this mode, no data
203 in the input MS are changed, although the amount of data that the application would have
204 flagged is reported.
206 DATA COLUMN
208 The datacolumn parameter can be specified to indicate which data column should be used
209 for computing the weights. The values "corrected" for the CORRECTED_DATA column and "data"
210 for the DATA column are supported (minimum match, case insensitive).
212 OTHER CONSIDERATIONS
214 Flagged values are not used in computing the weights, although the associated weights of
215 these values are updated.
217 If the variance for a set of data is 0, all associated flags for that data are set to True,
218 and the corresponding weights are set to 0.
220 EXAMPLE
222 # update the weights of an MS using time binning of 300s
223 statwt("my.ms", timebin="300s")
227 """
229 _info_group_ = """manipulation"""
230 _info_desc_ = """Compute and set weights based on variance of data."""
232 def __call__( self, vis='', selectdata=True, field='', spw='', intent='', array='', observation='', scan='', combine='', timebin=int(1), slidetimebin=False, chanbin='spw', minsamp=int(2), statalg='classic', fence=float(-1), center='mean', lside=True, zscore=float(-1), maxiter=int(-1), fitspw='', excludechans=False, wtrange=[ ], flagbackup=True, preview=False, datacolumn='corrected' ):
233 schema = {'vis': {'type': 'cReqPath', 'coerce': _coerce.expand_path}, 'selectdata': {'type': 'cBool'}, 'field': {'type': 'cStr', 'coerce': _coerce.to_str}, 'spw': {'type': 'cStr', 'coerce': _coerce.to_str}, 'intent': {'type': 'cStr', 'coerce': _coerce.to_str}, 'array': {'type': 'cStr', 'coerce': _coerce.to_str}, 'observation': {'type': 'cStr', 'coerce': _coerce.to_str}, 'scan': {'type': 'cStr', 'coerce': _coerce.to_str}, 'combine': {'type': 'cStr', 'coerce': _coerce.to_str}, 'timebin': {'anyof': [{'type': 'cStr', 'coerce': _coerce.to_str}, {'type': 'cInt'}]}, 'slidetimebin': {'type': 'cBool'}, 'chanbin': {'anyof': [{'type': 'cStr', 'coerce': _coerce.to_str}, {'type': 'cInt'}]}, 'minsamp': {'type': 'cInt'}, 'statalg': {'type': 'cStr', 'coerce': _coerce.to_str}, 'fence': {'type': 'cFloat', 'coerce': _coerce.to_float}, 'center': {'type': 'cStr', 'coerce': _coerce.to_str}, 'lside': {'type': 'cBool'}, 'zscore': {'type': 'cFloat', 'coerce': _coerce.to_float}, 'maxiter': {'type': 'cInt'}, 'fitspw': {'type': 'cStr', 'coerce': _coerce.to_str}, 'excludechans': {'type': 'cBool'}, 'wtrange': {'type': 'cFloatVec', 'coerce': [_coerce.to_list,_coerce.to_floatvec]}, 'flagbackup': {'type': 'cBool'}, 'preview': {'type': 'cBool'}, 'datacolumn': {'type': 'cStr', 'coerce': _coerce.to_str}}
234 doc = {'vis': vis, 'selectdata': selectdata, 'field': field, 'spw': spw, 'intent': intent, 'array': array, 'observation': observation, 'scan': scan, 'combine': combine, 'timebin': timebin, 'slidetimebin': slidetimebin, 'chanbin': chanbin, 'minsamp': minsamp, 'statalg': statalg, 'fence': fence, 'center': center, 'lside': lside, 'zscore': zscore, 'maxiter': maxiter, 'fitspw': fitspw, 'excludechans': excludechans, 'wtrange': wtrange, 'flagbackup': flagbackup, 'preview': preview, 'datacolumn': datacolumn}
235 assert _pc.validate(doc,schema), create_error_string(_pc.errors)
236 _logging_state_ = _start_log( 'statwt', [ 'vis=' + repr(_pc.document['vis']), 'selectdata=' + repr(_pc.document['selectdata']), 'field=' + repr(_pc.document['field']), 'spw=' + repr(_pc.document['spw']), 'intent=' + repr(_pc.document['intent']), 'array=' + repr(_pc.document['array']), 'observation=' + repr(_pc.document['observation']), 'scan=' + repr(_pc.document['scan']), 'combine=' + repr(_pc.document['combine']), 'timebin=' + repr(_pc.document['timebin']), 'slidetimebin=' + repr(_pc.document['slidetimebin']), 'chanbin=' + repr(_pc.document['chanbin']), 'minsamp=' + repr(_pc.document['minsamp']), 'statalg=' + repr(_pc.document['statalg']), 'fence=' + repr(_pc.document['fence']), 'center=' + repr(_pc.document['center']), 'lside=' + repr(_pc.document['lside']), 'zscore=' + repr(_pc.document['zscore']), 'maxiter=' + repr(_pc.document['maxiter']), 'fitspw=' + repr(_pc.document['fitspw']), 'excludechans=' + repr(_pc.document['excludechans']), 'wtrange=' + repr(_pc.document['wtrange']), 'flagbackup=' + repr(_pc.document['flagbackup']), 'preview=' + repr(_pc.document['preview']), 'datacolumn=' + repr(_pc.document['datacolumn']) ] )
237 task_result = None
238 try:
239 task_result = _statwt_t( _pc.document['vis'], _pc.document['selectdata'], _pc.document['field'], _pc.document['spw'], _pc.document['intent'], _pc.document['array'], _pc.document['observation'], _pc.document['scan'], _pc.document['combine'], _pc.document['timebin'], _pc.document['slidetimebin'], _pc.document['chanbin'], _pc.document['minsamp'], _pc.document['statalg'], _pc.document['fence'], _pc.document['center'], _pc.document['lside'], _pc.document['zscore'], _pc.document['maxiter'], _pc.document['fitspw'], _pc.document['excludechans'], _pc.document['wtrange'], _pc.document['flagbackup'], _pc.document['preview'], _pc.document['datacolumn'] )
240 except Exception as exc:
241 _except_log('statwt', exc)
242 raise
243 finally:
244 task_result = _end_log( _logging_state_, 'statwt', task_result )
245 return task_result
247statwt = _statwt( )