Coverage for /wheeldirectory/casa-6.7.0-12-py3.10.el8/lib/py/lib/python3.10/site-packages/casatasks/statwt.py: 56%

1##################### generated by xml-casa (v2) from statwt.xml ####################

2##################### de160979a5e31e9f22ba9e7cfd7db8f9 ##############################

3from __future__ import absolute_import

4import numpy

5from casatools.typecheck import CasaValidator as _val_ctor

6_pc = _val_ctor( )

7from casatools.coercetype import coerce as _coerce

8from casatools.errors import create_error_string

9from .private.task_statwt import statwt as _statwt_t

10from casatasks.private.task_logging import start_log as _start_log

11from casatasks.private.task_logging import end_log as _end_log

12from casatasks.private.task_logging import except_log as _except_log

14class _statwt:

15 """

16 statwt ---- Compute and set weights based on variance of data.

18 --------- parameter descriptions ---------------------------------------------

20 vis Name of measurement set

21 selectdata Enable data selection parameters

22 field Selection based on field names or field index numbers. Default is all.

23 spw Selection based on spectral windows:channels. Default is all.

24 intent Selection based on intents. Default is all.

25 array Selection based on array IDs. Default is all.

26 observation Selection based on observation IDs. Default is all.

27 scan Select data by scan numbers.

28 combine Ignore changes in these columns (scan, field, and/or state) when aggregating samples to compute weights. The value "corr" is also supported to aggregate samples across correlations.

29 timebin Length for binning in time to determine statistics. Can either be integer to be multiplied by the representative integration time, a quantity (string) in time units

30 slidetimebin Use a sliding window for time binning, as opposed to time block processing?

31 chanbin Channel bin width for computing weights. Can either be integer, in which case it is interpreted as number of channels to include in each bin, or a string "spw" or quantity with frequency units.

32 minsamp Minimum number of unflagged visibilities required for computing weights in a sample. Must be >= 2.

33 statalg Statistics algorithm to use for computing variances. Supported values are "chauvenet", "classic", "fit-half", and "hinges-fences". Minimum match is supported, although the full string must be specified for the subparameters to appear in the inputs list.

34 fence Fence value for statalg="hinges-fences". A negative value means use the entire data set (ie default to the "classic" algorithm). Ignored if statalg is not "hinges-fences".

35 center Center to use for statalg="fit-half". Valid choices are "mean", "median", and "zero". Ignored if statalg is not "fit-half".

36 lside For statalg="fit-half", real data are <=; center? If false, real data are >= center. Ignored if statalg is not "fit-half".

37 zscore For statalg="chauvenet", this is the target maximum number of standard deviations data may have to be included. If negative, use Chauvenet\'s criterion. Ignored if statalg is not "chauvenet".

38 maxiter For statalg="chauvenet", this is the maximum number of iterations to attempt. Iterating will stop when either this limit is reached, or the zscore criterion is met. If negative, iterate until the zscore criterion is met. Ignored if statalg is not "chauvenet".

39 fitspw Channels to include in the computation of weights. Specified as an MS select channel selection string.

40 excludechans If True: invert the channel selection in fitspw and exclude the fitspw selection from the computation of the weights.

41 wtrange Range of acceptable weights. Data with weights outside this range will be flagged. Empty array (default) means all weights are good.

42 flagbackup Back up the state of flags before the run?

43 preview Preview mode. If True, no data is changed, although the amount of data that would have been flagged is reported.

44 datacolumn Data column to use to compute weights. Supported values are "data", "corrected", "residual", and "residual_data" (case insensitive, minimum match supported).

46 --------- examples -----------------------------------------------------------

49 IF NOT RUN IN PREVIEW MODE, THIS APPLICATION WILL MODIFY THE WEIGHT, WEIGHT SPECTRUM, FLAG,

50 AND FLAG_ROW COLUMNS OF THE INPUT MS. IF YOU WANT A PRISTINE COPY OF THE INPUT MS TO BE

51 PRESERVED, YOU SHOULD MAKE A COPY OF IT BEFORE RUNNING THIS APPLICATION.

53 This application computes weights for the WEIGHT and WEIGHT_SPECTRUM (if present) columns

54 based on the variance of values in the CORRECTED_DATA or DATA column. If the MS does not

55 have the specified data column, the application will fail. The following algorithm is used:

57 1. For unflagged data in each sample, create two sets of values, one set is composed solely

58 of the real part of the data values, the other set is composed solely of the imaginary

59 part of the data values.

60 2. Compute the variance of each of these sets, vr and vi.

61 3. Compute veq = (vr + vi)/2.

62 4. The associated weight is just the reciprocal of veq. The weight will have unit

63 of (data unit)^(-2), eg Jy^(-2).

65 Data are aggregated on a per-baseline, per-data description ID basis. Data are aggregated

66 in bins determined by the specified values of the timebin and chanbin parameters. By default,

67 data for separate correlations are aggregated separately. This behavior can be overriden

68 by specifying combine="corr" (see below).

70 RULES REGARDING CREATING/INITIALIZING WEIGHT_SPECTRUM COLUMN

72 1. If run in preview mode (preview=True), no data are modified and no columns are added.

73 2. Else if the MS already has a WEIGHT_SPECTRUM and this column has been initialized (has values),

74 it will always be populated with the new weights. The WEIGHT column will be populated with

75 the corresponding median values of the associated WEIGHT_SPECTRUM array.

76 3. Else if the frequency range specified for the sample is not the default ("spw"), the

77 WEIGHT_SPECTRUM column will be created (if it doesn't already exist) and the new weights

78 will be written to it. The WEIGHT column should be populated with the corresponding median

79 values of the WEIGHT_SPECTRUM array.

80 4. Otherwise the single value for each spectral window will be written to the WEIGHT column;

81 the WEIGHT_SPECTRUM column will not be added if it doesn't already exist, and if it does,

82 it will remain uninitialized (no values will be written to it).

84 TIME BINNING

86 One of two algorithms can be used for time binning. If slidetimebin=True, then

87 a sliding time bin of the specified width is used. If slidetimebin=False, then

88 block time processing is used. The sliding time bin algorithm will generally be

89 both more memory intensive and take longer than the block processing algorithm.

90 Each algorithm is discussed in detail below.

92 If the value of timebin is an integer, it means that the specified value should be

93 multiplied by the representative integration time in the MS. This integration is the

94 median value of all the values in the INTERVAL column. Flags are not considered in

95 the integration time computation. If either extrema in the INTERVAL column differs from

96 the median by more than 25%, the application will fail because the values vary too much

97 for there to be a single, representative, integration time. The timebin parameter can

98 also be specified as a quantity (string) that must have time conformant units.

100 Block Time Processing

101

102 The data are processed in blocks. This means that all weight spectrum values will be set to

103 the same value for all points within the same time bin/channel bin/correlation bin (

104 see the section on channel binning and description of combine="corr" for more details on

105 channel binning and correlation binning).

106 The time bins are not necessarily contiguous and are not necessarily the same width. The start

107 of a bin is always coincident with a value from the TIME column, So for example, if values

108 from the time column are [20, 60, 100, 140, 180, 230], and the width of the bins is chosen

109 to be 110s, the first bin would start at 20s and run to 130s, so that data from timestamps

110 20, 60, and 100 will be included in the first bin. The second bin would start at 140s, so that

111 data for timestamps 140, 180, and 230 would be included in the second bin. Also, time binning

112 does not span scan boundaries, so that data associated with different scan numbers will

113 always be binned separately; changes in SCAN_NUMBER will cause a new time bin to be created,

114 with its starting value coincident with the time of the new SCAN_NUMBER. Similar behavior can

115 be expected for changes in FIELD_ID and ARRAY_ID. One can override this behavior for some

116 columns by specifying the combine parameter (see below).

117

118 Sliding Time Window Processing

119

120 In this case, the time window is always centered on the timestamp of the row in question

121 and extends +/-timebin/2 around that timestamp, subject the the time block boundaries.

122 Rows with the same baselines and data description IDs which are included in that window

123 are used for determining the weight of that row. The boundaries of the time block to which

124 the window is restricted are determined by changes in FIELD_ID, ARRAY_ID, and SCAN_NUMBER.

125 One can override this behavior for FIELD_ID and/or SCAN_NUMBER by specifying the combine

126 parameter (see below). Unlike the time block processing algorithm, this sliding time window

127 algorithm requires that details all rows for the time block in question are kept in memory,

128 and thus the sliding window algorithm in general requires more memory than the blcok

129 processing method. Also, unlike the block processing method which computes a single value

130 for all weights within a single bin, the sliding window method requires that each row

131 (along with each channel and correlation bin) be processed individually, so in general

132 the sliding window method will take longer than the block processing method.

133

134 CHANNEL BINNING

135

136 The width of channel bins is specified via the chanbin parameter. Channel binning occurs within

137 individual spectral windows; bins never span multiple spectral windows. Each channel will

138 be included in exactly one bin.

139

140 The default value "spw" indicates that all channels in each spectral window are to be

141 included in a single bin.

142

143 Any other string value is interpreted as a quantity, and so should have frequency units, eg

144 "1MHz". In this case, the channel frequencies from the CHAN_FREQ column of the SPECTRAL_WINDOW

145 subtable of the MS are used to determine the bins. The first bin starts at the channel frequency

146 of the 0th channel in the spectral window. Channels with frequencies that differ by less than

147 the value specified by the chanbin parameter are included in this bin. The next bin starts at

148 the frequency of the first channel outside the first bin, and the process is repeated until all

149 channels have been binned.

150

151 If specified as an integer, the value is interpreted as the number of channels to include in

152 each bin. The final bin in the spectral window may not necessarily contain this number of

153 channels. For example, if a spectral window has 15 channels, and chanbin is specified to be 6,

154 then channels 0-5 will comprise the first bin, channels 6-11 the second, and channels 12-14 the

155 third, so that only three channels will comprise the final bin.

156

157 MINIMUM REQUIRED NUMBER OF VISIBILITIES

158

159 The minsamp parameter allows the user to specify the minimum number of unflagged visibilities that

160 must be present in a sample for that sample's weight to be computed. If a sample has less than

161 this number of unflagged points, the associated weights of all the points in the sample are

162 set to zero, and all the points in the sample are flagged.

163

164 AGGREGATING DATA ACROSS BOUNDARIES

165

166 By default, data are not aggregated across changes in values in the columns ARRAY_ID,

167 SCAN_NUMBER, STATE_ID, FIELD_ID, and DATA_DESC_ID. One can override this behavior for

168 SCAN_NUMBER, STATE_ID, and FIELD_ID by specifying the combine parameter. For example,

169 specifying combine="scan" will ignore scan boundaries when aggregating data. Specifying

170 combine="field, scan" will ignore both scan and field boundaries when aggregating data.

171

172 Also by default, data for separate correlations are aggregated separately. Data for all

173 correlations within each spectral window can be aggregated together by specifying

174 "corr" in the combine parameter.

175

176 Any combination and permutation of "scan", "field", "state", and "corr" are supported

177 by the combine parameter. Other values will be silently ignored.

178

179 STATISTICS ALGORITHMS

180

181 The supported statistics algorithms are described in detail in the imstat and ia.statistics()

182 help. For the current application, these algorithms are used to compute vr and vi (see above),

183 such that the set of the real parts of the visibilities and the set of the imaginary parts of

184 the visibilities are treated as independent data sets.

185

186 RANGE OF ACCEPTABLE WEIGHTS

187

188 The wtrange parameter allows one to specify the acceptable range (inclusive, except for zero)

189 for weights. Data with weights computed to be outside this range will be flagged. If not

190 specified (empty array), all weights are considered to be acceptable. If specified, the array

191 must contain exactly two nonnegative numeric values. Note that data with weights of zero are

192 always flagged.

193

194 EXCLUDING CHANNELS

195

196 Channels can be excluded from the computation of the weights by specifying the excludechans

197 parameter. This parameter accepts a valid MS channel selection string. Data associated with

198 the selected channels will not be used in computing the weights.

199

200 PREVIEW MODE

201

202 By setting preview=True, the application is run in "preview" mode. In this mode, no data

203 in the input MS are changed, although the amount of data that the application would have

204 flagged is reported.

205

206 DATA COLUMN

207

208 The datacolumn parameter can be specified to indicate which data column should be used

209 for computing the weights. The values "corrected" for the CORRECTED_DATA column and "data"

210 for the DATA column are supported (minimum match, case insensitive).

211

212 OTHER CONSIDERATIONS

213

214 Flagged values are not used in computing the weights, although the associated weights of

215 these values are updated.

216

217 If the variance for a set of data is 0, all associated flags for that data are set to True,

218 and the corresponding weights are set to 0.

219

220 EXAMPLE

221

222 # update the weights of an MS using time binning of 300s

223 statwt("my.ms", timebin="300s")

227 """

229 _info_group_ = """manipulation"""

230 _info_desc_ = """Compute and set weights based on variance of data."""

231

232 def __call__( self, vis='', selectdata=True, field='', spw='', intent='', array='', observation='', scan='', combine='', timebin=int(1), slidetimebin=False, chanbin='spw', minsamp=int(2), statalg='classic', fence=float(-1), center='mean', lside=True, zscore=float(-1), maxiter=int(-1), fitspw='', excludechans=False, wtrange=[ ], flagbackup=True, preview=False, datacolumn='corrected' ):

233 schema = {'vis': {'type': 'cReqPath', 'coerce': _coerce.expand_path}, 'selectdata': {'type': 'cBool'}, 'field': {'type': 'cStr', 'coerce': _coerce.to_str}, 'spw': {'type': 'cStr', 'coerce': _coerce.to_str}, 'intent': {'type': 'cStr', 'coerce': _coerce.to_str}, 'array': {'type': 'cStr', 'coerce': _coerce.to_str}, 'observation': {'type': 'cStr', 'coerce': _coerce.to_str}, 'scan': {'type': 'cStr', 'coerce': _coerce.to_str}, 'combine': {'type': 'cStr', 'coerce': _coerce.to_str}, 'timebin': {'anyof': [{'type': 'cStr', 'coerce': _coerce.to_str}, {'type': 'cInt'}]}, 'slidetimebin': {'type': 'cBool'}, 'chanbin': {'anyof': [{'type': 'cStr', 'coerce': _coerce.to_str}, {'type': 'cInt'}]}, 'minsamp': {'type': 'cInt'}, 'statalg': {'type': 'cStr', 'coerce': _coerce.to_str}, 'fence': {'type': 'cFloat', 'coerce': _coerce.to_float}, 'center': {'type': 'cStr', 'coerce': _coerce.to_str}, 'lside': {'type': 'cBool'}, 'zscore': {'type': 'cFloat', 'coerce': _coerce.to_float}, 'maxiter': {'type': 'cInt'}, 'fitspw': {'type': 'cStr', 'coerce': _coerce.to_str}, 'excludechans': {'type': 'cBool'}, 'wtrange': {'type': 'cFloatVec', 'coerce': [_coerce.to_list,_coerce.to_floatvec]}, 'flagbackup': {'type': 'cBool'}, 'preview': {'type': 'cBool'}, 'datacolumn': {'type': 'cStr', 'coerce': _coerce.to_str}}

234 doc = {'vis': vis, 'selectdata': selectdata, 'field': field, 'spw': spw, 'intent': intent, 'array': array, 'observation': observation, 'scan': scan, 'combine': combine, 'timebin': timebin, 'slidetimebin': slidetimebin, 'chanbin': chanbin, 'minsamp': minsamp, 'statalg': statalg, 'fence': fence, 'center': center, 'lside': lside, 'zscore': zscore, 'maxiter': maxiter, 'fitspw': fitspw, 'excludechans': excludechans, 'wtrange': wtrange, 'flagbackup': flagbackup, 'preview': preview, 'datacolumn': datacolumn}

235 assert _pc.validate(doc,schema), create_error_string(_pc.errors)

236 _logging_state_ = _start_log( 'statwt', [ 'vis=' + repr(_pc.document['vis']), 'selectdata=' + repr(_pc.document['selectdata']), 'field=' + repr(_pc.document['field']), 'spw=' + repr(_pc.document['spw']), 'intent=' + repr(_pc.document['intent']), 'array=' + repr(_pc.document['array']), 'observation=' + repr(_pc.document['observation']), 'scan=' + repr(_pc.document['scan']), 'combine=' + repr(_pc.document['combine']), 'timebin=' + repr(_pc.document['timebin']), 'slidetimebin=' + repr(_pc.document['slidetimebin']), 'chanbin=' + repr(_pc.document['chanbin']), 'minsamp=' + repr(_pc.document['minsamp']), 'statalg=' + repr(_pc.document['statalg']), 'fence=' + repr(_pc.document['fence']), 'center=' + repr(_pc.document['center']), 'lside=' + repr(_pc.document['lside']), 'zscore=' + repr(_pc.document['zscore']), 'maxiter=' + repr(_pc.document['maxiter']), 'fitspw=' + repr(_pc.document['fitspw']), 'excludechans=' + repr(_pc.document['excludechans']), 'wtrange=' + repr(_pc.document['wtrange']), 'flagbackup=' + repr(_pc.document['flagbackup']), 'preview=' + repr(_pc.document['preview']), 'datacolumn=' + repr(_pc.document['datacolumn']) ] )

237 task_result = None

238 try:

239 task_result = _statwt_t( _pc.document['vis'], _pc.document['selectdata'], _pc.document['field'], _pc.document['spw'], _pc.document['intent'], _pc.document['array'], _pc.document['observation'], _pc.document['scan'], _pc.document['combine'], _pc.document['timebin'], _pc.document['slidetimebin'], _pc.document['chanbin'], _pc.document['minsamp'], _pc.document['statalg'], _pc.document['fence'], _pc.document['center'], _pc.document['lside'], _pc.document['zscore'], _pc.document['maxiter'], _pc.document['fitspw'], _pc.document['excludechans'], _pc.document['wtrange'], _pc.document['flagbackup'], _pc.document['preview'], _pc.document['datacolumn'] )

240 except Exception as exc:

241 _except_log('statwt', exc)

242 raise

243 finally:

244 task_result = _end_log( _logging_state_, 'statwt', task_result )

245 return task_result

246

247statwt = _statwt( )

248