Coverage for /wheeldirectory/casa-6.7.0-12-py3.10.el8/lib/py/lib/python3.10/site-packages/casatasks/statwt.py: 56%

27 statements  

« prev     ^ index     » next       coverage.py v7.6.4, created at 2024-10-31 18:48 +0000

1##################### generated by xml-casa (v2) from statwt.xml #################### 

2##################### de160979a5e31e9f22ba9e7cfd7db8f9 ############################## 

3from __future__ import absolute_import 

4import numpy 

5from casatools.typecheck import CasaValidator as _val_ctor 

6_pc = _val_ctor( ) 

7from casatools.coercetype import coerce as _coerce 

8from casatools.errors import create_error_string 

9from .private.task_statwt import statwt as _statwt_t 

10from casatasks.private.task_logging import start_log as _start_log 

11from casatasks.private.task_logging import end_log as _end_log 

12from casatasks.private.task_logging import except_log as _except_log 

13 

14class _statwt: 

15 """ 

16 statwt ---- Compute and set weights based on variance of data. 

17 

18 --------- parameter descriptions --------------------------------------------- 

19 

20 vis Name of measurement set 

21 selectdata Enable data selection parameters 

22 field Selection based on field names or field index numbers. Default is all. 

23 spw Selection based on spectral windows:channels. Default is all. 

24 intent Selection based on intents. Default is all. 

25 array Selection based on array IDs. Default is all. 

26 observation Selection based on observation IDs. Default is all. 

27 scan Select data by scan numbers. 

28 combine Ignore changes in these columns (scan, field, and/or state) when aggregating samples to compute weights. The value "corr" is also supported to aggregate samples across correlations. 

29 timebin Length for binning in time to determine statistics. Can either be integer to be multiplied by the representative integration time, a quantity (string) in time units 

30 slidetimebin Use a sliding window for time binning, as opposed to time block processing? 

31 chanbin Channel bin width for computing weights. Can either be integer, in which case it is interpreted as number of channels to include in each bin, or a string "spw" or quantity with frequency units. 

32 minsamp Minimum number of unflagged visibilities required for computing weights in a sample. Must be >= 2. 

33 statalg Statistics algorithm to use for computing variances. Supported values are "chauvenet", "classic", "fit-half", and "hinges-fences". Minimum match is supported, although the full string must be specified for the subparameters to appear in the inputs list. 

34 fence Fence value for statalg="hinges-fences". A negative value means use the entire data set (ie default to the "classic" algorithm). Ignored if statalg is not "hinges-fences". 

35 center Center to use for statalg="fit-half". Valid choices are "mean", "median", and "zero". Ignored if statalg is not "fit-half". 

36 lside For statalg="fit-half", real data are <=; center? If false, real data are >= center. Ignored if statalg is not "fit-half". 

37 zscore For statalg="chauvenet", this is the target maximum number of standard deviations data may have to be included. If negative, use Chauvenet\'s criterion. Ignored if statalg is not "chauvenet". 

38 maxiter For statalg="chauvenet", this is the maximum number of iterations to attempt. Iterating will stop when either this limit is reached, or the zscore criterion is met. If negative, iterate until the zscore criterion is met. Ignored if statalg is not "chauvenet". 

39 fitspw Channels to include in the computation of weights. Specified as an MS select channel selection string. 

40 excludechans If True: invert the channel selection in fitspw and exclude the fitspw selection from the computation of the weights. 

41 wtrange Range of acceptable weights. Data with weights outside this range will be flagged. Empty array (default) means all weights are good. 

42 flagbackup Back up the state of flags before the run? 

43 preview Preview mode. If True, no data is changed, although the amount of data that would have been flagged is reported. 

44 datacolumn Data column to use to compute weights. Supported values are "data", "corrected", "residual", and "residual_data" (case insensitive, minimum match supported). 

45 

46 --------- examples ----------------------------------------------------------- 

47 

48  

49 IF NOT RUN IN PREVIEW MODE, THIS APPLICATION WILL MODIFY THE WEIGHT, WEIGHT SPECTRUM, FLAG, 

50 AND FLAG_ROW COLUMNS OF THE INPUT MS. IF YOU WANT A PRISTINE COPY OF THE INPUT MS TO BE 

51 PRESERVED, YOU SHOULD MAKE A COPY OF IT BEFORE RUNNING THIS APPLICATION. 

52  

53 This application computes weights for the WEIGHT and WEIGHT_SPECTRUM (if present) columns 

54 based on the variance of values in the CORRECTED_DATA or DATA column. If the MS does not 

55 have the specified data column, the application will fail. The following algorithm is used: 

56  

57 1. For unflagged data in each sample, create two sets of values, one set is composed solely 

58 of the real part of the data values, the other set is composed solely of the imaginary 

59 part of the data values. 

60 2. Compute the variance of each of these sets, vr and vi. 

61 3. Compute veq = (vr + vi)/2. 

62 4. The associated weight is just the reciprocal of veq. The weight will have unit 

63 of (data unit)^(-2), eg Jy^(-2). 

64  

65 Data are aggregated on a per-baseline, per-data description ID basis. Data are aggregated 

66 in bins determined by the specified values of the timebin and chanbin parameters. By default, 

67 data for separate correlations are aggregated separately. This behavior can be overriden 

68 by specifying combine="corr" (see below). 

69  

70 RULES REGARDING CREATING/INITIALIZING WEIGHT_SPECTRUM COLUMN 

71  

72 1. If run in preview mode (preview=True), no data are modified and no columns are added. 

73 2. Else if the MS already has a WEIGHT_SPECTRUM and this column has been initialized (has values), 

74 it will always be populated with the new weights. The WEIGHT column will be populated with 

75 the corresponding median values of the associated WEIGHT_SPECTRUM array. 

76 3. Else if the frequency range specified for the sample is not the default ("spw"), the 

77 WEIGHT_SPECTRUM column will be created (if it doesn't already exist) and the new weights 

78 will be written to it. The WEIGHT column should be populated with the corresponding median 

79 values of the WEIGHT_SPECTRUM array. 

80 4. Otherwise the single value for each spectral window will be written to the WEIGHT column; 

81 the WEIGHT_SPECTRUM column will not be added if it doesn't already exist, and if it does, 

82 it will remain uninitialized (no values will be written to it). 

83  

84 TIME BINNING 

85  

86 One of two algorithms can be used for time binning. If slidetimebin=True, then 

87 a sliding time bin of the specified width is used. If slidetimebin=False, then 

88 block time processing is used. The sliding time bin algorithm will generally be 

89 both more memory intensive and take longer than the block processing algorithm. 

90 Each algorithm is discussed in detail below. 

91  

92 If the value of timebin is an integer, it means that the specified value should be 

93 multiplied by the representative integration time in the MS. This integration is the 

94 median value of all the values in the INTERVAL column. Flags are not considered in 

95 the integration time computation. If either extrema in the INTERVAL column differs from 

96 the median by more than 25%, the application will fail because the values vary too much 

97 for there to be a single, representative, integration time. The timebin parameter can 

98 also be specified as a quantity (string) that must have time conformant units. 

99  

100 Block Time Processing 

101  

102 The data are processed in blocks. This means that all weight spectrum values will be set to 

103 the same value for all points within the same time bin/channel bin/correlation bin ( 

104 see the section on channel binning and description of combine="corr" for more details on 

105 channel binning and correlation binning). 

106 The time bins are not necessarily contiguous and are not necessarily the same width. The start 

107 of a bin is always coincident with a value from the TIME column, So for example, if values 

108 from the time column are [20, 60, 100, 140, 180, 230], and the width of the bins is chosen 

109 to be 110s, the first bin would start at 20s and run to 130s, so that data from timestamps 

110 20, 60, and 100 will be included in the first bin. The second bin would start at 140s, so that 

111 data for timestamps 140, 180, and 230 would be included in the second bin. Also, time binning 

112 does not span scan boundaries, so that data associated with different scan numbers will 

113 always be binned separately; changes in SCAN_NUMBER will cause a new time bin to be created, 

114 with its starting value coincident with the time of the new SCAN_NUMBER. Similar behavior can 

115 be expected for changes in FIELD_ID and ARRAY_ID. One can override this behavior for some 

116 columns by specifying the combine parameter (see below). 

117  

118 Sliding Time Window Processing 

119  

120 In this case, the time window is always centered on the timestamp of the row in question 

121 and extends +/-timebin/2 around that timestamp, subject the the time block boundaries. 

122 Rows with the same baselines and data description IDs which are included in that window 

123 are used for determining the weight of that row. The boundaries of the time block to which 

124 the window is restricted are determined by changes in FIELD_ID, ARRAY_ID, and SCAN_NUMBER. 

125 One can override this behavior for FIELD_ID and/or SCAN_NUMBER by specifying the combine 

126 parameter (see below). Unlike the time block processing algorithm, this sliding time window 

127 algorithm requires that details all rows for the time block in question are kept in memory, 

128 and thus the sliding window algorithm in general requires more memory than the blcok 

129 processing method. Also, unlike the block processing method which computes a single value 

130 for all weights within a single bin, the sliding window method requires that each row 

131 (along with each channel and correlation bin) be processed individually, so in general 

132 the sliding window method will take longer than the block processing method. 

133  

134 CHANNEL BINNING 

135  

136 The width of channel bins is specified via the chanbin parameter. Channel binning occurs within 

137 individual spectral windows; bins never span multiple spectral windows. Each channel will 

138 be included in exactly one bin. 

139  

140 The default value "spw" indicates that all channels in each spectral window are to be 

141 included in a single bin. 

142  

143 Any other string value is interpreted as a quantity, and so should have frequency units, eg 

144 "1MHz". In this case, the channel frequencies from the CHAN_FREQ column of the SPECTRAL_WINDOW 

145 subtable of the MS are used to determine the bins. The first bin starts at the channel frequency 

146 of the 0th channel in the spectral window. Channels with frequencies that differ by less than 

147 the value specified by the chanbin parameter are included in this bin. The next bin starts at 

148 the frequency of the first channel outside the first bin, and the process is repeated until all 

149 channels have been binned. 

150  

151 If specified as an integer, the value is interpreted as the number of channels to include in 

152 each bin. The final bin in the spectral window may not necessarily contain this number of 

153 channels. For example, if a spectral window has 15 channels, and chanbin is specified to be 6, 

154 then channels 0-5 will comprise the first bin, channels 6-11 the second, and channels 12-14 the 

155 third, so that only three channels will comprise the final bin. 

156  

157 MINIMUM REQUIRED NUMBER OF VISIBILITIES 

158  

159 The minsamp parameter allows the user to specify the minimum number of unflagged visibilities that 

160 must be present in a sample for that sample's weight to be computed. If a sample has less than 

161 this number of unflagged points, the associated weights of all the points in the sample are 

162 set to zero, and all the points in the sample are flagged. 

163  

164 AGGREGATING DATA ACROSS BOUNDARIES 

165  

166 By default, data are not aggregated across changes in values in the columns ARRAY_ID, 

167 SCAN_NUMBER, STATE_ID, FIELD_ID, and DATA_DESC_ID. One can override this behavior for 

168 SCAN_NUMBER, STATE_ID, and FIELD_ID by specifying the combine parameter. For example, 

169 specifying combine="scan" will ignore scan boundaries when aggregating data. Specifying 

170 combine="field, scan" will ignore both scan and field boundaries when aggregating data. 

171  

172 Also by default, data for separate correlations are aggregated separately. Data for all 

173 correlations within each spectral window can be aggregated together by specifying 

174 "corr" in the combine parameter. 

175  

176 Any combination and permutation of "scan", "field", "state", and "corr" are supported 

177 by the combine parameter. Other values will be silently ignored. 

178  

179 STATISTICS ALGORITHMS 

180  

181 The supported statistics algorithms are described in detail in the imstat and ia.statistics() 

182 help. For the current application, these algorithms are used to compute vr and vi (see above), 

183 such that the set of the real parts of the visibilities and the set of the imaginary parts of 

184 the visibilities are treated as independent data sets. 

185  

186 RANGE OF ACCEPTABLE WEIGHTS 

187  

188 The wtrange parameter allows one to specify the acceptable range (inclusive, except for zero) 

189 for weights. Data with weights computed to be outside this range will be flagged. If not 

190 specified (empty array), all weights are considered to be acceptable. If specified, the array 

191 must contain exactly two nonnegative numeric values. Note that data with weights of zero are 

192 always flagged. 

193  

194 EXCLUDING CHANNELS 

195  

196 Channels can be excluded from the computation of the weights by specifying the excludechans 

197 parameter. This parameter accepts a valid MS channel selection string. Data associated with 

198 the selected channels will not be used in computing the weights. 

199  

200 PREVIEW MODE 

201  

202 By setting preview=True, the application is run in "preview" mode. In this mode, no data 

203 in the input MS are changed, although the amount of data that the application would have 

204 flagged is reported. 

205  

206 DATA COLUMN 

207  

208 The datacolumn parameter can be specified to indicate which data column should be used 

209 for computing the weights. The values "corrected" for the CORRECTED_DATA column and "data" 

210 for the DATA column are supported (minimum match, case insensitive). 

211  

212 OTHER CONSIDERATIONS 

213  

214 Flagged values are not used in computing the weights, although the associated weights of 

215 these values are updated. 

216  

217 If the variance for a set of data is 0, all associated flags for that data are set to True, 

218 and the corresponding weights are set to 0. 

219  

220 EXAMPLE 

221  

222 # update the weights of an MS using time binning of 300s 

223 statwt("my.ms", timebin="300s") 

224  

225 

226 

227 """ 

228 

229 _info_group_ = """manipulation""" 

230 _info_desc_ = """Compute and set weights based on variance of data.""" 

231 

232 def __call__( self, vis='', selectdata=True, field='', spw='', intent='', array='', observation='', scan='', combine='', timebin=int(1), slidetimebin=False, chanbin='spw', minsamp=int(2), statalg='classic', fence=float(-1), center='mean', lside=True, zscore=float(-1), maxiter=int(-1), fitspw='', excludechans=False, wtrange=[ ], flagbackup=True, preview=False, datacolumn='corrected' ): 

233 schema = {'vis': {'type': 'cReqPath', 'coerce': _coerce.expand_path}, 'selectdata': {'type': 'cBool'}, 'field': {'type': 'cStr', 'coerce': _coerce.to_str}, 'spw': {'type': 'cStr', 'coerce': _coerce.to_str}, 'intent': {'type': 'cStr', 'coerce': _coerce.to_str}, 'array': {'type': 'cStr', 'coerce': _coerce.to_str}, 'observation': {'type': 'cStr', 'coerce': _coerce.to_str}, 'scan': {'type': 'cStr', 'coerce': _coerce.to_str}, 'combine': {'type': 'cStr', 'coerce': _coerce.to_str}, 'timebin': {'anyof': [{'type': 'cStr', 'coerce': _coerce.to_str}, {'type': 'cInt'}]}, 'slidetimebin': {'type': 'cBool'}, 'chanbin': {'anyof': [{'type': 'cStr', 'coerce': _coerce.to_str}, {'type': 'cInt'}]}, 'minsamp': {'type': 'cInt'}, 'statalg': {'type': 'cStr', 'coerce': _coerce.to_str}, 'fence': {'type': 'cFloat', 'coerce': _coerce.to_float}, 'center': {'type': 'cStr', 'coerce': _coerce.to_str}, 'lside': {'type': 'cBool'}, 'zscore': {'type': 'cFloat', 'coerce': _coerce.to_float}, 'maxiter': {'type': 'cInt'}, 'fitspw': {'type': 'cStr', 'coerce': _coerce.to_str}, 'excludechans': {'type': 'cBool'}, 'wtrange': {'type': 'cFloatVec', 'coerce': [_coerce.to_list,_coerce.to_floatvec]}, 'flagbackup': {'type': 'cBool'}, 'preview': {'type': 'cBool'}, 'datacolumn': {'type': 'cStr', 'coerce': _coerce.to_str}} 

234 doc = {'vis': vis, 'selectdata': selectdata, 'field': field, 'spw': spw, 'intent': intent, 'array': array, 'observation': observation, 'scan': scan, 'combine': combine, 'timebin': timebin, 'slidetimebin': slidetimebin, 'chanbin': chanbin, 'minsamp': minsamp, 'statalg': statalg, 'fence': fence, 'center': center, 'lside': lside, 'zscore': zscore, 'maxiter': maxiter, 'fitspw': fitspw, 'excludechans': excludechans, 'wtrange': wtrange, 'flagbackup': flagbackup, 'preview': preview, 'datacolumn': datacolumn} 

235 assert _pc.validate(doc,schema), create_error_string(_pc.errors) 

236 _logging_state_ = _start_log( 'statwt', [ 'vis=' + repr(_pc.document['vis']), 'selectdata=' + repr(_pc.document['selectdata']), 'field=' + repr(_pc.document['field']), 'spw=' + repr(_pc.document['spw']), 'intent=' + repr(_pc.document['intent']), 'array=' + repr(_pc.document['array']), 'observation=' + repr(_pc.document['observation']), 'scan=' + repr(_pc.document['scan']), 'combine=' + repr(_pc.document['combine']), 'timebin=' + repr(_pc.document['timebin']), 'slidetimebin=' + repr(_pc.document['slidetimebin']), 'chanbin=' + repr(_pc.document['chanbin']), 'minsamp=' + repr(_pc.document['minsamp']), 'statalg=' + repr(_pc.document['statalg']), 'fence=' + repr(_pc.document['fence']), 'center=' + repr(_pc.document['center']), 'lside=' + repr(_pc.document['lside']), 'zscore=' + repr(_pc.document['zscore']), 'maxiter=' + repr(_pc.document['maxiter']), 'fitspw=' + repr(_pc.document['fitspw']), 'excludechans=' + repr(_pc.document['excludechans']), 'wtrange=' + repr(_pc.document['wtrange']), 'flagbackup=' + repr(_pc.document['flagbackup']), 'preview=' + repr(_pc.document['preview']), 'datacolumn=' + repr(_pc.document['datacolumn']) ] ) 

237 task_result = None 

238 try: 

239 task_result = _statwt_t( _pc.document['vis'], _pc.document['selectdata'], _pc.document['field'], _pc.document['spw'], _pc.document['intent'], _pc.document['array'], _pc.document['observation'], _pc.document['scan'], _pc.document['combine'], _pc.document['timebin'], _pc.document['slidetimebin'], _pc.document['chanbin'], _pc.document['minsamp'], _pc.document['statalg'], _pc.document['fence'], _pc.document['center'], _pc.document['lside'], _pc.document['zscore'], _pc.document['maxiter'], _pc.document['fitspw'], _pc.document['excludechans'], _pc.document['wtrange'], _pc.document['flagbackup'], _pc.document['preview'], _pc.document['datacolumn'] ) 

240 except Exception as exc: 

241 _except_log('statwt', exc) 

242 raise 

243 finally: 

244 task_result = _end_log( _logging_state_, 'statwt', task_result ) 

245 return task_result 

246 

247statwt = _statwt( ) 

248