SFF++ library: reading and writing SFF from C++
|
The following format definition is obtained from sff.doc
which comes along with libsff
or libstuff
. The SFF was invented in 1996.
The SFF (Stuttgart File Format) is an attempt to reconcile different demands on the way seismic data used at the Institute of Geophysics at Stuttgart University should be archived.
A single data format allows the standardization of software used to perform common tasks on the data as reading, writing, processing and plotting of data. Software has to be written only once, may be used by many people and may be kept at a single place in the computer system.
The general structure of the file format is a header block followed by one or more data blocks. Within the header and the data blocks optional blocks containing additional information are allowed. Each data block is structured as described by the GSE2.0 format. The data are compressed using second differences and are encoded into pure ASCII characters using a six bit encoding scheme (CM6) also described by the GSE2.0 format. The ASCII encoding ensures portability of the data across different operating systems and computer architectures. Moreover, it allows sending data via e-mail.
The whole datafile is ASCII readable with any text editor and is therefor transferable from any system to any system via email. You can extract valid GSE2.0 data blocks from the files by just using a text editor to delete additional lines.
The whole file consists of one file header block and one ore more data blocks:
File Header Data Block . . .
The File Header consists of a STAT line which is obligatory. There may be an optional FREE block and/or and an optional SRCE line:
STAT line obligatory FREE block optional SRCE line optional
Each Data Block has to start with an obligatory DAST line and a WID2 line defined in GSE2.0 format. After that there have to follow the encoded data samples between a DAT2 identifier and a CHK2 checksum. These lines may be followed by an optional FREE block and/or an optional INFO line.
DAST line obligatory WID2 line obligatory \ DAT2 identifier obligatory | The GSE2.0 data block consists dataset obligatory | of these four elements. CHK2 line obligatory / FREE block optional INFO line optional
This line provides general information about the data file
position | format | contents |
---|---|---|
1-5 | a5 | STAT (identifier) |
6-12 | f6.2 | library version minor versions are counted in 0.01 steps major versions are counted in 1.0 steps |
14-26 | a13 | timestamp of file creation time: yymmdd.hhmmss |
28-37 | a10 | code with a combination of two possible characters: F: there follows a FREE block S: there follows a SRCE line |
This is a block of any set of 80 characters wide lines. The start of this block is indicated a single line containing FREE in the first 5 positions. Another line of this content indicates the end of the FREE block. A FREE block may contain any usefull information for the user and has to follow no other standard than a line length of 80 characters.
This line holds information of the source that caused the seismic signal
position | format | contents |
---|---|---|
1-5 | a5 | SRCE (identifier) |
6-25 | a20 | type of source (any string like "earthquake") |
27 | a1 | type of coordinate system: C: cartesian S: spherical |
29-43 | f15.6 | c1: x, latitude (see also Coordinate Specification) |
44-58 | f15.6 | c2: y, longitude (see also Coordinate Specification) |
59-73 | f15.6 | c3: z, height (see also Coordinate Specification) |
75-80 | a6 | date of source event: yymmdd |
82-91 | a10 | time of source event: hhmmss.sss |
This line holds information on the actual dataset
position | format | contents |
---|---|---|
1-5 | a5 | DAST (identifier) |
7-16 | i10 | number of characters in encoded dataset From library version 1.10 this field may be -1. In this case the reading program has to determine the number of characters itself by detecting the CHK2 line. This change was necessary to implement the C++ version of libsff since this starts writing without having encoded the whole trace already. |
18-33 | e16.6 | ampfac This is a factor to scale the (floating point) dataset to an desireable dynamic range before converting it to Fortran integer values. After reading the dataset and decoding and converting it to floating point you have to multiply each sample by ampfac to get back the original values. As the maximum range of integer values goes from -(2.**31) to (2.**31)-1 you might like to adjust the maximum integer value to 0x7FFFFFFF. This may cause problems as the second differences compressing algorithm may increase the dynamic range of your data by a factor of four in the worst case. It is save to adjust the largest absolute value in the dataset to (2.**23)-1 which is 0x7FFFFF. |
35-44 | a10 | code with a combination of three possible characters indicating possible optional blocks and a following further dataset: F: a FREE block follows after dataset I: an INFO line follows after dataset D: there is another Data Block following in this file (this must be the last character in code) |
(is 132 characters wide!)
This waveform identification line holds information on the dataset as defined in GSE2.0 format.
position | name | format | contents |
---|---|---|---|
1-4 | id | a4 | WID2 (identifier) |
6-15 | date | i4,a1,i2,a1,i2 | date of first sample: yyyy/mm/dd |
17-28 | time | i2,a1,i2,a1,f6.3 | time of first sample: hh:mm:ss.sss |
30-34 | station | a5 | for a valid GSE2.0 block use ISC station code |
36-38 | channel | a3 | for a valid GSE2.0 block use FDSN channel designator |
40-43 | auxid | a4 | auxiliary identification code |
45-47 | datatype | a3 | must be CM6 in SFF |
49-56 | samps | i8 | number of samples |
58-68 | samprat | f11.6 | data sampling rate in Hz |
70-79 | calib | e10.2 | calibration factor |
81-87 | calper | f7.3 | calibration period where calib is valid |
89-94 | instype | a6 | instrument type (as defined in GSE2.0) |
96-100 | hang | f5.1 | horizontal orientation of sensor, measured in degrees clockwise from North (-1.0 if vertical) |
102-105 | vang | f4.1 | vertical orientation of sensor, measured in degrees from vertical (90.0 if horizontal) |
This line indicates the beginning of the encoded dataset. The dataset follows in 80 characters wide lines.
Provides a checksum for the dataset. The checksum has to be calculated as defined in GSE2.0:
position | format | contents |
---|---|---|
1-4 | a4 | CHK2 (identifier) |
6-13 | i8 | checksum |
Holds additional information on the seismometer position.
position | format | contents |
---|---|---|
1-5 | a5 | INFO (identifier) |
6 | a1 | type of coordinate system: C: cartesian S: spherical |
8-22 | f15.6 | c1: x, latitude (see also Coordinate Specification) |
23-37 | f15.6 | c2: y, longitude (see also Coordinate Specification) |
38-52 | f15.6 | c3: z, height (see also Coordinate Specification) |
54-57 | i4 | number of stacks done during acquisition (a value of zero and a value of one both mean a single shot) |
Notice that given coordinates imply a spatial relation between the source location and the receiver locations. While spherical coordinates refer to a fixed reference frame on the earth, cartesian coordinates refer to an arbitrary origin. The creator of the datafile is responsible to take care that coordinate information is consistent between the SRCE line and the several possible INFO lines.
x, y and z are vector components in a right handed cartesian reference frame. While x and y lie arbitrary orientated in the horizontal plane, z is counted positive upwards from an arbitrary reference level (preferably the free surface). All three coordinate values are measured in meters.
Latitude and longitude are given in the geographical reference frame and measured in degrees. The height value gives the height above the geoid and is measured in meters.