13.2 File organisation and access

# Methods of file organization
## Serial File Organisation
- records are stored one after the other
- need to be accessed one after the other
- Serial files are stored in chronological order
- In serial files, new records are added in the next available space / records are appended to the file
## Sequential File Organisation
- records are stored in a particular order
- the order is determined based on the value in a key field
- records are accessed one after the other
- records can be found by searching from the beginning of the file, record by record, until the required record is found or key field value is exceeded.
- In sequential files, new records are inserted in the correct position.

- **Records** (in the file) are ordered
- …based on the **key field**
- A new version (of the file) has to be created to update the file  
## Random File Organisation
- **Records** are stored in no particular order within the file // There is no sequencing in the placement of the **records**
- There is a relationship between the key of the **record** and its location within the file // a hashing algorithm is used to find the location of the **record**
- Updates to the file can be carried out directly.  
- records of data are physically stored in a file in any available position
  - The location of any record in the file is found by using a hash table/hashing algorithm on the key field of a record
    - Hash Table = table of data which is ordered not by the key field, but by the value of the key field
      - Hence, data can be directly accessed by hashing the key field, rather than having to look through each record one by one
      - Hashing Algorithm = a mathematical formula used to perform a calculation on the key field of the record, the result of the calculation gives the address where the record should be found
        - chooses a suitable number and divides this number by the value in the key field
##### Compare Sequential and serial methods of file organisation
- In both serial and sequential files records are stored one after the other …
- … and need to be accessed one after the other
- Serial files are stored in chronological order
- Sequential files are stored with ordered records and stored in the order of the key field
- In serial files, new records are added in the next available space / records are appended to the file
- In sequential files, new records are inserted in the correct position.

# Methods of file access
## Direct access
- Direct access allows a record to be found in a file without other records being read.
- Records are found by using the key field of the target record // the location of the record is found using a hashing algorithm.
#### Direct access is used to locate a specific record in sequential files and random files.
sequential files
- In sequential files, an index of all key fields is kept
- The index is searched for the address of the file location where the target record is stored.
random files
- A hashing algorithm is used on the key field of the record to calculate the address of the memory location where the target record is expected to be stored.
- Method to find a record if it is not at the expected location e.g. linear probing, search overflow area etc.
##### Describe what happens, in relation to the storage or retrieval of a record in the file, when the calculated hash value is a duplicate of a previously calculated hash value for a different record key.
- A collision occurs when the record key doesn’t match the stored record key
- this means the determined storage location has already been used for another record.
- **If the record is to be stored**
- Search the file linearly  to find the next available storage space (closed hash)
- Search the overflow area linearly to find next available storage space (open hash)
- **If the record is to be found**
- search the overflow area linearly (open hash) until the matching record key is found
- search linearly from where you are (closed hash) until the matching record key is found
- If not found record is not in file
#### Please outline what is meant by the term collision in this context.
- A collision is when the two values / data items in the key field for two records (pass through a hashing algorithm and) result in the same hash value  
- …so the location identified (by the hashing algorithm) may already be in use // two records cannot occupy the same address  
#### Explain how a collision can be dealt with when writing records to a random file.
- A process of collision resolution is used
- Start at the original hashed storage space
- go through the following spaces in a linear fashion
- …and **store the data item** in the first available slot.  
## Serial and Sequential access
### The process of sequential access for serial and sequential files
-  Start at the beginning of the file
-  …check records linearly //  searches for records one after the other
-  …until the desired record is found // … processing / updating records as required //… EOF found.

sequential method of file **access**
Sequential access method searches for records one after the other
… from the physical start of the file until the record is found/the end of file.

### Sequential Access Applied
#### to files with serial organisation 
- For serial files, records are stored in chronological order
- every record needs to be checked until the record is found, or all records have been checked.
#### to files with sequential organisation
- For sequential files, records are stored in order of a key field/index, and it is the key field/index that is compared.
- every record is checked until the record is found, or the key field of the current record is greater than the key field of the target record.