2. Pre-ingest

This workflow describes how you prepare and process the content so it is ready for preserving at the next stage, including appraisal and processing.

2.1 Format migration and attachments

For email content not in open formats you may wish to undertake migration to your preferred format for preservation using software (see below). Open formats include EML and MBOX (eighth definition on first page and third definition on third page of same PDF, 165 KB).

Although PST is a Microsoft Outlook proprietary format, some use it as a preservation format.

For attachments (third definition on first page of PDF, 165 KB) you will need to decide whether to store them with the email (as MIME-encoded data (sixth definition on third page of PDF, 165 KB)) or store them separately in their original format.

Some emails may contain shareable internal links to documents held elsewhere (e.g. on a SharePoint folder or a Google Drive folder) or external links. None of the email preservation tools can capture these types of documents automatically at present.

Further guidance and software tools

  • Module 5.1: Processing Email for Ingest of the Novice to Know How: Email Preservation online training provides an overview of format conversion, attachments and links.
  • Commercial software such as Emailchemy and Aid4Mail Converter can be used for migration. However, it is possible to buy a reduced-price license for Emailchemy for ePADD which works with ePADD.
  • Email2PDF tool (free) will take an email file as input and create an archival PDF file that conforms to the EA-PDF (PDF/mail) specification as output. However, it requires some technical skills to implement.

2.2 Appraisal and sensitivity review

The account holder may have already carried out some pre-appraisal as part of Step 1.2: ‘Selection’. At this stage, you may wish to carry out further appraisal.

Software can be used to help facilitate the appraisal process (see the ‘Further guidance and software’ section immediately below).

Relative ‘quick wins’ can include dealing with spam/marketing emails and deduplication.

As part of this step, you may also wish to carry out a sensitivity review, identifying content that contains personal, sensitive or confidential information.

This can be a resource-intensive step so some organisations decide not to undertake this until a later date – for example when an access request is received or until an embargo period has elapsed.

Further guidance and software

2.3 Capture metadata and describe

Capturing metadata is important in order to preserve and make the content accessible. This can include contextual metadata (e.g. structure, arrangement, provenance, intellectual property rights, appraisal decisions) and preservation metadata (preservation actions, checksum information, integrity checks audit).

Email header sections (third definition on the second page of PDF, 165 KB) contain metadata such as the email sender, email recipient, date created and details of attachments.

Software can help with extracting some metadata and creating a catalogue (see below).

A key part of the preservation metadata should include creating or updating the checksums of the content – ePADD can do this for you or you can use other software. See Section 1.5 – ‘Create checksums’ – of the Digital preservation workflows guidance.

Cataloguing can be resource-intensive so focus on creating collection or series level descriptions.

Further guidance and software

Section 3: Preserve