This workflow describes how you prepare and process the content so it is ready for preserving at the next stage, including appraisal and processing.
2.1 Format migration and attachments
For email content not in open formats you may wish to undertake migration to your preferred format for preservation using software (see below). Open formats include EML and MBOX (eighth definition on first page and third definition on third page of same PDF, 165 KB).
Although PST is a Microsoft Outlook proprietary format, some use it as a preservation format.
For attachments (third definition on first page of PDF, 165 KB) you will need to decide whether to store them with the email (as MIME-encoded data (sixth definition on third page of PDF, 165 KB)) or store them separately in their original format.
Some emails may contain shareable internal links to documents held elsewhere (e.g. on a SharePoint folder or a Google Drive folder) or external links. None of the email preservation tools can capture these types of documents automatically at present.
Further guidance and software tools
- Module 5.1: Processing Email for Ingest of the Novice to Know How: Email Preservation online training provides an overview of format conversion, attachments and links.
- Commercial software such as Emailchemy and Aid4Mail Converter can be used for migration. However, it is possible to buy a reduced-price license for Emailchemy for ePADD which works with ePADD.
- Email2PDF tool (free) will take an email file as input and create an archival PDF file that conforms to the EA-PDF (PDF/mail) specification as output. However, it requires some technical skills to implement.
2.2 Appraisal and sensitivity review
The account holder may have already carried out some pre-appraisal as part of Step 1.2: ‘Selection’. At this stage, you may wish to carry out further appraisal.
Software can be used to help facilitate the appraisal process (see the ‘Further guidance and software’ section immediately below).
Relative ‘quick wins’ can include dealing with spam/marketing emails and deduplication.
As part of this step, you may also wish to carry out a sensitivity review, identifying content that contains personal, sensitive or confidential information.
This can be a resource-intensive step so some organisations decide not to undertake this until a later date – for example when an access request is received or until an embargo period has elapsed.
Further guidance and software
- Module 5.2: Appraisal Decisions of the Novice to Know How: Email Preservation online training provides an overview of this step, including a guide to using the ePADD appraisal module which allows you to review the email mailbox, add metadata, appraise, identify sensitive information, and export for the processing module. ePADD will also undertake some deduplication.
- The Good Practices for Acquiring Email Archives: a community guide has a section on appraisal.
- Appraisal Rubric was developed for the Carcanet Press email collection by John Rylands Library and identifies what types of records will be kept.
- Protecting Sensitive Email: Archival Views on Challenges and Opportunities (PDF, 30 KB) by Amy Wickner et al (2017) is a good introduction to the challenges of sensitive information in email archives.
- Palladium: appraisal and sensitivity review of the Carcanet email archive by Paul Carlyle at the University of Manchester explains how they use ePADD to support appraisal and sensitivity reviews.
- Other free email preservation tools that facilitate appraisal and processing include RATOM and DArcMail, but they require some technical skills to implement. For RATOM, see The Other BCC: Appraising and Processing Email by Cal Lee & Kam Woods.
- If you are using BitCurator (free) then the Bulk Extractor tool (free) can be used to identify sensitive information. See BitCurator: Using Bulk Extractor to Locate Potentially Sensitive Information (video).
2.3 Capture metadata and describe
Capturing metadata is important in order to preserve and make the content accessible. This can include contextual metadata (e.g. structure, arrangement, provenance, intellectual property rights, appraisal decisions) and preservation metadata (preservation actions, checksum information, integrity checks audit).
Email header sections (third definition on the second page of PDF, 165 KB) contain metadata such as the email sender, email recipient, date created and details of attachments.
Software can help with extracting some metadata and creating a catalogue (see below).
A key part of the preservation metadata should include creating or updating the checksums of the content – ePADD can do this for you or you can use other software. See Section 1.5 – ‘Create checksums’ – of the Digital preservation workflows guidance.
Cataloguing can be resource-intensive so focus on creating collection or series level descriptions.
Further guidance and software
- Module 5.5: ‘Capturing Metadata’ and Module 7.1: ‘Facilitating Discovery for Preserved Email’ of the Novice to Know How: Email Preservation online training provides an overview of this step, including a guide to using the ePADD processing module in ePADD add and edit metadata required for a collection-level description and link names with authority records from other sources.
- Processing and Providing Access to Email Collections with ePADD (video)
- Email archive preservation which uses Preservica and ePADD (video) – Jessica Smith of the University of Manchester on using the processing module of ePADD with the Carcanet archive
- Other free email preservation tools that can harvest metadata include RATOM and DArcMail, but they require some technical skills to implement (the use of a database and Python scripting). For RATOM, see The Other BCC: Appraising and Processing Email by Cal Lee & Kam Woods (2020).