Publication Date


Document Type

Conference Presentation


Library and Information Science | Statistics and Probability


A presentation made at the Wrangling Library Data virtual conference organized by Amigos Library Services, 22 February 2018. Slide deck and spreadsheets demonstrated during the conference session are available here.


Microsoft Excel was first released on the Windows platform 30 years ago and has since become widely used. Although new tools for manipulating, analyzing, and visualizing data are constantly emerging, Excel remains a potent tool—and not just because of newer features. Simple functions such as TRIM, MID, SUBSTITUTE, FIND, ROUNDDOWN, and VLOOKUP can be used to manipulate data sets in powerful ways.

This workshop applies selected functions to realistic library data sets. Demonstrations include:

  • deriving time-series categories from date and time stamps
  • pre-coding survey comments based on keywords
  • dealing with messy data points such as call numbers and publisher names
  • removing extraneous punctuation and symbols from textual data
  • identifying keywords that are misspelled or on a stopword list

Spreadsheets available for download provide simple utilities for performing these tasks. For those interested in creating or modifying formulas, presentation slides illustrate the basics of using selected functions to transform data in Excel.

1. Date & time manipulation.xlsx (1129 kB)
1. Date & time manipulation

2. Pre-coding of survey comments.xlsx (609 kB)
2. Pre-coding of survey comments

3. Publisher name manipulation.xlsx (415 kB)
3. Publisher name manipulation

4. LC call number manipulation.xlsx (3817 kB)
4. LC call number manipulation

5. Punctuation, symbol, & extra text removal.xlsx (816 kB)
5. Punctuation, symbol, & extra text removal

6. Keyword generation.xlsx (2559 kB)
6. Keyword generation

7. Spell check.xlsx (2422 kB)
7. Spell check

8. Stopword identification.xlsx (1799 kB)
8. Stopword identification