Day 12

For next time

Ethics in text mining and analysis

Guest speaker Dr. Erhardt Graeff

Alignment and bias

Let’s examine a reflection write-up from a previous semester MP3 that has been adapted to serve as a prompt that allows us to build up numerous “ask-analyze-assess” alignment possibilities. We will do so today with an eye toward biases that could have played a role in data and algorithms used (or the people that generated them) by the systems that this project incorporated.

Work with one or two people near you as you read the reflection below and do exercises 1 through 5 related to alignment. Examining an already-completed MP3 can give you practice considering limitations and biases in complex systems before you reflect upon your own assignment. The exercise also enables instructors to introduce some considerations that might be less obvious.


Text caching class example

We’ve created an example program to demonstrate a) caching text data as local files, and b) the utility of custom classes.

You are free to build on this program in your MP3, but as with all code you didn’t write you must make sure you understand how it works (and ask questions if you don’t). Since this code was provided by course staff, you don’t need to cite its source.

Exercise: Try adding a lines method to the class that returns a list of all the individual lines in the text file, so that you can use it to write code like:

example = Text(my_url)
for line in example.lines():
    do_something(line)