Personal Data Preservation, Inspired by Ancient Writing - Will Byrd
Will Byrd presented a talk on his continuing efforts to preserve his work for at least 5,000 years. What are the challenges? Why would you want to do that?
Please forgive the poor audio quality. We had technical trouble with the audio recording and had to use the backup audio. Please use the captions (CC) in the video player.
Eric: [00:00] Our next speaker probably knows more Egyptian hieroglyphics than he does Clojure.
William Byrd: [00:10] This is true.
Eric: [00:15] This is part of the context track. There are three tracks in this conference. Tracks, they are themes. One is context.
[00:29] The other one is code, so all about coding, programming. The other is, I call it career, [inaudible] . It's business. The first talk was in the business theme.
Eric: [00:42] The context one, I asked Will if he could give a talk because software is changing the world. It's the new medium that hasn't been around that long. We don't know exactly how it's going to change our lives.
[00:59] What we can do is look to other media and how it changed the world and history. I might go back all the way to the beginning of history, to the invention of writing. That is why I asked Will to speak.
[01:16] Will Byrd is a friend of the coding community. He created miniKanren, is what core logic is based on. He's also working on Barliman, which will replace us all in the future, and to program her that's better than us.
[01:33] Will Byrd, thank you.
Will: [01:36] I should point out that all the work I've done is with other people, except for what I'm going to show you today. It's all me.
Will: [01:50] The miniKanren's for you, and the other people [inaudible] . Now for something different. Eric has known for a while that I'm interested in the history of writing.
[02:06] My language skills are pretty weak. Some of my friends make fun of me. They say I'm a programming language researcher, but I only know one programming language [inaudible] . I know other languages, but I refuse to program [inaudible].
Will: [02:21] The only natural language I know is English. I'm very interested, not such much the language, but in writing. Alan [inaudible] and many other people claim that writing is the most important invention humans have ever come up with. I believe that's probably true.
[02:44] I'm really fascinated by the history of writing and in particular the origins of writing. I also think the writing for some of these systems is extremely beautiful. I really love ancient Egyptian. I really love cuneiform, Sumerian, [inaudible] , that kind of thing.
[03:05] When I first went to school, when I went to college I was at University of Chicago and I didn't want to take Spanish because I had done so poorly, so I decided to take an advanced graduate course in Babylonian which would have gone as well as you might expect.
Will: [03:22] The other students in the class, the grad student majoring in creative linguistics would be like, "Oh, is that the case like in Hebrew or Aramaic," and I'd be like, "Is that like Spanish?"
Will: [03:36] Her husband was like, "No. It's not like Spanish." But anyway it hasn't stopped me from being interested in systems, especially the writing.
[03:47] I've taught myself a little bit of Egyptian and a little bit of Babylonian. Babylonian as a language isn't that hard. The writing system is very difficult to read but the language itself is, it's a Semitic language and it's not that different from learning Hebrew, basically.
[04:06] So Eric asked me to talk about the history of writing and programming and what in the world do you talk about? I had no idea so I just started reading as much as I could. I used it as an excuse to geek out, and also I've been in England a fair amount recently so every time I go to England now I get the Goodfellow room next to the British museum. I've also been to Oxford and Cambridge recently.
[04:36] I set up the Ashmolean museum in Oxford a few times, so this is some cuneiform writing from the British museum. Let's see, I'm not sure how old this particular one is but it's pretty old, so it's carved in stone. By the way my slides are just photos I've taken and stuff like that so they're not the most organized. I'll just talk around it.
[05:13] Here is another piece. I'm very interested also in ceramics, so obviously if you have writing on stone that lasts a long time. If you have ceramics, you can have complicated art work that lasts a long time. That's like 2,500 years old.
[05:32] Here you see some ceramic pieces. You can see whoever made the ceramics, or, actually I'm not sure if this is the person who made the ceramics or it's someone afterwards carving into it, but they could write something, inscribe it in the pottery and this will last thousands of years. Here's some cuneiform tablets. These are close to 4,000 years old. This is a Sumerian king list, so that's almost 4,000 years old.
[06:09] Again, you just go in the Ashmolean museum and it's sitting there on display. The Sumerians and Babylonians didn't just write on the flat tablets, they also had these prism structures and some [inaudible] looking type things.
[06:27] These are some inscriptions talking about mathematics. They have inscriptions talking about astronomy. These were preserved for a very long period of time.
[06:40] Then this is very, very old writing from the Sumerians. This is sort of accounting information. This is about 5,200 years old, I think, so 5,200 years old. I was thinking a lot about these ancient writings and also another fun thing...Oh yeah, this is really, really old writing. This is apparently proto-cuneiform I think for beer.
Will: [07:11] Yeah, once again, this is like 5,000 years old. You can also see these sorts of master works. This is a Greek vase. It looks like it was made yesterday, but this is thousands of years old. Now we come to the modern world.
Will: [07:33] As I'm doing these things, as I'm reading up and learning and thinking, and visiting these museums, I also do things like get emails from Google saying that they're going to shut down my account since we don't have data, "Don't want your account for your data? If you don't want this Google Apps account and don't want to save any of your data, you don't need to do anything." It'll take care of itself.
[08:10] They give their justification for doing it, but what happens to your account? Basically all your information is gone forever. It's gone forever. You can find some interesting websites. This is from "Le Monde," but this is a Google memorial to all of the services that Google has shut down.
Will: [08:30] They couldn't all fit on the screen easily. I don't particularly mean to pick on Google. This is not just a Google phenomenon. The Internet Archive Team run by Jason Scott keeps a deathwatch of all the services and websites that are likely to go down.
[08:54] Here's their "Likely to Die" list. They're keeping an eye on these websites and they're archiving them. This is a partial list of all the sites and services that shut down in 2014. I couldn't fit it on one page, of course.
[09:11] This is the thing that got Jason Scott really involved in this, is that GeoCities shut down. If you remember GeoCities that was big chunk of the Web, the early Web, that people had in their GeoCities pages and Yahoo! Just shut it down.
[09:29] Apparently they gave, their notice was like a little aside in an FAQ saying, "This service is going to be shut down in two months," a few months, or whatever. Jason Scott and this Archive Team managed to pull down much of GeoCities and several other teams also tried archiving it.
[09:55] If you go to Internet Archive you can find their "Special Collection for 2009" for GeoCities. Actually if you go back and look at "Sunset Happy," yeah, "Archive Team officially proclaims Yahoo! The least trustable host and its arch enemy.
Will: [10:17] Prove us different, or not." This is what I'm thinking about, also we're seeing while I'm reading about these documents that are thousands of years old.
[10:34] I know a little Egyptian. Let me see if I can find that little bit. These aren't too well organized.
[10:43] Anyway, the British Museum has writing in Egyptian where I can read king names just fine, things like that. I can still read writing that's thousands of years old. This was actually forgotten, the language was forgotten. People didn't know how to read it, because of Egyptian, people didn't know how to read Sumerian, people didn't know how to read Babylonian, so this information was lost for a long time.
[11:15] Only relatively recently was it rediscovered. We're still trying to figure out how to read Sumerian better. Their language is like Linear B to decipher. Then there are languages like Linear A where we haven't been able to decipher them. Why haven't we been able to decipher Linear A, does anybody know that?
Audience Member: [11:40] Sample sizes.
Will: [11:40] Yeah, the sample size is really small. There's not enough of it. There's this saying that 90 percent of life is showing up. I guess that's true of history too. 90 percent of history is showing up. If your language is forgotten, and there's enough pieces left, enough fragments, people in the future, maybe thousands of years in the future at least have a chance to try to recover that.
[12:11] But if you don't leave those pieces, it's gone, there's nothing you can do about it. I started thinking a lot about how when I was in elementary school I would write something down and my parents still have my writings from elementary school and some of my report cards from elementary school. But everything I did in high school and college is all gone because I did it on computer, I did it on a word processor.
[12:41] Some people say, "Well, now we have the Web, we have the cloud." But I think people of the Archive Team would say that that doesn't give us any safety guarantee. Maybe for a couple of years, but certainly not over time scales of so even 10 years, and definitely not over time scales of like 2,000 years.
[13:03] We only can make our data as safe in GitHub and GitHub's probably not going away tomorrow, but can you really say that in the next 30 years may not be acquired or some company that buys GitHub?
Audience Member: [13:19] Yahoo!
Will: [13:19] Yeah, Yahoo! Is going to buy GitHub.
Will: [13:23] They'll take all that Alibaba money.
Will: [13:31] Obviously people are aware of this problem. People are aware of this problem and they're also aware that they have to be very careful of their media, that their media can fail, hard drives fails, all these sorts of things. I think there's an awareness of it, but I think it doesn't go too far beyond awareness for most of us.
[13:53] We have some awareness that this could be a problem, but I don't think we're acting on what is a problem, or a potential problem. When I'm looking at this writing that's 5,000 years old, I can't help but wonder how much of the stuff we're doing today will be around in 5,000 years, how much of it is going to be lost forever.
[14:16] That inspired me to start changing some of my new practices and thinking about my new practices. I'm going to share some of the changes I've made and some of the things I'm thinking about and trying to work on.
[14:35] Interestingly enough, I thought this would just be some weird rabbit hole I go down that no one cares about, but many of the programmers I talk to think this is really interesting. I said, "Why? Why do you think this is interesting?" They said, "Well, programmers get excited about keyboard switches...
Will: [14:50] or whatever." There are certain types of input devices or certain...If your living is made by entering text, then it's natural to care about your input devices or your screens, or whatever.
[15:10] To my surprise a number of programmers seem to be interested in this, so I am going to speak out a little bit and tell you about what I do and invite you to join me, if you are interested. This is definitely a project I'm working on actively and I will be working on for the rest of my life.
[15:36] What have I decided to do? What I've decided to do is to take it as a personal challenge that I want my research notes and things like that to last at least 5,000 years and hopefully longer. That's my task for myself that everything I do that I care about I want to last at least 5,000 years, and it has the potential of lasting 5,000 years.
[16:00] I want the equivalent of clay tablets or carved in stone, or that kind of thing, so the oldest papyrus that's still existent is about 4,600 years old. Egyptian papyrus that this is written on that we can read is about 4,600 years old. That I think is a reasonable target to shoot at.
[16:25] How do you do that? How do you make sure that your research notes last 5,000 years or whatever? It's going to take some doing, maybe.
[16:36] The first thing I thought about was, "I could try to store everything on like GitHub. Maybe that will last 5,000 years."
Will: [16:45] I think Jason Scott has disabused me of that notion. What do I do instead? I've become like the anti-Ted Nelson. Ted Nelson worked on the Xanadu Project for a long time and he is not a big fan of paper, of emulating paper on the computer.
[17:04] I understand the reasons for this, so I'm not really anti-Ted Nelson, Ted Nelson's work is great. But it will look like I'm anti-Ted Nelson, because now I'm a computer scientist and I've gone back to paper. For anything I care about, it has to be on paper. It can also be on a computer, in fact it will also be on a computer.
[17:24] But it also has to be on paper, so I've also gotten interested not just in super high-quality paper, but I've gotten very interested in inks and pens and papyrus, and things like that. Here's some papyrus. I've got lots of stuff for people to check out afterwards if you're interested in this.
[17:42] This is a papyrus I bought online from Egypt I don't think is very high quality, so I bought a kit that makes my own. But that also I think is not really sufficient, because I also want to understand how these things are made. Did you know, I think it's at Lowe's, you can buy papyrus plants? I'm going to start growing my own papyrus.
Will: [18:09] Then, my understanding is that the ancient Egyptian papyrus was actually very high quality. Very high quality and they basically, I don't know if they intentionally did this, but they effectively did selective breeding of papyrus and we've lost that. I'm going to start breeding programs to start breeding papyrus to try to get back to something that's decent. That's one thing.
[18:38] But papyrus is actually annoying to write on and has some other issues. I want some writings on papyrus but I also want writings on other media. One of the things I'm using now is paper. The paper that I use is not typical paper. It was something you could buy, but you have to look out for.
[19:04] This is what one of my notebooks look like today. I sort of make my own. Basically what this is it's a spring-loaded thesis binder. Let me get these [inaudible] . Then I have paper which is a hundred-percent cotton, acid free, 24-pound weight paper made by a company called Strathmore, that makes very good paper for artists.
[19:31] Artists are people who care about things lasting a long time. If you're getting into materials, you often find that the artists are people who care about it. The other people who care about it are conservators, and librarians, and people like that dealing with old manuscripts or old art collections, and so forth. This is the paper I use.
[19:53] What do you write on? I'm sorry, that's what you write on, but what do you write with?
[20:02] What I've decided to do is, "Let's use fountain pens." Basically I carry a roll of fountain pens around. My go-to fountain pen is the Pilot 823 document filler. I've got two of these.
[20:17] They're not cheap, but the reason I use these pens is that they're safe to carry on an airplane. Not to use on an airplane but to carry on an airplane, because they're resistant to atmospheric pressure changes.
[20:29] If you don't have a pen like that then you will see that the ink can explode and go all over the place. I've got a couple of special pens and then I use some special ink. The ink I use is what's called Platinum brand carbon black ink.
[20:48] This is the same sort of ink that the Egyptians used, or basically the same sort of ink that the Egyptians used for the papyrus. That black, that's 4,600 years ago. This sort of ink should last not thousands of years but tens of thousands of years. It will last way longer than the paper will.
[21:08] The thing that I'm not sure about is that you have to suspend the carbon in a binder and I'm not sure what's in the binder and most of these ink manufacturers will not tell you. I've got some friends who run mass spec experiments, so I've thought about asking them...
Will: [21:27] if they could run a mass spec on the ink to try to...Basically my theory is paranoia.
Will: [21:34] This is like security. If you want to play this game, you have the ultimate enemy which is time. Time will find some way to destroy these things. It could be, if you look at the history of ink during the Middle Ages there was a type of ink called iron gall ink that was used in the Western world.
[21:58] There are many formulas for iron gall ink, but many of those formulas turn out to be quite acidic and iron gall ink changes with exposure to the atmosphere. There are manuscripts where the paper is in perfect condition, but the area where the writing is gone, or someone drew a rectangle and now there's a hole in the paper, and that kind of thing. I've become extremely conservative.
[22:24] You can get archival pens with archival ink, but they are only certified to a hundred years. You can get Sakura art pens, for example. They age test them, accelerated age test them to a hundred years, but that's child's play.
Will: [22:41] I don't trust it. The reason I don't trust it is because they don't tell you what the formula is. This is just like carbon, tiny particles of carbon and I think we understand, to some extent how that behaves.
[22:53] If you ever read Neal Stevenson's novel, "Zodiac," the main character will not do any drug where the chemical formula is too complicated, if he feels like he can't understand it. That's how I feel about these things.
[23:07] It's like a hundred-percent cotton paper, all right, I can understand that. A hundred-percent carbon ink with a little bit of binder, OK, we can look at the binder and try to figure that out. Now it comes down to things like storage. How do you store that paper?
[23:23] I have an archival storage facility which used to be my bedroom closet. But I became a little worried about air circulation, so now I've moved my setup outside of the closet, but I have a monitoring system. I monitor temperature and humidity.
[23:44] I have actually a wireless network in my apartment where I have my paper storage facility, my papyrus storage facility and the notes I've written on. For my phone I've got an app where I can check at any time and get graphs of the temperature and humidity for my paper [inaudible] .
[24:08] I live in Alabama now and I'm very worried about the humidity. I'm sure that Alabama has a lot of humidity, but New Orleans would also be a problem. You can also do things like send the paper to a salt mine. It's how they store film, film stock, that kind of thing.
[24:25] You can do salt mines. The places I've looked at it's like, "Call us for an estimate."
Will: [24:31] It's never been cited. Another thing is I want my paper to be here and the other thing is I want to try to figure out how to do this on a budget. Right now I haven't been really too worried about the budget. I was like, "All right, I'll spend a little money, it's a lot cheaper than...Playing around with fountain pens is a lot cheaper than buying a boat or something."
[24:53] As I'm trying to figure it out, I'm willing to spend a little of my money, but I want to try and figure out, "What's a low-cost way of doing this? How can we do it for $50, or something like that?" These are the sorts of things I'm thinking about.
[25:13] Then there's another wrinkle which is, "Well, this is fine for my research notes, but what about for the code I write?" What do you with the code that you write?
[25:28] Fortunately, or unfortunately, depending on who you talk to, I do all my work in Scheme or Racket and in those languages you tend to write things that can be in specific languages, using macros, similar to what we did in Clojure and that means that I can often get very good compression ratios for my code.
[25:49] I was thinking about it. All the projects I've worked on for the last 13 years as a researcher, you could probably take all of that code and squeeze it into 5,000 lines [inaudible] . At that point you can actually print it out.
[26:04] This, for example, is a printout of my paper of micro [inaudible] . It fits on these two pages. Now, what sort of ink do you use?
[26:19] You could try to print it out with a laser printer. I have a laser-jet printer and that actually uses carbon toner, so you'd think it's very stable. However, there seem to be problems where aging and as the humidity changes for the paper.
[26:39] Basically the paper's being stressed when it becomes more and less humid, hotter and then cooler, the paper fibers are expanding and contracting and the carbon that lays on top, the toner that lays on top of the paper can start flaking off, so that's a possible failure.
[26:58] The people who really care about longevity with printing, are people who do things like fine art prints and they want them to be archival, so those people are pretty serious about it. What they do is they use inkjet printers.
[27:13] This is printed with a relatively inexpensive Epson printer and the reason I got that particular printer is not because I care about inkjet printers or Epson printers, but because you can get special ink.
[27:27] There is one person in the world who makes this ink which is the carbon ink that I have basically for my fountain pens. It's basically a similar formula, although once again, I'm paranoid, because they changed the formula to version 1.1 because a supplier couldn't handle something.
[27:43] That makes me very nervous. That sounds like something that should be "mass spec-ed."
Will: [27:49] This ink should be very similar in composition to the ink from my fountain pens, this should last tens of thousands of years unless they're doing something funny or something weird about the binder.
[28:01] This is part of the idea that I can go from digital to analog and preserve at least the things I care about most. It can go the other way. I've also got a scanner, a flatbed scanner, so now I scan all my notebooks and I'll have digital copies of those. I want to do bi-directional everything. I want digital copies of everything, I want analog copies of everything on very high-quality artifacts.
[28:29] Another thing I've started getting into is making my own paper, so with my parents I started making my own 100-percent cotton paper and I can talk to you about how you do that. It's actually not very difficult. This is 100-percent cotton paper and you can make your own.
[28:49] Once again, this is a way to both learn about these technologies that we often take for granted, but also to try to control the ingredients. It's just interesting to learn about paper and the history of paper and so forth, and be able to try to control the medium that you're using.
[29:09] I've also started doing some pottery. My mom does pottery, so I started doing some pottery. I've seriously considered trying to make my own clay tablets and things like that. I've also looked into, I've used quartz glass and that kind of thing.
Will: [29:30] I'm interested in nanotechnology, so I'm building a scanning atomic microscope and there's a...You can run that in atomic force mode so you can move atoms around, so maybe at some point you can do a nanotech version of things. This is something I want to work on for fun, but also it's extremely interesting, because you can start getting into failure modes of medieval manuscripts for example.
[29:55] This is a book, "Introduction to Manuscript Studies," which I highly recommend. If you want to check it out, I can show you. It's full of all of the ways that manuscripts go bad and how you fix them, how you recognize them, how you store them.
[30:10] There's this whole area of knowledge that humans have accumulated having to do with the preservation of analog artifacts that we know can last for a very, very long period of time, but I don't think we know how to do that for digital artifacts, not very well. I'm still trying to figure that out and we're in this danger period until we do figure it out of losing a lot of our history.
[30:43] The other part of this is, let's say that we come up with some relatively inexpensive ways to preserve things like research notes and code or things that we care about that we want to preserver for posterity, particularly as the programming language designer. As someone who designs programming systems, I want to record that information for myself in the future and for other people, to try to give people some context. What were we trying to do? Why did we make those decisions?
[31:20] That's great, however there's another aspect of it, which is, how do you organize this knowledge? I've been studying my nights, scanning old notebooks. I have probably hundreds of notebooks, research notebooks going back for quite a long time all written on paper of dubious quality with ink of dubious quality. I'm trying to scan all of those things.
[31:46] Now I have all of these JPEG images. 100 dpi JPEG images. What do we do with them? More generally, if you think about how our knowledge is spread out, where do I have knowledge captured?
[31:59] I've got notes on my phone. I've got notes on my computer and I've got random Emax files. I tried messing around with org mode.
[32:07] I could never figure out how that's supposed to work. I've got scripts of documents written with [inaudible] .
[32:13] I've got drafts of books. I've got stuff in GitHub, I've got stuff in Bitbucket. I have bookmarks in my browsers, in different browsers. I have YouTube playlists. If you think about that, that's a way to capture knowledge.
[32:27] All the YouTube videos that currently exist, here are the videos that I'm most interested in. Here I'm going to organize them.
[32:34] Here's a fun thing about the YouTube playlist. I've got some YouTube playlists, for example, music I like to listen to while I'm programming. Great.
[32:44] Sometimes an account goes away or a video gets taken down and then the YouTube playlist just has a, "Video removed." It doesn't have the title of the video. It's gone.
[32:57] This is actually a hole in the knowledge. I've got all of these bits of information spread out all over the place and I've no way to search over those really, organize them, [inaudible] them or whatever.
[33:11] The next stage I want to do, I'm trying to, in addition to develop better technologies and techniques or refine the practices that I've come up with for my own personal analog preservation, I'm also trying to figure out, how do we organize this knowledge? How do I want to organize my knowledge personally?
[33:36] One of the things I currently use is a program called TiddlyWiki, which is actually a decent program. I have a whole bunch of notes in TiddlyWiki as well. It has some linking and tabs and things like that but I also find it just doesn't meet my needs.
[34:01] After talking a lot to my friends, many of my friends pointed out that actually there's a lot of interface work that went into something like TiddlyWiki and this is absolutely true. Something like TiddlyWiki is both great and I shouldn't underestimate the amount of time that it took to develop a nice interface.
[34:19] At the same time, as someone who's a programmer and someone who has particular needs, it's the same reason I use Lisp. The reason I use Lisp is that Lisp is a recognition that whoever designed the programming language doesn't know the program as well as you do, therefore you may have to do things like change the language or create a new language to solve your particular program.
[35:02] In fact, I'm probably going to build many of these things to try to [inaudible] but I've decided if I actually want to be serious about this, I'm going to have to take ownership of this problem and try to be better than just having all of my knowledge spread out on various devices, various computers, bookmarks, places where I don't even necessarily think of knowledge organization like YouTube playlists.
[35:25] That represents my organization of my knowledge. That's a project that I'm working on seriously. I guess in that part of it, I guess what I'd say is, if this interests you, I'll very much happily share everything I know. In fact, I'm going create a [inaudible] thing, right? I guess I'll crate a GitHub page to share my practices.
Will: [36:00] The practices are ephemeral. The technology we use is ephemeral but I want the data to last a long time. That's the idea. Anything we do is a snapshot and we're going to have to keep working on it and improving it but if you're interested in this, I'm happy to share anything I know. I've got samples of all sorts of stuff you can play with.
[36:21] There's actually a really nice fountain pen store around the corner, which I actually ran into yesterday. Turned out to be an expensive mistake.
Will: [36:27] If you're interested in this, we can even take a field trip there, show you the stuff but the other part is digital organization. How do we design that? I've read a whole lot about things like the [inaudible] and Engelbart's work and Ted Nelson's work.
[36:49] All these people who were interested in trying to organize knowledge in sophisticated ways.
[36:54] There's been a lot more recent work also but now I'm at the point where it's like, I'm just going to have to start building things. What I build is probably going to look weird because it's going to be for me but maybe over time, I can figure out ways to develop things that are useful more generally or at least find ways of building specific tools that are useful to people.
[37:16] I think we need to do that. If you are not somehow recording the things that you're doing, the design decisions you're making, whatever you're thinking about, I would encourage you to do that and to try to save that. It doesn't have to be expensive. It doesn't have to be super fancy.
[37:36] It would be simple but even if you're doing it on sucky paper with sucky ink, at least there's a chance that at some point you can scan it. Even sucky paper today tends to last quite a while.
[37:49] It's much better to record it than not and if you don't do that, then the future is going to be, I think, horrible. There's this very interesting book called "Playing at the World," by Jon Peterson. He's a researcher. He was interested in the history of Dungeons and Dragons.
[38:06] He's written this 722-page book on the history of Dungeons and Dragons and he has a blog. He finds all of these documents. He's a collector. He's found all of these original design documents with the original campaigns for Dungeons and Dragons and the original character sheets and things like that.
[38:24] You can track over time how the game changed and how different people had different ideas about how the game should work. In fact, I didn't really understand advanced Dungeons and Dragons rules from 1977 to '80 or whatever that I grew up with until I saw those sheets. It was like, "Oh, that's the sort of campaign that we run."
[38:43] This is an example with Dungeons and Dragons but you can also see this with programming language design. There's this series of conferences, History of Programming Languages, HOPL. Three HOPLS. HOPL four is coming up.
[38:54] There's hopefully going to be many more HOPL. If you haven't read HOPL, any of the proceedings, I recommend HOPL one and two. They're just completely fascinating to me. I love them. As a programming language designer, I want to start capturing the intent and the ideas and design decisions and context.
[39:15] For [inaudible] for example, we didn't do that. Now it's like, how did we come up with that? I don't really remember, so we kind of have to make up stories.
[39:24] If you're designing things, I'm sure you're making some decisions, something you're building, I would encourage you to write that down and save it. Don't fill up notebooks and throw them away. Save it. Maybe scan it.
[39:38] Then there's a whole set of other practices, which are -- what do you do with this information? What do you do if you get hit by a bus? What do you make public?
[39:49] There are things in my notebooks where it's like, maybe I'm talking about the ideas with someone else and I don't want to scoop them unintentionally by putting it online.
[39:59] Maybe I've written something about a paper that sounds like it's a nasty tone, just to myself. There's also a whole set of practices of, what do you make available? When do you make it available? There's a long tradition of this in the humanities and libraries. That kind of thing.
[40:20] I think it's important as people who design things that we think very hard about what's going to be the future of the decisions we make. Will people be able to recover that information? What can we do know to try to help? Also, there are all these giants in the field of organizing knowledge.
[40:42] There are people like Engelbart and [inaudible] and Ted Nelson. All these folks that did fantastic work but I think there's too much of a thing where I'm like, now you go read a book about this, the good old days or something like that instead of, we need to learn the history book, we also need to be building modern versions of these things for ourselves.
[41:06] I do feel like, for the programming languages I enjoy, those were all languages designed by the designers for themselves, for their own purposes.
[41:18] I would like to see more systems being designed for the users themselves. Whatever system I design, it's not going to be like TiddlyWiki. It's going to be for my own needs but I think there's one way to try to explore the space much more and try to come up with new approaches. I think we desperately need it.
[41:38] Anyway, I guess my basic message is because we are living in this hybrid analog/digital world, we're in the worst of both worlds. We're not paying serious attention to the analog things we make or record and we haven't figured out how to do digital preservation, just putting it on the web or whatever, put it in the cloud, is just totally insufficient.
[42:08] Learn about the Internet Archive. Learn about archival [inaudible] . Look at libraries, look at these sorts of things and think deliberately about, how long do you think the decisions you're making and the artifacts you're making, how long will they last?
[42:24] What can we do to try to make sure that we can capture history so that 5,000 years from now, when people have forgotten English but they put the parts together, they could actually recover something about what it's like to be here in 2018. That's it.
Eric: [42:49] I'd like to ask for the questions to come up here. Will, if you have some time...
Will: [43:03] Yeah, sure. Also, I've got a couple more pictures. I made that.
Eric: [43:08] Totally last 5,000 years.
Will: [43:14] Yeah. Here's something else I made, by the way. This is a turtle I made in eighth grade. That turtle will last longer than English perhaps, right?
[43:28] That's the thing we have to keep in mind, that if you want to be serious about these sorts of time scales, you have to think about the very serious possibility that people will forget English and English will have to be rediscovered, or the fact that this little clay thing, which I put my name on the bottom, that inscription of my name will maybe last longer than civilization.
[43:55] We'll have some sort of terrible scenario.
Eric: [43:59] Cool. A lot of great questions here. I'm going to start with one of mine. We're producing a lot of data now. Basically, the more we produce, the more time it's going to take to read the data. If we all start producing all this data, when are we ever going to have time to read it?
[44:28] Why are we going to want it in 5,000 years? It's just going to be exponentially growing. Why would we want this data?
Will: [44:36] Why do we want it now?
Eric: [44:39] For short term, we want to have a conversation with someone, like, what was I thinking 2 years ago, 10 years ago? But in 5,000 years with millions and billions of people, are they going to worry about what you were thinking 5,000 years ago?
Will: [45:01] It's an obvious problem, right? The amount of data we're generating is huge. Obviously, we can't just do everything on clay tablets right now. Big tablets. [inaudible] .
Will: [45:17] That's a problem. I think one thing is we can be somewhat selective about the things that we consider extremely high value. Like I said, I think you could boil down the last 15 years of my work or 15 years of my works to like 5,000 lines of code, or maybe 10,000 lines of code.
[45:36] There's certain core ideas that I'm willing to hand curate. These ideas that I think are particularly important. Things like conversations between people, I don't actually think that information...
Will: [45:45] that well. I can only send a finite number of emails in a day. People used to write huge numbers of letters and stuff like that. I think at the individual level, me communicating something, sure I'm on IM, whatever but that's actually pretty small and if you look at the heroic efforts that people have done to try to uncover biographical information.
[46:09] Read Robert Caro's magnificent book "The Power Broker," about Robert Moses in New York City and look at the amount of effort he did to try to uncover what was going on at that time. It's true that most of the data we're collecting, people aren't going to be [inaudible] .
[46:27] It's also true that we're going to have to figure out, how do we store these things at all? Fortunately, this space, this capacity's still increasing but we probably at some point have to be a little judicious but at the same time, I feel like I personally owe it to people in the future not to make that decision for them.
[46:49] I'ts like, "Oh, sorry, I [inaudible] these things for you. Sorry, 2018 was the year of, you don't get to learn anything about what it's like.
Audience Member: [47:01] Zero.
Will: [47:02] 02018, there you go. That's right. 00 2018.
Audience Member: [47:05] So you talk about boiling down your academic output. Do you think there's a conflict between the way that you do scientific discourse where you have to get it past peer review? You have to explain all this stuff but then in the end, there's only 3,000 lines that would need to be preserved. Is there some conflict in it?
Will: [47:33] Yeah, I think there's some. Of course, I'm being a little facetious when I said that everything's just 5,000 lines of code. Also, we have papers and have written books and things like that but on the other hand...
[47:47] If you really want to know about the work that we've been doing, what you really need to do is look at all the rejected papers that haven't appeared anywhere. They're on my laptop -- rejected, rejected, rejected, rejected -- and see how we change the idea, see how we try to improve them and maybe the papers got accepted not because the ideas are good.
[48:08] Maybe it's just because we present it in a different way. In a way people can more easily understand it or seems sexy or whatever. Actually, what I'm interested in also is recording all of the stuff that never saw the light of day because I got rejected or, for example, the book "The Reasoned Schemer."
[48:28] The first edition of that book, working with Dan Friedman and Oleg Kiselyov, Dan's motto is, "If you're not sure how to write a chapter, if there's two ways to write the chapter, you write it both ways and then you throw away at least one of the two, maybe both."
[48:44] That book had 10 chapters in it. We had at least 10 chapters that we threw away. They were finished chapters but they were never shown the light of day, or never seen the light of day.
[48:56] I think that's also part of it. You're trying to collect information so that people have more context, so they can see like, D&D, what were the alternatives? What the rules people turned on and rejected?
[49:07] I want to very seriously think about how to capture that and then the other part of it is, whenever you're trying to do any sort of curation, there's also the thing of, I want to make myself look good. I want to now show all the scummy things. I want to show the great stuff that makes me look brilliant.
[49:28] I'm also thinking, how can I capture a bigger, more accurate piece where it's sort of like, here's all the stuff -- at some point in the future -- here's all the stuff, go through it, come up with your own conclusions. That's [inaudible] .
Eric: [49:44] This is an interesting question. This is from Dr. Sussman. It's more of a comment but I'll make it into a question. One of the things that is, I think, really interesting about Egypt as a culture is they seem to be very interested in preservation. Pyramids, you just make something that'll never be destroyed.
[50:13] You have mummification to try to make the bodies last forever and if they didn't care so much...I guess papyrus was pretty good but we don't have so much paper. It doesn't last. What about new technologies for encoding stuff in genome. This is from Dr. Sussman.
[50:37] Making something that will make its own copies, reproduce and be around hopefully a little longer.
Will: [50:47] A couple thoughts there. One is, yes, Egypt was interested in preserving things but we should also think about the Sumerians and the Babylonians...
Eric: [50:54] We still have Egyptian DNA, right? In the mummies. We could clone an Egyptian king.
Will: [51:03] Your words, not mine.
Eric: [51:07] They made it, is what I'm saying.
Will: [51:12] The Egyptians were interested in preservation. The Babylonians and Sumerians were also extremely interested in preservation and actually the first archivists and librarians were from Mesopotamia, as far as I can tell.
[51:26] The Babylonians and Sumerians were also extremely into preservation. Actually, the first archivists and librarians were from Mesopotamia, as far as I can tell. Akkadian has two dialects. Babylonian has Assyrian. The scribes in Akkadian were writing in a system that used both Akkadian words and Sumerian words.
[51:45] Sumerian's a totally different language than Akkadian. They had to study Sumerian, so you had these people creating dictionaries of Sumerian and things like that thousands of years ago. They were interested in trying to capture linguistic knowledge. Here's how Sumerian works, right?
[52:04] They had libraries and archives and things like that many thousands of years ago. This isn't just a recent thing.
[52:12] As far as new technologies with things like DNA, there's actually a big project Microsoft has sponsored where you have a whole bunch of DNA that they're sequencing in code data. The idea is the DNA, if it's been dried and kept in a nice environment, is actually extremely stable. It'll last a very long time.
[52:34] You can fit a whole lot of it into a small area and then you can do sequencing, for example, to read that information in a very different fashion. You can do error-correcting codes and things like that because there's some sort of chemical reaction with the [inaudible] or something.
[52:52] Microsoft has been very serious about this. I think they've [inaudible] $50 million [inaudible] Bill Gates trying to clone himself or something like that.
Will: [53:04] This was like a long-term data storage thing. Then there's the other idea of like trying to encode the information in a living creatures genome right? [inaudible] try to pass [inaudible] time.
[53:15] One of the things I find interesting is there's this idea of how do you tell people, you know, when we were building the [inaudible] repository for highly radioactive waste. How do you tell people in the future, after English isn't spoken anymore not to go near this area. How do you tell people 10,000 years from now, "Don't go into this area"? A whole bunch of architects came up with interesting ideas.
[53:40] One idea was to genetically engineer cats so cats would change color when near radioactive waste. Then you'd have this rural folklore about if the cat changes color don't go there.
Will: [53:54] That's one way maybe to encode knowledge. Some of you, these are interesting possibilities.
[54:04] Even if the media is going to last thousands of years, you still are going to have the issue of, what do you record, how do you find it, how do you organize that knowledge, privacy issues, when you do release it? How do you make it easier for people in the future to tell what may be of interest to them?
[54:29] Alan Kay, and in particular his students, has this paper, "The Cuneiform Tablets of 2015." Where they tell more of these data lost stories.
[54:40] They propose a way by using virtual machines and various storage media so that people 5,000 years from now could bootstrap [inaudible] or something like that. Go through all the processes. If you give me enough information, here's a virtual machine. Once you have that you can run the software on it. Hopefully your computers are faster than today.
[55:04] They [inaudible] this story about this. In the great story of this whole [inaudible] paper which is very interesting read is the story about the "Domesday Book." You know about the Domesday Book in England? After the Battle of Hastings, the French conquered England, created this book of all the holdings in England.
[55:23] This book can still be read today. This is from the 1080s or something like that, the book was created.
[55:31] The BBC decided apparently in the late '80s, they were going to create a modern Domesday Book using [inaudible] technology. The BBC micro with a special light optical disk thing.
[55:45] They solicited all sorts of entries from people around England to represent the state of England. What's it like to be English person in the 1980s. Of course, within a few years no one could read this disk. The machine broke.
[56:00] There was a big preservation effort. An academic team got together to try to read this information. They were going to put it on the Web. Then they started running out of funding. The project leader apparently died. As of now, the website's down.
[56:16] That was their example of we can read these clay tablets, we can read the Domesday Book, but we can't read the archival digital project that BBC launched just a few years ago.
Eric: [56:30] You mentioned moving out of your closet into an office.
Will: [56:35] Yeah, [inaudible] .
Eric: [56:36] Do you have backups? I mean, it's being one fire will...
Will: [56:44] A fire alarm went off in my new apartment. I ran outside, I grabbed my laptop bag, I forgot my archival documents. I go back in. Is this really a fire or what?
Will: [57:00] That's what got me to get serious. I better scan this stuff. I am going to do all this work and this fire starts or water damage. My last apartment, the water heater, like the bottom fell out and flooded my entire apartment and then it got mold [inaudible] .
[57:16] That's the closet we're behind. I moved it out of the closet because it's better airflow. You can see the WiFi station, you can see one of the temperature/humidity sensors, it's wireless.
[57:28] You can see one of the regular temperature/humidity sensors. There is my archival box. You can see that.
[57:36] That's one of the temperature sensors and humidity sensors. That's 48 percent relative humidity, 71 degrees, which is too hot.
[57:44] Then inside the box, I've got special paper. This is my storage area for the reams of paper in the future. I got a temperature/humidity sensor on that.
[57:59] A lot of this is actually trying to figure out practices. I can tell you about the materials, but the practices, how do you order this paper? It turns out all the paper that is 100-percent cotton isn't the same quality. I like the Strathmore paper, but maybe they will stop making it someday. That's one problem.
[58:16] Another problem is if you want to get this shipped, first of all the paper is not cheap. If you get it shipped, guess what, it's usually packed...Amazon or paper mill or wherever you order it, all the places I've seen they just put it in a cardboard box.
[58:32] What ends up happening is the cardboard box, the corners get smashed in during shipping. That stack of paper on top, those are all pieces of paper that are basically unusable because they've been folded so much in shipping. It's stuff like that.
[58:45] What I did is I had the bright idea in that I ordered three reams. The ream in the middle would be usable. That's not the best long-term option.
[58:55] A lot of it is figuring out stupid stuff like that. How do I get paper where the edges aren't all crushed in already? Maybe I figure out how to un-crush it or something.
[59:07] I watched an awesome YouTube video last night. Seven ways to hide a lavaliere microphone for people doing film industry stuff. Seven different ways to thread a mic through a shirt to hide it.
[59:22] This is an example of you can have the lavaliere microphone. You can have tape, but there's a whole set of practices that people build up over time.
[59:31] It's the same thing here. Most of my effort is trying to figure out what are the set of practices that are useful. How do I refill my pens in the best way? How do I make sure my paper doesn't [inaudible] ?
[59:45] I had a household emergency where it rained non-stop in Alabama. I left my archival notebook in my laptop bag where it was soaking wet for a week.
[59:57] By the time I pulled it out, it had this really weird smell. Oh-oh, I better scan that and now I got it in an isolation area. I put it in my freezer. I froze it to try to...
[60:07] Anyways, there is all sorts of stuff like that I am trying to figure out.
[60:12] I'll give you one more...
Eric: [60:14] Do you have backups that aren't digital? I'm thinking of...
Will: [60:20] That's why I got this printer. The reason I got the printer, here's my fade test, my window exposure test so you can tell how the inks would do.
[60:32] Here's another test, by the way. This is with the special Epson printer and a special ink. I printed out this page and this from my cotton pad. Then I soaked this paper in water to try to smear it to see how well. It actually stood up pretty well.
[60:49] Now that we get the paper, the printer, I actually can print that out and then I can scan it again and print it out. This is a scan of me printing out the digital document. I want to have this workload where now that I scanned all my documents of all my notebooks in, I am going to print them all out so I'll have another physical backup. Maybe I'll send that to the salt mine.
Eric: [61:12] Maybe you just have another printer in the salt mine.
Will: [61:18] You're right. I don't have any trust in this sort of my digital [inaudible] .
Eric: [61:22] I'm thinking, literacy during the Dark Ages, it was scribes, just copying, and copying all the time, who preserved it.
Will: [61:33] That's right. If you haven't read "A Canticle for Leibowitz," that's inspirational reading material. That, and [inaudible] . You'll see them somewhere else...
Eric: [61:44] Before we break for lunch, what are the big hopes that you have for [inaudible] preservation?
Will: [61:50] This is me [inaudible] . I've got like a thousand photos in it, videos.
[61:57] What was that?
[61:59] [off-mic comment]
Will: [62:01] Oh, yeah, for the photos? Yeah. I can show you a picture of you that's [inaudible] .
[62:03] Anyway, sorry, what was...?
Eric: [62:06] You started talking about digital preservation at the end of your talk, but you said you still have to get into that. What are your hopes for that?
Will: [62:18] For many, many, many, many years, as long as I can remember, I've been frustrated by trying to organize my information -- index cards, a zillion other ways, and I've never been very successful. I think, ultimately, I'll never be super happy with what I come up with, but I can come up with something better than the current ways of organizing things.
[62:37] I've been extremely interested in new media studies, and extremely interested in digital preservation and reading about knowledge organization, and [inaudible] work, and Engelbart and all this stuff, Nelson, and all these people.
[62:53] But now I'm at the point where I've just decided I'm going to have to build my own system. One of the reasons I've gotten the confidence is that now...
[63:03] I'm working on this project for the National Institutes of Health, with Greg Rosenblatt and Matt Might and the Hugh Kaul Precision Medicine Institute at the University of Alabama, Birmingham. We've been building our own biomedical reasoning system, from scratch.
[63:22] We originally were just going to take some off-the-shelf software, cobble some stuff together, and we're actually building it entirely from scratch. We're using miniKanren logic programming, and things like that.
[63:32] The system is actually only a few thousand lines of code. Once again, we've been able to try to keep it very, very small, but try to do sophisticated things through [inaudible] language [inaudible] and stuff like that, and building interfaces for that.
[63:47] I'm very much in this mindset of both trying to build reasoning tools and knowledge organization tools, but also thinking about interface, thinking about how people use this, watching people use these tools, and coming to the conclusion that no one can build the tool I need better than I can.
[64:09] That's why I learned how to become a part of it. That's what I'm going to do. It may not be useful to anyone else, but it's going to be useful to me eventually. Maybe not tomorrow, but it will be.
Eric: [64:22] Thank you very much, Will.
Will: [64:23] Thank you.