7 Falsehoods Programmers Believe about Place & Time - Emily Ashley
What if I told you everything you know about location and time was wrong? Geospatial software has come a long way, and everything works great . . . that is, until humans join the scene. Using off-the-shelf geospatial building blocks, MapStory.org has engaged a global user community and challenged them to collaboratively organize knowledge about the world both spatially and temporally . . . and found that users' mental models of maps over time do not match our software data model. This session will address assumptions you might be making about the nature of spatiotemporal data. There's a significant disconnect between the data model and user knowledge — is software rich enough to support the whole story? Can we reconcile the two?
Slides
(Click to advance)
Video
Please forgive the poor audio quality. We had technical trouble with the audio recording and used the backup audio. Please use the captions (CC) in the video player.
Transcript
Eric Normand: [00:00] Our next speaker loves maps. Maps are a really cool example of a representation of human knowledge, something practical, you find your way with them, [inaudible] whatever you do with maps, it's up to you.
[00:24] They're just quintessential to even use the word map to talk about different data models. I thought that some things happened when they translate it into digital, data into digital. We've forgotten that there's a human side to it, as long as the data fits in the database, we're fine.
[00:53] I really appreciate what Emily is doing — she's our next speaker — because she's really a champion for taking the complexity of human-centered problems and trying to find a good solution for it. Our next speaker is Emily Ashley. She works on open source, she's from the Gulf Coast, and she's a champion for the human side of computing. Please welcome Emily to the stage.
[01:27] [applause]
Emily Ashley: [01:32] Can everybody hear…Whoops. Hi, I'm Emily Ashley. I'm a software engineer and UX architect at Atlas Go. My team builds open source geospatial software. The title of this talk is a misnomer. It's, [inaudible] about time and place. I got really, really excited about time, because I always work with place, so [laughs] the rest of this talk is going to focus on time.
[01:59] I work at an open source, open data historical geography platform called MapStory. Everything works great until humans join the scene. That is the story with technology, everything works great until we try to use it. Part of me still finds it hard to believe that I landed such a dream project. These are the social media things for my company and project [inaudible] .
[02:25] With MapStory, we've engaged a global user community, challenged them to collaborate. Then we organize knowledge they know about the world, both spatially and temporally. As you know, organizing knowledge is hard, whether it's text knowledge, space knowledge, or temporal knowledge. [laughs] Trying to mix and match those is a very, very fun problem.
[02:45] We found that most users' mental models of maps over time — there's an informal understanding — doesn't always match a formal data model we can provide them with. I'm going to provide you with a quick background of me. Trust me. It's relevant context. It's a lot of buzz words, [laughs] but I'm sure you can keep up.
[03:03] My academic background is urban morphology which is the study of the form of human settlements and the process of their formation and transformation, a lot of cadastral systems, historic preservations, how things change over time.
[03:16] My practical background, what I did with that degree, was I worked in guerrilla urbanism and tactical urbanism, and I've been in New Orleans volunteering post Katrina and I've been here ever since. I worked on a lot of neighborhood groups, a lot of corporate responsibility projects, people trying to get a bang for their buck for volunteering in quality of life.
[03:32] It don't mean more [inaudible] roles. I mean renovating school libraries, getting bus stop benches put in, helping people understand how to build a better life in the community that often forgets them. My GMs would be a lot of volunteer efforts in the intersection of online and real life communities.
[03:50] It's funny, as I've gotten older I've realized there is no BRE anymore, so the IRL communities are the online communities. They're the same thing now. It's very weird. Shout out from this background into what I do now, shout out to operations [inaudible] . I know you all are…
[04:06] [applause]
Emily: [04:06] I've been with operations [inaudible] from July 2015, so I've been [inaudible] programmer for about two years now. All of this centers around what drives me, which is the core belief that access to readable, explorable data, local data is a profoundly unmet need in our communities.
[04:27] I believe the Walgreen's has more information about how my neighborhood is changing than my neighborhood association does. It's messed up. Hence this dream project. It's a citizen cartography project. It's open source, open data with a map of our communities.
[04:42] [inaudible] is the atlas of change that everyone can edit, no big deal. You can upload data sets, what we call story layers, or layers. You can create new ones. If you don't have any you can start from scratch, and you can edit and collaborate by curating data and making it as complete as possible.
[05:02] We have a geospatial version control called GeoGit, GeoGig, sorry the T is not allowed. The G is, GeoGig [laughs] . It is about keeping track of how data changes over time, so not only do we have data that represents change over time, we're keeping track of how that data changes over time.
[05:20] A lot of time here, but the cherry on top of this is the lagniappe of the platform which is the storytelling and narrative elements composing MapStories by compiling the data layers with texting images for human contacts so people would know what they're seeing. No big deal, we're just trying to map all of the world through an online platform of all change over time.
[05:46] No big deal, dream big. As with most big dreams, it's been broken and [inaudible] again a few times. Some of our favorite most dear things are broken things, aren't they? It's a broken dream project. We'll get there with a great team and love it.
[05:59] About those [inaudible] facets, just to get a feel for the room, how many of you have tried to collect data input from users? All right. How many of you have worked with crowdsourced bigger data? All right. How many of you've worked with spatial data? Temporal data? Spatio-temporal data?
[06:23] [laughter]
Emily: [06:23] How many of you've worked on an application that provides cartographing your styling tools? Are there any cartographers in the room? I see you. How many of you've worked for on natural visualizing or animated change over time? You're my new best friends, all of you, let's talk. Cool.
[06:39] We're going to focus on time for this one because I do recommend geospatial field, we take many geospatial knowledge for granted.
[06:48] What I will tell you is what we're not talking about, so there are no expectations here. We're not talking about time zones or daylight savings time. We're not talking about leap seconds, leap days, leap years. We're not talking about anything smaller than a second or non-Gregorian calendars. We're not talking about relativity.
[07:05] [laughter]
Emily: [07:05] Just to scope down our problems that we've had. This is what we're working with. We've got 45 minutes, time's hard enough without those things.
[07:14] What we are trying to talk about is problems we're trying to solve, problems I had that surprisingly, things that have caught me off guard, questions I don't know how to answer on how to learn get volunteer information from technical and nontechnical people. We've got to work together to do this.
[07:32] Arguably there's four steps to this kind of platform and spatio-temporal data. There's collection, there's storage, there's request, and visualization. Some people that I work with would argue that requests and visualization are the same step, that once you get it from the server, that's the same thing. Those are two very distinct steps, and I want to point that out.
[07:55] What we're talking about today are the to two steps. That's the collection and the visualization, the parts that interact with humans. The storage and the requests, we're not covering today, scoping down this problem set even further. We're starting with a falsehood, easy pickings.
[08:10] [laughter]
Emily: [08:12] Falsehood number zero, nothing worth computing happened before January 1, 1970. It seems like a silly thing that I'm teasing, right? But decisions people have made would make you believe this is true.
[08:26] [inaudible] , I try to see as an attempt to take a reasonable-sized bite of a problem. You've got to start somewhere. However, there are side effects based on the placement and size of these bites, these assumptions that people made in the beginning that we're still living with today.
[08:46] I'm kind of spaz, and this is going to be a spazzy combination. Deal with that. I don't have time to be somebody I'm not, and I'm a spaz. A month ago, a teammate asked me to break down the conceptual data model of what we were dealing. What were the pieces we were trying to place together?
[09:06] I wanted to make sure that we understood the problem we were working with, and that we were using the same language. It was funny. This is what they said. They said, what we're dealing with is layers, which is data a plane, an optional time component, they separated that into instant or range, and they said either raster or vector geometries.
[09:30] Stories, which are the ordered series of the first chapter to those human elements I talked about, which are the media we add to it, and chapters and annotations, right? They're like, “Oh, right.” Notice that time is just an optional time component. It's just [inaudible] on there. Data on the plane, and a time component. This is a huge oversimplification of what we're trying to build, what we're trying to collect.
[09:55] On this one, I'm going to go ahead to assumption number one, which is the tools that you work with should take care of that for you. We've already figured out time. We're trying to get users to map the things they know, to share and collaborate, to contribute their data. Every cartography problem is a data problem. Clean data is a myth. If you find it, [laughs] you're magical.
[10:15] We've got some standards and utilities that help us take care of that for you. We use PostgreSQL, PostGIS and so it even seeks a one-time standard. There's time support and then we use GeoServer. All this is open source. Explaining to someone, I found out the suite is not an open standard technically. The rest of these are open source tools.
[10:40] Then we use OSGO importer to support our Geo-Mode, to import that data. We have tools, and they can handle time, but I'm finding we have to be intentional on how we use them. We have to be intentional in what we're giving them and what we're expecting back.
[10:53] Quisling number two, to add time to geographic data just a time field or column. They're like, “You already have the map figured out. We just need to add a field where people put a date and maybe a timeline on the bottom.” There you go, you've got space geo [inaudible] . Mm-mm, no.
[11:14] [laughter]
Emily: [11:14] The importer will parse it, PostGIS can save and invest it, and GeoServer will send us bits that match the query. They're handling that, but they're not doing it right. Here's a scene, this is a scene.
[11:28] I'm not sure this is an art in Texas for some reason There's two bits of user knowledge here. Start with users. I'm going to fix that. User A noticed the post office opened in 1932. User B knows the [inaudible] in Texas was founded December 12, 1932.
[11:48] It goes into our date parser and when it comes out we have January 1st, midnight, 1932 and December 12th, midnight, 1932. Let me show you, nope, why this can be a problem. This is an image. This is our website live demo time, yay.
[12:11] [laughter]
Emily: [12:11] Whenever we plot these things on a map, you can see that the time slider, it's moving forward, you can see the date time, and you can see the different pieces. We also have a timeline where it plots the pieces on a timeline, you can see that visualization. Then someone some years ago provided us some playground settings, and I'm not sure what they do. [laughs] We don't want them.
[12:38] [laughter]
Emily: [12:38] The problem is [inaudible] and the timeline. We have these two dates we've been given together. Someone told us 1932, that it had been January 1st, midnight, 1932, and someone told is December 12th, 1932. When I plot these on a timeline it then appears as if the post office was founded a whole 11 months before the town.
[12:59] That's now what our users told us. That's not what happened at all. When you put it in and it needs to be stored, and you need to give it back, this is what happens to that data you are given. We've now just fabricated some precision, we've compromised our primary source data, our user data, which is firsthand knowledge of what happened, and we've made our map and timeline lie.
[13:24] How did we make it lie? What happened there? There's two main ways to represent time. This is all of the roller coasters in the US, by the way, someone contributed this for our platform. You can represent time as a line, which is a very traditional way to do things. Plot it out a line, flat.
[13:46] You can also represent time as time and that's where the playback comes in. If you see this moves along, the time slider is going as the time goes. That's why [inaudible] . Representing time linearly, when we're taking this data, it's trying to find put where should we draw the post office opening on the map?
[14:10] Where should it be on the map? How long should it be on the map? How does this time slider move forward to show that post office was there? When you're doing it on a timeline, where do we put it on the timeline? These are questions you have to answer when visualizing data, whether you're doing it as time is time which is animation, series of snapshots.
[14:30] We're not animating. We're not doing any slides in between them, series of animation, a series of snapshots or on a line. I'm telling you this to illustrate the point that collection, storage and visualization all have different needs. I know our database needs a tool like this, the story. When I visualize it, I want to know where to put it and how to interpret it.
[14:58] We need to be intentional about how we request and visualize the data we've been storing, and what we do right before we store it. Here's another sign, [laughs] another 2.5. There's an assumption that timelines move from left to right. How many of you read from left to right? How many of you might read some languages that are right to left?
[15:19] Languages where words move from right to left, timelines also can be visualized from right to left. There's also cultures that visualize time, the future as forward and the past as behind you. There's cultures that warrant the future as east and the past as west, so depending on which way my time is facing, [laughs] and whether you're in the timeline.
[15:42] Then there's also the idea of the cyclical time. It's not always a line. It could be a line that has become a circle. Just assuming that moves from left to right is an assumption that a lot of programmers, I feel, make. I wanted to let you know outside. Timelines are not universally visualized, but I will talk about something that is universal such as a birthday.
[15:59] How many of you know your birthday? Does anybody not know their birthday? All right, that's good. We're good, OK. How many know the general time of day you were born, morning, afternoon, night? How many of you know the hour?
[16:16] If I were going to try to collect this data from you, how would I build a timeline from that given that some of you know the hour and some of you know the day? First in the mercury is time is precise or once collected, your time-enabled data will have a uniform-level precision. Regardless of how it needs to be stored, we've already established the collection, storage and visualization have different needs.
[16:39] If I wanted to put this out and communicate the normal, goal precision, what would that look like? Can our model and storage tools allow me to retrieve your births and place them in a [laughs] reasonably accurate order given this mixed precision or are we going to match that? What if we turn the time sliders forward? What is the playback rate for that?
[17:01] Are some points on the map longer because they don't know if you were at two o'clock or towards the afternoon, I keep it there a little after noon? If I'm putting it on the line, do some parts of the timeline take up more space to visualize that granularity? How do I communicate visually that the nature of the data and its uncertainty?
[17:20] We're going to talk about my favorite words to co-opt. I used the world chronon, but I probably wouldn't use it how it's supposed to be used, but I'll use it to communicate a very certain thing.
[17:32] Chronon is the length of time it takes to for quantum energy to push one electron from an electrical orbit to the next. On our team, we use chronon to described the smallest level if granularity on your layer or on your compilation of layers.
[17:46] Whatever it is that is the smallest piece between all of those, that is what your chronon's going to be. This illustration could be, the top one could be years, the middle one could be seasons, and the bottom one could be weeks.
[17:57] Then you're putting three layers together, and one of them has years, and one of them has weeks. Playback might be in weeks or you might try to temporarily bend differently, push things together into your playback in years and have several points of weeks shown at the same time.
[18:20] Back to your birthday, how many of you know your birthday, and the hour? How many know the minute? How many of you are you astrology fans that have to figure out the minute to figure out their signs? [laughs] This is silly. This is a very serious question. How many births do you think happen in less than one minute?
[18:37] [crosstalk]
Emily: [18:41] Falsehood number four, events in time are instantaneous or rather time is always a point-based domain. Beginnings and endings are often processes, not instantaneous events. Rome wasn't built in a day, but I bet you we can find a data set that [laughs] says it was.
[19:00] We all know, inherently, that in order to model data it requires compromising reality in some way to store the information. At a particular data system or data architect, they can make arbitrary rules as to how they're going to compromise their data.
[19:19] Reality never fits into a database, you have to compromise it. They make this decision based on what they want to use the data for, because they thought it could be useful or it corresponds to the information they're interested in getting back later, probably want to digest this information or what kind of information they're willing to maintain in their system.
[19:37] The weird and fun thing about MapStory is we have a lot of systems, a lot of layers. Each of our user layers is its own system where people might try to communicate what decisions they're all trying to make about their scheme, what defines a start and what defines an end.
[19:55] When there's doctor here to declare time of death, how does a community decide what point of the life cycle was dying? How do you think death is an instantaneous process? Good, we found it.
[20:14] We're going to try mix and matching some concepts here. Let's sit down. [inaudible] . We're going to mix and match the concepts of granularity and the idea of interval-based events so that they're not instantaneous. At any time there are a ton of different kinds of domains. How many of you have a background in any kind of statistics or in that domain space?
[20:35] Point-based domain is the idea that something takes up no space. It's just a quant. In spatial data, we call these points. Alternate would be an interval-based domain, which in spatial data we would call a line or a polygon, but it takes up some sort of area more than an infinitesimal wee small point. This is time as a point-based domain where this piece of data takes up zero space and it is midnight.
[21:00] It is on the second midnight, it has a zero space at the time of midnight. Interval-based domain, this is a day. It starts at midnight and ends at 23:59:59. That's an illustration of how to visualize possibly or communicate about point-based and interval-based domains of time.
[21:21] An assumption we have to make in order to [laughs] compute with data is that entities have a finite existence, a beginning and an end, that there will be a start and end point. Here is an illustration of an interval in a point-based domain. This is saying that something happened from January 1st to January 4th, and January 1st to January 4th took up zero space but know that this event was that span.
[21:49] This is an interval on a point-based domain. This is the way that I might try to illustrate an instance in interval-based domain. This is saying that let's plot the time, it's January 1st, and something takes up no space on that day that we know as an interval.
[22:06] This is the way that we can deal with instant in a point-based domain. Both the domain and the point take up zero space, not a lot, infinitesimally small. Are ya'll following? Everyone good? Is this interesting? You like this?
[22:24] [crosstalk]
[22:24] [laughter]
Emily: [22:24] Affirmations. Thank you. I should have out more kitty cats on this.
[22:30] [laughter]
Emily: [22:30] Here's a thing for you, say if I'm a user that wanted to say an event took place from 1990 to 1932. Could someone tell me here what the precision or the granularity, the chronology from 1990 to 1932?
Group: [22:46] Years.
Emily: [22:48] The years. With this granularity, we also know it's an interval-based, it takes up time. There's a start year, there's an end year. On a start-start date and an end-start date the precision is the year and the end year.
[23:05] If we're trying to map that onto something with a different granularity, we would then break it up into a start-start date and start-end date. It might be January 1st, 1919, to December 31st, 1919. It could take up a whole span to show that chronon of the year.
[23:25] This is my illustration. This is a silly illustration of trying to visualize how these queries might happen from when I'm trying to get into your server. Here I know that that's 1993 to my chronon is years. The timeline is a conservative estimate. If I want to say, OK, the end of the first year, beginning of the second year, I want the data between that.
[23:51] The second one is the general assessment of, OK, I want the beginning of the first year and the end of the fifth year. Then this is a displacement of what data might or might not come in depending on how I build those queries. I'm pleased, when I figured this out I felt like, ah, it can do what I want.
[24:12] Let's looks at our docs. This is our GeoServer docs, and it looks like they have reduced accuracy of time. They can handle these things. This can happen. [inaudible] Look at this, they have intervals. [laughs] They can do this. The tools can do this. No one had done this with the tools on our project before.
[24:31] I wanted to stop acting and say that we have the tools to do this. We have to be intentional of how we're using them and think about the problem set we're working on.
[24:43] Just a side note, a dangerous but probably necessarily notion or assumption we have to make is that entities have a finite existence. We can't change that, but it doesn't mean we shouldn't acknowledge that that's a compromise we're making, acknowledge that you're compromising reality and mapping as you move forward with these.
[25:02] Falsehood number five, timeline is on a continuous scale. Back to this statistics background, scales being ordinal, discrete, continuous. Continuous means, theoretically, you can measure infinitely small unit. Discreet means it can only take discrete numerical values. In our project where we're focusing on a discrete scale, second is the smallest value we can operate [inaudible] and anything like.
[25:35] When programmers are working with time, they're often work on a discrete scale, but when humans think about time, you can easily bounce between those scales based on your experience or rather than whether that specificity's needed. Part of the effort with MapStory and the inspiration for these topics, how do we model data in a way that maps the human knowledge?
[25:57] Humans join the scene and shit breaks. It just happens. We want to build a platform, it's collaborative, it has volunteer engagement, user retention, let people volunteer in the smallest way possible, contribute to things that they know whatever precision it is, if somebody can come in later and make it more precise.
[26:15] You need to find your niche, minimize the buy-in, all those user engagement crosswords you've heard before. When it comes to contributing space, you have to control data. I'm going to throw this out to ya'll. This is the problem I don't know how to solve, is how we enable users to contribute knowledge we have that might be on an ordinal scale.
[26:35] Often human knowledge is on an ordinal scale. The post office was built after the town was founded. Mark Twain was born before me. I don't know how to explain but that's the first example I could come up with. [laughs]
[26:47] If somebody has that knowledge, how do you capture it and how do you build up other folks' knowledge to compile all this experience on each other? If anyone has an answer to that, later, we should talk about ordinal space, if that's for you.
[27:03] Number six is not a falsehood but it's worth a number. When organizing information, one of the problems that data architect, data users, have to make is determining what makes an instance unique within each entity type.
[27:17] I have a background in preservation or planning rate. There's this idea between eastern and western philosophies. I'm not an expert on philosophies, but there is a temple that is rebuilt every 20 years with the same materials, from the same forest, and it's built in the same way. People run the simple belief it's been the same temple for 10,000 years. This temple's 10,000 years old.
[27:39] Someone from my class might walk in and be like, “No, no, no, that's 20 years old. You build it 20 years ago.” If you've been building it the same way over time is it the same instance? Is it the same temple?
[27:47] That's one of those Ship of Theseus kind of problems. Here in New Orleans, there's a bar in the neighborhood that seems to change ownership and names every two years. [inaudible] at 1135, Santos, Rubyfruit Jungle, it's all the same address. They even have some of the same weekly details.
[28:05] [laughter]
Emily: [28:06] How long has that guy been there? Can it be closed for a day, a month or a couple of years? What makes it continually open and [inaudible] ? What's going to happen is instantaneously there is need to discuss these kinds of decisions when collecting their data.
[28:23] One of the things I focus on is the idea of creating new data to its platform. It's one thing to import a data set and try to curate it and make it better, it's another thing to start with a blank scene of scheme and say what fields do we need? What are we trying to map here? One of the initiatives we have on MapStory is to map the history of beer. Not to play favorites but beer is probably one of my favorite drinks.
[28:49] In curating this, I find how much people have opinions about what makes something a brewery. Can it change owners? Can it change location? Can it change staff? Can it close for 20 years, change its logo but reopen with the same recipe? Is it still the same brewery? Everyone has these decisions, and we need to provide mechanisms for people to communicate about them and to disagree with them.
[29:09] We have virtual control. People can fork data sets. People can change that, but they need to determine what makes something unique within a data set, disagree or work together and become good stewards, and make a data of this. This is a falsehood that has to do with place, place names and naming more than anything, more than time.
[29:33] Number seven is the assumption we already know what we're doing. We've figured out time years ago. That's why they have standards. This ties into zero and what date impacts. This time I'm not kidding. It's something we struggle with as programmers, the ramifications and effects of how this was set up is something we're still solving.
[29:53] If you think about it, the less you monitor your data, often the less precise it will be. There is a built human environment, human history, carbon dating. My data has nanoseconds of servers recording data. It's very precise. You're concerned about problems such as time zones, you need seconds.
[30:14] When you're dealing with things 100 years ago, you might be concerned more about what were the time zones there at that time. When did the time zones change? You're looking at data from 1905 and it's a full data. Did it even have time zones? If you go back 1,000 years and 100,000 years, the granularity gets bigger and bigger, plus or minus millions of years is the margin of error.
[30:43] I've read a lot about the [inaudible] in modern data, but I'm building a historical geography platform so it's a different set of problems, and I think that it's problems that a lot more people will run into as we try to digitize and collect data about our history. I ended up going to cartography conferences and historian conferences to find the digital humanities buffs.
[31:05] Who is working on how to organize data this old? The main takeaway from the two years I've been on this project is that there's a significant disconnect between the data model and user knowledge. We're still figuring it out, and we should talk about it so that we don't solve the same problems over and over. [laughs]
[31:25] A lot of the resources from this talk came from the “Visualization of Time-oriented Data,” from the Human and Computer Interaction series. It's been a real help for me. There's a lot of diagrams and illustrations. It's a huge book. I recommend it, and also from the book, “Data and Reality — A Timeless Perspective on Perceiving and Managing Information in Our Imprecise World.”
[31:50] As a PS that I'm able to share, I'd given this talk at [inaudible] here locally a couple of months ago. In some instance, I come up with user solutions from these kinds of problems. We haven't implemented these yet, but I figured I could share the requirements I've come up with. If you're going to build a timeline and a time slider, what are the kinds of information you need?
[32:14] Can you all see this? Is that big enough? These are what I've given our team for assumptions. The assumptions we're making is that multiple layers can be displayed on the same map canvas, they'll be controlled by a single time slider, and the playback option selected will drive all current layers.
[32:31] We're only addressing records in a time span, so these are not requirements for a timeline, rather for a time slider. We're working on a timeline of interval-based time. Entities can have multiple attributes such as start-date, end-date, so you can make date ranges.
[32:47] We're not going to allow users to have a stark start-date and a stark end-date. Those are things that we will use to match the level of precision given when storing it, so we will have those fields internally to make sure they only give us a year. Layers and entities can have mixed levels of precision, and we're not addressing time zones.
[33:14] For the playback scale options, I've decided to include ordinal and discrete playback which means right now, you can see this, the playback is on the ordinal scale. There is no different amount of times between ticks. Every tick is either happening before this event or after this event, and it's tick, tick, tick, tick, tick. If we interact through the [inaudible] .
[33:47] I suggest we give the users the option, the viewer the option to play it at the ordinal or a discrete, so when you're playing it, do you want the time in real-life to have some sort of relation to the time of your data. Do you want it so then for three years, you'll have to wait three seconds, or do you just want to see this series of snapshots? It'll give the user these exclusive options, probably a toggle.
[34:12] The duration options, so the question about how long should something stay on the map, given the time and the data. “Instant,” which would be it just shows up for the instant that it's recorded for. “Cumulative,” meaning that as instants pass, it stays on the map, so you can see something build over time. Let me show you an example of that.
[34:33] These are just the different ways we can visualize a time-series, so this you'll see, as it goes, you'll see them accumulate, right, so once it's on the map, it should stay on the map. Then there is “Range,” which means you honor the start-date and end-date of the data, and the “Cumulative Range,” which is where you honor the start-date but you ignore the end-date, you just [inaudible] .
[35:01] The playback rate option should include speed and span. This idea of a playback span is how I've come up with to address the ideas of temporal binning. Let's go back in time, this is temporal binning.
[35:21] No, I want the current one. They look the same! This one, right. If I have data that's in weeks and I want to display it in years, I want to give my user the option to say, like, “OK, this data [inaudible] but every tick of the timeline should be a year.” Everything that happened that year may happen at once. The idea that the viewers of data should be able to regulate the temporal binning and playback of that data.
[35:52] When I say speed and span, span is the one that's the temporal binning. Those are some of the solutions I've come up with on how to model time and what options to provide users in visualizing and collecting data. If anybody has any questions, I'd love to [inaudible] .
[36:12] Where did [inaudible] ? There he is. Hi!
Eric: [36:14] Let's give Emily a big…
[36:17] [applause]
Eric: [36:17] keep going. [inaudible] volunteers.
Audience Member: [36:20] I have a question. Can you tell me a little bit more about the project that you're working on, what it's called, how people can get involved?
Emily: [36:41] Oh, yeah! First of all, it's broken.
[36:44] [laughter]
Emily: [36:45] But I love it. OK, it's a broken link. It's MapStory.org. Here's our [inaudible] , make this full screen, so that's MapStory but it's also GitHub. MapStory is all open source, so you can look at all the sub-repos we're using to build this. Most of it is JavaScript and Python. There is layers and stories you can check out.
[37:14] We have some initiatives we're working on, such as gerrymandering. Let me find it. This is some cool work that [inaudible] have been putting together. He lives in DC [inaudible] has been trying to map gerrymandering over time, which is an interesting time-topic because did you know that congressional districts come into effect at noon?
[37:37] They come into effect noon, DC time, so if you're in California, they come into effect at 9:00, so it just means the timing that's given, that time zone actually applies to a specific place and not to all of your place data, so that's a fun tidbit.
[37:55] He's putting together all these [inaudible] and how things have changed over time. You can see here, he's only given us years, so one of the problems with the platform is its fabricated precision. You'll see that it's just between 6:00 PM and 7:00 PM, some of that is due to time zones. I realized that recently, but some of that is due to our date parser time, having to do daylight-savings time.
[38:19] It's like one point is from the same place but from a different season, and he didn't give a time at all so this made it midnight and then provided some offsets. [inaudible] trying to rectify how we take that data and then visualize it.
[38:35] You can create an account. There's a journal where people ask questions, like how do I use KML files, as well as talk to many, the decisions they're making about the data, in terms of whether or not we're going to use a time zone, day or year, and that's it.
[38:56] You can upload and then can compose [inaudible] . Officially, we handle CSVs and SHP files. You can sneak in some GeoJSON and KMLs as well.
[39:04] I don't list them as officially supported because GeoJSON is not strongly typed for spatial data, so most data sets you can have either points or you can have lines or you can have polygons. GeoJSON, you can have all three, so we don't handle that correctly, but you can try and sneak it in. It'll work.
[39:24] Then we have a version history so you can join and edit other people's data. We haven't implemented a UI on how to approve or reject changes because implementing something like a GitHub for geospatial data is a very big problem and there's six people in our team. [laughs]
Eric: [39:42] Besides signing up and using it, if someone's interested, how would they get started copying this on GitHub?
Emily: [39:56] For the coding?
Eric: [39:57] Yeah.
Emily: [39:57] Yeah, GitHub. We have a local board as well, where you can have issues. This MapStory/master is our main issue tracker, and so right now, we've [inaudible] another layer-editing [inaudible] to make sure that if someone edits something, that those changes persist in the way that they're expected.
Eric: [40:12] And this is all open source, all systems?
Emily: [40:18] Yeah, everything. It's built on GMO, if you know how to use GMO or GeoServer, one of those geospatial tools.
Eric: [40:27] I have some questions that are more conceptual about the human side of data. People on this team, on other teams you've worked on, do you find that they're not thinking about the human side, that they [inaudible] , like the timestamp is all they need.
Emily: [40:57] [inaudible] ?
Eric: [40:58] Yeah. What are the kinds of things that they're ignoring? Because that's what I would do, I would add a time field [inaudible] , figure out later the UI. What am I ignoring by doing that?
Emily: [41:12] I think that with geospatial data, a lot of the kind of standards and ideas [inaudible] , is like the idea of a point, and a line and a polygon. When you're working with time, it's often the same thing. A timeline is a map of time. It's a map. A lot of the same kind of concepts you would think through, of how do I collect and visualize data.
[41:38] Like talk to a cartographer, talk to a data visualist. See the experts in your field and say, “Hey, like what kind of data should I be expecting? Who's working with this data?” I ran into someone that works with tree-related data and he was like, “Oh, do you allow me to put things by season?” because they can't tell what month it happened but they can tell you if those months were summer. Right?
Eric: [41:56] Oh, right.
Emily: [41:57] Their data can go back 10,000 years of petrified life, but they can't tell you a month. Can I put that into your system? How do I put that into your system?
Eric: [42:09] Is it a problem of we need more formulizations or maybe less? What would you say?
Emily: [42:18] In my experience, the [inaudible] have handled every problem I've been asked to solve, where I at least had the intention of solving it.
Eric: [42:24] OK.
Emily: [42:24] In the end, it comes to just connecting with the users and the people you expect to use it, which is why I went to AHA, American Historians Association, to talk to people that are into humanities.
[42:36] It's an interesting conference because if you ask them about their technical savvy, they'll be like, “Oh, yeah. I PowerPoint. I PowerPoint so good.” They don't always work with digital sets. Some of it is. [inaudible] those sets and see if we can put them into what we're building.
Eric: [42:54] Someone is asking to see the books that you recommended again.
Emily: [43:03] Data and Reality. This was written in the '70s and it's still relevant, so it's not so much about the technology use and more about things like [inaudible] . Then there's also the Visualization of Time-Oriented Data. I can look up the author of that. I'm sorry but I can't leave my phone more than like three feet from me. [inaudible].
Eric: [43:24] We also have a question from Tom. I don't quite understand the question but I'll read it and maybe you can understand and help us. How do we handle interpolating data? How does this translate into visual [inaudible] ? Do you [inaudible]? Do you interpolate this data at all?
Emily: [43:58] What do you mean, Tom?
Audience Member: [44:01] Can you hear me?
Emily: [44:03] Yeah.
Audience Member: [44:03] Whenever you visualize this stuff, I've dealt with some of this as well, and you're going into the domain where you're trying to animate stuff over time, showing things in real time, a lot of these, you have these discrete jumps, right.
[44:14] You have these points where you've got your interval data or whatever, and there's no definition of how that transforms and changes, but in reality, when the user wants to see something, it's like, well, there's actually, we should see like a fading-in or some sort of movement of the past things moving across geographically.
[44:30] That data, that interval turns in an actual complex visualization, that's implicit. Again, it's implicit, right.
Emily: [44:37] Yeah, yeah. How to create maps, right, yeah.
Audience Member: [44:38] I was curious what you guys were doing with that, because that's a whole another issue along with sampling, right. Your different basis for your…was it chronons? Figuring out a sampling basis, but you also have this issue of well, how do you interpolate between those samples consistently. Have you guys already thought about that?
Emily: [44:53] The data, all we have basically is a series of snapshots of how the data was collected. Interpolating between those snapshots is something I personally am not willing to do or incorporate on this project. I'm possibly fighting a single-woman fight against people on this project, who're on this project [inaudible] maps.
[45:12] I don't intend to go into the cartography of the animation but rather a series of snapshots, with those snapshots aligning to the measurements that were taken. I do want to confine things to the temporal domain so that people can kind of change what is being visualized by the amount of space they're looking at or the amount of time they're looking at as a second.
[45:34] But things like heat-maps, [laughs] heat-maps are an example I used to have to. We used heat-maps all the time [inaudible] and there isn't data of great vary in between two points. I feel the same way about the animation. I think that is a cartographer-specific question for custom maps but not something I can address easily in a platform.
Eric: [46:01] Another question is, you were talking about ordinal time data and you said that you don't have a solution for that.
Emily: [46:12] I don't know if I agree with that [inaudible] .
Eric: [46:16] Yeah, is formalizing that something that you have on your roadmap? Is that something that you're willing to attempt?
Emily: [46:22] My personal fun-and-happy roadmap, yes. Our project roadmap, probably not.
Eric: [46:27] Are you collecting the data, in any way? Because I mean it seems like you should be able to turn in some stuff, with the right binning. I don't know, like you could know the year because you know that something happened between this and that, and you know the dates of those two things. Just an idea. Are you thinking about it all?
Emily: [46:52] Just in the aspect of something [inaudible] and at conferences to see if people have also been thinking about this. I don't think it's something that I can incorporate to this platform in the next year.
[47:02] I will show you, I want to go to the next slide. Basically, what we collect from users right now, is they can import a file, whether it's a SHP file or a CSV and then they tell us which of their columns is the start-date and which column's the end-date, and that is all we gather from them.
[47:23] Let me show you how it works. [inaudible] .
[47:24] [pause]
Emily: [47:33] It says, upload it with features, with [inaudible] attributes, [inaudible] a name, that's the start-date and that's the end-date. This is what it was like when I came on to the project. We're still working on that. I want to do time correctly. There's not many cartography platforms that can handle time-series data. It's something that you have to hire a custom data visualization or cartographer to handle.
[48:02] CartoDB, our partner and friend [inaudible] , is another open source mapping company and they handle time-series point-data, so they have a library called Torque, T-O-R-Q-U-E, and it handles time-series point-data but they don't address lines and polygons. We're one of the only people right now working on addressing lines and polygons with time-series.
[48:24] I think questions like interpolation and animation are like hard ones, which is why they stuck to points I think. [laughs]
Eric: [48:34] Any other questions? Yeah, we'll just, because he didn't write it down.
Audience Member: [48:44] I'm sorry. I had a question about how far that MapStories can go, like do you have a timescale or something?
Emily: [48:54] We talked a lot about negative time. I haven't tested it. My personal metric is I would like to go back to the end of the Holocene era so to cover every [inaudible] data. It's a bite that I want to take, [inaudible] data, we're doing a good job.
[49:08] I've seen things [inaudible] pre-1917, pre-1918, so we have stuff from like the 1700s, 1600s, but I haven't seen any user-supplied data sets from before that so I'm not sure if our date parser or the data [inaudible] can handle that. Does that answer your question? Sure, if we can handle [inaudible] data, that'll be really swell. [laughs]
Eric: [49:30] What kind of binning would you look at then?
Emily: [49:35] Oh my gosh! I have a lot of geologist friends. We have arguments on that. I never run out of interesting questions for this platform. There's so many rooms in the house that I can go into and solve problems in. It's great. [inaudible] .
Eric: [49:48] Is that Brian?
Audience Member: [49:54] Yeah. What about the movement of land, like the tectonic plates?
[50:04] [laughter]
Emily: [50:05] This is a good question. This is a kind of US architecture question. How many of you know what map tiles are, how these maps work in tiles? I could give a five-minute presentation on that [laughs] if you want.
[50:22] We used to have open tile-sets. One way to advance that is to have a tile-set that has the ground data of when you're looking at. I showed one of my favorite [inaudible] map of Istanbul changing over time and the first thing they said was, “That street wasn't there in 1200,” like they already knew that the tile-set was incorrect.
[50:41] We're working on importing time-series raster data, so raster is the images, so like the satellite images or scanned photos of historic maps. One way we can address this is to have a scanned image of a historic map and allow you to put that as a couple of tiled [inaudible] .
[50:58] I can't give you a complete tile [inaudible] of the world at 1100 but if you have a scanned map, we can georectify it using the New York Public Library's tile-maker [inaudible] .
[51:13] This georectifies historic images and it's an open source project as well that we have a fork of and are trying to incorporate into our UI, so you can work those images and place them on your [inaudible] data set to have a more accurate or a more robust and descriptive, like if your data's a Civil War data set or something, to have additional information that might not be encoded in a vector layer.
Audience Member: [51:35] You don't have like Pangaea, like [inaudible] ?
[51:37] [laughter]
Audience Member: [51:38] You play it and the [inaudible] move away from each other kind of thing.
Audience Member: [51:44] [off-mic comment] ?
Audience Member: [51:45] Yes.
Emily: [51:46] That is the goal. The founder of the [inaudible] foundation, [inaudible] , that is what he wanted. He wants it for his daughter to have in school to display all of these things and they can't build it fast enough for him.
[51:58] [laughter]
Emily: [51:58] Any more questions?
Eric: [52:07] We've still got a few minutes. Yes.
Audience Member: [52:13] You mentioned that [inaudible] was somehow in a bit of trouble. At some point, you said it failed. What do you mean by that?
Emily: [52:21] The project has been around for about five years, I've been on it for two years. At one point we had 3,000 users. This is a migration of all those users. Right now, on a weekly basis level we have about six active users. We lost a lot of user data about three years ago in one of the main regions. Our servers were down for a while, simply to open a historical [inaudible] .
[52:45] Have you heard of OpenStreetMap? There is an open historical map that is solving some of these same problems. We had some server issues, lost a lot of users. We have six active people in a month. That's an [inaudible] problem to go from 3,000 to 6. It's a lot of people.
[53:06] It was nice to go to the North American Cartography Conference. Somebody [inaudible] from Toronto came. He was giving a presentation on what kind of historical geography tools that are out there for people JS use for historic data. It was nice to hear him saying that he had heard of MapStory, and he had used it before, but a lot of people at the community were wondering what happened to it.
[53:28] That's before my time, so I'm not sure what happened. All I know is that it's my dream project I've been wanting to work.
[53:34] [laughter]
Eric: [53:34] All right. Yes?
Audience Member: [53:40] Getting back to your birth date, you [inaudible] .
Emily: [53:52] How would you put that on a platform for users?
[53:54] [laughter]
Audience Member: [53:54] [off-mic speech]
Emily: [53:54] I know. That's a good question.
Audience Member: [53:59] I don't know. Maybe somehow just [inaudible] .
Emily: [54:04] It's stepping inside a little bit with the diagram of conservative [inaudible] estimates. That's one of the ways I thought that as a user-friendly [inaudible] …Where did my presentation go? There we are.
[54:18] It's not exactly my choice in [inaudible] , but it's a way for people to choose [inaudible] how generous they want to be in their query of what became from this [inaudible].
[54:27] [pause]
Eric: [54:31] That brings up a question of mine. What if someone said, “I think it was in December of 1950,” but it's a think. They're thinking plus or minus three months, and then some inquiries were everything that happened in 1951.
[54:55] Did that come back? Because you binned it to the year, but then the year is uncertain [inaudible] .
Emily: [55:02] Was there [inaudible] trying to get? I feel like this is a very big use case for Behavioral [inaudible] Development. [laughs] In trying to implement standards of what we're expecting and what we want to provide for, because we can't provide for everything. If we can choose a few simple things and do them well in the expected and reliable way, that's a good thing.
[55:24] I think that, following that line of question I have, when you're looking at historic documents, sometimes you have the date and the year, but not the month, it might be illegible or unreadable. It's not specificity, it's literally you know the date, which is specific, but you don't know of which month and you know the year.
Eric: [55:43] Right. You might know that one has a C in it, if you can read the C, or maybe [inaudible] 80 percent there's a C in it.
Emily: [55:49] A lot of these problems are addressed on an individual level by a cartographer, data visualist, or researcher who is putting together their data, curating it, and making a visualization. Designing and working with a platform to provide these tools is a really fun, [inaudible] of becoming a US architect. These are all decisions that have meaningful impact.
Eric: [56:10] There's a human designing these stories, so it's not like you're importing this thing and expecting it to be perfect [inaudible] .
Emily: [56:20] One way I thought to address that, as far as, “I think it's this, but I'm not sure,” Wikipedia is an example of crowdsource data and people creating the data, and there's an idea of community citation. One of the questions I've been asking around in our team is can we implement something like a link citation. If we do that, is it per layer? Is it per row? Per feature? Per attribute?
[56:44] At what level of granularity do we want to have data source citations and for people to flag it? Then possibly you could have, “OK, I'm going to use this layer, but I only want the things that are cited.” To say you can have control over the quality of data as far as citation or non-citation that you're getting back from the server.
[57:05] Does that answer your question?
Eric: [57:08] Well, thank you so much, Emily.
[57:10] [applause]