T makes detailed schedule data public. Can you do anything with it?

August 17, 2009 By David

This is pretty cool.

The Executive Office of Transportation (EOT) today announced an historic step toward openness and transparency by releasing to the public the detailed scheduling and geographic data the MBTA submitted for use in the Google Transit Trip Planner…. The data includes full schedules and geographic information for all MBTA bus, rail, and ferry routes, along with several Regional Transit Authorities. Similar efforts in other areas, such as Washington, D.C., San Francisco, and Portland, have allowed third-party developers to create useful applications and scheduling tools for riders at no cost to the city, agency, or state, vastly improving customer service….

The data will be located on the EOT Developers’ Page ( http://www.mass.gov/eot/develo… ), which is the one-stop place to open up to the public useful transportation data across the state. The page will now include nine sets of GTFS data from transit authorities, making Massachusetts a global leader in embracing this open standard for transit data.

Google Transit will already tell you how to get from point A to point B using the T. Also, if you’re a tech-savvy sort, you might glean some interesting info from the data the T has released. Here it is. If you find something, let us know!

Please share widely!

Comments

somervilletom says

August 17, 2009 at 10:49 pm

I haven’t looked inside the files yet, but this is totally awesome data. This makes my job (as a web developer) much easier. It also makes me really glad I didn’t jump through hoops to laboriously transform older datasets into Google formats (like KML).

I absolutely can do a ton of things with this data. This enables many new community-oriented websites.

This is fabulous news, it makes my day.
eaboclipper says

August 18, 2009 at 5:06 am

but since nothing on the T runs on time, totally useless, no?
- hrs-kevin says
 
 August 18, 2009 at 7:55 am
 
 The subway doesn’t seem to have an exact schedule for each train, but the Commuter rail and bus system do, and is usually within 5 minutes of their scheduled time in my experience. The Red Line does seem to be increasingly beset with mechanical failures and it seems like something goes wrong almost every day.
 
 What I would really like to see is not just their schedule data but complete logs of subway and station entry/exit data for some representative days. Then I could play around with different train scheduling heuristics to see how they work out. I have often thought that the way they handle trains when there are backups is suboptimal and would like to try some simulations to show it.
 - stomv says
 
 August 18, 2009 at 8:05 am
 
 most things on the T run on time at any given time. Rush hour buses tend to get delayed… but there’s two parts to that equation.
 
 1. Headway time. That’s the time in between vehicles; it’s effectively the maximum amount of time you’ll have to wait for a vehicle to arrive.
 
 2. Travel time. That’s the time it takes you, once on the vehicle, to get to where you’re going.
 
 If there’s lots of congestion, often times (2) increases but (1) doesn’t. So, you may still only wait a few minutes for the bus, but once on it will take you longer than usual to get there. For many riders, this isn’t nearly as big a problem — for one thing, you’re out of the rain/snow/cold/heat and you might even have a seat.
 
 
 
 In any case, knowing the headway, plus having an interactive iphone “I just got picked up by a 66 bus at Brookline Village” button from a user, the estimates of all the vehicles on that route could be updated in real time.
 
 This is how the NC State Wolfline uses the data. It’s pretty cool; more importantly, it’s useful for riders.
 - somervilletom says
 
 August 18, 2009 at 11:33 am
 
 I agree that the subway has no schedule. That is a huge problem.
 
 There is absolutely no valid reason why every origination departure should not be scheduled. There is similarly no valid reason why every departure on every line from Park Street should not be scheduled.
 
 How on earth can the public be expected to rely on a subway system with no schedule?
 
 How on earth can on-time performance be measured on a subway system with no schedule?
 - petr says
 
 August 18, 2009 at 12:32 pm
 
 How on earth can the public be expected to rely on a subway system with no schedule?
 
 How on earth can on-time performance be measured on a subway system with no schedule?
 
 I take the Orange, Red and Green lines on a regular basis and without a schedule. I don’t recall waiting any more than 8 minutes, and most often much less time, for any train on any of these lines.
 
 I sometimes take buses and, though the schedule be publicized, I find that traffic concerns often throws those schedules out of whack. Congress street is carrying too much car traffic. It only takes one ambulance or fire truck to hold up what is normally a 3 minute drive upwards of 20 minutes or more.
 
 Now the commuter rail publishes their schedules and, by and large, isn’t much a problem. In North Station there are three different places to see the schedule and the trains are announce via PA before boarding.
 - stomv says
 
 August 18, 2009 at 12:57 pm
 
 One way to have a schedule is to have arrival/departure times. Another way is to simply use headway times. That’s what the subway uses to my knowledge. The D Line comes within 7 minutes, for example**. If you arrive at the station and it takes more than 7 minutes, it’s behind late. Less than 7 minutes, it’s “on time”.
 
 This is not the same thing as stating that the D Line will arrive at the Fenway stop at
 7:00
 7:07
 7:14
 7:21
 7:28
 7:35
 7:42
 7:49
 etc. When measuring performance, the two systems can yield dramatically different results. Let’s say, for example, that the first train arrives at 7:00, but the second train stalls for 10 minutes. The MBTA, responding, holds up following trains for a while to keep things spread out. The actual arrival time at Fenway becomes
 7:00
 7:17
 7:23
 7:29
 7:35
 7:41
 7:49
 etc. Now, how many trains were late? Using the arrival/departure method, every single train starting with the second one was late. The first one was late by 10 minutes, the second by 9 minutes, etc. However, if you just use interarrival time (headway), only two were late: the 7:07 and the 7:14, because people arriving any time between 7:00:01 and 7:09:59 had to wait more than 7 minutes for a train. Which is a better metric of rider expectations and service? I think it’s the second… because after 7:10am, nobody waited more than 7 minutes for a train, just like normal.
 
 Note that we didn’t look at the travel time at all — just the arrival time. If you were already on the “7:14” train that got held up to 7:23 that means that you spent an extra 9 minutes inside the train. Your commute ran 9 minutes longer. Naturally, this gets complex pretty quickly.
 
 Furthermore, using headway times but no schedule allows for more flexibility. Special events (Fenway Park, Boston Garden, Esplanade, The Common, etc) may call for more trains on specific routes, or even for “specialty routes” which don’t begin at a typical station (e.g. extra inbound D Line trains from Fenway when Red Sox games get out). A schedule isn’t so useful if there’s caveats for over 100 days of the year.
 
 
 
 I agree that reducing the variance of interarrival times is really important for QoS (and a schedule, when kept perfectly, has zero variance). But ultimately, as long as interarrival times are kept within bounds, the mean, variance, and worst 10% of cases are all kept in very good shape.
 
 
 
 ** I think it’s every 7 minutes, M-F, 7am-7pm or somesuch. Still, the idea is solid regardless of actual MBTA scheduled interarrival time.
 - somervilletom says
 
 August 18, 2009 at 2:57 pm
 
 Perhaps we are measuring (and therefore optimizing) different things.
 
 When a train is late, more capacity is needed to handle the delayed passengers of the tardy train. A headway-only system doesn’t do that, and inaccurately (in my opinion) describes the subsequent trains (after the first late one) as “on time” — even though in practice the system has started to choke.
 
 Perhaps a better analytic approach is to keep track of the passenger “flux” — the total number of passengers moved through a given point in a given interval.
 
 First, I’m proposing that departure times from Park Street and the originating stations be scheduled (as a beginning). Most times, when the 7:07 at Fenway is delayed, it’s because the train didn’t leave Park Street (or got diverted to BC).
 
 Second, when I’m trying to get to a meeting, the headway doesn’t matter (except that it forces me to get to the platform early, wasting more travel time). If the trip from Fenway to my meeting takes 20m, and my meeting is at 7:45, then the difference between 7:07 and 7:17 matters. The fact that the 7:07 train is delayed should have minimal impact on the 7:14 and the 7:21. If I really need to arrive at my meeting on-time, I’ll arrive early enough so that I can miss the 7:07, catch the 7:14 or 7:21, and still arrive to my meeting on time. In the scenario you describe, there are no trains between 7:00 and 7:17. By then, I’m late — and the train that does arrive will be full, the platform will be full, and everyone will be unhappy.
 
 The algorithm that works better, I think, is to start each train at a specific time, and reward drivers for arriving (and departing) on time at each station on their route. The timetable has to distribute slack time among the stops, and the drivers need to either arrive a bit early and wait for the scheduled time or drive faster or slower enroute to stay on schedule.
 
 I’d also be very interested in seeing the actual, rather than claimed, headway times for the Green Line. I have a minor hobby of timing trains (especially D-line trains) when I’m waiting. The variance I see is huge, and the “7 minutes” is a fiction.
 
 Perhaps we should be measuring the aggregate on-time arrival times of passengers, rather than the trains that carry them.
 
 stomv says
 
 August 18, 2009 at 5:12 pm
 
 Buses do slow down considerably when they get a bit behind because of this problem of more people to pick up at each station providing positive feedback.
 
 Subways, when underground, don’t suffer this problem nearly as much, because they can open up all doors and because passengers have already paid fare to be in the station — they just have to get on, not get on and pay.
 
 Green Line above ground is somewhere between the two, depending on how cooperative the driver is about back doors.
 
 RE: “First” you can’t schedule trains while ignoring headway because it’s track capacity that is the constraint. You simply can’t fit any more green line runs on the track between Gov’t Center and Copley. If you’ve just released a D Line train from Gov’t Center (toward Kenmore) 4 minutes late you can’t release another one in 3 minutes to keep schedule because there isn’t capacity to also slot in the B, C, and E runs. By Federal Law there must be headway maintained between the trains… so trying to cram them onto a schedule is in fact impossible unless you simply ghost-train (remove the train that’s running 10 minutes late altogether).
 
 There are a few stations that allow green line trains to “pass” each other — Park and Kenmore, for example. These stations can sometimes allow two sets of cars to “switch places” on the line, which can allow a slight increase in the frequency of one train (and an equal decrease in the frequency of the other) in an effort to balance passenger demand while maintaining the headways.
 
 RE: “Second”. When you are trying to get to a meeting, you do two calculations: (a) how long will I wait for the train, and then (b) how long, once on the train, will the ride take? If you know the interarrival time is 7 minutes and the ride itself is 22 minutes, you know that if you arrive at the station 29 minutes before you need to be at the destination station, you’ll get there on time. In fact, on average you’ll be there 3.5 minutes early. What about delays? Moot, since a delay means you’ll be late with headway or arrive/depart scheduling. If you want to guard against delay, you simply aim to take the train prior — which means arriving at the station 22+7+7 minutes before you need to arrive instead of just 22+7.
 
 
 
 
 
 The algorithm that works better, I think, is to start each train at a specific time, and reward drivers for arriving (and departing) on time at each station on their route.
 
 This algorithm is effectively illegal for subway/light rail and ill-advised for bus. For rail, the drivers simply can’t make up time because of the headway requirements. That’s why the drivers will coast (or even stop) in between stations… the yellow/red lights require that they maintain sufficient distance between the rear of the train ahead and the front of their train. For Orange and Blue, you could likely do this, but what’s the advantage — when late, the driver will leave immediately in either case, and when early the schedule system suggests that they just put the train in park to not get ahead of schedule. Everyone on the train already is delayed an additional x seconds so that a few riders who would have otherwise waited up to 7 minutes will instead just catch this one. The green and red can’t do this, because there are multiple sub-lines sharing the same tracks: B,C,D,E for green, Ashmont,Braintree for red. One driver waiting because he’s ahead of schedule blocks the other line(s) from moving.
 
 With low interarrival times and required headways and multiple lines sharing the same track (with no passing possible), running an arrive/depart schedule is a recipe for massive disaster.
 
 For buses, arrive/depart scheduling can be done, and in fact is done. Rewarding drivers for keeping schedule is probably a bad idea though. Running late? Speed. Blow a yellow-orange-red light. Not notice the one person waiting at the stop and pass it to make up time. I don’t think bus drivers driving like taxi drivers is particularly good public policy.
 
 
 
 I’d also be very interested in seeing the actual, rather than claimed, headway times for the Green Line. I have a minor hobby of timing trains (especially D-line trains) when I’m waiting. The variance I see is huge, and the “7 minutes” is a fiction.
 
 Really? You sit at a station and measure the amount of time between trains arriving? That’s some (minor) hobby. If you’re only measuring the time until the train you are waiting for arrives, may I humbly suggest you always measure… methinks it’s much easier to remember the rare instance when the wait was 12 minutes than the most recent 3 or 4 times when you were only waiting a few seconds. I don’t mean to poke fun of your (minor) hobby — I’ve been known to sit at a corner and count the number of bicycles that have riden by in 15 minute intervals for 2 hours.
 
 
 
 Perhaps we should be measuring the aggregate on-time arrival times of passengers, rather than the trains that carry them.
 
 Absolutely right in theory. How do you actually do it in practice?
 
 somervilletom says
 
 August 18, 2009 at 6:35 pm
 
 On the Green line, can’t the late train run express until it catches up to its schedule, at least once it gets past Kenmore? The E trains branch from the B, C, and D trains after Copley, the B trains branch from the C and D trains just before Kenmore, and the C and D trains branch just after Kenmore. Most Green line delays are above ground, so underground arrival times are generally more regular (when the trains leave Park street on time). The Red line only has one branch, at Andrew, where the Ashmont and Braintree lines diverge.
 
 Don’t forget that the Green line has a loop that, like the loop at Andrews, allows the B, C and D lines to have more frequent above-ground service (looping at Kenmore) without clogging the underground Kenmore/Park Street segment — inbound trains can loop at Kenmore and proceed back outbound. There is an existing siding (between the tracks) on the B line, above ground, just inbound from Warren Towers — capable of holding multiple cars for just such use (it used to be used to reserve extra cars on game nights). There’s ample real estate for similar capacity at the Fenway stop on the D line.
 
 I would love to know how the subways of Vienna solve this problem — and they’ve clearly solved it.
 
 Regarding my hobby, I find myself waiting to meet friends at stations like Longwood and Brookline Village from time to time. More frequently than I like, I find myself timing inbound trains while waiting for outbound and vice-versa. I’m a street photographer, and so I’ve also spent more time than I like to admit counting and timing C-line arrivals at Coolidge Corner.
 
 
 
 How do you actually do it in practice?
 
 I think this is a queing-theory problem (a la Kleinrock). Treat each passenger as an element to be transfered, analogous to a byte awaiting transfer across a network. Treat each station as a passenger source, analogous to a ethernet connection that collects and delivers packets from and to a local machine. Treat each train as packet sent across the network.
 
 There’s a reasonably well-established body of theory and practice that describes how to measure “passenger flux”, analogous to effective data transfer rate.
 
 Since passengers are not bytes, the solutions that work for a network (random-interval retry on collision, etc.) probably won’t work for a transportation system. On the other hand, the math that describes the dynamics should still hold true I think. This is just up your alley, isn’t it? 🙂
 
 stomv says
 
 August 19, 2009 at 8:09 am
 
 On the Green line, can’t the late train run express until it catches up to its schedule [once above ground]
 
 Sure, and they do sometimes. It’s pretty miserable when you’re waiting for the train and you get passed (which makes the train “behind schedule” for you now). It’s even more miserable when the expressed train is going to pass your stop and so now you’ve got to get out and wait for the next one, which will now have an awful lot of people trying to get on to it.
 
 A better system for express trains? The system used in NYC — but they have a 4 track system, so they just label some trains local and others express from the outset, so folks rarely find themselves on the wrong one.
 
 
 
 Don’t forget that the Green line has a loop that, like the loop at Andrews, allows the B, C and D lines to have more frequent above-ground service
 
 Sure, but it’s not clear how helpful that really is, since most of the folks getting on above ground want to get off past Kenmore. The idea of taking the B Line (for example) to Kenmore, getting off, and then hopping on a different train inbound once in Kenmore seems a bit silly for a few reasons. (1) you’ve just substantially added to the length of their trip — there’s an additional debark, an additional wait, and an additional embark, all to run more trains above ground. (2) the train was full entering Kenmore. If two trains enter Kenmore full (a B and a D for example), then how are four cars worth of people expected to embark on a two car train? (3), what about the E Line? All trains arriving at Copley will be 75%-ish full since a full car arriving at Kenmore had debark/embark there and Hynes. How will two full cars arriving at Copley on the E fit onto the 75%-ish full train?
 
 The only way this system could possibly work — and that’s with a lot of help — is if they could run 3 car trains underground but not above ground. In fact, this system may arrive within the next year or two. One of the major reasons for the station work below ground on the Green Line was to extend all the platforms to be at least three cars in length, allowing the underground trains to increase their capacity by 50% (minus a very small percentage due to headway requirements).
 
 Still, the amount of time and hassle of an extra debark and embark process seems like it would kill any efficiency. The D Line stations above ground have also had work done to make them three cars long (I don’t know about all, but certainly the first handful have)… so I expect the D Line to start running three car trains eventually. B, C, and E are unlikely because it’s tough to make it through a stop light cycle time when three cars long, and some stations are so close to the traffic light that the third car would obstruct the road when stopped at the station. These problems can be worked around with priority signalization, but if we can’t even get the traffic lights to change cycle so a train of 200 people doesn’t have to wait for four autos making a left turn across the tracks, then the signal prioritization necessary for three car kits, while technically possible, is politically less likely as likely as putting a bike lane on every Boston street by removing a lane of parking wherever necessary.
 
 The benefits of arrival/departure scheduling aren’t so important when the interarrival times are small. If there was a train every minute, would you care if it always arrived at the :08 or would you just show up and expect to wait about 30 seconds +/- 30 seconds? How about if the interarrival time was 2 minutes? 3 minutes? Switching to an arrival/departure schedule on the Green Line may be technically possible, but the overall QoS would have to drop considerably for that to happen, and if the QoS gets worse, why do it?
 
 
 
 I would love to know how the subways of Vienna solve this problem — and they’ve clearly solved it.
 
 By never building the U5 ;). I’m not sure exactly what problem the U-Bahn has solved, except that they don’t try to cram so many subway kits so close together on one track. With the brief exception of one bit of the U2 & U4, they all roll their own track. If the Green B, C, D, and E all had their own tracks, the MBTA would be able to do all sorts of load balancing, scheduling, and emergency rerouting that they simply can’t do now with the limited track they’ve got. Trenching Boyleston St from Hynes on in to the Boyleston Stop (and then up to Park) to add a third (or 3rd and 4th!) parallel line to the Green Line underground would do wonders. Tunneling from Kenmore to Hynes would complete the job. Then you could run scheduled express trains (for example, the B Line would only ever stop at Kenmore, Copley, Park) and you could run the above ground lines more frequently because they wouldn’t have headway problems (if only 3 tracks this would be in the direction of rush hour only).
 
 
 
 I think this is a queing-theory problem…
 
 It’s actually a Queueing Theory problem, but really it’s network theory. In any case, my point is measurement, not modeling. You can model the MBTA like a data network all you like, but there’s a key difference: in a data network, I can capture the statistics for any and every packet. From when it arrived at the first switch to when it was delivered, complete with routing information, times at each buffer, etc. We can’t do that with people. We can’t measure when each person arrived at a station, which route they took (Copley to State: Green to Gov’t Center to Blue to State, or Green to Park to walking-in-system to Downtown Crossing to Orange to State. Brookline Village to the airport: D Line to Gov’t Center to Blue to Airport to bus-shuttle to Gate or D Line to Park to Red to South Station to Silver Line to Gate?), and we can’t measure arrival times. It’s really easy to measure what time each vehicle arrived at each station (and trivial to derive travel times), but it’s impossible to measure the “bits” (riders) very well.
 
 So yeah, the modeling is right up my alley — but you just can’t apply it to people very well. Interestingly, it’s easier to apply to buses than to trains because with trains the transfers don’t involve a second swipe of Charlie, and because when you go o a train station, your Charlie-ing into the station gives no information on direction, whereas bus stations you swipe as you get onto that actual bus.
- petr says
 
 August 18, 2009 at 9:54 am
 
 *[new] all well and good (5.00 / 3)
 
 but since nothing on the T runs on time, totally useless, no?
 
 As a long time commuter on the Fitchburg line, I can say that the commuter rail has been running on time. Further, the time I take from arrival at North Station to my office (across from South Station) has remained fairly consistent, whether I take the #4 bus from North Station, or I if I take the Orange line to the Red line and get off at So Station, or if I walk.
 
 Interestingly, walking is actually the most hassle as nobody, neither pedestrians nor cars, pays much attention to traffic lights and signs along Congress street and other streets along that route.
 
 But, even if the T was as abysmally off-schedule as you (incorrectly) alledge one could take this data and derive the schedule one ought to see, so there is usefulness there.
stomv says

August 20, 2009 at 5:33 pm

It seems that the New York MTA has threatened a blogger and asserted copyright over schedule which, as I’m sure David can explain, is asinine given Feist Publications, Inc. v. Rural Tel. Service Co., 499 U.S. 340 (1991).

What a bunch of fools. Let the geeks create software for your users to make better use of your service, thereby driving QoS and revenue. Like the T just did 🙂

blueeyes on Beware the latest griftSo where to, then??
Christopher on Some Parting ThoughtsI've enjoyed our discussions as well (but we have yet to…
Christopher on Beware the latest griftI can't imagine anyone of our ilk not already on Twitter…
blueeyes on Beware the latest griftI will miss this site. Where are people going? Twitter?…
chrismatth on A valedictoryI joined BMG late - 13 years ago next month and three da…
SomervilleTom on Geopolitics of FusionEVERY un-designed, un-built, and un-tested technology is…
Charley on the MTA on A valedictoryThat’s a great idea, and I’ll be there on Sunday. It’s a…

T makes detailed schedule data public. Can you do anything with it?

Search

Archives