Description of the CTA Railroad Network Bruce Peterson Oak Ridge National Laboratory 2000 May 8, rev 2003 January 6 The CTA Railroad Network is a geographically-based representation of the North American railroad system. It is designed foremost to support analytic transportation studies, such as traffic assign- ment, capacity, optimal investment, intermodal terminals, and cost estimation. But it will also trivially support mapping and GIS display functions. It may be treated as a standard link-node network for routing and traffic assignment following procedures similar to those detailed below that ORNL has actually used. However, several aspects of its data model could potentially support operational simulations as well. The network is highly heterogeneous in almost every element and measure. As with all networks, it is an evolving and expanding entity. In the following, I will often describe an ideal or design form, and then caution that the ideal is often "underachieved" and offer advice for living with its limitations. It is important to remember that the fundamental objective in its construction was accurate intercity routing, and effort was spent on the network roughly in order of the benefit/cost relative to that objective. COMPONENT MODULES The entire rail data system, known as network system "Q", consists of: 1. A set of raw network files for each geographical module (the US states, DC, Canada, and Mexico, called for brevity "states"). Each state contains (a) a file of nodes with locations and attributes (extension .NDR), (b) a file of link attributes (file name extension .LLR), and (c) a file of link locations (extension .LCR). The states are tied into a unified network by a set of boundary nodes at state borders contained in a separate national file. That is, links in two different states may share the same boundary node as link endpoints. In this way, states are relatively independent of each other and may be independently edited off-line. In fact, multiple networks of each state that are plug-in replacements may be maintained for different purposes. Since the state files contain no duplicate elements or IDs, states may be concatenated to form unified national files. The raw networks that ORNL has edited have file names beginning "QN" (Q-format, specified below, and the N data model), and derivative networks with filtered or modified data content will also have a leading "Q" when using the Q-format, such as "QC" for the current operational network. Derivative networks will typically have unified national files rather than state files. 2. An interline file (QNxx.ILN) contains a list of the locations where traffic may be transferred between railroad companies, plus any attributes affecting that transfer. In its raw form, the file is related to network modules by geolocation only. It does not reference network elements by ID. A model is required to establish this reference. This was done so that an independent interline file could be used with any network at minimal maintenance cost. The attachment model will typically vary with the nature of the network. Derivative interline files mated to a specific network may contain node numbers, effectively making the file a list of logical links. 3. A railroad ancestry file (WCONV.DAT) contains each railroad's reporting marks and a list of its ancestors and descendents with dates of transition. This makes it possible in many cases to accommodate corporate changes without making any changes to the railroad identifications used in network element attributes. But more importantly, it means a common network can accurately represent the system over a broad range of dates. However, this versatility and ease of maintenance is purchased at the expense of requiring a translation of raw railroad identifications into target date identifications for every separate application. 4. Though not data, part of the Q-network system is a set of programs to convert the raw data into an analytic network suitable for standard network programs. These include separating the links and nodes into railroad specific national subnetworks for a given target date, converting interlines into logical links connecting subnetwork nodes to form a unified national network, and reformatting data into a form useable by commercial GIS systems. Although Fortran source codes are included along with their output, it is likely that many users will wish to code variants of these models themselves from the descriptions provided. The major programs are: (a) RHA reformats network files into the input form required by some GIS systems. (RHA is common to all CTA networks, and a switch determines whether to use processing rules for the Q-network (railroad), H (highways), W (waterways), C (intermodal), or S-network (shorelines/ boundaries). (b) RQCFS writes the derivative QC network from raw QN files for a specific target date, an input parameter. Network QC2C represents the freight network as of 2002 December and is available for download in a variety of formats. (Be aware that most logical interline links, which are elements in the link list of derived networks, are geo- graphically of zero length, and several GIS systems will refuse to accommodate them. In particular, they are not included in the GIS- ready versions.) 5. Auxiliary data files aid in interpreting the network attributes. (a) SUBDIV.TXT lists the 4-character abbreviations of subdivisions used in the SB link attribute field. (b) TQCONV.DAT lists railroad names found in TIGER files (helpful for TIGER users but not needed for QN). SOURCES The base data for the US portion of the network is the Federal Railroad Administration's National Atlas-based strategic rail network. That is, network QN was built by starting with that network, and then modifying it. The preponderance of the data included in QN is therefore directly carried over from the FRA network. However, different paths were often taken in that modification. The table below shows the development path taken in different states. "UTTC" means that we started with the FRA network of 1994. Dave Clarke of the Univ of Tennessee Transportation Center edited all states by making minor changes in topology and attributes and adding main line classes necessary for the routing model requirements of the 1993 Commodity Flow Survey (CFS). ORNL has subsequently maintained these states to reflect ownership, closures, and changes in traffic patterns. "FRA" means that we started with the FRA network of Spring 1997, including its trackage rights model and traffic density classes. The previous UTTC network was used for reference and comparison only once it became apparent that updating it would be of comparable expense to modifying the newer FRA network into CFS-compatible form. Both UTTC and FRA unenhanced networks can generally be characterized by 1500 m geographic accuracy and 1000 m topological resolution. While FRA state networks generally have more abandoned lines from before 1993, there are no longer any practical operational differences between them due to subsequent editing. UTTC states are AK AZ AR CA CO ID DC KS ME MD MI MN MT NJ NM NY ND OR TN TX WA VT Canada & Mexico FRA states are AL CT DE FL GA IL IN IA KY LA MA MO MS NE NV NH NC OH OK PA RI SC SD UT VA WV WI WY All US states except Alaska have had locational and topological enhancements to most mainlines and in urban terminal areas. Enhanced locations were generally taken from the BTS rail network derived from 1:100,000 scale Digital Line Graphs. Geographic accuracies are now generally 100 m (excepting numerous unenhanced branch lines and abandoned lines) and topological resolution is about 200 m throughout the US. Canadian and Mexican matchstick networks were constructed de novo by Dave Clarke for the 1993 CFS, and are similarly maintained by ORNL. The Canadian network has been locationally enhanced by using alignments from FRA's Canadian network, itself derived from the Digital Chart of the World. Most distances are transcribed rather than geographically calculated. The Mexican network does not include the late 1990's privitization. Beyond the digital data, the most useful sources I had for current status and topological detail were: Harry Ladd, US Railroad Traffic Atlas (numerous editions). Mike Walker, Railroad Atlas of North America (all volumes), published by Steam Powered Video. Miscellaneous local urban street maps, county highway maps, topograhic maps, and employee timetables. THE DATA MODEL 1. Topology The QN network is fundamentally a link-node representation. Nodes are points on the Earth's surface, with a longitude and latitude, and links are one-dimensional polylines connecting two nodes, defined by a chain of vertices with lon/lat coordinates approximating the shape of a line that idealizes real rails. Traditionally in national strategic rail networks, nodes represent an undifferentiated area where various activities occur: stations, industrial leads and switching operations, yards and car classifica- tion, interlining, or a complex of turning or transfer leads. This is opposed to an operational representation where each node is a switchpoint, where each separate path a vehicle can take, including dual track and sidings, is a separate link, and link locations halfway between the rails are unambiguous down to the centimeter. This network is more former than latter, but does have some operational pretensions. Links are intended to represent lines used for through line-haul and branch movements. The individual lines used in yards and passing sidings do not, in general, have explicit representations. Even important traffic generators may be located in the middle of a long link with no nearby nodes. In the ideal, however, turnouts at junctions used by through trains are intended to be represented. In the data design, each node with 3 incident links is a switchpoint, and the directions of movement possible are determined by incident angles at the node. Where triangles exist (that is, all three directions of movement are permitted through the junction of three routes), the ultimate intent is to explicitly show all three sides of the triangle or to flag the node's "crossing" attribute. If there are 4 incident links, the node is either a diamond (no turns between 2 distinct through routes) or equivalent to a double slip, in which case turns should be allowed through sufficiently obtuse angles. (In reality, the switches may be tens of meters apart.) If the node has an explicit crossing flag, follow that logic regardless. Otherwise, crossings between lines of different railroads may be assumed to be diamonds, and of the same railroad to be double slips. If interlines are permitted at the junction, naturally there must be a physical connection, whether or not shown. As a rule no nodes are used at grade separated overcrossings, unless there are ancillary connecting tracks I have not had the energy to explicitly add. Adjacent lines of different railroads will not share nodes, even if their lines are but 10 m apart. A common node will be used only if the traffic on one line can obstruct the operations of the other. There are notional connectors (track type 'Z') in the network that represent operationally costless movements between two nearby routes. Most often they represent a pair of crossovers between parallel lines or a transfer track between two nearby railroads, and angle of incidence has no meaning for them. Suggestion: Because this data is incomplete or surmised, and because reverse movements do occur, my advice would be to penalize but not prohibit acute angle turns between links that are flagged "geographically accurate." For nodes explicitly flagged or having highly acute angles (say < 60 deg) the turning penalty would be on the order of 10 mi, and half as much otherwise. The angle of incidence should be averaged over a 50 m radius of the node. For the continental US I am by and large satisfied to calculate angle of incidence as though longi- tude and latitude are a cartesian coordinate system, but of course users are free to use spherical trigonometry, corrections, or a suitable map projection. 2. Geography The reasons for being concerned about geographic accuracy are: a. Links lengths are generally derived from geographic lengths, that is, the sum of lengths of chords between vertices that define polylines. The more accurate the shapes, the more accurate the lengths and everything that depends on them, such as ton-miles and distance dependent route choices. b. Evaluating the interaction between the network and ancillary features often depends on proximity measures between them. Examples are traffic generators, population distributions, environmental features, elevation data, and political boundaries when those locations are known only by geodetic coordinates, which is typical. Geographic location is therefore the means of relating a network to the rest of the world. c. Accurate renditions of shape make human interpretation of functional relationships easier, and so aid editing and maintenance. d. Aesthetics. The primary concern in the construction of this network was routing and network analyses--what generally fall under the meaning of the "analyticity" of a network--rather than geography per se. Therefore improvements in geography were concentrated where they would have the greatest impact on analyticity. There are three basic classes of link shapes. a. First are original shapes from the National Atlas, digitized at a scale of 1:2 million. During editing, endpoints may be adjusted with corresponding displacements in the link interior. These adjust- ments generally if modestly increased the link's overall geographic accuracy, as characterized by the difference between the location of an arbitrary railroad feature and the point on the network that repre- sents it. I estimate root mean square errors of these links to be 1300 m. These links have source codes of "m" or "n." b. Second are shapes taken from 100 m digital sources: the BTS rail network, in turn taken from 1:100K USGS digital line graphs, and TIGER. c. Third are freehand additions to cover lines not found in available digital cartography. Accuracy is highly variable. Typically, short links will have small deviations, except I may have considerable uncertainty about where such connectors join a known location route. Longer routes may simply be straight line repre- sentations with deviations comparable to the length of the link. Such links are generally recognizable by the small number of vertices relative to their length with source codes of 'J' or blank. The preponderance of QN network mileage is in the second class (100 m) when weighted by traffic, but a significant proportion of branch lines and abandoned lines have accuracies worse than 1000 m, and all lines are worse than that outside the US. All coordinates are stored in decimal degrees, datum NAD27. 3. Line ownership and identification In the ideal case, lines are identified by an owning railroad plus a unique route identifier, which attempts to use the owner's own administrative method, typically division (generally an areal feature) plus subdivision, branch, or route (lineal features within the area). Division-wide routes are sometimes identified by one- or two-letter designations inserted in the subdivision field. Separate tracks may also be identifed when they use distinct alignments. In addition, other railroads that may jointly own the line and which have trackage or haulage rights on the line are identified. Separate flags are used for switching or trackage rights with or without traffic generation privileges. Previous owning railroads of a link may also be listed. In principle, each railroad's presence on a link is for a specific time period. That range may be indicated explicitly as a link attribute, or implicitly through the railroad ancestry file. For instance, CSXT's Toledo Subdivision from Cincinnati to Toledo is iden- tified in the raw network as owned by CHD (Cincinnati, Hamilton & Dayton). Consulting the railroad ancestry, this line was successive- ly inherited by BO (Baltimore & Ohio, 1920-1986), CO (Chessie System, 1986-1987), and finally CSXT up to the present. In general, the most specific railroad that has an unambiguous line of ascent or descent is chosen to identify the "owner" of the line. Some reporting marks are independently defined in this network with special meanings, for example: CNWU - Chicago & North Western lines north of Green Bay, inherited from Union Pacific by Sault Ste Marie Bridge (WCL subsidiary), and later into Canadian National. WCL - Recent Wisconsin Central Ltd not part of the original Wisconsin Central RR (WC). CRNS - Conrail lines inherited by Norfolk Southern. CRCS - Conrail lines inherited by CSX. CRJT - Conrail lines in the Shared Assets areas (the new Conrail). NSRY - Original Norfolk Southern Ry, inherited by Southern Ry, to distinguish it from the current NS. MC2 - Michigan Central, to distinguish it from the Maine Coast. CKRU - Central Kansas lines inherited from Missouri Pacific, as distinguished from the initial CKRY from Santa Fe. ICRR - Original Illinois Central lines that preceded the ICG merger. CPSH - Former Canadian Pacific lines inherited by St Lawrence & Hudson (and subsequently re-unified with CP). null - No railroad, as when an entire railroad is abandoned without a successor. X - Any mark beginning with an "X" represents a period of no operations, including suspension or abandonment. Typically intended to change a line's STATUS rather than logical ownership. This identification strategy is not usable for dismembered railroads, such as the Rock Island. Explicit link histories must be used instead. In any event, explicit link histories take precedence over railroad ancestries. Conrail (CR) has a special ancestry status. Many of Conrail's ancestors have been predominately incorporated into either Norfolk Southern or else CSX. Where this has happened, I have indicated the ancestor in the link ownership field, and only exceptions will have explicit link histories. For example, the Lake Shore & Michigan Southern (LSMS) shows decendants NYC, PC, CR, and finally NS. Therefore, the Lake Shore main across northern Indiana shows owner LSMS only with no explicit link histories. But the Lake Shore main between Cleveland and Buffalo was inherited by CSX, and so shows owner CRCS with explicit predecessor LSMS. For a complete link history, old owner LSMS and its successors would be used before the initial creation date of CRCS, 1999.6. Leading CR components predominately assigned to a single successor are: To CSX: NH; NYC core and components BA, CCCS, TOC, WSH. To NS: PRR; ERIE; DLW; LV; RDG; NYC components LSMS, MC2. To CRJT: PRSL; CNJ. Although the ancestry system is complicated, it has substantially reduced maintenance costs. To avoid the complication, users not interested in history will use a derivative network, either QC or one they construct on their own using program RQCFS. Derivative networks contain only contemporary reporting marks for the target date, and no interpretation is required. 4. Distances Link lengths throughout the networks, excepting Canada and Mexico, are most commonly geographically imputed from the digital location data itself. Geographically imputed lengths may differ from actual due to a. Generalization of shapes, usually smoothing but sometimes exaggerating fine features. b. Representation of arcs by a sequence of internal chords. c. Digitization noise. d. Cartographic offsets. Typically in mountainous areas generalization errors dominate, leading to imputed distances that underestimate actual distances by 1% in the 100 m lines, and 5-20% from 1:2M sources. Conversely, in level terrain, noise and generalization cause opposite estimation errors in the vicinity of 0.2% and 2%, respectively. Since virtually all mainlines now have 100 m geography, the estimation of distances should be good with little user effort needed for adjustment. 5. Interlines Operationally the rail system is a collection of individual railroad companies, each operating their own trains over a subset of the network--those lines that they own or on which they have trackage rights. Each company can directly collect and distribute cars only between points on its own lines. Each company's lines may therefore be thought of as a subnetwork where trains can operate, but where trains typically cannot pass the subnetwork boundaries. Individual cars and shipments, however, may be passed from one railroad to another, in effect jumping between subnets. These transfers are called "interlines," and occur only at an enumerated set of locations. The interline file provides this set. Interlines occur in several possible ways. The fastest is a run- through, where a train stops on a section of track and the forwarding railroad's locomotives (or at least crew) are replaced by the receiving railroad's, and the train continues on its way. More commonly in large rail centers, entering trains are broken up at receiving yards and the cars are reassembled into blocks by receiving railroad. At regular intervals the blocks are transferred to the assembly yards of the receiving railroads. The transfer may be made by a local transfer or switching run of either railroad, or by a third railroad paid for the service. When volumes are smaller, cars may be left at an agreed upon local siding or small interchange yard (ie, several adjacent local sidings) by one railroad, and later picked up by the receiving railroad. Each interline entry indicates two locations: on the forwarding and receiving railroads. If the locations are different, it indicates that a transfer run occurs between points of through train disassembly and assembly. Those points are usually yards of the two railroads. If the same, they indicate a common point where cars are set off by the forwarding railroad and later picked up by the recipient. In general, node numbers are blank or ignored in the raw interline file, meaning that geographic location alone is used to locate interlines on any particular subnetwork. Part of the construction process for any derived network, however, is to assign node numbers to interlines, after which lon/lat may be ignored. Suggestion: As a practical matter, for a common interline point, I select the nearest node shared by the two railroads, provided it is not too far away (generally 4 km). If no such node exists, then the closest nodes on each of the two railroads are selected independently to be the endpoints of the interline movement. If either is too far away, the interline is presumed inactive. (Currently inactive interlines are retained in the list because of the possible need to simulate historical routes.) Remember that interlines have often been located independently of current network locations. They could be better, but are more often worse, even beyond the vagueness inherent in their definition. The expense of the interline is intended to be expressed in its impedance rating, run-throughs receiving the lowest impedance. If a third railroad is used for the transfer, its identity may be indicated in the entry. FILE DESCRIPTIONS In all files, records with a '#' sign in the first character position are comments (intended for human readers). In some files, text after a '#' sign anywhere in the record is a comment. 1. Link location (extension .LCR) Fixed format, 80-byte records. For each link, a header record contains the 8-digit link ID and the number of vertex coordinate pairs that follow, in format (I8,I4). Then the longitude and latitude of each vertex is written in format (8F10.6), four vertices per record, using as many records as required. Normally, the decimal point is implicit, but an explicit decimal point takes precedence. Units are decimal fractions of a degree, using the convention that longitudes west of Greenwich are negative. In this dataset, all longitudes are in the range [-180, 0] and latitudes in the range [0, +90]. Location (0,0) means unlocated or unused. 2. Link attributes (extension .LLR) Fixed format, variable length records (current maximum of 149 bytes), one record per link. Fields: Columns Format Name Attribute 1-8 I8 LID Link ID 9 A1 LTYP Record type 10 11-17 I7 JA A-node 18-24 I7 JB B-node 25-30 F6.2 MILES Length (mi) 31 A1 HDNG Heading 32 A1 ONEWAY One-way 33-36 A4 W0 Owner 37-39 A3 DV Division 40-43 A4 SB Subdivision 44 A1 RT Route (Line) 45 A1 TKID Track number 46 A1 MLC Main line class 47 A1 WT Weight 48 A1 HT Height 49 A1 GAUGE Gauge 50 A1 TRKTYP Track type 51 A1 GRADE Access control 52 A1 STATUS Status 53 A1 DENSTY Traffic density 54 55 A1 SIGNAL Signal system 56 A1 PASNGR Passenger service 57 A1 MILIT Military subsystem 58-59 60 A1 LSRC Source 61 A1 LUPDAT Edit history 62-73 A12 MPRAW Raw milepoint field 62 A1 PTYP Milepoint type (derivative network only) 63-68 F6.2 PA Beginning milepoint 69-73 F5.2 PDEL Milepoint increment Variable length usage section: 74 Z1 NFLD Number of usage fields 75-79 A5 FLD1 First usage field ... 145-149 A5 FLD15 Last possible usage field Usage fields are defined in the field descriptions below. 3. Node file (extension .NDR) Fixed format, 72-byte record length. Fields: 1 A1 JTYP Record type 2-8 I7 JID Node ID 9-32 2F12.6 JZ Node location (lon/lat) 33-34 A2 JSTATE State 35-57 A23 STN Station name 58 59-64 I6 SPLC Std point location code 65 A1 CROSS Crossing code 66 67-68 A2 JSRC Source/history 69-72 A4 DIRCTN Directional connections 4. Interline file (extension .ILN) Fixed format, two 72-byte records per entry. In the common fields the two records describe the respective railroads that exchange traffic. Currently, all interlines are symmetric. The first two fields together define a unique interline ID; they may be read as a single 7-character field with embedded blanks. Fields: (Rec) Columns Format Name Attribute (1) 2-6 A5 IIDNAM Interline (Rule 260) junction code (1) 7-8 I2 IIDQ Interline ID number (1) 10-13 A4 WA Forwarding RR reporting mark (1) 15-38 A2,A23 NAMEA Forwarding location (State/place) (1) 39-57 F10.6,F9.6 ZA Forwarding location (lon/lat) (1) 59-65 I7 IJA Forwarding RR node number (1) 68-72 A5 ALIAS Alternate junction code (2) 2-8 F7.0 IMPED Impedance rating (2) 10-13 A4 WB Receiving RR reporting mark (2) 15-38 A2,A23 NAMEB Receiving location (State/place) (2) 39-57 F10.6,F9.6 ZB Receiving location (lon/lat) (2) 59-65 I7 IJB Receiving RR node number (2) 66 A1 ITYP Interline type (2) 68-71 A4 WTRM Intermediate transfer RR Notes: a. As with link owners, interline railroads must be translated into current values. b. If the receiving RR location is blank, it defaults to the forwarding location. c. A made-up junction code ending with the character '$' is used for locations where interchanges are known to occur, but where I cannot find a formal code. Examples (somewhat shortened): BLUIS 1 CR ILBlue Island Yd -87.6480 41.6403 CHGO 300 CSXT ILBarr Yd -87.6535 41.6488 IHB The alias of "CHGO" means that on waybills indicating a Chicago interline between Conrail and CSX, this Blue Island transfer is allowed to be used if otherwise convenient. Indiana Harbor Belt is presumed to handle the transfer between yards, but is not mentioned on the waybills' railroad sequence. DURHM 1 SAL NCEast Durham -78.8765 35.9782 370 SOU NCEast Durham -78.8765 35.9782 The high impedance discourages use of this little used interline in route selection models. For current use, Seaboard Air Line would be mapped into CSXT, and Southern into Norfolk Southern. 5. Ancestry file (WCONV.DAT) Fixed fields in the beginning of the record, with optional delimited fields at the end. Variable length records at most 255 bytes long. Dates are floating point Julian years, eg, 1998 October 1 is approx 1998.75. If there are only 2 digits before the decimal point, a century window (eg, 1940-2039) is used to expand the year, except that year '0' means the indefinite past, while '00.x' means the year 2000. (The link attribute file format I chose long ago limits me to 4 characters for a date.) Fixed fields: Columns Format Name Attribute 1 A1 WTYP Record type 2-5 I4 WNUM AAR railroad numeric code 8-11 A4 WMARK Reporting mark (standard RR ID) 13-14 A2 FAMILY Railroad family 16+ A64 WNAM Railroad text name, terminated by delimiter or end-of-record Delimited fields occur after the name. In general, each delimited field is of the form: open delimiter, first transition date, first RR list, second transition date, second RR list, ..., last RR list, last transition date, close delimiter End-of-record or the beginning of a comment ('#') may also terminate the field if there is no close delimiter. A missing first transition date means the indefinite past, a missing last transition means the indefinite future. Between bracketing transition dates, the railroads in the list are active. Dates always start with a numeric digit, railroads in a list always start with a letter, and all are separated by blanks within the field. Delimited fields, with their open and close delimiters, are: @ @ Operated by. The railroad described by the record had its trains operated by another railroad during the period, even though it may have retained formal control of the lines or plant. (For my routing purposes, this is the same as a transi- tion of ownership.) Example (without number, etc): " LI Long Island Rail Road @1997.4 NYA @ " The newly formed New York & Atlantic Railway operated all freight service on the Long Island from 1997 May onward, but it does not own the lines. [ ] Subsidiary of, or otherwise owned by. I generally ignore corporate ownership or subsidiary relationships, until operations are integrated. Example: " TCT Texas City Terminal Ry [0 ATSF SP ] " % % Railroad transition (merger, purchase, lease). One railroad replaces another. You often must follow a chain of transitions to get the current operator. Examples: " MRL Montana Rail Link %0 NP 1987.8% " " NP Northern Pacific Ry %1970.2 BN % " " BN Burlington Northern Inc %1996 BNSF % " " BNSF Burlin ... & Santa Fe %0 BN ATSF 1996% " For a target date of 1986, MRL would map into BN in two jumps. ^ ^ Billing railroad. Intended for shortlines that may rely on their parent for billing and collection, and therefore may not appear on waybills for traffic they generate. Applies, for instance, to the Conrail Express and Norfolk Southern Thoroughbred programs. Railroad and division may be indicated in a railroad list in the format "RR/Div". Eg, "WCRC/Mos" for the Moses Lake division of the Washington Central, which became the Columbia Basin Ry. 6. Subdivision list (SUBDIV.TXT) Since 4-character abbreviations are used for subdivision identifiers in the link file, this file provides the full name of the subdivision. An attempt has been made to use unique abbre- viations for different subs even when the names are the same. Fixed fields in the beginning of the record, with optional explana- tory text: Columns Format Name Attribute 1-4 A4 WMARK Owning railroad (as ancestral as possible) 8-11 A4 SB Subdivision abbreviation 14-14 A1 SBTYP Subdivison type, subjective for now, explained in comments at the file beginning. 16-32 A17 SBNAME Expanded name 33+ A* SBSTAT List of states containing SB, delimited by braces, eg, {MD DE PA}. Comments follow the closing brace. Examples (shortened): MP RivM s River {MO} Jefferson City W> Neff Yd MILW Rivr s River {MN} River Jct W 288.0 w> 407.4 St Paul Yd SLSF ThaS s Thayer South {MO AR TN} Thayer > Tennessee Yd I have not yet figured out how to make this data temporal. In cases where a subdivision is inherited by a successor railroad, the SB abbreviation will often be preserved. The ancestral owner is important, because independent milepoint sequences from the ancestors will typically be preserved even though combined in a single sub. For example, the UP La Grande sub includes OWRN mileposts numbered east from Portland to Huntington, OR, and OSL mileposts numbered west from Granger to Huntington. That is, subdivision and milepoint alone are insufficient to uniquely identify a location. LINK ATTRIBUTES (FIXED) Variables are rated according to extent of coverage and reliability . 1. LID - Link ID. An 8-digit integer. First 2 digits are the state number, the following 5 a sequence number, and the last a subdivision number (almost always 0 in the current network because I do not yet see a need to preserve link ID ancestry between versions). 2. LTYP - Record type. Always blank in the current network. "D" used for deleted links in the raw network. 3. JA - A-node. The node number at the beginning of the link, a 7- digit integer. 4. JB - B-node. The endpoint node number. 5. MILES - The length of the link in miles. Invariably written with an explicit decimal point. See the discussion of distances in the data model section for estimation techniques and reliability. 6. HDNG - Heading. Usually a compass heading, N, S, E, or W, from the endpoints, but it may match the timetable heading of the route the link is a part of. 7. ONEWAY. If the link primarily carries traffic in a single direction from A- to B-node, this flag will be a "1", otherwise blank. Directionality is seldom a complete prohibition against reverse movements; my practice is to penalize contra-flow movements with 20% greater impedance. 8. W0 - Owner. This is the railroad that administratively "owns" or controls, and is responsible for, the link. The most specific unambiguous ancestor of the current railroad is generally selected. 9. DV - Division. A 3-character abbreviation of the CURRENT owning railroad's division name. There is at present no list of these abbreviations, but they will eventually be included in the railroad ancestry file. 10. SB - Subdivision. The name of the linear route that the link is a part of, a 4-character abbreviation. The route may be called variously a subdivision, branch, line, or lead. 11. RT - Route. A 1-character line identifier of the separate milepoint sequences used within the subdivision. 12. TKID - Track number. Since this network does not attempt to resolve double track, TKID is only used when there are different alignments for multiple tracks of the same route. 13. MLC - Main line class. A subjective rating of line importance, and implicitly quality. It is primarily based on tonnage (eg, A-mainlines were originally defined to carry more than 20 million gross tons/year), but discounts slow or cheap commodities. It is the primary component of impedance measures that differentiate between routes. Codes in descending order are: A - A-main G - A-branch B - B-main H - B-branch C - C-main X - non-freight 14. WT - Weight. Class of standard freight cars allowed in normal service. G - 220K lb K - 241K Q - 263K R - 286 K 15. HT - Height. Overhead clearance adequacy. O - double stack F - plate F U - under plate F 16. GAUGE - Gauge. blank - standard E - electrified, standard C - cog N - narrow R - transit 17. TRKTYP - Track type. number (1 or higher) - number of multiple tracks on a through line A - main, number unknown S - siding P - spur (eg, industrial lead off a through line) F - car ferry Y - yard track B - main through yard R - transfer tracks T - station tracks Z - notional connector 18. GRADE - Access control. G - at grade F - controlled access S - in street E - uncontrolled T - tunnel(s) I - grade separated B - bridge U - underground H - snowshed 19. STATUS. K - active A - abandoned M - embargoed (rails exist) P - suspended (out of service, but reopenable) number - FRA rail class of active line 20. DENSTY - Traffic density. Only used in FRA-derived states, preserved 1997 data. 0 - unknown 1-7 - FRA assigned density class. 21. SIGNAL - Signal system, roughly descending order. S - automatic control system T - automatic train control U - automatic train stop I - ITC (AMTK incremental train control) C - centralized traffic control B - automatic block signals M - manual O - timetable and train order blank - unknown (primarily Mexico) 22. PASNGR - Passenger service. A - Amtrak C - commuter B - both Amtrak and commuter V - VIA (Amtrak takes precedence) R - transit S - scenic T - tourist (eg, dinner train) O - other X - former Amtrak route 23. MILIT - Military subsystem. S - STRACNET system C - STRACNET connector 24. LSRC - Source. blank - matchstick (Canada and Mexico) n - original QN network, 1:2M shape N - original QN, modified shape m - original QM network, 1:2M shape M - original QM, modified or transferred to QN (In Canada, 1:1M shape.) G - Directly from 100 m source (DLG, BTS) Q - 100 m accuracy, possibly modified T - TIGER/Line J - Surmised or estimated shape, generally superior to 1000 m. 25. LUPDAT - Edit history. blank - original C - edited in 1997-2001 E - 2002-2003 D - 2001-2002 26. MPRAW - Raw milepoint field. In the raw network QN, milepoints with easy-to-enter but hard-to- interpret codes are used. It is the task of an independent program to convert codes into the uniform mileposting scheme seen in fields 27-29. The following schemes are often used, where a - A-end milepoint > - increasing MP seq b - B-end milepoint < - MP seq reverse of link dir v - milepoint increment "a > b" "a < b" "a b" "a < " end MP(s) given "a +v " "a -v " " -v b" " +v b" end MP w/ increment A number without an explicit decimal point implies 2 decimal digits. 27. PTYP - Milepoint type. 28. PA - Beginning milepoint. 29. PDEL - Milepoint increment. LINK ATTRIBUTES (USAGE) At the tail of each record are a variable number of usage fields, each 5 characters long. The first character indicates the type of information, and the final 4 the information content. The fields are used to indicate dates of status change, and other railroads which may use the link. The types of usage data are: 1. d - date removed from service. If STATUS indicates the link is not active, the date following the "d" indicates when the link was last active. 2. e - date restored to service. 3. w - joint owner. The railroad following the "w" jointly owns or shares control of the line with the railroad in the W0 field. In the operational network QC, the W0 field is unchanged and used for identification only. The reporting marks of the current owning railroad (which will typically be an ancestor of W0) will be added in a usage field. 4. u - effective operator. Used exclusively for formerly owned freight lines that have been formally acquired by a public transit organization, but where the former railroad continues to operate freight service. 5. o - old owner. The owner of the link before the present one. If the date of transition is not explicit, it may be assumed to be the last day of the old owner's corporate existence, before transition to its successor, or the first day of the present owner's, whichever is later. 6. s - switching rights with traffic generation, but no line haul trains operated. 7. t - trackage rights with traffic generation rights. Also used for passenger railroads that operate over the line. 8. r - trackage rights without traffic generation. That is, the railroad may run its through trains over the line, but may not switch, pick up, or deliver cars along the line. 9. h - haulage rights. One of the host railroads hauls the client railroad's trains with its own crews (and probably locomotives). Ultimately, it would make more sense to enumerate haulage rights trains with a list of their stops and hauling and client railroad in a separate file, rather than decorate the raw network with these difficult-to-maintain filigrees. 10. +/- - dates before/after the preceeding railroad's usage is active. For instance, if 3 consecutive usage fields read "rATSF", "+1993", "-96.5" it would indicate that the Santa Fe had trackage rights between 1993 (month indeterminate) and 1996 June. Under standard usage, a missing beginning or end date means the indefinite past or future, although as a practical matter there is no real attempt to track the history of these rights before 1993. Since derivative networks decribe a single date, this history is not carried over from the raw network. The railroad appears or is excluded from the usage list depending on whether it was active on that date. NODE ATTRIBUTES 1. JTYP - Record type. Normally blank. "U" indicates a node unused by any link. 2. JID - Node ID. A seven digit integer. If internal to a state, the first two digits will be the FIPS code of the state. Otherwise (on a boundary), the number will start '00'. International boundary nodes use the form '0009ssq', where 'ss' is the adjacent US state FIPS code and 'q' is a unique digit. Pseudo-FIPS codes of "88" are used for all of Canada and "91" for Mexico. 3. JZ - Node location. 4. JSTATE - Postal abbreviation of state. Used only if the node has been assigned a name. 5. STN - Station name. Not necessarily official, or if an official station the node may be no more than nearby the actual location. The name may be used multiple times on different nearby railroads, or even at multiple nearby locations on the same railroad. 6. SPLC - Standard point location code. Generally carried over from the FRA network. Unlike the FRA data model, the same SPLC may be used at multiple nearby locations. 7. CROSS - Crossing code. Added on an exception basis. X - Diamond, no turns. T - Triangle. Turns allowed between all three incident links. If two incident links, trackage exists to allow train reversal. A - Turns allowed through any obtuse angle. (Usually used when multiple lines meet in a fan, or two lines from different railroad cross at a shallow angle.) E - All turns allowed. 8. JSRC - Source/history. Similar to links. Seldom used. 9. DIRCTN - Directional connections. Not used. ROUTING NETWORK CONSTRUCTION (TYPICAL) Addressing routing or traffic assignment problems essentially requires representing the rail system as a network, but one designed to capture the crucial features of the system's operation that are different from other modes. The following describes the steps we used to construct the railroad component of the CFS multimode network from the derived QC6C network. 1. Construct the logical link list. Pass through the physical link list, noting each railroad that has usage rights. For each railroad, add one more railroad-specific link to the logical link list. This is most conveniently done by appending the railroad's 4-character reporting mark to the physical link ID. This is also an appro- priate time to evaluate the impedance function for each link, since it may depend on the type of trackage rights. Crucial information about the logical link must be retained when it is different than the physical link. For instance, traffic generation rights will differ for each railroad. In general, the logical link's distance and geographic shape will need to be known as well, but this can be handled by a pointer to the block of attributes for the physical link, and so need not be repeated for each railroad. 2. Node list. Nodes must also be logically distinguished by the subnetwork they are in. The easiest means is again to append rail- road marks to the node numbers to form new logical network node IDs and retain them in a list. The resulting network is a set of inde- pendent unconnected subnetworks, one for each railroad. 3. Interlines. Each active interline is represented by a single logical two-way link added to the link list. The link's 2 endpoints are in the subnetworks of the two railroads the interline connects. The interline attachment model I use selects an existing node in each subnet for attachment points, but a more sophisticated model could easily select a point interior to a link, and then divide (segment) that link. 4. Turn restrictions. If desired, angles can now be measured between incident links in each subnet. [For strategic or national- level problems these restrictions are rarely of much consequence, and were not used in the CFS.] The resultant logical link list now describes a national routeable network. If you wish to use this logical network in the network module of a commercial GIS system, you must keep the following limitations in mind. Many GISes will construct network topology based on the assumption that links whose endpoints coincide geographically must be connected at that point. In this network it is very common for a physical node to be shared by multiple logical nodes on different railroads' subnets. If the different railroads' links were directly connected, traffic could pass between railroads costlessly, short circuiting interlines. To avoid this, it may be possible to slightly dither the individual railroads' node positions. (And of course you must snap logical link locations as well. Now a common coordinate chain cannot be used for each daughter logical link.) This will also solve another common GIS problem: Refusal to accept logical interline links when they are of zero geographic length. The dither, hopefully, will be large enough to be outside the GIS's resolution limit, but not so large as to be obvious on displays or to corrupt proximity measures. Otherwise, you must avoid disrupting topology. In particular, do not CLEAN these networks--they're definitely non-planar. FILES INCLUDED IN THIS DISTRIBUTION State numbers use FIPS codes for US states, S01 for Alabama through S56 for Wyoming, plus S88 for Canada and S91 for Mexico. There are no files in this network for Hawaii (S15), Puerto Rico (S72), or any other US territory. Consequently there are 52 "state" divisions. The "2C" in the file names below is the version: 2002 December. 1. Raw QN network files, by state: a. Boundary nodes: QN2CS00.NDR b. State nodes: QN2CS01.NDR through QN2CS91.NDR c. State link locations: QN2CS01.LCR through QN2CS91.LCR d. State link attributes: QN2CS01.LLR through QN2CS91.LLR Note that no state will be complete without the boundary node file, since state node files contain only nodes internal to the state. Unified North American files can be produced by simple concatenation. 2. Railroad ancestry file: WCONV.DAT 3. List of subdivision names: SUBDIV.TXT 4. Raw interline file: QN2C.ILN 5. Derived North American freight network current as of 2002 December ('2C'). This can be used as input for CFS intermodal network creation. QC2C.NDR QC2C.LCR QC2C.LLR QC2C.ILN 6. Network QC2C reformatted for: Arc/Info Maptitude MapInfo ESRI generate Shapefiles QCU.NCA QCUN.GEO QCUN.MIF QCUN.SHP Node locations QCU.NDA QCUN.TXT QCUN.TXT QCUN.DBF Node attributes QCUN.SHX Node SHP index QCU.LCA QCUL.GEO QCUL.MIF QCUL.SHP Link shapes QCU.LLA QCUL.TXT QCUL.TXT QCUL.DBF Link attributes QCUL.SHX Link SHP index The data dictionary required by Maptitude (*.DCC) which identifies attributes by position and type may be constructed from the attribue list in the RHA program's .CFG file. 7. Network QN2C reformatted for: Arc/Info Maptitude MapInfo ESRI generate Shapefiles QNU.NCA QNUN.GEO QNUN.MIF QNUN.SHP Node locations QNU.NDA QNUN.TXT QNUN.TXT QNUN.DBF Node attributes QNUN.SHX Node SHP index QNU.LCA QNUL.GEO QNUL.MIF QNUL.SHP Link shapes QNU.LLA QNUL.TXT QNUL.TXT QNUL.DBF Link attributes QNUL.SHX Link SHP index Again, the RHAQN.CFG file is included to help users construct Maptitude's .DCC dictionary file for reading attributes. 8. Program source codes: RHA.FOR Input: Q-format network Output: reformatted for GIS input RHA.CFG Configuration or command file to control RHA output (example). Contains a list of attributes to include. RQCFS.FOR Input: Q-format raw network, interlines, RQCFS.CFG Output: Q-format current network, interlines SUBS folder Utility Fortran subroutines (incl RR ancestry mapping). QC.AML Arc/Info macro for cover generation (example). NOTES ON GIS IMPORTATION 1. Interlines are ignored in GIS reformatting. 2. Arc/Info. In Arc/Info, run the macro QC.AML. File names are hard coded in the macro, and assume the 4 input files (QCU.??A) are in the current directory. It will produce covers QCUNVD for nodes and QCULVD for links, with no map projection (decimal degrees). By replacing file names containing 'QCU' with 'QNU', the same AML will import the raw network provided the attribute lists are changed to correspond. 3. Maptitude. For both nodes and links separately, build a map layer from the GEO file, an attribute table from the TXT files, and join them using the first fields (sequence numbers) for the join key. The user should construct the data dictionary file x.DCC that Maptitude uses to interpret the attribute fields in x.TXT. That list is contained in the RHA configuration file used to direct the creation of x.TXT. 4. MapInfo. For both nodes and links separately, import a layer from the MIF file. Besides geography, the MIF contains a description of the attribute table, but not its contents. Add the contents of the TXT file to populate the attribute table. (Warning: Since I have seen no documentation for the MIF, I used other MIFs as templates.) 5. The GIS files contain only selected attributes of the original data, but also include a number of derived attributes calculated by RHA from raw attributes. (a) FAMxx gives an integer "occupancy value" for railroad family "xx" in the operational network QC. (The list of families is at the top of WCONV.DAT.) For example, FAMCX is for lines used by CSXT. Values range from 8 for lines controlled by CSXT to 2 for CSXT haulage rights to 0 if unused by CSXT. (b) Character variables W1 and W2 in network QC give the reporting marks of current owners, and variables T1 through T4 give railroads with trackage rights (of any kind) on the link. (c) INCID is a count of the number of links incident to a node. (d) All character variables in DBF and comma-separated TXT files will have a leading underscore ("_") if they would otherwise be all blank. ACKNOWLEDGEMENTS Support for the development of this data model has come in small increments over a decade of evolutionary development. In most cases, the sponsor of the work was interested in the answers provided by an application rather than the data itself; this network was an attempt to find the means to answer the questions. The Bureau of Transportation Statistics provided support for the initial development in 1994, enhancements in 1997, and a maintenance update in 2002 to simulate intermodal movements in the Commodity Flow Surveys of those years. The Federal Highway Administration also contributed to 1997 enhancements. The Federal Railroad Administration has contributed data resources, and in fact the basic architecture of the network, as it was built on their foundation. Finally, support from the Department of Energy, Civilian Radioactive Waste Management, in the 1980's provided the intellectual base that made it possible for me to design a model of the rail system.