I love data. And for those who love to tell stories, myself included, data can be a wonderful tool. Especially when that data is freely available to the public.
Sometimes, it isn't easy to see the bigger picture, among the mess of arbitrary column names and arcane tech specs. But records are like threads in a tapestry: once you understand how they complement one another, the vision they weave becomes clear.
Ordinarily, I reserve this blog for technical topics, however, today I wanted to take some time to talk about public data. It also happens that I love NYC & real estate, which I've had the privilege of working with over the last two years. And so, I thought I'd share a little about the wonderful world of NYC Public data with you today, and how it might inspire your next project.
The background of NYC public data
With a growing emphasis on transparency and public access to data under the Bloomberg administration, the New York City government has grown its offerings of public datasets and services under its Open Data initiative. This program represents a collaboration of various city departments, from which both single-source or composite datasets are released to the public, typically on a semi-annual basis. From it's website, you can access a huge number of datasets in a variety of different formats and verticals.
But to understand the data, let's first talk about the players, and what part they play in providing it.
Department of Information of Technology & Telecommunications (DoITT)
DoITT is the technical arm of the New York City government. They run the servers, and provide technical access to the public data, and manage many of the web services the NYC government provides (e.g. the city map.)
However, they are not responsible for the actual data itself, which comes from the respective goverment departments. All the same, in my experience, the folks at DoITT have been very helpful and knowledgable about much of the data they service. Contacting them with questions about the data is usually a good place to start.
Mayor's Office of Data Analytics (MODA)
Complementing the technical side of the operation is the city's data team, MODA. This department, established by the Bloomberg administration, tends to focus primarily on policy research and data sharing within the New York City government. Although I never personally crossed paths with anyone in this department whilst working with NYC Open Data, they do play a significant role in the program.
I like stories. But what stories can this data tell?
If you're like me, you're probably more interested in what you can actually use this data for. It turns out there are many uses: particularly for research, pet projects, and startups. In fact, there are too many specific applications to cover in this post (that I leave to you, dear reader, and your imagination.)
However, I have had much experience working with the public data concerning the real-estate spectrum. And from that experience, I'd like to cover several broad categories, which datasets are related, and how they might be relevant to your interests.
If you're interested in buildings, you'll want to take a look at the datasets from the DCP and DOB.
- List of buildings: The DOB assigns every building a building identification number (BIN). Take a look at PAD, issued by the DCP, which contains a list of all structures, their BINs, lots and addresses.
- Building details: Although the DOB has a wealth of information as a part of the Building Information Search (BIS) system, I haven't found a copy of building details available for download. However, the PLUTO dataset does offer basic descriptions for lots that might prove useful.
- Building shapes: Check out this dataset from Open Data. It could be useful for maps or other visual aids.
- Lot shapes: Similarly, the PLUTO optionally includes shape data for each lot it describes.
- Permits & construction: A list of permits by individual property is available in BIS, but not for download. However, new permits and status changes to existing permits are issued in weekly and monthly digests from the DOB, which could be downloaded and merged to create a rolling collection.
- Complaints: Also available via BIS, and also not available for download. Yet also available in weekly and monthly digests from the DOB.
- Violations: These are not typically managed by the DOB, but by Housing Preservation & Development (HPD), who has a search page for violations. This dataset is available for download, and is updated monthly.
- Affordable housing: Also known as "inclusionary housing". Zones for eligibility is roughly described in this dataset provided by the DCP.
- Zoning: A comprehensive list of zones is available from the DCP, which is updated monthly. You can also find some really awesome explanations and visual aids about the nuances of NYC zoning law on the DCP website.
For those interested in mapping an address or set of coordinates to nearby structures or zones, you're in luck! Well, kind of anyways. Data support for geocoding as a public service is a more recent development by DoITT and the DCP.
The official variety of geocoding comes in two major flavors:
- Web API: Through its Developer Portal, DoITT has provided a basic geocoding API service. Web developers rejoice! For small scale use, and simple needs, this may do the trick. But if you plan on hammering it with many requests, get in touch with the folks at DoITT first.
- Desktop application: This application was originally written for a mainframe system. Yes, you heard me: a mainframe. It's that old. And while the mainframe version still powers the API,
they finally released a desktop version in both Windows and Linux flavors for general use. It comes loaded with all the data you need for geocoding and a GUI, however, it wasn't really written with services in mind. (But if you do manage to make it through its tome of a user guide, and manage to make usable as a service, please let me know!)
However, if your needs are not satisfied by the official solutions, you may need to make your own. It isn't simple work, but it is possible!
- Finding specific buildings: To find a building from an address, one could combine the SND and PAD datasets together.
- Verifying addresses: Using the same strategy, if you find a matching record, you know the address is legitimate! You can also use PAD to detail specific qualities of an address (e.g. address ranges, parity, vanity, etc.)
- Plotting addresses on a map: If you could find a lot using the above strategy, then you could also cross reference it against the PLUTO dataset to receive a set of map coordinates. (Although, you'd likely have to translate those coordinates to lat/long first.)
And here we find ourselves squarely in DOF territory. Huzzah.
- Assessments: Basic assessment values can be found in the PLUTO dataset if you only need a casual assessment value per lot. For individual, detailed assessments, one could use the property search feature. For bulk data, some information is available on NYC Open Data.
- Exemptions & abatements: Similarly, basic exemptions are in PLUTO, and exemptions on an individual basis can be found via search. Beyond that, however, I have not found a dataset including all tax exemptions.
- Condominiums: The best dataset I've seen for a list of all condominiums is the DOF's Real Property Assessment Data (RPAD). Unfortunately, the last copy I found in the wild was years old, and I fear it might now even be extinct.
Mortgages & Closings
Also in the ballpark of DOF, look no further than the Automated City Register Information System (ACRIS) for many of the financial documents for property within the city. Their search feature can be used to find deeds, mortgage papers, and closing documents that offer a wealth of both financial & ownership information.
Let's talk about the actual data...
Under the realm of property & real estate, there are tons of rich data sets provided by NYC government, primarily from Department of City Planning (DCP), Department of Buildings (DOB), and Department of Finance (DOF).
However, these departments typically don't speak the same language in regards to property, and certainly not the same language as the average NYC resident. They instead relate to one another on what terms all government bodies understand...
...which means we have to talk about tax.
So for us to understand this data, we need to briefly cover how the government taxes property in NYC. (I'll try to spare you most of the details.)
The city bases all of its property tax billing, and consequently its description of property, around lots. Each of these lots are assigned a unique Borough-Block-Lot (BBL) number: a 10 digit code that looks like 1-00001-0001.
But, as with all tax code, it isn't that straight forward. There are two major structures for real-estate tax, each of which change how lot numbers are assigned:
- Traditional: This is your age-old tax scheme. A plot of land is assigned a BBL, and regardless of what's on it, or who lives on it, the city bills the owner of the land (typically a co-op LLC.) This means under this tax scheme, all structures on this plot of land have the same BBL.
- Condominium: This is the new school tax scheme. Instead of one BBL, the land is assigned a single billing BBL, then each individual condominium unit on that land is assigned its own unit BBL. Taxes are billed to each individual unit's owner.
Tax code is boring. What about those datasets?
Among the many offered datasets, let's break down some of the most critical ones, which are generally useful in understanding the landscape of NYC real estate.
Street Name Dictionary (SND)
The Street Name Dictionary (SND), managed by the DCP, contains a list of every NYC street. Within it, each street is assigned a name and a code, which can be linked to addresses (see PAD.)
The data itself illustrates an interesting challenge though: some streets have multiple aliases, change name, or have preferred abbreviation schemes. The DCP has a clever set of rules utilizing street codes to address this, however, detailed in their user guide.
Divergence aside, this brings us to...
Property Address Dictionary (PAD)
The PAD dataset, managed by the DCP, describes every known NYC address, and links them to buildings, lots, streets, and zip codes. It acts as the 'glue' for many of NYC's property data sets, and for this reason, it's the most important. It can be used to power a geocoder, address verification system, or other NYC-specific property tech. Unfortunately, PAD does not include latitude/longitude coordinate pairs for plotting addresses on a map, but this can easily be rectified by comparing an address to the PLUTO dataset (more on this later.)
PAD itself consists of two major files:
- PAD ADDR: The file containing actual addresses and their references to those buildings, lots and streets mentioned earlier.
- PAD BBL: Another file that lists all the city's lots, including the composition of condominium unit lots into condominium billing lots. Useful for translating lot numbers.
Rumor has it, there's also a mystery file called TPAD (or transactional PAD), which contains small updates to this city data between its semi-annual releases. But I've yet to get my hands on it, as it would seem the DOB is playing this card close to its chest.
Primary Land Use Tax Lot Output (PLUTO)
PLUTO is a Frankenstein of the NYC lot data sets, which contains lot and some building descriptions sourced from the DCP, DOB, and DOF. Unlike some of the other data sets, it describes lots and buildings in more detail, with things like lot type, zoning, assessed taxes, height, number of units, etc. Another awesome feature of this data set is that it optionally includes geographic shape data for its lots, which can be very useful for those with visual applications in mind.
Unfortunately, this data set only describes property at the lot level: some of the richer information about buildings can be either inaccurate, or just simply wrong, when the describe a lot with multiple buildings, so utilize with a grain of salt.
Real Property and Assessment Data (RPAD)
This data set is published by the Department of Finance, and contains rich information about condominium units, mostly in regards to tax assessments (which themselves often contain lots of really interesting facts.)
Unfortunately, this particular data set is rather elusive still, and I have only seen one very old copy of the data. (I will update this section with more information if I find it.)
Other useful resources around the web
If you're interested in more of what NYC has to offer, check out some links to other useful resources I've found & used:
- NYC Open Data Blog: The NYC Open Data program runs a blog with some interesting stories, which can offer inspiration.
- City Map: The official city map from the NYC government. It's kept up to date regularly, and allows you to search by address, lot or building identification numbers (BINs.)
- City Map Blog: The guys that maintain the NYC map also run a blog with lots of rich, interesting info about NYC maps and data.
- Building Information Search (BIS): The DOB's searchable database of every structure in New York City.
- Property Tax Roll: This system is great for looking up a full tax history on any NYC lot.
- ACRIS: Search for mortgage, closing, and other financial documents linked to a property. Great for the savvy homebuyer.
- Exemptions & Abatements: Searchable list of properties with tax exemptions and abatements.