The Low Income Investment Fund (LIIF) and Enterprise Community Partners (Enterprise) are partnering with Living Cities to better understand how to preserve affordability and diversity in high-opportunity, transit-accessible areas, and we’ve turned to Big Data to help us answer this question.

This is the second in a series of four posts about how to navigate the realm of data, big and small, to connect low-income people to the jobs and amenities they need. Read Blog 1.

There is no shortage of data. At every level–federal, state, county, city and even within our own organizations–we are collecting and trying to make use of data. Data is a catch-all term that suggests universal access and easy use. The problem? In reality, data is often expensive, difficult to access, created for a single purpose, quickly changing and difficult to weave together. To aid and inform future data-dependent research initiatives, we’ve outlined the common barriers that community development faces when working with data and identified three ways to overcome them.

Common barriers include:

  • Data often comes at a hefty price. County tax assessors’ property information alone can cost $10,000 or more—a prohibitive cost, especially for low-budget nonprofits and community-development organizations.

  • Data can come with restrictions and regulations. Public agencies can run into legal restrictions that prevent them from sharing certain information, like owner name. Moreover, use policies sometimes restrict sharing datasets with professional peers, making the process of getting data cumbersome.

  • Data is built for a specific purpose, meaning information isn’t always in the same place. The county tax assessment roll offers some incredibly useful property information—including number of units, building square footage, and the year built—but the dataset is built for tax appraisal and billing, meaning that different parts of a mixed-use building may not appear on the same tax rolls (if they are appraised by different formulas) and that tax exempt properties (like public and church owned) may have missing information.

  • Data can actually be too big. National datasets, like CoStar or RealFacts, provide great information but at a scale that’s more useful to institutional investors and national organizations. Private firms like these often stop collecting information on properties with less than 50 units, meaning they don’t provide fine-tuned local or neighborhood intel.

  • Data gaps exist. Even if you gain access to a dataset, there may be missing data fields, broken links, or different, if not contradictory, information in comparable datasets. It takes time (and money) to clean up data and to piece together information from distinct jurisdictions.

  • Data can be too old. The subset of available data that’s accessible and useful to community-development organizations is often public and/or old. But data even just one year old can be obsolete and outdated in today’s age of rapid neighborhood change. Our largest challenge in the affordable-housing field is that we’re competing against quickly moving market forces. Private actors with sophisticated data staff and robust budgets are able to access the most up-to-date, comprehensive information, lending an advantage in this work. In resource-constrained environments however, we must rely on cheaper, older data and/or partners to share the data for free.

Let’s say you are able to identity data that defies the challenges outlined above. Now you have to think about barriers the public sector may face to accessing and sharing it. For example, when we needed access to public datasets in the Bay Area for our project, we relied on the relationships we had built with government officials through our local offices and years of working in the area. One of those relationships was with Wayne Chen from the City of San Jose, and even though Wayne is the Activing Division Manager for the Department of Housing, he still had to go on a scavenger hunt to get us the information we were looking for. He had to find the right contact in the assessor’s office, identify the specific data fields from the roll that we wanted (which had changed since the last set), put in a new request, create an account to access the data, and then get permission from his legal team to share it with us. Not to mention, government officials are often doing this pro bono on top of other responsibilities, which can delay the process.

As you can tell, there can be many complications when it comes to working with data, but there is still great value to using and having it. We’ve found a few way to overcome these barriers when scoping a research project:

1) Prepare to have to move to “Plan B” when trying to get answers that aren’t readily available in the data. It is incredibly important to be able to react to unexpected data conditions and to use proxy datasets when necessary in order to efficiently answer the core research question.

2) Building a data budget for your work is also advisable, as you shouldn’t anticipate that public entities or private firms will give you free data (nor that community development partners will be able to share datasets used for previous studies).

3) Identifying partners—including local governments, brokers, and community development or CDFI partners—is crucial to collecting the information you’ll need.

Keep an eye out for our third post coming tomorrow, that will dive into why people and human knowledge are still a crucial component of research. And follow along on social media with #ConnectUS!


Special thanks to Erin Austin for her contributions to this blog post.

Photo: Sound Transit System Map by Oran Viriyincy, Flickr. CC by SA-2.0.