What value is your entertainment time worth to you?

Cost of media

  • Go watch a 2 hour movie it’ll cost you about $13 = $6.50 per hour
  • Download episode 1 of Mad Men Season 7 (47:44) in standard definition for $1.99 from Apple = $2.54/hour
  • Listen to 7,800 minutes of Spotify in 2015. My $9.99/month subscription is $119.88 (year) / (7800 minutes/60) = $0.92/hour

Suppose you were to listen to 1 hour of podcasts each day (30 minute commute). Take a typical 5 day American work week, getting the standard paltry 2 weeks of vacation. Your listening would total 250 hours/yr. Suppose you were paying $9.99/mo for that entertainment it would work out to $0.46/hr. That $0.46/hour would work out to about 5% of the pre-tax annual salary of someone making $20,000.

Just stuff that I end up thinking about on a random Saturday night.

Christmas Card

“Brush your teeth, get your jammies on and then you can open one of your Christmas cards.” A simple, reasonable request, not complicated and one that you might not expect would leave your heart hurting, and so tired that you just want to give up.

Oh I wish she would have brushed her teeth. Screaming, rage directed at Kim, Ella, everyone, no one. Violent. Be calm, yelling at her amplifies the situation, she’ll cry more, I’ll cry, everyone will cry, I’ll hurt. Walk toward her, approach gently, at her level, with a soft understanding face. Offer comfort, try and guide her out of the corner. Hateful language, directed at me.

As gently as possible get her to bed. Lift surprisingly strong fingers from their grip of door frames, hoping that small hands aren’t pinched, just making it worse. Hope that her flailing head does’t smash into a wall. Wish that you could just sit down and watch Jeopardy, that you could exhale and enjoy a conversation with an adult. Know that the level of stress in your partner is increasing along with yours. Restrain yourself, wish that this small terror and life irritant would please stop, that you know it will impact the remainder of your evening, those precious 2 hours that you get between kid bedtime and yours.

Wonder why we even bothered buying any Christmas presents, attempting to bring joy and excitement to someone who has no interest in you having either.

Place her in bed. Tell her to stay, knowing that she won’t. Gently bring her back into bed 30+ times, not saying anything, realizing that it has become a game for her, figuring out how to make it one for myself. Knowing that I will win, but I won’t enjoy it. She understands that she has lost, but we aren’t done.

I rub her head, listening to her tell me how she doesn’t have any friends at school. Is this the source of tonight’s anger? She tells me how this friend and that are friends, how no one likes her. Inaccurate, of course, but telling her that just would make it worse. Continue to listen, rub, console, comfort. Feel for her, know that her problems are real. Want to be out in the living room. Ask her to do something for her buddy, to go to sleep, eyes closing, long lashes fluttering, feeling hopeful about this promise to sleep.

1 hour. Handled that well, proud of myself. I’m a good parent. Allowing myself to relax, come down from the stress. “Dad!”. She’s back up. Back to bed game again. Repeat again. She understands she has lost again and her attitude shifts, anger.

Faced with pure malice, evil, somehow it becomes hard to recall the fun we had playing video games earlier in the day. 1.5 hours in I sit at the threshold of her door, then crumple to the floor, telling her how much my heart hurts. It does, a crushing, squeezing feeling and I clutch my chest. I mistakenly tell her as I’m in the fetal position that she is hurting my heart, “Your heart doesn’t really hurt, you aren’t going to die, but if you do, I don’t care.”

I roll over, on my back and just give up. She covers me with a blanket, gets her own pillow and blanket and curls up with me on the floor. She’s asleep.

Marketing your podcast, thoughts

This is a repost of a comment that I left on the /r/podcasts Reddit about how to spend $500 to market a podcast. My response assumed that the primary source of revenue for the show is advertising. The ROI does not work out if that is your only source of revenue, though if you are using your podcast as lead generation for more expensive services, the customer acquisition costs might be worth it.


I’m assuming that you will continue to focus on improving the quality of your show, production and guests, and that this money is for marketing, advertising and promotion purposes only.

I’m also assuming that you are making more than $500/month in advertising revenue. At a $25 CPM you are earning $0.025 per listener download. Each listener would need to download 100 episodes before you’d make back a $2.50 listener acquisition ad spend.

Given that you might be willing to spend $500 each month on advertising I’d guess that you have somewhere in the range of 15,000-20,000 monthly downloads. Encouraging your existing listener base to promote your show might be the most effective method to grow the audience. Find a way to identify your most dedicated and passionate listeners and provide them with shareable marketing materials. Help those dedicated listeners build a street team that promotes your show to others.

I could provide more specific advice if I knew more about your motivations for creating the show, but some broad things to consider:

  1. Who is the potential audience for your show? Really nail this down to a small set of archetypes. “Whitetail deer hunters who are looking for advice.” or “Librarians who are looking for book reviews.” The more specific, the easier it will be to target a potential new listener with ad copy that addresses their needs.
  2. Does that audience listen to podcasts? If you are attracting a new audience, will you need to explain what a podcast even is? Advertising or cross-promotion on other podcasts means that you don’t need to explain what a podcast is, but then you are competing with other podcasts for that potential listener. Facebook and Google Ads are going to reach an audience that might not be familiar with podcasting, asking them to subscribe is a pretty high bar.
  3. Partner with adjacent media properties/companies. Are there established companies that are already doing something similar, but have yet to develop a podcast? Develop a relationship with those companies, pitch them on why a partnership and cross-promotion might amplify both brands.
  4. Are there blogs that are relevant to your topics and show? Find bloggers that might be interested in the hosts or topics. Give them advance notice of the show, write mini-press releases for them so that they have less to write on their own.


  1. Invest in the quality of your show. No brainer.
  2. Invest in visual branding Good design signals quality. Quality develops trust. Advertising to a new audience means that you have to establish enough trust that a potential listener will be willing to take a chance on your show. The goal is to get them to listen to your show, guide them into that pathway with strong visual elements and copy. Do you have a strong wordmark and logo? Are there high quality photographs of hosts, guests?
  3. Create video teasers/trailers/commercials. This isn’t the same thing as just posting the audio for your show on YouTube. Create a commercial, a 30 second clip that highlights why someone should visit your website. Use YouTube annotations to drive people to your website: https://creatoracademy.withgoogle.com/page/lesson/get-your-viewers-to-act-with-annotations-and-descriptions?hl=en so that you can encourage them to listen to your program.
  4. Ensure that you have your own website and domain. Telling someone to visit www.yourshow.com is much easier than telling them to search for your show and subscribe on iTunes, Overcast and the dozens of other apps. Once you have them there you have much more control over how you communicate with them. When I worked at PRX I spoke with the Radiotopia producers about how to convert website visitors into listeners:http://www.slideshare.net/prx/radiotopia-convert-your-visitors-slides-49743460 The slides were well received.
  5. Create landing pages that are tailored to new audiences and are specific to the ad campaigns. I’ve used Unbounce http://unbounce.com/ as the landing page location for my Google Ads.

Suggested approach for ad copy For any ads that you decide to run, have the call to action encourage people to visit your website. Do non’t tell them to subscribe to your podcast. Subscribing to podcasts is still a pain and it’s confusing.

Turn and face the strange

It’s time for a ch-ch-ch-ch-change. After an amazing 10 years at PRX I’m leaving my job as Chief Product Officer to rest up a bit, reflect for a while and explore other opportunities. My last day will be November 6th.

Follow your interests

I really enjoy taking photographs, many people know that, but what might not be as known is that I was a heavy and early Flickr user back in 2004–2006. I was an active and engaged participant in the community before Yahoo bought it and things took a different turn and before Instagram was even dreaming of its first filter. I loved Flickr and the community that it helped foster.

In January of 2006 I posted a photo of my six-month-old Ella’s tummy. She ended up with a nasty skin rash after wearing an American Apparel outfit. The support from the community and increased visibility for the issue changed how the company interacted with me. That event helped me understand how power shifts when people are able to speak directly with each other and can reach a large audience without intermediaries.

While I was taking thousands of photos I was hanging out creating software at The MITRE Corporation, a suite of multiple Federally Funded Research and Development Centers or FFRDCs (what a mouthful). A former co-worker and good friend Nathan kept telling me about this awesome organization where he had just started called PRX and how it might align with my interests.

Embrace change

I met with Jake and Steve to learn about what PRX was doing and it stoked my imagination. I remember long conversations with my wife Kim, describing PRX, the potential, the opportunities and the risk. We had a six-month-old, and Kim was at home taking care of Ella, me the in the role of breadwinner, working at MITRE, with a reliable, well-paying job. The kind of job that people rarely left.

For Audrey

The benefits at MITRE were amazing, the work could be interesting and challenging, and many people felt connected to it. The quarterly retirement statements displaying how many times over a millionaire I might be certainly made leaving tough. It was hard looking at a chart with eight figures on it and walking away to the unknown. My father called them the Chains of Gold, and breaking them was hard.

Leaving a 5,000 person company to take a pay-cut at a small, newly formed, mission driven non-profit that operated like a start-up to support public radio producers, while being solely responsible for the support of my family seemed like a crazy change to some of my co-workers. It probably was.

Millions of retirement dollars or meet Ira Glass?

Try, fail, learn. Repeat.

It could have gone all wrong, and early on there were times when it felt like it had, that maybe I had made a poor decision, one that was irreparable. I experienced crashes, both of the application server, disk and data center kind but also those that were very personal and emotional. I came to realize and understand that failure can be an opportunity for learning and growth, but that lesson doesn’t come without bruises, bumps and some scars.

The PRX Marketplace, Public Radio Talent Quest, PRX REMIX on XM,Matter, Dovetail, PubCatcher, Radiotopia, apps for This American Life,Radiolab, taking on the public radio satellite system with SubAuto, dozens of station apps, the Public Radio Player and oddities like Flu Portal andEconomy Story. Coming up with a complete list would be exhausting, but they were all opportunities to try something, learn and improve.

Jake, Kerri, John and the board have been creative and resourceful, finding funders who are interested in supporting the organization and that have allowed us to explore a large number of opportunities and research projects. Thank you for giving us the space to experiment and try.

What did this cat learn?


There are probably dozens of other important or interesting projects and people that I am forgetting to mention. What is most important to me is that PRX has been an organization filled with people who care. People who care about the mission, people who care about their fellow employees and people who care about the success of the producers.

Working here also showed me that politicking isn’t required, that you can care for the people you work with, and consider the personal impact of professional decisions, and still create something successful.

I learned how valuable it is to understand how people might be feeling.

Hugs are important, for those that enjoy them.

Value your time

I found it weird that turning 40 this year mattered. I didn’t think it would. There was something about that round number and marker of passing time that caused me to look to the past and to the future. This past year was filled with moments that gave me pause and opportunities to think about how I wanted to spend my remaining time.

Recently I was organizing photos at home and came across two that caused me to sit back and think. One is of my daughter Ella, from just a week or two after I joined this crazy talented crew at PRX and the other is one that I took of her only a few weeks ago . The difference and change between those two photos amazed me and marked the passing of time in a way that was a stark reminder of just how much had.

My version of the before and after presidential photos. My oldest daughter the week after I started at PRX and one from just a few weeks ago.

I’ve spent almost 25% of my adult life working at PRX and 50% of my career. When I think about all the change that has happened here, it’s overwhelming. It can be hard to see that change though when you are in it, so when I took time to take a giant step back and recognize just how much really has, it was amazing. Those two photos helped solidify it for me, it’s time to try something new.

Beyond broadcast

I’m so excited about this moment for audio as a medium to reach and connect people. It’s crazy to think that my first conference at PRX in May of 2006 was Beyond Broadcast where innovators and industry experts were imagining how to shift the power from entities and organizations to the people.

Inexpensive production tools, podcast distribution, mobile devices, and first class HTML audio support mean producers with initiative are able to reach a direct audience, without gatekeepers deciding what we should hear. Spin up a WordPress or SquareSpace website, post to PRX, SoundCloud or YouTube and bam, their voice can be heard. Doing it well, and creating something that people want to hear, that will always be hard.

A fresh crop of people without a radio background are starting to imagine and create audio. Writers, actors and creative people from many disciplines are starting to understand that audio, and the human voice is a powerful way to reach and connect with people. Distribution and an audience are now accessible to them in a way they haven’t been. It’ll be so neat to see what they create.

Podcasts are just the start.

What is next?

Honestly I really don’t know what my future holds, and that is both scary and exhilarating at the same time. Once again I have found myself in a moment and position where I’m doing something less typical, unexpected and a bit risky. My hope is to spend an appropriate amount of time to reflect on my 10 years working at PRX in digital media, and my nearly 20 years creating and helping design software products before making any decisions. I’m hopeful that things will work out this time too.

Tom Flynn Is The Hottest ‘Jeopardy’ Contestant Ever

The internet is amazing and so is PRX. Both have provided me with opportunities to meet and work with a splendid array of talented individuals. People that I never expected to meet. Working at PRX opened my eyes. I have been dreaming of a period where I could ramp down and unplug, work with my hands, focus on the physical world and my family and hit the big pause button on life for a moment.

So, my immediate plans are to spend time with my family, build some ambitious blanket forts, read books with my kids and do some writing. I’ll also continue training to get my sub 00:20:00 5K time (come cheer me on!) and see how hard I can push my physical fitness. After that, it’ll be time to explore what I might want to work on next. I hear PRX has some pretty neat things bubbling in the laboratory too… Drop Jake a line if that piques your interest. 😉

Data and metrics

I spend a crazy amount of time looking at Google Analytics and spreadsheets, tracking numbers related to PRX. How many licenses, published pieces, producers publishing, listening time, duration, and so on. So I thought I’d wrap up by sharing some of the most important metrics about my time here.

  • 155,000 pieces published (actual)
  • 1,460 chocolate chip cookies eaten (estimated)
  • 8,400 metaphors used (estimated)
  • 241 times Rebecca Black’s Friday was played in the office (estimated)
  • 180,000,000 toe taps (estimated)
  • 550 whiskey sours consumed at holiday parties (estimated)
  • 243 times that the website was down, when it wasn’t (estimated)
  • 323 custom meme photos created and shared (estimated)
  • 3 Filipe’s Burrito locations (actual)
  • 9 children born (estimated)
  • $12 lost playing SpotIt (current estimate)
  • 520 amusing photoshop graphics created for newsletters (estimated)
  • 54,000 happy tail wags (estimated)
  • 2,300 wet dog kisses (actual)

When I see you again…

Jake says you never really leave PRX, so how about instead of a cold goodbye, we make it more of a warm see you later. If we’ve worked together I’d love if we could keep in touch.

I’m always interested in meeting people who want to talk about media, broadcast, disintermediation, open government, municipal government, city planning, Internet and digital distribution. If any or all of that interests you, please say hello. If you also happen to be into bodyweight fitness, I may have found my doppelgänger. You can follow me on Twitter, email me at matt.macdonald@gmail.com or find my code hacking on GitHub.

There ain’t no such thing as a free (open government data) lunch

There's no such thing as a free lunch.
Credit: Dana Fradon

There ain’t no such thing as a free lunch — I like that saying. It’s important to remember that it costs money to provide access to information, even public government or municipal information. Someone must pay. But who pays and how much? I like Philip Ashlock’s Tweets so I follow him on Twitter, he also has a nifty proposal called DemocracyMap in the Knight News Challenge. Our NearbyFYI proposal is also in the semi-final round with Philip’s, so when I saw his recent Tweet about “Paying for Public Data” it piqued my interest.

“There are certain elements of our democratic system of government that are so essential to its freedoms and principles that we have to make them as accessible as possible and provide them free of charge.”

That line got my attention and guaranteed that I’d read his entire article. I realize the piece is mostly a response to David Eave’s Tech President post about grant funded projects potentially destabilizing for-profit organizations. I’m not going to weigh in on that discussion as I don’t have strong opinions, but I do want to unpack Philip’s statement that some government information should be free of charge. It’s something that I’ve written about before but I think it’s worth teasing out more.

The government data ecosystem

I’ve spent a lot of time thinking, reading and doing in the Open Government space and I think the following four statements pretty accurately represent the current government data ecosystem:

  1. Citizens must have access to information about their municipality or government.
  2. Access to information has a cost.
  3. Corporations derive revenue from government data.
  4. Citizens pay corporations for services built with government data.

I’m going to focus on statement #2, I have plenty of thoughts about #1, #3 and #4 for a later time, let’s focus on #2 first.

Access to government information has a cost

There are real costs when we interact with our government and request access to information about it. In the non-digital world we need government employees to pull documents from filing cabinets, answer phone calls and handle in-person information requests. Municipal employees attend public meetings, give presentations and summarize reports for us. The human cost to access analog government information is real.

The Open Government movement usually focuses on the digital world, but digital access to government information has real costs too. Proprietary software solutions require licensing fees and support contracts, and open source approach requires knowledgable IT staff, dev-ops and probably developers. There are support costs for open source too, people to help answer questions. There are costs to digitize documents, setup new data publishing workflows and smaller fees for servers (cloud or not), bandwidth, and electricity.

Who is paying the costs now?

I think we often miss an important point when we talk about free, public or open government data — that when we require our government to provide better access to information that it creates a new expense line item. This means that we as citizens pay for improved access — but are we getting a good return on that investment and expense? Larger cities currently pay vendors to provide Open Data portals and information publishing solutions for them. There are open source alternatives like CKAN too. These solutions can work in larger metropolitan areas where the benefits of sharing open data can be realized downstream, but smaller communities will have a harder time justifying the expense. What new open data derived services in a town of 3,000 people will be created that the community values enough to offset the expense?

I live in the Boston area and when the MBTA opened up transit data my commute improved as Google and others stepped in, using tax payer funded data to provide us with useful services based on that data. My experience with the MBTA improved as a result of that investment from my tax dollars. I think my tax dollars are being used wisely. I live in Watertown, Massachusetts (pop. 31,915) and if the town decided to use a vendor like Socrata to provide open data, a very small portion of my property tax dollars would be used to pay for that service. I’ve spent a long time thinking about what services would be created from an Open Data portal in a community of this size and I haven’t been able to think of one that would be worth our tax dollars. When budgets are being trimmed, and it comes down to more teachers or more open data portals, the teachers are going to win and probably should.

My hometown of Millinocket, Maine (pop. 4,506) and Watertown, Massachusetts are probably more like your town than Boston is. Millinocket doesn’t have a public transit system or other large scale services, so it’s harder to explain how Open Government, specifically “Open Data” will help them. When 80% of our towns have fewer than 10,000 people living in them, we should be looking at different models and incentives to get improved municipal information access. Without economic incentives, meaning cost savings — not a new expense, smaller communities will be hard pressed to adopt Open Data tools.

There is another information cost that isn’t often highlighted, I am certainly paying a portion of  a City Clerk’s time when an RFP bid monitoring company from out of state calls my town office and takes 5 minutes to ask for information about new bids. A company that does not pay property taxes in my town. When a fellow citizen emails the Town Manager to ask a budget question it takes time for them to find the information and provide a response. I understand that these examples result in a very tiny portion of my tax bill, but I think it is important to highlight that we’re already paying for data access, it’s just poor quality and inefficient.

So, wait who should pay then?

The Internet is littered with services and software vendors that cost tax payers money, where local government is the customer and our tax dollars help their businesses profit. Code for America references a GovWin report that states $60B will be spent on IT by local and state government in 2013. There is a long list of companies suckling from that wealthy government IT teat. From simple website vendors like CivicPlus, GovOffice and Virtual Town Hall to older established companies like Tyler Technologies, IBM and Microsoft. There are thousands of small mom and pop vendors, mid-tier $5M-$20M companies and fortune 500s that provide solutions for building permitting, parking tickets, accounting and payroll. All of these solutions cost us as tax payers money.

So, you may be wondering how do we get access to municipal information if we’re not paying these companies to provide tools and services? Good question.

Companies like BidClerk.com, CrimeReports.com and eRepublic broker access to municipal information to other companies. They recognized value in the data that our cities and towns generate and spent time and money building up methods to collect this difficult to access information. The Adam and Eve of this approach are Westlaw and Lexis Nexis. I know that there are serious, important discussions taking place about these types of companies and their practice of paywalling access to important documents. To be clear, I believe citizens MUST have access to these documents and information without having to pay a second time. What is important though is that these companies figured out that there are other entities that derive business value from having access to this type of information. What Westlaw, LexisNexis and until only recently JSTOR, couldn’t figure out is how to broker access to data for those that derive business value, while providing access to the public.

To re-appropriate William Gibson’s famous quote “Access to Government data is already here — it’s just not very evenly distributed.”

What are we proposing that is different then?

The approach that we promoting at NearbyFYI blends both of these realities, that there are companies that derive value from better government data and that citizens must have access to it. Access to information has a cost, we just want to shift it over to those that benefit financially in a more efficient way. We see a better way to get high quality data into the hands of those companies willing to pay for it by providing better tools and services for our government. We don’t see expensive “Open Data Portals” as being that solution. We’re going to treat the government as a user, not a customer, helping government employees do their jobs more efficiently. We see a wealth of valuable “dark data” in our municipalities, data that other companies will pay to get access to, we just need to give municipalities better tools to provide it.



Local government is sexy


Local government is sexy.

It isn’t really, but it could be made much simpler to understand. Most of the work that takes place in local government is done in Select Boards, Town Councils and Sub-Committees like Finance and Public Works. Most people could care less what goes on these meetings until their local property tax bill comes. Most cities and towns live in the stone age when it comes to the processes for their meetings. It’s like the Internet never even existed. Word documents scanned as images then turned into PDFs that require OCR are state of the art: http://www.cityofwestfield.org/Files/AgendaCenter/Agendas/68/Archives/57/01-08-13%20Conservation%20PH%20Niemiec.pdf.

I’ve been working on a set of tools to collect meeting minutes, agendas and reports from hundreds of cities and towns in Vermont. We have over 150,000 documents now. We’re doing the best that we can to extract meaningful, structured data from the blobs of PDFs, Word documents and the most poorly formed HTML you’ve ever seen. We’re finding useful, interesting bits of data in this local legislative soup, Vermont Public Radio is using the information we’re finding to write stories that have been picked up by NPR and the Associated Press.

We’re never going to win the battle though. The upstream source is so polluted. We need to clean things up. I’m starting to flesh out an open source meeting management tool (Muni Meeting) that is specifically designed for Municipalities and how they run meetings. The primary benefits to a town being:

* Reduce meeting taker and organizer time
* Real time publishing of notes – zero publish time
* eDelivery of Meeting Packets (Police Officers usually hand these out manually)
* No need to convert Word docs and flatbed scanner documents to PDF
* View voting histories, profiles
* Record meeting audio via iPhone and Android apps
* Meeting topic trends
* Searchable meetings
* Low-cost or FREE tool
* Open source, open APIs for data
* Useful, structured data and information for analysis

There is a closed source, commercial vendor in this space called Granicus. They build decent tools, they have an API (limited non-public access) but they create an expensive, closed and complicated tool. It is a tool for larger cities and towns. Towns with closed circuit camera systems and $40,000,000+ budgets. Most towns in the United States have fewer than 30,000 residents. These are the towns where a Selectboard meeting might take place in a library or in a kitchen of a member. These smaller towns pass important laws and ordinances that are rarely noticed in our busy lives. Democracy is happening in public view, but we just don’t see it.

If you have gotten this far it’s likely that you’d be interested in talking with us about we’re dreaming up. I’d love to collaborate with others on this.

“My Administration is committed to creating an unprecedented level of openness in Government. We will work together to ensure the public trust and establish a system of transparency, public participation, and collaboration. Openness will strengthen our democracy and promote efficiency and effectiveness in Government.” — BARACK OBAMA

Anyone interested? matt.macdonald@gmail.com.

Matt MacDonald

3 ways cities can improve citizen access to online meeting information

Hi there,

I posted this over at our NearbyFYI blog and thought it would be useful to cross post it here.

At NearbyFYI we review online information and documents from hundreds of city and town websites. Our CityCrawler service has found and extracted text from over 100,000 documents for the 170+ Vermont cities and towns that we track. We’re adding new documents and municipal websites all the time and we wanted to share a few tips that make it easier for citizens to find your meeting minutes, permit forms and documents online. The information below is written for a non-technical audience but some of the changes might require assistance from your webmaster, IT department or website vendor.

Create a unique web page for each meeting

Each city or town meeting that occurs should have it’s own unique webpage for the agenda and meeting minutes. We often see cities and towns creating a single very large webpage that contains an entire year of meeting minutes. This may be convenient for the person posting the meeting minutes online but presents a number of challenges for the citizen who is trying to find a specific meeting agenda or the minutes from that meeting.

Here is an example of meeting minutes that are in a single page that requires the citizen to scroll and scroll to find what they are looking for: http://www.shrewsburyvt.org/sbminsarchive.php This long archived page structure also presents challenges to web crawlers and tools that look to create structured information from the text. Proctor, VT provides a good example for what we look for in a unique meeting minutes document. We like that this document can answer the following questions:

  1. Which town created the document? (Proctor)
  2. What type of document is this? (Meeting Minutes)
  3. Which legislative body is responsible for the document (Selectboard)
  4. When was the meeting? (November 27, 2012 – it’s better to use a full date format like this)
  5. Which board members attended the meeting? (Eric, Lloyd, Vincent, Bruce, William)

The only thing that could improve the access to this document is if it was saved as a plain text file rather than a PDF file. Creating a single web page or document for each meeting means that citizens don’t have to scan very large documents to find what they are looking for.

Save PDFs as text not images

After running our CityCrawler for several months it’s clear that cities and towns love the PDF file format to share information online. While PDF files can be a quick way to post information online, cities are often publishing documents that are scanned images which is no better than taking a photo of the document and posting it on Instagram.

The challenge here is that search engines must use OCR Optical Character Recognition software to try and extract the text from the image. Anything that makes it harder for search engines to index your documents means that fewer citizens are going to find your published information. At NearbyFYI we often see documents that could easily be saved as a text PDF but are scanned as images. Here is how this document could easily be converted to a format that search engines can use.

Steps to convert your PDF to text

  1. Scan the meeting minutes document and save it as a PDF.
  2. Open the scanned document in Adobe Acrobat.
  3. Select “File> Export> Text> Text Plain.”
  4. Name the document and click “Save.”
  5. Open the saved file and review for conversion errors.
  6. Save the corrected document and post to your website.


Allow web crawlers in Robots.txt

Most information online is found via search tools like Google or Bing. Google uses what is called a web crawler or web page indexer to review your website documents so that they can add your content to their search index. This is a good thing, you want the search companies to find your content as it’s likely the way most citizens are going to look for information about your community.

Robots.txt files contain a simple set of rules that web crawlers follow. Some websites are setup to allow crawlers others aren’t. This is an example of a robots.txt file from a city who’s online information won’t be found with a Google Search:

User-agent: *
 Disallow: /

What this means is that when a citizen searches Google for “how to get a zoning permit in pownal, vt ” they won’t find this page: http://www.pownalvt.org/planning-commission/town-plan/zoning-rules/zoning-permit-application/. Ensuring that web crawlers can access the documents you post on your website is likely the simplest thing that you can do to improve citizen access.


Jason and I have been working on CityCrawler so that we can extract structured information and provide an API to the public documents and data that our cities and towns create each day. We’ve encountered a number of other issues with city and town websites that make it harder for both citizens and web crawlers to get access to this online information and we think that we can help.

If you are interested in learning how your city can improve access to meeting minutes and other public documents please contact us at info@nearbyfyi.com.

Matt & Jason

What if adding more money to our school budget doesn’t help our children as much as we hope?

Someone questioned why I posted an article from The Onion in the Watertown Residents for Strong Schools Facebook group. Thought it might be useful to cross post it here.


I was hoping that posting the satirical article from The Onion about our education crisis might prompt further discussion. Plus it was just damn funny and I didn’t see the harm in bringing a little levity to the seriousness of the discussion.

When I read The Onion article it reminded me of a common topic on this group about how much money the school department should receive and also of a segment I saw recently on the TODAY Show about Unschooling.

Dialog in this Facebook Group reinforces the message the media covers at the national level http://www.educationnation.com/ http://www.pbs.org/wgbh/pages/frontline/education/ that our current public education system is no longer producing the results we expect. Many of our conversations in this group have focused around increasing the funding our school department receives from the town and by doing so we will better support our students and teachers.

This morning I Googled, “does increasing spending on education help?” and you land on this article which does a pretty great job of dealing with the question: http://www.thefiscaltimes.com/Articles/2011/06/06/School-Budgets-The-Worst-Education-Money-Can-Buy.aspx#page1 Basically – just spending more money won’t help as much as we think.

Maybe we should ask the tough question:
What if adding more money to our school budget doesn’t help our children as much as we hope?

When I met with former Superintendent Ann Koufman last spring she handed me a book called Rethinking Education in the Age of Technology: http://www.amazon.com/Rethinking-Education-Technology-Education-Connections-Education-Connections/dp/0807750026 which spends the first few chapters describing how our current educational system came to be. Reading how our industrial revolution shaped the education system was enlightening – it also makes it clear that the 150 year old structure of our education system is ripe for radical transformation.

Self directed learning is so easy now – with just a few mouse clicks you can learn more about just about any topic on the Internet, you can attend FREE in-person computer programming classes: http://meetup.bostonpython.com/events/17433132/?eventId=17433132&action=detail or attend university courses: https://www.ai-class.com/. The Khan Academy http://www.khanacademy.org/ is educating millions. TED talks inspire us to learn more about our world and spark conversations and jumping off points for learning: http://www.ted.com/. Go to a Maker Faire with your kids: http://makerfaire.com/ or better yet start one.

When you examine our town budget and after you add in the town appropriation, health care, pension and state and federal grants you see that our school department budget is north of $50,000,000 per year. With our 2,800 students that comes in near $18,000 per student. What if all of that money, and I mean all of it went to teachers/facilitators that interact with our kids? Imagine a world in which each educator could earn $108,000 per year to work with only 6 kids?

Could we become more involved in educating both ourselves and our children if we shook up the institution? Could we better engage with our local scientists, artists, engineers, bakers, welders and electricians if we didn’t carry the baggage of a 150 year old institution?

I believe that our government should fund the education of it’s citizens, it benefits all of us to have a more engaged and informed society. I just want to know – can we do it better and more efficiently.


He Said, She Said – Extracting data from 5 years of Watertown Town Council meetings


I think I found a way to to raise the visibility of some pretty boring civic data and present it to the public in a more useful manner. Like most cities and town Watertown, MA keeps detailed meeting minutes for each Town Council meeting that takes place. Since 2006 those meeting notes have been placed in the Document Center as PDFs. I doubt that many people are reviewing those notes, I’d honestly be surprised if there was anyone else but me looking through them. If you do review them, please say hi in the comments! Check out He Said, She Said if you want to skip the details and get to the data extracted from 5 years of Town Council meetings.

Project Goal

For this project I wanted to see if I could take the very dry and challenging to read meeting minutes from the Town Council meetings and present them in a way that might make it more interesting to people in Watertown. I’ll consider this project a success if just a handful of people explore some of these meeting minutes.


Once again my primary challenge is working with information that is locked up in PDF documents. So what could I do to take these vanilla, boring presentations of our civic employees and create a more engaging experience.

Data for this project

Tools used in this project

  • We’ll need the trusty Typhoeus and nifty Nokogiri  ruby gems to help us screen scrape and download the PDFs
  • The wonderful pdftotext tool to liberate all that juicy information from the PDF
  • Open Calais will help us find interesting people, quotes and business mentioned in the minutes and we will use the Calais Ruby gem to help us here.
  • A number of ruby scripts to use all these fine tools
  • Ruby on Rails running on Heroku will serve up the now super awesome meeting notes

The Result?

He Said, She Said is the website that shows quotes from the Town Council meetings. I’m pretty happy. I was able to pull down the PDFs, convert them to text, run them through the Open Calais web service and present them in a Rails app in a much more compelling way. The basic website that shows quotes from Town Council meetings is called He Said, She Said (thanks for the name Kimmi!) I’ve found it a fun way to look at quotes and dive into older meetings. There is obviously much more that should be done but I’d like to get it into the hands of more people in Watertown to collect feedback.

“Councilor Lawn indicated that a few business owners contacted him and stated that they would be willing to pay more for the service than lose the service altogether.”

Steps to reproduce this project in your town

  1. Locate your town meeting notes. Hopefully they are online and if you are lucky they are at least in a PDF format.
  2. Download one PDF and try to use the scripts and tools mentioned to extract the text. You may even want to copy the text from the PDF to try the Open Calais Viewer. Using their free viewer, you’ll get a sense of how useful their entity extraction is before you commit all the way.
  3. If Open Calais seems to provide useful results for your town meeting notes you can use, modify or improve the Ruby script (town-council-parse.rb) that I have provided, to screen scrape your town website and pull down the PDFs for you to locally work with them.
  4. After you have downloaded the PDFs using the script you’ll need to convert the PDFs to a plain text format so Open Calais can extract meaning from them. Take a look at the convert_to_pdf.rb script. It is very, very simple as it just converts all the PDFs it finds in your current directory into text files. If pdftotext can do batch jobs I didn’t find it right away. After running this script you should see a .txt file with the same name as the pdf.
  5. Now after you have your text files you’ll need to send the text that was extracted from the PDF to the Open Calais web service for them to generate meaning and structure from your documents. The script, extract-entities.rb uses the Calais Ruby gem to make our requests much easier. When running this script it will take a while to generate the files from the plain text documents. I like working with the JSON data format so that is what we get back from Open Calais.
  6. JSON isn’t the most user friendly format for people to look at so we should do something about that. How about we convert it to the more Excel and lay person friendly format of Comma Separated Values, CSV. Using the Siren Ruby gem we use create-quotes-csv.rb to parse the JSON returned and stored in our new csv files.
  7. At this point in your file system you should have 3 files for each meeting. The PDF, the .txt file and the .txt.json file. You should also have a new file called quotes.csv that provides a CSV file containing all of the quotes from the meetings that Open Calais located.
  8. Upload the PDFs to Scribd so people can more easily read them on the web. I currently have moved over 93 of the 126 documents before I hit the Scribd rate limit. Hopefully I will remember to go back and add the remaining documents. For my own reminder Scribd shut me off at TC%20Minutes%204.24.2007.pdf.
  9. You may find this is a place where you might stop. Having the CSV file of quotes gives you quite a bit of ability to store it in a relational database and perform fun interesting queries. I wanted to provide a more visible face to this information so I used Heroku to run a very, very simple Rails server that only currently shows a single random quote from the meetings.

Next Steps

I have known about ScraperWiki for a while but haven’t yet really tried it out. I’d like to see if I can migrate the collection of scripts that I’ve been using over to that tool. I also need to finish uploading documents


Some of the scripts

[gist id=1176346]

Mapping the location of building permit submissions in Watertown, MA


I’m trying weekly to identify and hack on data that is published in the Watertown, MA document center in order to provide it in a manner that people might find more useful than a PDF, Excel or PowerPoint presentation. Today I decided to focus some effort on the building permits. We’re about to do some water damage work in my daughters room so this felt like a relevant area to hack on.

Hack Goal

Determine the effort involved in creating an automated process to place issued building permits from Watertown, MA on a google map.

Data for this hack

Tools used in this hack

My Hack Results and a description

Overall I’m pretty happy with how this hack turned out. I was able to take a very boring presentation of this pretty interesting information that was locked up in a PDF and display it on a map. Here are the hoops that I had to jump through to get this working.

  1. Download the PDF from the Watertown, MA website.
  2. Open up DeskUNPDF and use their conversion tool to identify tabular data in a PDF so it could be extracted as a comma separated value (CSV) file.
  3. Write a little Ruby script that further cleaned up the data in preparation for generating the latitude and longitude coordinates for use in the Google Geocoding API. After getting the lat/long coordinates from Google I then had to write it out as a new CSV file for Socrata to take over.
  4. Upload a new data set to Socrata and mark the lat/long fields into a location field
  5. Use Socrata to generate the Google Map view

How to improve this


  1. Automate the downloading of building permit PDFs from the Watertown website
  2. Detect when new PDFs are available for download
  3. Script the PDF to CSV conversion using the DeskUNPDF command line interface rather than the UI
  4. Automate the updating of the dataset on Socrata

Better data to start with

Extracting this data from a PDF is something that we shouldn’t have to do. This data resides in a computer database somewhere so why should we need to write scripts to scrape and cobble this data together. If you feel like joining me in writing to Ken Thompson, CBO, LCS Inspector of Buildings at kthompson@watertown-ma.gov to see if there is a better way to get this information I’d be grateful.

Data integrity

One major sticking point that I have is that DeskUNPDF isn’t picking up the first two rows in each PDF so I need to ask them if they know what might be going on with that. Missing the first two rows isn’t a huge deal but I’d like the dataset to be accurate.


While I think seeing the individual permits on a map is better than looking at a list in a PDF, I would like to see this data in graph form with permits plotted over time. Obviously summer months are high for permits but I would be quite interested to see how weeks and months compare with previous years.


Not a ton of code here but you can find the Ruby script below. If others are interested in helping me write scrapers for this data the scripts will be updated in github.

[gist id=1169914]