Share
Preview
Having trouble seeing this email? View this email in your browser
Making Digital Newspaper Collections Easy
March 2017 • Veridian Newsletter
What's in this Newsletter?
2017 Veridian Development Roadmap
Delivering the Veridian Promise
Most Popular Articles
 
Dear Veridian users and newsletter subscribers,

We hope all is well and you are all enjoying 2017 so far!

This year is a special year for us, as it’s the 15th anniversary of DL Consulting and 10 years since the Veridian Software was first launched. Thank you all for your support in the past years. We will continue to deliver our Veridian promise and at the same time grow our systems and services to bring your collections even further.

 
2017 Veridian Development Roadmap
This year, we are planning to launch several new services and system enhancements that will benefit your digitization projects in many different ways. Please follow our newsletters to get the latest information.

Below are the three main items that we have been (and will be) working on this year. Once work has progressed a little further we plan to send out more detailed articles for each of them. In the meantime though, if you’re interested in finding out more please feel free to contact us.

Item #1 2017 Veridian User Interface Update
Veridian has been constantly changing and evolving for the past decade, but many of those changes are “under the hood”, and aren’t immediately obvious to the user. Even the changes we’ve made to Veridian’s user interface over the past few years (and there have been many!) have been fairly subtle. For example, over recent times Veridian has moved to being HTML5 compliant, and has incorporated much improved support for accessibility guidelines.

This year though we will be launching a brand new user interface with much improved support for small screen devices (to allow friendly mobile access to your collections), better accessibility (aided by our experience working with the American Foundation for the Blind), and a cleaner and more modern look. We hope that through this upgrade your users will be able to access your collections more easily and more frequently, from a broader range of devices.

Item #2 METS/ALTO Quality Assurance through Veridian QA
Veridian QA is a new service that will allow you to visualize METS/ALTO data before you sign off on it with your OCR vendor. METS/ALTO is a well-defined format, and there are already many tools (e.g. those provided by the Library of Congress for the National Digital Newspaper Program) for ensuring it is technically valid. What we often find though is that METS/ALTO that is technically fine may have underlying “hidden” problems. For example, often the “textblocks” that are defined in the ALTO files aren’t in the correct order, or some of the text on the page isn’t recognized as text at all (i.e. it isn’t inside a “textblock”), or some lines of text have been “cut off” during the OCR process. Another problem we regularly see is “textblocks” that run across two or more columns of a newspaper page.

For page-level METS/ALTO (like for NDNP), the textblock areas and ordering are usually not visible on a live system, and there’s no easy way to detect these sorts of problems. And these problems don’t cause any serious functional issues when the newspapers are displayed online in a system like Veridian or Chronicling America. They do cause a bunch of less obvious problems though, as follows:

  • In many cases some of the text on the page has not been OCR’ed at all, and in some cases we’ve seen the majority of the text on the page was excluded from the OCR process!

  • More advanced features like Veridian’s crowdsourced text correction rely on the underlying data being as correct as possible. For example, if entire lines of text have been excluded from the page it makes the correction process much more difficult.

  • Ensuring the “textblocks” on the page are correctly encoded in “reading order” is important! For example, even if all the text on the page is perfect (e.g. if users have corrected it) those using a screenreader to access the collection won’t be able to make sense of it if all the text isn’t in the correct order. Likewise the text is of much less value when exported and re-used in other systems if it isn’t in correct reading order.

For us though we see the biggest problem with these types of “hidden” errors as the potential they have to limit what may be possible with your collections in future. As an example, we hope to eventually add a crowd-sourced article segmentation feature to Veridian, to allow page-level newspaper data (like that produced for NDNP) to be converted to article-level data. The greatest challenge with doing that is not the technical implementation, but the relatively poor quality of a lot of the page-level METS/ALTO data we see.

Our short-term goal then is to build a system to help improve the quality of the page-level METS/ALTO being produced, by allowing data owners to visualize the METS/ALTO and see the “hidden errors” for themselves.
Item #3 Improved Veridian Digitization Services
We have now been producing what we’ve been calling “low-cost” METS/ALTO data for several years, for several of our Veridian customers. That data is valid page-level METS/ALTO, but to keep the per-page cost as low as possible it is produced with a completely automated process, so quality is variable.

The next step in the evolution of our digitization services is to improve that existing process (and incorporate Veridian QA) to produce consistent, high-quality METS/ALTO, to NDNP standard.

By adding Veridian QA to the process we will not only produce NDNP data that is accepted by the Library of Congress validation process, but also make sure hidden errors are greatly reduced. The end result is data that we hope is much more suitable for future expansion, such as with crowd-sourced article segmentation.

Again, we plan to keep everyone informed as we make progress on these three key items. If you’d like more information we’re always happy to discuss though.
 
Delivering the Veridian Promise
Over the past decade we have dedicated ourselves to building digital newspaper collections, and with your help, have made millions of historic newspapers available online. In addition we have done our best to help build strong user communities behind those collections. Today when we look back we are proud of what we have done. Thank you all for giving us the opportunity to work with you. It has been our pleasure to be a part of your journey.

However, we are not stopping here!! We see the past decade as just a beginning, and a test to help ensure we’re doing things right.

  • Rapid response
  • Transparent communication
  • Quality systems
  • Always striving to improve

With all the above we aim to deliver on our Veridian promise - “Making Digital Newspaper Collections Easy”. We are the system experts and you are the collection masters. Leave the system to us so you can focus on building your collections and your user communities!!

 
Most Popular Articles
Here are our most read articles in the past few months. These are the articles that others in the field are interested in. You might also find them interesting.

 
 
 

DL Consulting Ltd., Waikato Innovation Park, 1 Melody Ln, Hamilton East, Hamilton 3216, New Zealand


Email Marketing by ActiveCampaign