CIRCA:Venice Time Machine Project

From CIRCA

(Difference between revisions)
Jump to: navigation, search
(What is Venice Time Machine?)
(Scanning)
Line 52: Line 52:
=='''''Scanning'''''==
=='''''Scanning'''''==
Paper documents are turned into high-resolution digital images with the help of scanning machines. Different types of documents impose various constraints on the type of scanning machines that can be used and on the speed at which a document can be scanned. In partnership with the industry, EPFL is working on a semi-automatic, robotic scanning unit capable of digitizing about 1000 pages per hour. Multiple units of this kind will be built to create an efficient digitization pipeline adapted to ancient documents. Another solution currently being explored at EPFL involves scanning books without turning the pages at all. This technique uses X-ray synchrotron radiation produced by a particle accelerator.
Paper documents are turned into high-resolution digital images with the help of scanning machines. Different types of documents impose various constraints on the type of scanning machines that can be used and on the speed at which a document can be scanned. In partnership with the industry, EPFL is working on a semi-automatic, robotic scanning unit capable of digitizing about 1000 pages per hour. Multiple units of this kind will be built to create an efficient digitization pipeline adapted to ancient documents. Another solution currently being explored at EPFL involves scanning books without turning the pages at all. This technique uses X-ray synchrotron radiation produced by a particle accelerator.
 +
https://youtu.be/XwwuhCd-CqM
=='''''Transcription - Information extraction'''''==
=='''''Transcription - Information extraction'''''==

Revision as of 23:49, 24 October 2021

Venice Time Machine Project

Contents

What is Venice Time Machine?

The Venice Time Machine is a large international project launched by the École Polytechnique Fédérale de Lausanne (EPFL) and the Ca' Foscari University of Venice in 2012 that aims to transform the ‘Archivio di Stato’ – 80 km of archival records documenting every aspect of 1000 years of Venetian history including maps, monographs, manuscripts and sheet music. – into an open-access digital information bank.

This project has now been extended to dozens of cities in Europe thanks to funding from the European Commission under a larger project called Time Machine.

Its purpose

  • The project aims to trace the circulation of news, money, commercial goods, migration, artistic and architectural patterns amongst others to create a Big Data of the Past.[2] Its fulfillment would represent the largest database ever created on Venetian documents.
  • It promises not only to open up reams of hidden history to scholars, but also to enable the researchers to search and cross-reference the information, thanks to advances in machine-learning technologies
  • If it succeeds, it will pave the way for an even more ambitious project to link similar time machines in Europe’s historic centres of culture and commerce, revealing in unprecedented detail how social networks, trade and knowledge have developed over centuries across the continent. It would serve as a Google and Facebook for generations long past, says Kaplan, who directs the Digital Humanities Laboratory at the Swiss Federal Institute of Technology in Lausanne (EPFL).

Who create it?

  • The Venice Time Machine Project was launched by EPFL and the Ca' Foscari University of Venice and The State Archives of Venice in 2012.
  • Frederic Kaplan is the project’s director. He currently holds the Digital Humanities Chair at the Swiss Federal Institute of Technology in Lausanne (EPFL). He directs the EPFL Digital Humanities Institute (DHI), comprising 5 research laboratories. He is also the president of the Time Machine Organisation
  • This project also includes collaboration from major Venetian patrimonial institutions: The Marciana Library, The Instituto Veneto and the Cini Foundation.
  • The project is currently supported by the READ (Recognition and Enrichment of Archival Documents)European e-Infrastructure project, the SNF project Linked Books and ANR-SNF Project GAWS.
  • The international board includes renowned scholars from Stanford, Columbia, Princeton, and Oxford. In 2014, The Lombard Odier Foundation joined the project Venice Time Machine as a financial partner.

History:

  • 2013:

The Venice Time Machine project officially begins with an initial agreement signed on February 23, 2013, between Ecole Polytechnique Fédérale de Lausanne (EPFL) and Ca ’Foscari University. For this signing, the Italian Minister of Education and Research, Francesco Profumo, as well as the Swiss Secretary of State for Education, Research and Innovation, Mauro Dell’Ambrogio, make the trip to Venice, thus highlighting the collaboration within the context of good Switzerland-Italy relations. A joint training program between EPFL and Ca’Foscari University is set up, taking the form of regular autumn schools: joint activity weeks were organized by the partners. The objective of these training courses, attended not only by students from EPFL and Ca’Foscari University but also young researchers from several other European and American institutions, is to develop interdisciplinary training around archival material and new technologies.

  • 2014:

The State Archives, Ca’Foscari University and EPFL sign a first formal collaboration document framing a joint program of actions for the future. The aim of the project is to transform the documentary heritage of archives into an online information system that is available online for the community of researchers and specialists, but also for the general public. The agreement specifies that “the digitization of ancient documents is an essential step for the conservation and enhancement of cultural heritage, two of the fundamental missions of archives.” And that “digital images .. make research possible worldwide allowing thus the creation of ambitious international projects.” The text continues: “For these projects to be carried out, it is important to create a freely accessible database of images of documents associated with related instruments and records of archival descriptions.” Finally, to avoid any ambiguity, the agreement specifies: “In addition to viewing the images, it will be possible to download them in accordance with standards of the Code of Cultural Property and Landscape“. The objective of the project is as “helping the Venice State Archives to make rapid progress in the digitization of the documents it stores and in making these documents available to the international research community.” It is for this reason that the images will be “distributed globally with an open license.” The EPFL provides the scanners, servers, computers and all the necessary equipment for the creation of a first digitization space to be installed within the State Archives of Venice. A pre-study phase in close collaboration with the archivists takes place from June to September 2014 in order to test the performance of the digitization chain, primarily in terms of speed in accordance with the categories of documents considered. On the basis of this preliminary study, the choice of series and the configuration of the teams would be established. On June 2014, the official inauguration of the digitization center takes place in the presence of Patrick Aebischer (President of EPFL), Carlo Carraro (Rector of Ca’Foscari), Raffaele Santoro (Director of ASVe) and Thierry Lombard (main sponsor of the project). EPFL hires and trains five Italian specialists for the operations of the scanners and the annotation of digitized documents, as well as a qualified team leader with a paleography and archival background (trained by the school of archival and internal paleography in the Archive): Fabio Bortoluzzi. Among this team, one of the archivists will join the State Archives of Venice a few years later, and Fabio Bortoluzzi himself will go on to become director of the Vicenza State Archives.

  • 2015

The protocol for digitization, metadation and annotation is established by the team leader and the archivists of the Venice State Archives on the basis of the results of 2014’s pre-study phase. An estimate of the number of hours is produced for the digitization and description of various documents series. On the basis of these estimates, it is decided to carry out a basic description of the registers and to concentrate efforts on searches facilitating the automatic extraction of information. In July 2015, a first version of an annotation software is deployed within the State Archives. 2015 is also a period of intensification of the collaboration between the EPFL teams and the other Venetian institutions. Several parallel projects are launched. The “Garzoni” project – a partnership between EPFL, the University of Lille (Valentina Sapienza) and the University of Rouen (Anna Bellavitis), funded by the Swiss National Fund and The French National Agency for Research – aims to build an information system in order to conduct historical research on the question of learning from the perspectives of the economy, family, gender, art and architecture. It focuses on the “Gustizia Vecchia” collections, which had already been digitized by the University of Lille in partnership with the State Archives, and is coordinated by Maud Ehrmann for EPFL, involving a dozen other researchers. A second project funded by the Swiss National Science Foundation, Linked Books, begins on September 1, 2015. The project explores the “history of history” of Venice using new algorithmic approaches, based on networks of citations and full-text analyzes of publications. The project is coordinated by Giovanni Colavizza and Matteo Romanello and concerns a corpus of more than 2,000 monographs and 5,000 newspaper articles published over the past 200 years and dealing with all aspects of Venetian history. For this project, several specific contracts are established to supervise the digitization of the collections of secondary sources necessary for the project, notably with the Marciana Library, the Istituto Veneto and the Ca’ Foscari University Library. Finally, also in 2015, EPFL and the Giorgio Cini Foundation signed an agreement for the launch of the Replica project, coordinated by Isabella di Lenardo which aims to digitize the foundation’s photo library (one million images) and to build an engine for research enabling the search for morphological patterns. The agreement specifies that the digital photo library and the search engine will be open access. A new type of scanner is developed by Adam Lowe’s team at Factum Arte. It is designed as a rotary table that moves continuously during a scanning session, simultaneously photographing both sides of documents on a page and automatically uploading the images to a computer. The project will also give rise to a doctoral thesis by Benoit Seguin who will propose a new way to train neural networks using deep learning to detect recurrences of patterns on media as diverse as drawings, paintings, engravings or photographs.

  • 2016

EPFL begins its participation in the READ project in January 2016 to accelerate progress in handwriting recognition. Venice Time Machine is one of the large-scale demonstrators of the project. On the EPFL campus in Lausanne, a new building designed by Kengo Kuma, Artlab, is inaugurated in November. A permanent exhibition on the Venice Time Machine is presented in the “Datasquare” pavilion. Director Raffaele Santoro is interviewed several times and his explanations are presented on the pavilion screens, along with other testimonials from historians and researchers working on the project.

  • 2017

In 2017, EPFL makes the problem of sharing images via the network more effective with the creation of a first version of the Time Machine Box. It is a server, installed at the location of scanning, that is to say directly at the archives, on which all the scanned documents and their metadata are hosted and easily accessible via the IIIF protocol, which defines international standards on image exchange. The Time Machine Box is not a simple storage space. It allows any research organization to perform an analysis on the images present to perform an analysis using document segmentation algorithms or handwriting recognition, presuming these are compatible with the IIIF standard. In October of the same year, EPFL, Ca’ Foscari University and the State Archives of Venice and the Giorgio Cini Foundation, publish a joint press release which will give rise to several articles publicly announcing the first results of the project and of the digitization campaign, including 190,000 digitizations of archival documents, 720,000 photographic documents, 3,000 books covering 200 years of Venetian historiography, making a total of more than 2 million digitized images. On this basis, 160,000 manual transcriptions of name, location and keywords were performed by archivists. A search engine using a handwriting recognition system based on these annotations is announced. To mark this occasion, Michele Bugliesi, the Rector of Ca’ Foscari University declares: “The digitization of archival holdings opens up new avenues for the study and understanding of the history of the cultural evolution of past and contemporary civilizations. With this project, Venice is at the forefront of Europe, demonstrating the enormous potential that digital technologies offer for the enhancement of cultural heritage and their ability to develop research methods in the fields of history, art history and more generally for research in the humanities and socio-economic sciences.”

  • 2018

In June 2018, a joint research center established between the Cini Foundation, Factum Arte and the DHLAB of EPFL is inaugurated. The center is named ARCHiVe – Analysis and Recording of Cultural Heritage in Venice and is funded by the Helen Hamlyn Trust. As planned, an automatic handwriting recognition system is developed from the annotated Venetian documents within the framework of the European READ project. The results obtained by researcher Sofia Ares Oliveira at EPFL are very encouraging: the recognition performance of this system exceeds the reading skills of an Italian person without archival training. The system is presented for the first time in Mexico City at the Digital Humanities 2018 conference. The same summer, a generic document segmentation system (dhSegment), initially developed to solve the segmentation problem of the Replica project, is also made available open-source. In just a few months, this free and open system will be used by dozens of archives around the world, including the National Archives in Paris. The search engine, announced in 2017, combining text search, visual search and geo-historical navigation to allow efficient access to the sources of the Venice State Archive and the Cini Foundation, is unveiled to the public during the Time Machine 2018 conference. Indeed, EPFL and Ca’ Foscari University become founding members of a project submitted to the European Commission for the establishment of a “European Time Machine”, along with 31 other European institutions. Thanks to the extraction methodologies and open technologies developed, the Venetian model can now be exported as a generic format to understand the past of European cities. A large exhibition at the Venice Biennale of Architecture presents the project in the Padiglione Venezia.

  • 2019

The pan-European Horizon 2020 Time Machine Coordination and Support Action is funded by the European Commission. The number of supporting partners continues to grow and reaches more than 400 institutions, confronting Europe’s challenge to build an open database of information that has thus far been segmented into silos. The Venice Time Machine now becomes one among 20 others Local Time Machines. EPFL wins the Parcels of Venice project to continue research on Computing methodology to extract information from cadastral sources.

  • 2020

EPFL publishes results based on newly collected and digitized daily death records, or necrologies, from the city’s Patriarchal Archives in the open-access Nature Research journal, Scientific Reports. The article uses data science techniques to analyze the spread of the bubonic plague, which is caused by the bacterium Yersinia pestis, in Venice between 1630 and 1631. The team observed that the deaths appeared to follow a novel pattern: a first peak in 1630 that reached over 400 deaths per day at its worst, followed by a less acute, but longer-lasting, peak in 1631. They note that this is the first description of such a “long tail of high mortality” in the literature on the subject.

Technology

The State Archives of Venice contains a massive amount of hand-written documentation in languages evolving from medieval times to the 20th century. An estimated 80 km of shelves are filled with over a thousand years of administrative documents, from birth registrations, death certificates and tax statements, all the way to maps and urban planning designs. These documents are often very delicate and are occasionally in a fragile state of conservation. The diversity, amount and accuracy of the Venetian administrative documents are unique in Western history. By combining this mass of information, it is possible to reconstruct large segments of the city's past: complete biographies, political dynamics, or even the appearance of buildings and entire neighborhoods.

Scanning

Paper documents are turned into high-resolution digital images with the help of scanning machines. Different types of documents impose various constraints on the type of scanning machines that can be used and on the speed at which a document can be scanned. In partnership with the industry, EPFL is working on a semi-automatic, robotic scanning unit capable of digitizing about 1000 pages per hour. Multiple units of this kind will be built to create an efficient digitization pipeline adapted to ancient documents. Another solution currently being explored at EPFL involves scanning books without turning the pages at all. This technique uses X-ray synchrotron radiation produced by a particle accelerator. https://youtu.be/XwwuhCd-CqM

Transcription - Information extraction

The automatic reading of old handwritten manuscripts is a major challenge. Standard character-recognition software allows printed books to be read letter by letter despite variations in fonts, and thus rendered search able. But this doesn’t work for handwritten manu- scripts, where shapes of individual letters can vary enormously between scribes, and can evolve over time. Various approaches to solving the problem are being developed in a European Union collaboration called Recognition and Enrichment of Archival Documents (READ), using machine learning to recognize the shapes of whole words. The algorithms can transform images into probable words. The images are automatically broken down into sub-images that potentially represent words. Each sub-image is compared to other sub-images and classified according to the shape of the word its features. Each time a new word is transcribed, it allows millions of other word transcripts to be recognized in the database.

Connecting data

The real wealth of the Venetian archives lies in the connectedness of its documentation. The information extracted from these diverse sources is organized in a semantic graph of linked data and unfolded in space and time as part of a historical-geographical information system, based on high-resolution scanning of the city itself. These algorithms find reoccurring patterns in hand-written documentation, maps but also paintings and musical scores extracting information about people, places and artworks, creating a giant network of linked data. The information items extracted from the documents are intricately interweaved linked together into giant graphs. By combining this mass of information, it is possible to reconstruct large segments of the city’s past and also allows new aspects of information to emerge.

4D modelling

4D multiscale geohistorical simulator and procedural methods for reconstructing possible pasts compatibles with digitized sources. ​​

Phase

The efforts are organized in several consecutive phases of increasing scale:

  • Phase I of the project (2012-2019) included major Venetian patrimonial institutions: the State Archive in Venice, The Marciana Library, The Instituto Veneto and the Giorgio Cini Foundation. The project was supported by the READ European eInfrastructure project, the SNF project Linked Books and ANR-SNF Project GAWS. The international board of the project includes scholars from Princeton, Stanford, Columbia and London Universities. Three hundred researchers and students from different disciplines (Basic Sciences, Engineering, Computer Science, Architecture, History and History of arts) have already collaborated to the programme. A doctoral school was organized every year in Venice and several bachelor and master courses already use the data produced in the context of the project.
  • Phase II of the project (2020-2028) will focus on developing the Venice Mirror World, a 4D model of the Venice overlapping the city itself directly connecting the information of its past for those who have to decide its future.

Venice is just a starting point. The Venice Time. Machine has applied, with partners around Europe, to become one of the next billion-euro flagship programmes funded by the European Union. If it wins, it will create time machines in other cities with similarly important archives, and link them together.

Suspended

In September 2019, the project was suspended due to bit of an oversight in the original agreement back in 2014.

Penzo Doria, the current director of the State Archive of Venice, claims that "these files are useless" from an archival point of view because the digitization work did not conform to the archival guidelines established by the InterPARES (International Research on Permanent Authentic Records) project. these guidelines require careful recording of information that confirms the origin of each document and require that information be stored in the metadata that comes with each file. This serves as a kind of electronic signature that ensures the long-term retention and validation of a digital file. According to Penzo Doria, the EPFL researchers who did the scans did not document how they gathered such information – or, if they did, they did not share this documentation with staff in the archive.

Kaplan says the researchers have collected metadata. However, their methodology was based on a different set of rules – the ISAD (International Standard Archival Description) guidelines of the International Council on Archives. He says that the EPFL researchers have followed the procedures established by the staff of the State Archives. Kaplan also said that he provided metadata documentation to the predecessor of Penzo Doria, Giovanna Giubbini, in February 2019 in an e-mail. Penzo Doria and Giubbini reported to Nature that they had never received these documents.

In the meantime, fate was 8 terabytes Digital files that have been collected from around 190,000 documents in the last 5 years are unclear. There is nothing updated after that suspension.

Personal tools