Why use the Universal File Interface?
One of the main objections we hear from technologists who avoid using databases to manage their data is that their data files are very large and too cumbersome/difficult/slow/costly to load into a database.
Database management systems typically have built-in mechanisms for loading unstructured text files, but the sorts of files used by technologists are different - they are often structured and binary. Some examples are HDF5/BioHDF, NetCDF, GRIB, NITFS, and FITS. The fact that they are structured and binary means that special software interfaces have to be used to work with them - loading them into a database requires writing custom code using these, often complex, interfaces.
The size of files worked on by technologists is growing exponentially - it is not uncommon, for example, for Next Generation DNA sequencing instruments and LOFAR astronomy instruments to generate terabytes of data in a single day. Loading such amounts of data into a traditional database is not an option!
Technologists rarely want to analyze a single file in isolation; rather, they may wish to compare different files or analyze file data in the context of other information - information that may be stored in a database, for instance. Carrying out such analyses often requires the writing of complex code to retrieve and merge information.
At BCS we have developed a solution to the challenges outlined above. We call this solution the Universal File Interface (UFI). UFI allows the contents of files to be queried just as if the contents were stored in database tables. But database load times and large space requirements are avoided, and the file data can still be queried alongside, and joined with, actual database table data.
How does UFI work?
UFI is based on the IBM Informix Virtual Table Interface (VTI). VTI is a technology that supports making external data sets appear as tables to SQL queries and statements. The UFI server communicates with file-structure-specific adapter programs. The following diagram illustrates how UFI, VTI, and the adapter programs work together.
But what if I don't have an Informix DBMS?
In 2010 IBM made available the Innovator-C edition, a free version of its easy-to-administer Informix product. This edition is limited in the number of cores and amount of memory available to the Informix engine, but since much of the work done by UFI happens outside the database server (i.e., in the adapter programs) this is really not an issue.
But is it fast?
File formats such as HDF5 and NetCDF are often used to store large direct access multidimensional arrays of data. UFI can take advantage of such structures and translate SQL WHERE clauses into very efficient direct accesses. In addition, UFI includes facilities for building indexes on file data.
What if my other database data is in non-Informix DBMS?
UFI virtual tables behave like normal Informix tables, meaning that products such as Oracle Bridge can be used to query these "tables" alongside Oracle tables.
What file types does UFI work with?
We have written adapters for HDF5, NetCDF, CSV, GDAL, and DBF. In addtion, we've developed an easy-to-use UFI Adapter SDK with which adapters for other file formats could be written.
To find out how to use UFI with your application, contact BCS.