Data Lakes refer to huge assortments of information that are unorganized, uncategorized, and unclassified (and value has not been assigned yet).
Data Farms are where predictive analytics and different methods are utilized to create data in between known, observed data points.
Data warehouse technologies allow organizations to incorporate data from databases, data warehouses, data lakes, or other environments from across the internet. This is collectively called big data.
Database Management Systems, or DBMSs, are a category of applications and programs. Each typically has a core called the database engine, which handles all of the data creation, storage, retrieval, update, and deletion activities. The DBMS engine also provides the interface to the host OS for system services, and provides a client facing command interface, typically as a command line interpreter for direct user interaction.
A DBMS implements one specific type of database model. Each model presents different security and integrity challenges.
Network Database Management Model
Or, the network architecture model represents data in the form of a network of records, or sets of records that are related to each other. These sets form a network of linkages. “Network” does not refer to network topologies or network terminology found in Domain 4, but rather that there is a “network” of data and record linkages that can be associated as part of the architecture. Network databases organize data elements (or records) in sets of linked lists.
The network model, AKA the CODASYL model, was created by the Conference on Data Systems Languages in 1959, which led to the COBOL programming language.
Records are sets of related data values that store the name of the record type, the attributes associated with it, and the format for these attributes. For example, a website “member” record type could contain the name, address, and member’s account number or username related to the member. Record types are sets of records of the same type. These are the equivalent of tables in the relational model. Set types are the relationships between two record types, such as a member’s membership plan and the courses they are enrolled in (or services ordered, etc.).
Network databases are considered to be an improvement over the hierarchical database model, but not an improvement over the relational model because of the relational database model’s flexibility.
The network model has two powerful applications:
- High performance and high volume storage management, or Large scale parallel processing which enables cloud data architectures, search engines, and other applications to use network database architectures to meet design and performance needs. Parallel processing refers to dividing large tasks into many smaller tasks, and executes smaller tasks concurrently on multiple nodes, resulting in the large tasks being completed more quickly. A “node” in this context refers to a separate processor, typically a separate machine.
2. Graph databases that use network database architectures for complex patterns of meaningful connections or associations between data elements of disparate types. Neo4J is an example, which is a graph database often used for insider threat detection and anti-money laundering investigations. Graph databases also play an important role in COVID-19 contact tracing systems.
Non-relational Databases (NoSQL)
Non-relational models are commonly referred to as NoSQL. These are logical data models that don’t follow relational algebra for the storage and manipulation of data. Unstructured data types, such as text, image, video, and relationship data, are increasing in popularity, volume, and prominence. While modern relational databases may have support for these data types, their ability to analyze, index, and process them is limited and done through nonstandard SQL extensions. The need to analyze unstructured or semi-structured data has been around for many years. However, there is big demand for extracting value from unstructured and related data, and also on the various engineering methods that can potentially handle data more efficiently.
Big data engineering refers to the ways in which data is stored in records. In some cases, the records are still in the concept of a table structure. One storage method uses a key-value structure, with a record consisting of a key and a string of data together in the value. The data is retrieved through the key, and the non-relational database software handles accessing the data in the value. You can view it as a subset/simplification of a relational database table with a single index field and column. A different variation of this would be the document store, where the document has value fields, any of which can be used as the index/key. The difference between the relational table model is that the set of documents don’t need to have all the same value fields.
Graphical model is another type of new big data record storage. A graphical model represents the relationship between data elements. Data elements are called nodes (so many contexts for this term!), and the relationship is represented as a link between nodes. Graph storage models represent data elements as a series of subject, predicate, and object triples. Often, the available types of objects and relationships can be described through ontologies as discussed earlier.
Complexity between the data elements is another concept to be aware of. There are systems where some data elements cannot be analyzed unless there is context with other data elements. The term “complexity” is frequently attributed to big data, but it actually refers to an interrelationship between data elements, or across records, and is independent of whether or not the dataset has characteristics of big data.