Friday, April 12, 2019

Tutorial 07 – Data Persistence

PROGRAMMING APPLICATIONS AND FRAMEWORKS                                              
Tutorial 07





01. Discuss the role of data in information systems indicating the need for data persistence 

An information system (IS) is a set of components that work together to manage data processing and storage. Its role is to support the key aspects of running an organization, such as communication, record-keeping, decision making, data analysis and more. Companies use this information to improve their business operations, make strategic decisions and gain a competitive edge.

All information systems require the input of data in order to perform organizational activities. Data, as described by Stair and Reynolds (2006), is made up of raw facts such as employee information, wages, and hours worked, barcode numbers, tracking numbers or sale numbers. The scope of data collected depends on what information needs to be extrapolated for maximum efficiency.


02. Explain the terms: Data, Database, Database Server, and Database Management System 

  • What is Data?

In simple words data can be facts related to any object in consideration. For example your name, age, height, weight, etc are some data related to you. A picture , image , file , pdf etc can also be considered data.

  • What is a Database?

Database is a systematic collection of data. Databases support storage and  manipulation of data. Databases make data management easy. Let's discuss few examples.
An online telephone directory would definitely use database to store data pertaining to people, phone numbers, other contact details, etc.
Your electricity service provider is obviously using a database to manage billing , client related issues, to handle fault data, etc.
Let's also consider the facebook. It needs to store, manipulate and present data related to members, their friends, member activities, messages, advertisements and lot more.
We can provide countless number of examples for usage of databases.

  • What does Database Server mean?

The term database server may refer to both hardware and software used to run a database, according to the context. As software, a database server is the back-end portion of a database application, following the traditional client-server model. This back-end portion is sometimes called the instance. It may also refer to the physical computer used to host the database. When mentioned in this context, the database server is typically a dedicated higher-end computer that hosts the database.
Note that the database server is independent of the database architecture. Relational databases, flat files, non-relational databases: all these architectures can be accommodated on database servers.


03. Compare Files and Databases, discussing pros and cons of them 

File System
File is a collection of related records stored on a storage medium such as a hard disk or optical disc
Let’s see some pros and cons involved in saving files in the file system.

Pros of the File System

  • Performance can be better than when you do it in a database. To justify this, if you store large files in DB, then it may slow down the performance because a simple query to retrieve the list of files or filename will also load the file data if you used Select * in your query. In a files ystem, accessing a file is quite simple and light weight.
  • Saving the files and downloading them in the file system is much simpler than it is in a database since a simple "Save As" function will help you out. Downloading can be done by addressing a URL with the location of the saved file.
  • Migrating the data is an easy process. You can just copy and paste the folder to your desired destination while ensuring that write permissions are provided to your destination.
  • It's cost effective in most cases to expand your web server rather than pay for certain databases.
  • It's easy to migrate it to cloud storage i.e. Amazon S3, CDNs, etc. in the future.

Cons of the File System


  • Loosely packed. There are no ACID (Atomicity, Consistency, Isolation, Durability) operations in relational mapping, which means there is no guarantee. Consider a scenario in which your files are deleted from the location manually or by some hacking dudes. You might not know whether the file exists or not. Painful, right?
  • Low security. Since your files can be saved in a folder where you should have provided write permissions, it is prone to safety issues and invites trouble, like hacking. It's best to avoid saving in the file system if you cannot afford to compromise in terms of security.

Database

Database is a collection of data organized in a manner that allows access, retrieval, and use of that data, Let’s see some pros and cons involved in saving files in the

Pros of Database

  • ACID consistency, which includes a rollback of an update that is complicated when files are stored outside the database.
  • Files will be in sync with the database and cannot be orphaned, which gives you the upper hand in tracking transactions.
  • Backups automatically include file binaries.
  • It's more secure than saving in a file system.

Cons of Database

  • You may have to convert the files to blob in order to store them in the database.
  • Database backups will be more hefty and heavy.
  • Memory is ineffective. Often, RDBMSs are RAM-driven, so all data has to go to RAM first. Yeah, that’s right. Have you ever thought about what happens when an RDBMS has to find and sort data? RDBMS tracks each data page — even the lowest amount of data read and written — and it has to track if it’s in-memory or if it’s on-disk, if it’s indexed or if it's sorted physically etc.

04. Discuss different arrangements of data, giving examples for each 

Structured data usually resides in relational databases (RDBMS). Fields store length-delineated data phone numbers, Social Security numbers, or ZIP codes. Even text strings of variable length like names are contained in records, making it a simple matter to search. Data may be human- or machine-generated as long as the data is created within an RDBMS structure. This format is eminently searchable both with human generated queries and via algorithms using type of data and field names, such as alphabetical or numeric, currency or date.

Unstructured data is essentially everything else. Unstructured data has internal structure but is not structured via pre-defined data models or schema. It may be textual or non-textual, and human- or machine-generated. It may also be stored within a non-relational database like NoSQL.


05. Explain different types of databases, providing examples for their use 


Relational Database
The relational database is the most common and widely used database out of all. A relational database stores different data in the form of a data table.
Operational Database
Operational database, which has garnered huge popularity from different organizations, generally includes customer database, inventory database, and personal database.
Data Warehouse
There are many organizations that need to keep all their important data for a long span of time. This is where the importance of the data warehouse comes into play.
Distributed Database
As its name suggests, the distributed databases are meant for those organizations that have different workplace venues and need to have different databases for each location.
End-user Database

To meet the needs of the end-users of an organization, the end-user database is used.
Hierarchical Databases
In a hierarchical database management systems (hierarchical DBMSs) model, data is stored in a parent-children relationship nodes. In a hierarchical database, besides actual data, records also contain information about their groups of parent/child relationships.

Network Databases
Network database management systems (Network DBMSs) use a network structure to create relationship between entities. Network databases are mainly used on a large digital computers. Network databases are hierarchical databases but unlike hierarchical databases where one node can have one parent only, a network node can have relationship with multiple entities. A network database looks more like a cobweb or interconnected network of records.


06. Compare and contrast data warehouse with Big data 

Data warehousing is one of the common words for last 10-20 years, whereas big data is a hot trend for last 5-10 years. Both of them hold a lot of data, used for reporting, managed by an electronic storage device. So one common thought of maximum people that recent big data will replace old data warehousing very soon. But still, big data and data warehousing is not interchangeable as they used totally for a different purpose. So let us start learning Big Data and Data Warehouse in a detail in this post.



Data Warehouse
          Big Data
Meaning
Mainly an architecture not a technology. It extracting data from varieties SQL based data source and help for generating analytic reports. In terms of definition, data repository, which using for any analytic reports, has been generated from one process, which is nothing but the data warehouse
 Big Data is mainly a technology, which stands on volume, velocity, and variety of the data. Volumes defines the amount of data coming from different sources, velocity refers to the speed of data processing, and varieties refers to the number of types of data
preferences
Organization wants to know some informed decision, they prefer to choose data warehousing, as for this kind of report they need reliable or believable data from the sources
If organization need to compare with lot of big data, which contain valuable information and help them to take better decision, more profitability, more customers, they obviously preferred big data approach.
Accepted data source
Accepted one or more homogeneous or heterogeneous data source
Accepted any kind of sources, including business transactions, social media and information from sensor or machine specific data.  It can come from DBMS product or not
Accepted type of formats
Handle mainly structural data
Accepted all types of formats. Structure data, relational data, and unstructured data including text documents, email, video, audio, stock ticker data and financial transaction
Subject Oriented
Data warehouse is subject oriented because it provides information on specific subject not on organization ongoing operation. It mainly focusses on analysis or displaying data which help on decision making.
Big data is also subject oriented, main different is source of data, as big data can accept and process data from all the sources including social media, sensor or machine specific data. It also main on provide exact analysis on data specifically on subject oriented
Distributed file system
Processing of huge data in data warehousing is really time consuming and sometimes it taken entire day for complete the process.
This is one of the big utility of big data. HDFS mainly defined to load huge data in distributed systems by using map reduce program




07. Explain how the application components communicate with files and databases 

  • File – File path, URL
    • Using file path or URL we can access to some particular resources and add or modify using application/ Software.
  • DB – connection string
    • We have to establish the connection string prior to connect to database. After successfully establish connection between Database and application. We can use any functionality to data in Database.

08. Differentiate the SQL statements, Prepared statements, and Callable statements 

SQL Statements
Execute standard SQL statements from the application

                         Statement stmt = con.createStatement();
                         stmt.executeUpdate(“update STUDENT set NAME =” +
                         name +
                         “ where ID =” +
                         id + “)”;



Prepared statements
The query only needs to be parsed (or prepared) once, but can be executed multiple times with the same or different parameters.
                         PreparedStatement pstmt = con.prepareStatement("update STUDENT set NAME = ?
                         where ID = ?");
                         pstmt.setString(1, "MyName");
                         pstmt.setInt(2, 111);
                         pstmt.executeUpdate();

Callable statements
Execute stored procedures
                         CallableStatement cstmt = con.prepareCall("{call
                         anyProcedure(?, ?, ?)}");
                         cstmt.execute();



09. Argue the need for ORM, explaining the development with and without ORM 

Object-relational mapping (ORM) is a mechanism that makes it possible to address, access and manipulate objects without having to consider how those objects relate to their data sources. 
PROS
  • Facilitates implementing domain model pattern.
  • Huge reduction in code.
  • Takes care of vendor specific code by itself.
  • Cache Management — Entities are cached in memory thereby reducing load on the DB.
CONS
  • Increased startup time due to metadata preparation( not good for desktop applications).
  • Huge learning curve without ORM.
  • Relatively hard to fine tune and debug generated SQL.Not suitable for applications without a clean domain object model.
10. Discuss the POJO, Java Beans, and JPA, indicating their similarities and differences 
JPA
  • it is EJB 3.0-compliant;
  • it is light-weight;
  • it manages persistent data in concert with a JPA entity manager;
  • it performs complex business logic;
  • it potentially uses several dependent Java objects;
  • it can be uniquely identified by a primary key.
POJO
  • It doesn’t have special restrictions other than those forced by Java language.
  • It doesn’t provide much control on members.
  • It can implement Serializable interface.
  • Fields can be accessed by their names.
  • Fields can have any visiblity.
  • There can be a no-arg constructor.
  • It is used when you don’t want to give restriction on your members and give user complete access of your entity
JAVA BEAN
  • It is a special POJO which have some restrictions.
  • It provides complete control on members.
  • It should implement serializable interface.
  • Fields are accessed only by getters and setters.
  • Fields have only private visiblity.
  • It must have a no-arg constructor.
  • It is used when you want to provide user your entity but only some part of your entity.
11. Identify the ORM tools available for different development platforms (Java, PHP, and .Net) 


  • PHP :- CakePHP, CodeIgniter,Doctrine, FuelPHP
  • Python :- Django,SQLAlchemy, SQLObject, Storm
  • C++ :- ODB, QxOrm
  • Java :- ActiveJDBC, ActiveJPA, Apache Cayenne, Apache Gora, Athena Framework, Carbonado
  • .NET :- Base One Foundation Component Library, DatabaseObjects, DataObjects.NET, Dapper, ECO, Entity Framework

12. Discuss the need for NoSQL indicating the benefits, also explain different types of NoSQL databases

Benifits
  • Schemaless data representation
  • Development time
  • Speed
  • Plan ahead for scalability
NoSQL Database Management Systems
  • MongoDB
  • Redis
  • Couch DB
  • RavenDB
  • MemcacheDB
  • Riak
  • Neo4j
13. Discuss what Hadoop is, explaining the core concepts of it 

Hadoop is the open source project which takes care of all the above points for distributed computing. It is completely based on the concept of Google File System and MapReduce. 

14. Explain the concept of IR, identifying tools for IR

Information retrieval, as the name implies, concerns the retrieving of relevant information from databases. It is basically concerned with facilitating the user's access to large amounts of (predominantly textual) information. The process of information retrieval involves the following stages:
  1. Representing Collections of Documents - how to represent, identify and process the collection of documents.
  2. User-initiated querying - understanding and processing of the queries.
  3. Retrieval of the appropriate documents - the searching mechanism used to obtain and retrieve the relevant documents
Tools
  • Apache Solr
  • elasticsearch
  • Algolia
  • Sphinx (search engine)
  • Site Search 360
  • OpenSearchServer
  • Xapian
  • Manticore search



References

Q1
lecture 08

Q2
https://www.toolsqa.com/sql/data-database-and-database-management-system/

Q3
https://dzone.com/articles/which-is-better-saving-files-in-database-or-in-fil


Q5
https://www.quora.com/What-are-the-different-types-of-databases

Q6
https://www.educba.com/big-data-vs-data-warehouse/

Q7
https://www.oreilly.com/library/view/web-database-applications/0596005431/ch01.html

Q8
https://javaconceptoftheday.com/statement-vs-preparedstatement-vs-callablestatement-in-java/

Q9
https://medium.com/building-the-system/dont-be-a-sucker-and-stop-using-orms-190add65add4

Tutorial 10 – Client-side development 2 - RiWAs

PROGRAMMING APPLICATIONS AND FRAMEWORKS                                                  Tutorial 10 Distinguish the term “Rich Internet...