File Based Data-Processing System
Task 1 – Database systems
a. Critically assess the different database models that you know about and summarise their respective advantages and disadvantages.
File based data-processing system A computer system store data in the form of bits and bytes over files that are structured in a tree structure (Shroff, 2010).
Bit – The smallest form of data.
- Byte – a collection of several bits which represents one character (e.g. letter, number).
- Field – When combining bytes together into a while word then we create a field.
- Record – When we group together fields like a student ,first and last name, his modules, dates and his programme we create a record.
- File – A group of similar records makes a file. a group of files makes a whole database system.The Data Hierarchy Louden & Louden (2017)
Advantages of using a File based data-processing system
- No equipment/material – Eliminates the need to use equipment and/or materials for storing information (pages, pencil, pen) and to use hand writing which consumes time and considered costly.
- Organized structure – Provides a way to insert, delete and update data through a hierarchal file structure (Shroff, 2010) where all files are kept within folders.
- Easy Data processing – Provides the ability to search for data within files by either using an application or the command line tool of the operating system in some cases.
- Disadvantages of using a File based data-processing system
- Data redundancy – As each file is related to a specific program and cannot be shared with other programs data will begin to grow as we will store the same data several times on different files.
- Data inconsistency – Because that the same data is spread on different files, some information might be up-to-date while other files will not contain the latest information.
- Program data dependence – Each application will use its own file as its source of data making it very hard to commit changes.
- Lack of flexibility – Since the data is coupled with each application there is no way to retrieve data easily and such an operation can take weeks (if not months).
- Poor security – There is a minimum amount of control for security and it is very hard to manage.
- Lack of data sharing and availability – It is hard to share the same data with other departments in other regions in a timely manner.
- Hierarchical based database system
The Hierarchical database contains:
- Segment – The smallest form of data that IMS could store.
- Fields – Segment contain fields (the smallest piece of data the an application can manipulate). Each field contains a unique key field that can be used to locate a segment within the database
- Record – The tree-structure of all segments is the database record while the root segment identifies a database record with depended segment which contain pieces of data that relevant for the record.
- Example of a hierarchical structure (Powell, 2006):
Advantages of using a hierarchical based database system
- Data Integrity – The IMS database provides a locking mechanism where only 1 application can write into the database at a time. Additionally the IMS is using a logger that store changes made to the database by snapshots, which help restore data.
- Security – By using the DRBC component the database can authorize requests from an application.
- Data sharing and availability – The DRBC also support concurrencies and allows to share the data between several applications by using the DML interface.
- Disadvantages of using a hierarchical based database system
- Limited data independence – Same like with a file-based system, each application will create its own depended data source and by that create challenges to update information on a record for each record.
- Lack of flexibility – Searching for information through the records will have to go all the way from the root to the child’s in a recursive manner which will be a time consuming operation.(Hoffer, et al., 2015) , additionally segments have fixed length
- Network model database systemThis system uses pointers that identifies the physical address of each record on the storage. (Rai & Singh, 2015).The model was designed to solve the lack of flexibility in the hierarchical database.
Advantages of using a Network model database system
- Conceptual simplicity – easy to use and design
- Support more relational types – Many-to-Many and one-to-many.
- Ease of access – It is more easier to access data then in the hierarchical model.
- Data independence – The network model separates the data from complex physical storage information.
- Meet Database Standards – The network model met the ANSI/SPARC(Thakur, n.d.)
- Disadvantages of using a Network model database system
- System complexity – All the system relies on the pointers which makes it very complex to manage.
- Operational anomalies – Every insert, update and delete operation will require to change many pointers which makes every operation difficult to perform.
- Absence of structural independence – Making a change of the structural level is very hard since the applications would have to be changed as well in order to work with the database after such a change which raise another new data independence challenges.
Using an example, discuss the principles and structure of the relational database mode.
Relational Database system
Was invented In the early 70s by a Dr. Edgar Codd (‘Ted’ Codd), who was a British scientist formulized the conceptual data modeling (Teorey, 2011) which used the Entity-relationship (ER) approach and it is based on the relational algebra and uses unique values defined as keys to make a relationship between two or more tables.
A database record consists of the following:
- Tables – A grid of rows and columns which contains data on entities and its attributes.
- Columns – Contains simple and atomic data value (e.g. integer, text) for a record and has a unique name.
- Row – Provides number of columns.
- Primary Key – A Unique key that represents a row.
- Foreign Key – A pointer that leads to another primary key on a different table.
Example of an RDBMS structure (Laudon & Laudon, 2017) :
In the example above we have two tables – SUPPLIER and PART which consists of columns and rows.Each column represents an entity and its attributes for example the entity “Part_Number” has three different attributes which are “Part_Name”, “Unit_Price” and “Supplier_Number”.The column “Supplier_Number” is inherited in from the “SUPPLIER” table which provides an example to the relational mode.
Advantages of using a relational database system
- Data independence – The format of the data can be easily changed through a schema located in the system catalog (Adrienne Watt and Watt, 2014).
- Views – Allows to expose information differently and selectively to users.
- Data sharing and availability – The database can be shared with users at the same time through concurrency control.
- Control of data redundancy – If designed properly, Each data item is stored in only one place in the database and can be controlled by application programming.
- Data Integrity – Controlled by constraints by rules that dictates what can be entered into a column (e.g. size, type) and ACID compliancy (Pokorny, 2013).
- Security – Provides the ability to control unauthorized access.
- No Coding is needed – SQL uses its own language called SEQUEL.
- Disadvantages of using a relational data base system
- Scaling – RDBMS can only scale up and cannot be scaled out due to the complexity of the consistency model used.
- Price – The SQL server requires lots of resources hence it is expensive.
- Difficult interface – The interface syntaxes are complex which can make it difficult to use.
The conceptual model describes the contents of the system. Through the figure 1, it can be understood that staff, retailer, order, product and manufacturer are the main contents of the system. The business stakeholders like staff, retailer and manufacturer and data architects like order and product is mentioned in the diagram. The proposed model consists of one-to-many, many-to-one and many-to-many relationship. The diagram shows multiplicity, cardinality and optionality.
The conceptual model illustrates that table has five tables that are interconnected with each other. One staff will take the order and store it. A staff can store many order but it is not necessary that all the registered staff to will take orders. One retailer can order many time. Each order must be done by single retailer. Not every retailer will order products. A retailer can order many products at one time. It is also true that many products can be ordered by various retailers simultaneously. That is why the product and order has many-to-many relationship among them. An order must contain the products that are listed. A product may not be ordered in a whole lifetime of business. A manufacturer can provide many products. A product is supplied by only one retailer.
Table Name: Retailer |
|||||
Attribute |
Data Type |
Value |
Key |
Description |
Example |
retailerNumber |
INT |
5 |
Primary Key |
The unique retailer number through which each row in this table can be identified uniquely |
00000 |
retailerName |
Varchar |
40 |
No |
The name of the retailer, two retailer can have same name |
ABC Wholesaler |
retailerAddress |
Varchar |
40 |
No |
The physical address of the retailer that consists of building number, street name, city, State and zip |
769 Amethyst Drive |
retailerTelephoneNumber |
INT |
10 |
No |
The contact number of the retailer. The code will be identified based on the state mentioned in address |
5173218505 |
maximumCredit |
Decimal |
10,2 |
No |
The maximum amount of credit a retailer can have from selected organization |
15000.00 |
joinDate |
Date |
Default |
No |
The Join date of the retailer. This indicates the on which retailer signed contract with Nova Cosmetics |
November 29, 2005 |
Advantages of Using File Based Data-Processing System
Table Name: Staff |
|||||
Attribute |
Data Type |
Value |
Key |
Description |
Example |
staffNumber |
INT |
5 |
Primary Key |
The unique staff number through which each row in this table can be identified uniquely |
00000 |
staffName |
Varchar |
40 |
No |
The name of the staff. Two staff can have same name |
Mamie B. McLane |
staffAddress |
Varchar |
40 |
No |
The physical address of the staff that consists of building number, street name, city, State and zip. This is the current address of the staff. |
2579 Skinner Hollow Road Huntington, OR 97907 |
staffPosition |
Varchar |
40 |
No |
The designation of the staff. |
5418692501 |
staffSalary |
Decimal |
10,2 |
No |
Salary of the staff per annum. |
5000.00 |
joinDate |
Date |
Default |
No |
The Join date of the staff. This indicates the on which staff started working with Nova Cosmetics |
December 25, 1993 |
Table Name: Order |
|||||
Attribute |
Data Type |
Value |
Key |
Description |
Example |
orderNumber |
INT |
5 |
Primary Key |
The unique order number through which each row in this table can be identified uniquely |
00000 |
orderDate |
Date |
Default |
No |
The date on which, order is placed. |
December 25, 2017 |
productNumber |
INT |
5 |
Primary Key |
The unique product number through which each row in this table can be identified uniquely |
00000 |
productType |
Varchar |
40 |
No |
The type of the product. A product can of single type. Many products can be categorized into one type. |
Suns cream |
unitPrice |
Decimal |
10,2 |
No |
Cost of individual product. |
35.30 |
productQuantity |
INT |
5 |
No |
The quantity of each product bought |
3 |
totalPrice |
Decimal |
10,2 |
No |
Total price for each product bought |
105.90 |
totalCost |
Decimal |
10,2 |
No |
Total amount a customer will pay. The order cost is a derived attribute. The cost is calculated using unit price of product and total products bought. |
205.50 |
Table Name: Manufacturer |
|||||
Attribute |
Data Type |
Value |
Key |
Description |
Example |
manufacturerNumber |
INT |
5 |
Primary Key |
The unique manufacturer number through which each row in this table can be identified uniquely |
00000 |
manufacturerName |
Varchar |
40 |
No |
The name of the manufacturer. Two manufacturer can have same name |
Mamie B. McLane |
manufacturerAddress |
Varchar |
40 |
No |
The physical address of the manufacturer that consists of building number, street name, city, State and zip. |
2579 Skinner Hollow Road Huntington, OR 97907 |
manufacturerTelephoneNumber |
INT |
10 |
No |
The contact number of the manufacturer. The code will be identified based on the state mentioned in address |
5418692501 |
Task 3: Logical Model
Normalization Decisions: The proposed database is normalized up to BCNF. As per the protocol of first normal form, the database cannot have any attribute that consists of multiple values. The attributes must contain only atomic values. The second normal states that database cannot have any partial dependency. The partial dependency occurs in a table when a non-key attribute is dependent on another non-key attribute. The third normal form states that the database must be in second normal form and should not have any transitive dependency. Transitive dependency occurs in a table if attribute Y can be found using attribute Y but X cannot be found using Y and Z can be found using Y. The Boyce Code Normal Form states that Y can be found using the X and X is the super key (primary key) of the table.
First Normal Form: If all the data are in a single table then the table must have all the atomic valued attributes. The table is as following.
retailerNumber |
00000 |
retailerName |
ABC Wholesaler |
retailerAddress |
769 Amethyst Drive |
retailerTelephoneNumber |
5173218505 |
maximumCredit |
15000.00 |
retailJoinDate |
November 29, 2005 |
staffName |
Mamie B. McLane |
staffAddress |
2579 Skinner Hollow Road Huntington, OR 97907 |
staffPosition |
5418692501 |
staffSalary |
5000.00 |
staffJoinDate |
December 25, 1993 |
productNumber |
00000 |
productType |
Suns cream |
unitPrice |
35.30 |
productQuantity |
|
totalPrice |
|
orderDate |
December 25, 2017 |
totalCost |
205.50 |
manufacturerName |
Mamie B. McLane |
manufacturerAddress |
2579 Skinner Hollow Road Huntington, OR 97907 |
manufacturerTelephoneNumber |
5418692501 |
The product number, quantity, totalPrice and productType attributes are having multiple values. In the new table all the order details will be copied for each product bought.
Second Normal Form: The partial dependencies can be seen for staff details. The staffAddress, staffPosition, staffSalary and staffJoinDate are partially dependent on the staffName. A new staff table will be created to remove this partial depdency.
staffNumber |
00000 |
staffName |
Mamie B. McLane |
staffAddress |
2579 Skinner Hollow Road Huntington, OR 97907 |
staffPosition |
5418692501 |
staffSalary |
5000.00 |
staffJoinDate |
December 25, 1993 |
Another partial dependency can be seen in the manufacturer related attributes. The manufacturerAddress and manufacturerTelephoneNumber is dependent on the manufacturerNumber. The new manufacturer table is created remove this partial dependency.
manufacturerNumber |
00000 |
manufacturerName |
Mamie B. McLane |
manufacturerAddress |
2579 Skinner Hollow Road Huntington, OR 97907 |
manufacturerTelephoneNumber |
5418692501 |
The other table is as below.
retailerNumber |
00000 |
retailerName |
ABC Wholesaler |
retailerAddress |
769 Amethyst Drive |
retailerTelephoneNumber |
5173218505 |
maximumCredit |
15000.00 |
retailJoinDate |
November 29, 2005 |
orderNumber |
00000 |
productNumber |
00000 |
productType |
Suns cream |
unitPrice |
35.30 |
productQuantity |
3 |
totalPrice |
105.90 |
orderDate |
December 25, 2017 |
totalCost |
205.50 |
Third Normal Form: There is a transitive dependency between the order details and productType and quantity. The productType and unitPrice is dependent on productNumber. The orderDate and totalCost is dependent orderNumber. The orderNumber is dependent on productNumber. A new table named product is created to remove transitive dependency.
productNumber |
00000 |
productType |
Suns cream |
unitPrice |
35.30 |
Now the product quantity and totalPrice is dependent productNumber. So a bridge table is created.
productNumber |
00000 |
orderNumber |
00000 |
productQuantity |
3 |
totalPrice |
105.90 |
The new table is as following.
retailerNumber |
00000 |
retailerName |
ABC Wholesaler |
retailerAddress |
769 Amethyst Drive |
retailerTelephoneNumber |
5173218505 |
maximumCredit |
15000.00 |
retailJoinDate |
November 29, 2005 |
orderNumber |
00000 |
orderDate |
December 25, 2017 |
totalCost |
205.50 |
BCNF: the orderNumber and retailerNumber has trivial functional dependency among them. And retailerNumber is the super key.
retailerNumber |
00000 |
retailerName |
ABC Wholesaler |
retailerAddress |
769 Amethyst Drive |
retailerTelephoneNumber |
5173218505 |
maximumCredit |
15000.00 |
retailJoinDate |
November 29, 2005 |
orderNumber |
00000 |
orderDate |
December 25, 2017 |
totalCost |
205.50 |
Final Database Tables:
retailerNumber |
00000 |
retailerName |
ABC Wholesaler |
retailerAddress |
769 Amethyst Drive |
retailerTelephoneNumber |
5173218505 |
maximumCredit |
15000.00 |
retailJoinDate |
November 29, 2005 |
orderNumber |
00000 |
orderDate |
December 25, 2017 |
totalCost |
205.50 |
productNumber |
00000 |
productType |
Suns cream |
unitPrice |
35.30 |
productNumber |
00000 |
orderNumber |
00000 |
productQuantity |
3 |
totalPrice |
105.90 |
manufacturerNumber |
00000 |
manufacturerName |
Mamie B. McLane |
manufacturerAddress |
2579 Skinner Hollow Road Huntington, OR 97907 |
manufacturerTelephoneNumber |
5418692501 |
staffNumber |
00000 |
staffName |
Mamie B. McLane |
staffAddress |
2579 Skinner Hollow Road Huntington, OR 97907 |
staffPosition |
5418692501 |
staffSalary |
5000.00 |
staffJoinDate |
December 25, 1993 |
Task 4: Physical Model
Selection of DBMS (SQL Server): The reasons of selecting the SQL Server database ate as listed below.
- Streamlined Installation: Setup wizard to download this DBMS is available for free in the Microsoft website. The required updates are also detected and downloaded automatically by the installer itself. The complexity of the installation is reduced through automatic installation of updates and other required components. The database and analytical services can be separately installed later as per the user requirement.
- Better Security Features: The SQL Server database utilize the policy-based management for detecting security associated policies. These policies are non-compliant. Only the authorized users ate allowed to access the features of the database management system. The user can easily write the security events and audits into the log files.
- Better Performance: The transparent data compression feature is built-in into the SQL server database. The database management system also has encryption method installed in it. This eliminates the need to update programs for integrating the data encryption technique. The user access control is associated with the permission management tools that perfectly does the work of user management. The data collection process of SQL server is significantly improved than other DBMSs.
- Recued Ownership Cost: Effective and efficient data mining and management tools are available along with another tool named disk partitioning.
As the database suggests in Figure 3, the business of Nova Cosmetics is dependent heavily on the retailers and manufacturers. The staff of the organization is required to be registered before start working. The staff will provide their name and address (current) details to organization. The organization will store staff’s position, salary and join date. The unique number of staff which is staffNumber is automatically stored by the system. The staff details are required for various internal audits. The retailer’s data are very important for the business. The organization needs the retailer’s data as the retailers are the customer of the business. The organization needs retailer name, address, contact number and credit level is important.
Advantages of Using a Network Model Database System
The organization will decide the maximum credit level of the retailer. The retailer cannot buy product worth more than that. The staff will register a retailer into the organization database. That is why the staffNumber will be used as a foreign key in the database. This allows the database to find which staff has done the work of registering retailer. The manufacturer data is also stored in the database. The manufacturer can supply many products but a single product is bought from a single manufacturer. In real life, the retailer can order for various products in one order. Simultaneously, various products can be bought by different retailers. This is presented through a relational database method name bridge table. The bridge table store the order table primary key and product table primary key as foreign key. The order details like total cost and data is stored in the database. The retailer number is used as the foreign key to detect which realer has done the order.
Relational schema of final database are as following. The underlined attributes are primary key and italic attributes are foreign key.
Staff (staffNumber, staffName, staffAddress, staffPosition, staffSalary, staffJoinDate)
Retailer (retailerNumber, retailerName, retailerAddress, retailerTelephoneNumber, maximumCredit, retailJoinDate, staffNumber)
Order (orderNumber, orderDate, totalCost, retailerNumber)
OrderProduct (orderNumber, productNumber, productQuantity, totalPrice)
Product (productNumber, productType, unitPrice, manufacturerNumber)
Manufacturer (manufacturerNumber, manufacturerName, manufacturerAddress, manufacturerTelephoneNumber)
Attributes |
Properties |
retailerNumber |
It cannot hold null value |
retailerName |
It cannot hold null value. It should be indexed |
retailerAddress |
It cannot hold null value |
retailerTelephoneNumber |
It cannot hold null value |
maximumCredit |
It can hold null value |
retailJoinDate |
It cannot hold null value |
orderNumber |
It cannot hold null value. It should be indexed |
orderDate |
It cannot hold null value. It should be indexed |
totalCost |
It can hold null value. It should be indexed |
productNumber |
It cannot hold null value. It should be indexed |
productType |
It cannot hold null value. It should be indexed |
unitPrice |
It cannot hold null value. It should be indexed |
productQuantity |
It cannot hold null value. |
totalPrice |
It cannot hold null value. |
manufacturerNumber |
It cannot hold null value. |
manufacturerName |
It cannot hold null value. It should be indexed |
manufacturerAddress |
It cannot hold null value. |
manufacturerTelephoneNumber |
It cannot hold null value. |
staffNumber |
It cannot hold null value. |
staffName |
It cannot hold null value. It should be indexed |
staffAddress |
It cannot hold null value. |
staffPosition |
It can hold null value. |
staffSalary |
It can hold null value. |
staffJoinDate |
It can hold null value. |
Adrienne Watt and Watt, A., 2014. Database Design – 2nd Edition. 2nd ed. s.l.:BCampus.
Arasu, A., Babcock, B., Babu, S., Cieslewicz, J., Datar, M., Ito, K., Motwani, R., Srivastava, U. and Widom, J., 2016. Stream: The stanford data stream management system. In Data Stream Management (pp. 317-336). Springer, Berlin, Heidelberg.
Batkovskiy, A.M., Batkovskiy, M.A., Semenova, E.G., Fomina, A.V. and Khrustalev, ?.I., 2015. Linguistic Analysis of High-Tech Production Complex. Mediterranean Journal of Social Sciences, 6(4), p.130.
Blackman, K. R., 1998. Technical note: IMS celebrates thirty years as an IBM product. IBM Systems Journal, 37(4), pp. 596-603.
Chambers, M. L., 2009. Build Your Own PC Do-It-Yourself For Dummies. Hoboken: For Dummies.
Coronel, C. and Morris, S., 2016. Database systems: design, implementation, & management. Cengage Learning.
Coronel, C. and Morris, S., 2016. Database systems: design, implementation, & management. Cengage Learning.
Elmasri, R. and Navathe, S., 2016. Fundamentals of database systems. London: Pearson.
Gouhar, A., 2017. Database Management System. International Journal of Engineering Science, 11766.
Guèye, M., Ndiaye-Guèye, M.D., Kane-Guèye, S.M., Diallo, M., Cissé, M., Fall, K., Daff, H.M.B., Abdoulaye, M., Gahungu, S., Ka, S. and Moreau, J.C., 2016. E_Breast: A Computerized Database Management System for Breast Diseases Patients in a Low Income Country. Journal of Obstetrics and Gynecology, 6, pp.754-760.
Relational Database System
Haigh, T., 2016. How Charles Bachman Invented the DBMS, a Foundation of Our Digital World. Communications of the ACM, 59(7), pp. 25-30.
Hoffer, J., Venkataraman, R. & Topi, H., 2015. Modern Database Management. Global Edition ed. s.l.:Pearson Education Limited.
Jamal, A. and Pramila, R.S., 2018. An Overlay Database Management System. network, 5(05).
Jamsheer, K., 2018. An overview of SQL and NoSQL with it’s pros and cons.
Laudon, K. C. & Laudon, J. P., 2017. Management information systems managing the digital firm Global. Harlow: Pearson Global edition.
Laudon, K.C. and Laudon, J.P., 2016. Management information system. Pearson Education India.
Lee, H., Chapiro, J., Schernthaner, R., Duran, R., Wang, Z., Gorodetski, B., Geschwind, J.F. and Lin, M., 2015. How I do it: a practical database management system to assist clinical research teams with data collection, organization, and reporting. Academic radiology, 22(4), pp.527-533.
MariaDB, n.d. Understanding the Network Database Model.
Nidzwetzki, J.K. and Güting, R.H., 2016. DISTRIBUTED SECONDO: An extensible highly available and scalable database management system. FernUniversität, Fakultät für Mathematik und Informatik.
Pokorny, J., 2013. NoSQL databases: a step to database scalability in web environments. International Journal of Web Information Systems, 9(1), pp. 69-78.
Powell, G., 2006. Beginning Database Design. Indianapolis: Wiley Publishing Inc..
Rai, D. P. K. & Singh, P., 2015. International Journal of Computer Science and Mobile Computing. A Monthly Journal of Computer Science and Information Technology, 4(5), p. 834 – 838.
Raza, B., Kumar, Y.J., Malik, A.K., Anjum, A. and Faheem, M., 2018. Performance prediction and adaptation for database management system workload using case-based reasoning approach. Information Systems, 76, pp.46-58.
Reddy, T.B., Thomas, A.D., Stamatis, D., Bertsch, J., Isbandi, M., Jansson, J., Mallajosyula, J., Pagani, I., Lobos, E.A. and Kyrpides, N.C., 2014. The Genomes OnLine Database (GOLD) v. 5: a metadata management system based on a four level (meta) genome project classification. Nucleic acids research, 43(D1), pp.D1099-D1106.
Savage, N., 2015. Forging Relationships. Communications of the ACM, 58(6), pp. 22-23.
Shroff, R., 2010. Computer Systems and Applications. Rev ed. ed. Mumbai: Himalaya Publishing House.
Souza, R., Silva, V., Oliveira, D., Valduriez, P., Lima, A.A. and Mattoso, M., 2015. Parallel execution of workflows driven by a distributed database management system. In ACM/IEEE Conference on Supercomputing, Poster.
Tamboli, A., Shinde, P., Pariskar, P., Sonawane, M. and Anaokar, C., 2017. Institute Administration Automation and Student Database Management System.
Teorey, T. L. S. N. T. &. J. H., 2011. Database Modeling and Design : Logical Design. San Francisco: Elsevier Science & Technology.
Thakur, D., n.d. Network Model.
UKEssays, 2013. The Hierarchical Model.
Ung, M. & Philip, K. a. C.-Y.-S., 1999. DATABASE LANGUAGES. Encyclopedia Of Electrical & Electronics Engineering, Volume 4, pp. 568-576.
Zhang, B., Van Aken, D., Wang, J., Dai, T., Jiang, S., Lao, J., Sheng, S., Pavlo, A. and Gordon, G.J., 2018. A Demonstration of the OtterTune Automatic Database Management System Tuning Service. Proceedings of the VLDB Endowment, 11(12).