Big Data Analytics using Hadoop
Discuss about the Hardware Technology Options For Organizations.
Based on the requirements of the proposed business structure the IT solution recommended is Big Data analytics.
When it comes to big data analytics, Hadoop is an excellent choice for the affordability and versatility. Hadoop is a software framework that is used to store data and run applications on clusters provided by commodity hardware.
Hardware requirements:
Resources |
Medium |
High End |
CPU |
8 physical cores |
12 physical cores |
Memory |
16 GB |
48 GB |
Disk |
4 disks x 1TB = 4 TB |
12 disks x 3TB = 36 TB |
Network |
1 GB Ethernet |
10 GB Ethernet |
Software specifications:
Operating system:
Following are the Linux based operating system for Hadoop:
- RedHat Enterprise Linux (RHEL)
- CentOS
- Ubuntu
Java:
Hadoop is witten using Java. Oracle JDK 1.6 is most popular version of java platform for Hadoop.
- Cost effective data analysis when affordable frameworks like Hadoop is used.
- Big amount of data can be analyzed in quick time
- Insight from the data to design and predict effective marketing strategy for better business and customer serving.
- Problem in data management due to huge of data processing
- Risk of getting flaws in result due to misunderstanding and misinterpretation of data
- Risk of data hack
Area of cost |
Estimated cost per month |
Database storage of 3 TB per month based on leading platforms |
$134,000 |
Developers and eight data scientist to analyze data |
$800,000 |
Proposed IT solution:
Based on the requirements of the proposed business structure the IT solution recommended is cloud services.
Hardware requirement:
- x64 platform with AMD-V/VT-x hardware virtualization support enabled (Centos 5 or 6)
- Quad Core 2Ghz+ CPU
- 8GB+ RAM
- 3x Gig network interface cards
- 30 GB of free disk space
Software requirement:
- Ubuntu 14.04.02 LTS- Operating system for server and VMs
- Contrail 2.21 with OpenStack (Icehouse), or VMware ESXi Version 5.5.0- Hypervisor on Contrail Service Orchestration node
- Additional software – Secure File Transfer Protocol (SFTP)
Benefits of cloud service:
- More data security
- More data storage
- Energy efficient hardware for advanced computing
Drawbacks of cloud services:
- Threat of data theft
- Capacity of storage depends on bandwidth allowance
- Huge power requirement for running remote data server which increase the cost for service provider.
Cost estimation:
Cloud server provider |
Estimated cost per month |
Amazon EC2 |
$53,29 |
OpSource Server |
$74,46 |
Rackspace Server |
$55,48 |
GoGrid Server |
$52,56 |
The integration of voice and data communication over the internet has numerous benefits for the business. Due to voice and data integration, organization these days does not need separate experts to manage the telephone and data network as phones which support Voice over Internet Protocol (VoIP) can operate on the data network (Singh et al. 2014). This helps to reduce the cost for the organization by reducing the cost of employment, infrastructure and system maintenance.
VOIP or voice over IP is the transmission technique where the voice as well as multimedia content can be transmitted on the same network by using the internet protocol or IP.
The audio is first encapsulated into data packets using codecs and the data packet is then sent over the IP network (Amir et al. 2014). The device at the other end of the connection decodes this packet back into audio to extract the message. The network does not need the traditional circuit switched network for voice communication and this reduces the complexity of network significantly.
Apart from the cost, there are several other factors, which make the VOIP service attractive for the business solutions. VOIP offers excellent call quality compared to the traditional telephone service. VOIP enabled handsets can handle both the voice and data communication. VOIP makes communication secure with advanced encryption technology and this feature is must for business communication (Devi et al.2014).
ADSL refers to Asymmetric Digital Subscriber Line. It is a broadband communication technology used for internet connection. It supports higher rate of data transmission (upstream 16 to 640 Kbps and downstream 1.5 to 9 Mbps) using the same standard copper telephones line as compared to the data transmission rate supported by the standard modem lines (Bachy et al. 2015).
Cloud Services
The term next generation network refers to the changes and modification to the architecture of standard telecommunication networks. Next generation networks are designed using the internet protocol (Liang and Yu 2015). Hence, the term IP network is often used to describe the particular type of networks. The networks support both voice and data communication.
The next generation network or the NGN has brought major changes to the architecture and design principle of core telecommunication network.
NGN has integrated different transport network, previously designed for different services into one core transport network. The network is often built on Internet protocol (IP) and sometimes Ethernet is also used for this purpose (Liang and Yu 2015). The circuit switched network that was previously used in public switched telephone network for voice transmission has been integrated in the new design with the internet service network to support Voice over Internet protocol or commonly known as VOIP. This integration has eliminated the need of separate voice and data network.
Physical layer | Max data rate |
Diffuse infrared |
1 Mbit/s |
Frequency-hopping spread spectrum |
1 Mbit/s or 2 Mbit/s |
Direct-sequence spread spectrum |
1 Mbit/s or 2 Mbit/s |
Ad hoc mode is Wi-Fi wireless network technology. One of the prime benefits of using ad hoc mode is that it does not require any central wireless router to operate. This makes the infrastructure less complex and easy to maintain. The reduction in the network complexity means cost reduction. However, there are certain limitation and drawbacks regarding the technology, which needs to be considered before adopting the ad hoc solution for wireless internet connection (Akyildiz 2018). The devices used in the ad hoc network offers very limited security options against the unauthorized connection. the speed of the network is also slower than infrastructure mode and max speed is 11Mbps.
Infrastructure mode refers to 802.11 networking framework. In order to create a communication between two devices in the network, the devices first need to go through an access point or AP. The communication can be with a device or a wired network and for this feature infrastructure mode is very popular in wireless LANs (Huang et al. 2015)
SSID refers to service set identifier. It is a primary name that is used to identify a wireless network that follows the 802.11 standard.
SSID is just a network name and it is not a password for the network. The primary use of SSID is to distinguish a wireless network, so that it can be easily identified in presence of other wireless network within a short range and for this reason, hiding the SSID has no effect on the network security (Aziz, Razak and Ghani 2017).
Voice over Internet Protocol (VOIP)
Access control list is a set of rules that is used to filter the packets that come through a network. The packets are filtered on basis of various factors like source, destination and protocol (Chate and Chirchi 2015). Based on the inspections, the packets are either allowed to pass through the network or denied completely. Although firewalls plays an important role in making network secure, but the addition of list to the network makes the network even more secured as firewall itself cannot ensure effective network security.
WEP or Wired Equivalent Privacy is network protocol standard that is used to make Wi-Fi or other 802.11 wireless networks more secured.
In order to make the network secure WEP combines the key values generated by the users with the system generated key values for data encryption (Wright and Cache 2015). When used in the Wi-Fi connection the incoming data packets is encrypted so that it is only machine-readable. Previously WEP supported 64-bit encryption, which has been now increased to 128-bit encryption for added security
These are the four different method of data collection (Matthews and Ross 2014 ) :
- Questionnaires and Surveys.
- Documents or records
Quantitative Research |
Any research, which is driven by measurement of objects, and the data, which collected through various polls, surveys or interviews, is mathematical, statistical or numerical in nature is referred to as quantitative research (Matthews and Ross 2014 ). |
Qualitative Research | Qualitative research is mainly exploratory in nature. it is aimed in understanding the research topic through reasoning and exploring various opinions through interview, survey and conducting various polls. Data collected in this method is not numerical or statistical in nature (Matthews and Ross 2014 ). |
Key Verifying | Key verifying is a network security technology to make the communication over the network secured. It uses end-to-end encryption techniques such as PGP and OTR (Akyildiz 2018). It ensures that the person one talking with is the right one for whom the communication is meant for. |
Secondary Research | Secondary research refers to the research work that is conducted with secondary data. Secondary data refers to the data that is collected by someone else rather than the person who is conducting the research. Collection of data from internet is secondary research (Matthews and Ross 2014 ). |
Validation | Validation refers to the process that is used to establish documented evidence. It is useful to demonstrate that methodology followed for executing any research or activities is valid and fully compatible with the existing standards available for that particular research or activities (Wright and Cache 2015). |
Boolean Operators | Boolean operators (AND, OR and NOT) are useful in connecting and defining relationship that is existing between the search items. When a search is in initiated in the electronic databases, Boolean operators can be used to make the search records either narrower or broaden. |
Data analysis refers to the process, which is used to inspect, cleanse, transform and model the collected data (Taylor, Bogdan and Devault 2015).
There are three different types of data analytics:
Descriptive Analytics: raw data is summarized to make it human interpretable.
Predictive Analytics: it can provide strategies based on data analysis.
Prescriptive Analytics: it is used to quantify the effect of future decisions.
In order to get the insights about a process, it is not enough to collect the data and to get the most out of the data, it needs to be analyzed. Data analysis helps to clean, transform and remodel data to provide the necessary insights by explaining various concepts, theories and frameworks (Matthews and Ross 2014 ).
Exploratory research is used to discover ideas that are essential to conduct the research. It helps to get insight about the topic once the research is completed (Matthews and Ross 2014 ).
Descriptive research is helpful to describe population regarding variables. It also helps to determine frequency of occurring something by determining the relation between variables (Matthews and Ross 2014 ).
Casual research follows different approach compared to exploratory and descriptive research. It helps to determine the cause-effect relationship between variables (Matthews and Ross 2014 ).
The first step that should be followed to ensure the validity of research is to choose an experienced moderator. An experienced moderator helps to identify various issues related to the research and also give valuable suggestion to overcome those issues for effective research (Taylor, Bogdan and Devault 2015).
Wireless Networks (Ad-Hoc and Infrastructure Modes)
Triangulation is another method that is helpful to ensure the validity of research .it explore the research topic from various perspective so that an effective and valid research work can be done.
Respondent validation is another technique for research validation.
In order to ensure that a research is reliable there should be complete transparency in the method of data collection. Not only transparency, a systematic approach should be followed for data collection as it increases the reliability of the collected data. An audit trail should be maintained to clearly document every process related to the research. The data should be checked by various research members to ensure the data reliability (Taylor, Bogdan and Devault 2015).
- Online – the site is a reputed site
- Online – the date of the article is recent and well structured
- Online – the article is well documented and the data is supported by various references.
- Print – the article is peer reviewed
- Print – the article is from a well recognized author or publisher
- Print –all the information related to author , date of publishing and the version of the article is properly mentioned.
In the planning phase, various factors like procedures, research methodology and the required tolls are outlined. The plan is done by taking consideration of the organizational infrastructure and requirements. In the evaluation phase the results obtained from the research is analysed. This is done to evaluate the outcomes of the research to have a better understating of the fact that how much the research is successful and how effectively it has acquired the research objective (Spector et al. 2014).
The procedure of performance testing starts with identification of the testing environment. Then the performance acceptance criteria are identified. Once the criteria is identified the test environment is configured and the test design is implemented, after which the test is executed (Ammann and Offutt 2016).
Performance testing is a toll that is done to test software to assess the reliability of the software. The test is done to check if the software is compatible for public use and how well it can perform under workload (Lewis 2016).
References:
Akyildiz, I.F., 2018. AD HOC NETWORKS EDITORIAL (2017).
Amir, Y., Danilov, C., Goose, S., Hedqvist, D. and Terzis, A., 2006. An overlay architecture for high-quality VoIP streams. IEEE Transactions on Multimedia, 8(6), pp.1250-1262.
Ammann, P. and Offutt, J., 2016. Introduction to software testing. Cambridge University Press.
Aziz, T.A.T., Razak, M.R.A. and Ghani, N.E.A., 2017, September. The performance of different IEEE802. 11 security protocol standard on 2.4 ghz and 5GHz WLAN networks. In Engineering Technology and Technopreneurship (ICE2T), 2017 International Conference on (pp. 1-7). IEEE.
Bachy, Y., Nicomette, V., Alata, E., Kaâniche, M. and Courrege, J.C., 2015, September. Security of ISP Access Networks: practical experiments. In Dependable Computing Conference (EDCC), 2015 Eleventh European (pp. 205-212). IEEE.
Chate, A.B. and Chirchi, V.R., 2015. Access Control List Provides Security in Network. International Journal of Computer Applications, 121(22).
Devi, G.U., Kaushik, K.V., Sreeveer, B. and Prasad, K.S., 2015. VoIP over Mobile Wi-Fi Hotspot. Indian Journal of Science and Technology, 8(S2), pp.195-199.
Huang, H., Li, P., Guo, S. and Zhuang, W., 2015. Software-defined wireless mesh networks: architecture and traffic orchestration. IEEE network, 29(4), pp.24-30.
Lewis, W.E., 2016. Software testing and continuous quality improvement. CRC press.
Liang, C. and Yu, F.R., 2015. Wireless virtualization for next generation mobile cellular networks. IEEE wireless communications, 22(1), pp.61-69.
Matthews, B. and Ross, L., 2014. Research methods. Pearson Higher Ed.
Singh, H.P., Singh, S., Singh, J. and Khan, S.A., 2014. VoIP: State of art for global connectivity—A critical review. Journal of Network and Computer Applications, 37, pp.365-379.
Spector, J.M., Merrill, M.D., Elen, J. and Bishop, M.J. eds., 2014. Handbook of research on educational communications and technology (pp. 439-451). New York, NY: Springer.
Taylor, S.J., Bogdan, R. and DeVault, M., 2015. Introduction to qualitative research methods: A guidebook and resource. John Wiley & Sons.
Wright, J. and Cache, J., 2015. Hacking exposed wireless: wireless security secrets & solutions. McGraw-Hill Education Group.