Working of the Deep Web
Question:
Discuss about the Sampling strategies for information extraction.
The deep web, which is also known as the invisible web, can be stated as a part of the internet, which is not accessible to search engines over the concept of the internet. The content, which are included into the deep web, may include chat messages; email message, social media files, electronic health record (EHRs). This content can be directly be accessible in the internet but are not crawled and indexed within the search engine like Yahoo, Google, DuckDuckGo and Bing. On the other hand, it can be stated that the security, which is applicable to the cyber world, also plays a very vital role. Deep web can be considered as one of the places where illegal activity are easily performed without any interference from the third party or user. The main aspect, which is involved into the working of the deep web, is that using the Deep web is not an illegal activity but the benefit or the service, which is obtained from, should be legal. Tor is one of the most well known portals of the Deep Web. It can be used in the way of providing a virtual pathway that allows the user to communicate and navigate anonymously over the concept of the internet.
The main aim of the report is to take into consideration the different concepts, which can be, applied the aspect of Deep Web. To focus on the topic, major emphasis is put into the working of the Deep Web. The main security challenges, which are faced with this concept, are discussed putting the focus on the currency, which is used for the exchange of item over the concept. The role, which the deep web would be playing shortly, is also a point of consideration in the report.
The concept of the deep web or the deep web existed a long time back since the invention of the internet was made. The dark web can be referred to as a website, which is on top of the dark nets. They are a network, which are overlaid that off limits certain authorisations, software and configuration of the hardware. In conversation, people usually use the term dark web, deep web and shadow web interchangeably (Liu & Xiang, 2016).
The term “dark net” was coined in the 1970s, which mainly refers to the network, which is secluded from the ARPANET (the original term, which the user knows as the Internet). The exact focus point in the creation of the deep web was the creation of a private network scenario where illegal activities can be conducted. The main area, which was involved, was that the user should be kept secured so that no one has the direct access on who is accessing what. One of the examples of the dark net is the Tor. It is a network, which is anonymity, which obscures the actual IP Address of the user. The process is mainly achieved using bouncing and encrypting the communication, which is made in the network all across the world. It can be stated here that the Tor was very much far from the first dark net which was used however recent research have founded out that the students of the Stanford University and MIT were the first one to use the ARPANET to sell cannabis. Therefore, it can be said that these students were the first one to invent the concept of the “dark web”. Few of the other example, which can be related to the concept of the dark nets, which are used today, can be the I2p network and the freenet.
Security Challenges of the Deep Web
In the section, the main emphasis would be focused on how large is the concept of the Deep Web. The following statistics would be helping in identifying the perspective of the deep web in details
- The information, which is, contained in the deep web us almost 7500 Terabytes.
- Taking into comparison surface net, the deep web contains between 400 to 550 times more information.
- In recent times, more than 200000 deep web sites are currently available.
- 550 million individual documents are found in the concept of the Deep Web whereas 1 billion individual documents are founded in the normal surface web.
- Almost 95% of the deep web is directly accessible to the public. This means that the user does not have to pay any money to access and gain advantage from it.
- Together the 60 largest Deep web sites mainly consist of around 750 terabytes of data. It can be surprising that the entire surface web contains only 40 times the overall data, which are available on the deep web.
In the concept of information security relating to any concept which can be advantageous for the user or not can be restricted to three concept which is integrity, confidentiality and availability of the data. When any user who is not authorised to access a certain type of data it can be termed as loss of confidentiality of the data. It can state here that gaining unauthorised access can be an easy task. On the other hand, catching the intruders is a difficult job in hand. In the concept of the Deep web, authorised users cannot access the data and cannot alter the data, in this manner the factor of authentication and authorisation is involved in it. The deep web directly bypasses the security aspect in a way that it restricts in allowing the third party to get the information ion who access which data. This can be considered as a direct breach of the information security (Zhao et al., 2016).
The focus point in the concept of the security challenges, which can be faced with the concept of the deep, is its nature. It can be stated here in this context that deep web can easily be accessed using a tool named Tor browser. The Tor browser implements encryption layers on the outgoing and the incoming data, due to the factor it is also known as “the onion router”. The main question, which can arise in this context, is that what the encryption process serves in the process. The concept may seem to be foreign about lack of privacy on the internet. Everything, which is searched, viewed and traded in the concept of the deep web, can be considered as a rule. The anonymity in this aspect can have severe implications.
Everything, which is done in the concept of the deep web, is completely untraceable. The criminal can easily indulge in it and take advantage of the concept. Simply, it can be said that the deep web is becoming one of the hubs, which is corrupted, that involves many criminal activities. Illegal weapon, transfer of drugs and hiring of contract killers is a daily occurrence in this aspect. Illegal marketplace relating to bidding similar to shopping websites have been incorporated in the deep well to sell goods, which are illegal. There have been many laws, which are enforced, but the stopping of this criminal activity is not possible to the extent level. This type of illegal market place is highly efficient in a way that it provides an interface which is user-friendly and a search bar which helps the criminals to save time and search the illegal goods quite easily. The currency, which is used to complete the transaction Bit coin, which is a cyber-currency. This type of currency provides an extra feature for the criminal that it is nearly impossible to trace them (Calì & Straccia, 2015).
Criminal Activities in the Deep Web
The concept of the deep web is into the market for a long period, but since the year 2013, the awareness relating to stopping the activity was initiated. This was done seeing the primary deep well market place. The main tool, which is used in this concept, is a tool named Tor. Tor tool is widely used to access the deep web due to the factor that Tor uses a network of nodes, which makes it very much difficult for the third party to know who accessed what site and the time of the access. If a user wants to access, .onion sites, it would only be possible with the use of the Tor. The main risk, which is associated with the use of the Tor, is that the user who downloads it would be added to the NSA list (Das, 2015). The same concept can be applied to the Tails. In this context, it can be stated that if everyone downloads the Tor and uses it, then no one would be so much suspicious. Therefore, it can be stated that if the number of people using the Tor increase the risk which is associated with the concept can be reduced highly. It can be stated that there is no specific risk, which is associated with using the Tor tools, but the point is that what is being done with the help of the tool (Calì & Straccia, 2015).
The importance, which can be applied to the concept of monitoring, has increased recently with a focus on Tor network. This aspect can be included in other fields shortly. The factors would not only reduce the access point which is applied to the concept of the deep web but also plays a vital role in safeguarding the field of internet and its peripherals. The factor of design and webbing, which is interacting, monitoring of the dark web, would pose challenges. The following solution can be applied to safeguard the concept and provide a solution for the issue.
Mapping the service, which is hidden in the directory: The Tor uses a domain database, which is built on a system of distributed system, which is known as the “distributed hash table” or DHT. The working of the DHT is related to the nodes in the system, which take the responsibility of maintaining and storing the subsets of the database (Orsolini et al., 2017). The nature of the distributed architecture relating to the hidden services domain resolution, it is very much possible to deploy in nodes in the DTH, which would allow monitoring the request, which comes from a particular domain.
Social site monitoring: Most of the times sites such as Pastebin are often used for exchanging of address and contact information for new service which are hidden. These sites should be kept under surveillance all the time to detect spot exchange of messages, which contains new dark web domain (Kaczmarek & W?ckowski, 2018). One’s illegal activity is detected it should be seen that the site is blocked so that shortly illegal activity is not performed from the origin of the site.
The Role of Tor in Accessing the Deep Web
Monitoring of hidden service: Most of the services which are hidden can be considered to be very much volatile, and they tend to go offline very often and come back with a new domain name. In this context, it can be stated that it is very much essential to get a snapshot of every site when it is seen. The snapshot is mainly taken so that it can be analysing of the online activity can be done (Sharma & Sharma, 2017).
Semantic analysis: Ones the data is retrieved from the hidden services it should be stored in a semantic database so that it can be tracked shortly. This is done to get an idea of the activities and from where it originates.
The future trend relating to the deep web are stated below:
- It would be more secure in the past: It can be stated relating to the deep web that shortly the technology would be more advanced. It would be more difficult to detect the activity, which would be performed using the platform of the Tor. It is also one of the technological fields, which is advancing at a rapid rate.
- Market place would be stronger than before: The transaction, which is made using the electronic currency, would be more efficient and would not involve any liability. It would directly implement the full market place without any single point of failure. As the rapid increase in the market place would be developed, it would be incorporating more item into the aspect so that more and more product can be accessed using the concept.
- A gauge of reputation would be easier: In the context of the high anonymity, the trust and the reputation among the buyers and the sellers without relying on external authority can be gained.
- Tracking of the bit coins would be harder: The concept of crypto currency goes hand in hand with the concept of the Deep Web. In this context, it can be stated that shortly advanced concept would be applied which would make the concept less traceable. The concept of malware can take advantage of the block chain The block chain technology can reduce the risk which is reduced to the detection of the user when they access the site.
- More people would be involved in the concept: In a recent report, it was stated that most of the user of the normal web would not have much time by which they would be involved in the concept of the Deep Web. If the awareness is increased among users, there can be a situation where users get more involved into the concept (Barrio & Gravano, 2017). It can be said that it one of the fields was if more and more people access it the risk, which is applied to the detection, would be minimised at a rapid rate.
Conclusion
The report can be concluded on a note that despite the information that the dark web possess it is still an ambiguous part relating to the digital world. Many of the users who are involved into the web sphere has not yet heard of the term and the main operation, which is involved into the concept, and they believe that what they see is the maximum that Google can deliver. On the other hand, some claim that it is an underground world relating to crime and behaviours, which are unethical. Shortly, it can be stated that the concept of the deep web would be playing a very vital role, which would make this sector stronger. To gain advantage form the concept of the deep web the Tor has to be utilised as it is one of the best-known portals. The Tor directly enhances the web browsing security and privacy while allows information to share in a highly secret manner. The concept of the Tor shortly would be so much advanced that the detection of the user, which access the deep net through it, would not be detected anyhow. As innovation is playing a vital role in the sphere of advancement of the technology, the deep web can also be considered as a field, which is playing a role in the context.
References
Barrio, P., & Gravano, L. (2017). Sampling strategies for information extraction over the deep web. Information Processing & Management, 53(2), 309-331.
Calì, A., & Straccia, U. (2015). A Framework for Conjunctive Query Answering over Distributed Deep Web Information Resources. In SEBD (pp. 358-365).
Das, G. (2015, May). Principled Optimization Frameworks for Query Reformulation of Database Queries. In Proceedings of the Second International Workshop on Exploratory Search in Databases and the Web (pp. 2-2). ACM.
Jiang, L., Kalantidis, Y., Cao, L., Farfade, S., Tang, J., & Hauptmann, A. G. (2017, February). Delving deep into personal photo and video search. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining (pp. 801-810). ACM.
Kaczmarek, T., & W?ckowski, D. G. (2018). Harvesting deep web data through produser involvement. In The Dark Web: Breakthroughs in Research and Practice (pp. 175-198). IGI Global.
Liu, B., & Xiang, J. (2016, August). Extraction and management of meta information on the domain-oriented Deep Web. In Software Engineering and Service Science (ICSESS), 2016 7th IEEE International Conference on (pp. 787-790). IEEE.
Massouh, N., Babiloni, F., Tommasi, T., Young, J., Hawes, N., & Caputo, B. (2017). Learning deep visual object models from noisy web data: How to make it work. arXiv preprint arXiv:1702.08513.
Memon, M. H., Khan, A., Li, J. P., Shaikh, R. A., Memon, I., & Deep, S. (2014, December). Content based image retrieval based on geo-location driven image tagging on the social web. In Wavelet active media technology and information processing (ICCWAMTIP), 2014 11th international computer conference on (pp. 280-283). IEEE.
Orsolini, L., Papanti, D., Corkery, J., & Schifano, F. (2017). An insight into the deep web; why it matters for addiction psychiatry?. Human Psychopharmacology: Clinical and Experimental, 32(3).
Pavai, G., & Geetha, T. V. (2014). A Bootstrapping Approach to Classification of Deep Web Query Interfaces. International Journal on Recent Trends in Engineering & Technology, 11(2), 1.
Qiang, B., Zhang, R., Wang, Y., He, Q., Li, W., & Wang, S. (2014). Research on Deep Web Query Interface Clustering Based on Hadoop. JSW, 9(12), 3057-3062.
Sharma, D. K., & Sharma, A. K. (2017). Deep Web Information retrieval Process. The Dark Web: Breakthroughs in Research and Practice: Breakthroughs in Research and Practice, 114.
Su, H., Gong, S., Zhu, X., Popescu, A., Ginsca, A., Le Borgne, H., & Loh, Y. P. (2017, October). Weblogo-2m: Scalable logo detection by deep learning from the web. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 270-279).
Wang, Y., Lin, X., Wu, L., & Zhang, W. (2017). Effective multi-query expansions: Collaborative deep networks for robust landmark retrieval. IEEE Transactions on Image Processing, 26(3), 1393-1404.
Zhao, F., Zhou, J., Nie, C., Huang, H., & Jin, H. (2016). SmartCrawler: a two-stage crawler for efficiently harvesting deep-web interfaces. IEEE transactions on services computing, 9(4), 608-620.