portotim: PDF searcher

The search engine works quite well. I tried couple of searches and in both cases found what I was looking for. Sometimes it gives a direct PDF link, in other cases you might need to dowload the torrent and use the torrent client. Overall, it’s a very useful engine. Highly recommended.
PDF is the Portable Document Format used by Adobe Acrobat. It is designed for brochures, magazines, forms, reports and other materials with complex visual designs which will be printed on PostScript (tm) printers. The format was created to remove machine and platform dependence for the documents, and its goals include design fidelity and typographic control. It was never designed for interactive online reading. However, many word processors, page layout and other programs can create PDF files easily, so many sites are now serving them online.
Adobe has a PDF Plug-In for browsers and some development tools to allow servers to send PDF in chunks ("byte-range serving") rather than downloading the entire file. This improves the user experience of receiving PDF files, but they still lack the speed, simplicity and user control of HTML.
PDF files have a specified page size, for example, and do not reflow in smaller windows, so people with small screens spend a lot of time scrolling around the window. In addition, copying text from a PDF file is very difficult, as sidebar text is included, and selections cannot cross page breaks.
If at all possible, you should serve both HTML and PDF versions of files, designing the HTML for onscreen use and searching, and PDF for printing only. That provides your users with the best format for their task, rather than making too many compromises on one side or the other. HTML files are better for searching as well!PDF files usually have both text, and graphical representations of the text, with indications of exactly where that text should be displayed. However, there are several cases where this does not work for searching:
Documents which were scanned directly into PDF may only have the graphic portion: there may be no computer-readable text at all. These documents are not searchable.
Documents that were scanned and converted from graphic display to digital text using OCR (optical character recognition) may have significant numbers of errors. This is more common if the original document is old or was not perfectly aligned. In this case, many search terms will not be matched although the words were in the original printed or typed text, because they were not correctly interpreted. Some search terms may be falsely matched if the OCR software incorrectly interpreted the original text.Documents with multiple columns which were converted to PDF by some layout programs will display correctly and contain the correct digital text, but they miss the text flow: the words don't come in the correct sequence. Therefore the search engines will fail to match phrase queries because the phrases were wrapped on the next line of the column in the original, but that relationship was not stored in the PDF.Documents generated by some applications will contain partial words due to hyphenation, incorrect coding of ligatures and extended characters (diacriticals and letters beyond the basic 26), and other unusual situations. These mangled words will not match queries, although the words were in the original text. Informations about PDF searcher you'll find on our site.

portotim

joi, 24 noiembrie 2011

PDF searcher

Niciun comentariu:

Trimiteți un comentariu