US Patent 7519621 Extracting information from Web pages

Is a

Patent

Patent attributes

Current Assignee

‌

PageBites

Patent Jurisdiction

United States Patent and Trademark Office

Patent Number

7519621

Date of Patent

April 14, 2009

Patent Application Number

10838982

Date Filed

May 4, 2004

Patent Citations Received

‌

US Patent 12105763 Generation of a sequence of related text-based search queries

‌

US Patent 11995613 Search extraction matching, draw attention-fit modality, application morphing, and informed apply apparatuses, methods and systems

‌

US Patent 12106317 System and method for classifying relevant competitors

Patent Primary Examiner

Charles Rones

Patent abstract

Methods and apparatus, including computer program products, for identifying Web page content with a granularity finer than individual Web pages, e.g., finer than individual HTML documents. The invention provides a computer-implemented method for identifying Web page content. The method includes receiving a string of markup language source code that includes tags. The method includes identifying sub-sequences in which tags occur in the string. Each sub-sequence is associated with the portion of the string that starts with the first tag of the sub-sequence and ends with the last tag of the sub-sequence. The sub-sequences identified are ones that satisfy criteria for being classified as associated with a portion of the string that define Web page content constituting an entire listing. The criteria includes a requirement that an identified sub-sequence be repeated in tandem, either exactly or approximately, in the string. The method includes returning the identified sub-sequences.

Timeline

No Timeline data yet.

Further Resources

Title

Author

Link

Type

Date

No Further Resources data yet.

US Patent 7519621 Extracting information from Web pages

Contents

Patent attributes

Timeline

Further Resources

References

Find more entities like US Patent 7519621 Extracting information from Web pages