Instant analysis of small-to-medium quantities of text
Lingo3G organizes collections of text documents into clearly-labeled hierarchical folders. In real-time, fully automatically, without external knowledge bases.
Get a concise summary of the subjects discussed in a set of documents.
More efficient browsing
Navigate straight to the documents you need using clearly-labeled folders.
Refine the initial query and "drill down" on a specific subject based on cluster labels.
Painless integration into any environment using:
Java API, C# API, REST API, CLI tools, Solr or Elasticsearch plugins.
Accurate, blazing-fast, stateless
Useful hierarchical clusters
Lingo3G aims to produce clusters with concise, varied, relevant and human-readable labels.
No external taxonomies or knowledge bases needed, Lingo3G categorizes documents based only on their text.
On a desktop machine, Lingo3G clusters 100 search results in about 5 ms. Clustering 10.000 abstracts takes ~1 s.
You can add synonym definitions, such as photos = pictures = pics = photographs, to increase the quality of clustering
You can boost or suppress specific cluster labels to highlight product names or remove abusive language.
Tuning of clustering characteristics and performance in a dedicated GUI application called Lingo3G Workbench.
Numeric or enumerated document fields, such as price or tags, can be optionally used to guide clustering.
19 languages supported
Including English, German, French, Chinese Simplified, Thai and Arabic. All with automatic language detection.
Lingo3G is a stateless system: data-in, clusters-out. This makes horizontal scaling a breeze.
Pure Java library
Lingo3G works on any system supporting Java 1.8 or higher, no platform-specific dependencies.
Native C# API
Java runtime is not needed to integrate and call Lingo3G through its C# API.
Open source foundation
Lingo3G is based on the Carrot2 framework. If you've used Carrot2, switching to Lingo3G will be a breeze.
Dozens of happy customers around the globe
AcclaimIP is the fastest, most intuitive patent analytics and landscape solution available.
AcclaimIP uses Lingo3G to cluster patent documents. Not only are the clusters a great way to organize and visualize large sets of patents, but the theme extraction doubles as a keyword tool giving our customers yet another way to discover important search terms.
Matt Troyer, President at AcclaimIP, USA
eTools is the transparent Metasearch Engine built in Switzerland.
We chose Lingo3G because it exceeded our expectations in several ways: easy to integrate, many configuration options and extremely fast and lightweight. Besides that, we are always pleased with the professional care and responsiveness of the Carrot Search team.
Stephan Schmid, CEO at Comcepta, Switzerland
EPPI-Centre uses Lingo3G in its systematic reviewing software, EPPI-Reviewer.
A recent evaluation found overwhelming support for using Lingo3G, enabling users to make connections that they had not been able to predict in advance, “broadening understanding”, and so leading them to important new places.
Dr James Thomas, Associate Director, EPPI-Centre, Social Science Research Unit, Institute of Education, London
Questions & Answers
Lingo3G is stateless and processes all data in-memory. This makes it particularly suitable for clustering data coming from highly-dynamic collections, such as search results or social conversations.
Having said that, Lingo3G will be appropriate for processing any collection of texts where the total size does not exceed a few tens of megabytes.
Lingo3G was designed to perform real-time in-memory clustering of small and medium collections of documents, which roughly corresponds to about 5,000 documents, a few kilobytes each.
The upper limit very much depends on the characteristics of your documents. Some of our customers report that they successfully use Lingo3G with as many as 100,000 documents. Please contact us for an evaluation license and performance tuning advice.
For collections spanning millions of documents and gigabytes of text, consider Lingo4G.
No. Lingo3G is a software component intended for use and embedding in other applications. Some programming experience is required to apply Lingo3G to custom data.
Lingo3G does come with a clustering tuning GUI called Clustering Workbench, which may serve end-user needs to a limited extent. However, our development efforts concentrate primarily on the core clustering algorithms, developing and supporting user-facing applications has lower priority.
We require one Lingo3G license per one physical or virtual server that runs Lingo3G binaries, regardless of the number of cores on the server, the number of users or requests handled by the server.
For large-scale or non-typical deployment scenarios, such as OEM distribution, please get in touch.
Absolutely! Please get in touch for a free evaluation package.
The best place to start would be the Lingo3G Manual. For an in-depth introduction to search results clustering algorithms and engines, please see:
A survey of Web clustering engines. ACM Computing Surveys (CSUR), Volume 41, Issue 3 (July 2009), Article No. 17, ISSN: 0360-0300 (PDF).
The paper reports on the evaluation of a number of search results clustering engines, including Lingo3G.