sphinx的物种分类和Elasticsearch有什么不同啊

点击联系发帖人 时间：2018-07-30 07:42

逃离金字塔sphinx攻略

17分享收藏文章被以下专栏收录寂寂轻舞，静待秋风
从sphinx迁移到elasticsearch常用的功能
项目迁移前在es上需要实现的操作
size=x&from=x
"sort": { "title": { "order": "asc" }}
多字段都含有不同的字段
"query": {
"should": [
{ "match": { "title":
"邪气" }},
{ "match": { "author": "七星"
多字段含有相同的字段或
"query": {
"should": [
{ "match": { "title":
"War and Peace" }},
{ "match": { "author": "Leo Tolstoy"
POST /book/fulltext/_search
"query": {
"dis_max": {
"queries": [
{ "match": { "title": "青春" }},
{ "match": { "author":
搜索权重设置
POST /book/fulltext/_search
"multi_match": {
"best_fields",
[ "title^2", "author" ],
"tie_breaker":
"minimum_should_match": "30%"
"query": {
"multi_match": {
"type": "best_fields",
//完全匹配权重高
"query" : "我的宝马多少马力",
"fields" : ["title", "content"]
简单的过滤
"multi_match": {
"best_fields",
[ "title^2", "author" ],
"tie_breaker":
"minimum_should_match": "30%"
"post_filter": {
"title":"青春期"
A filtered query affects both search results and aggregations.filtered查询会影响搜索结果和聚合。
filter桶只影响聚合。
post_filter只影响搜索结果。
使用php操作es
require_once('vendor/autoload.php');
$params = array('hosts' =& array (
'127.0.0.1:9200',
$client = new Elasticsearch\Client($params);
$params = array();
$params['index'] = 'book';
$params['type'] = 'log_type';
$params['body']['query']['match']['src_ip'] = '1.122.33.141';
$params = array(
'index'=&'book',
'type'=&'fulltext',
'body' =& array (
'query' =&array (
'multi_match' =&array (
'query' =& '青春',
'type' =& 'best_fields',
'fields' =& array (
0 =& 'title^2',
1 =& 'author',
'tie_breaker' =& 0.99999,
'minimum_should_match' =& '30%',
$start_time = microtime(true);
$res = $client-&search($params);
$end_time = microtime(true);
echo "cost time:".($end_time - $start_time);
print_r($res);
$params = array();
$params['body'] = array (
'title' =& '征婚女人',
'author' =& '青青翠竹',
$params['id'] = 32;
$params['index'] = 'book';
$params['type'] = 'fulltext';
$ret = $client-&c
玩转ElasticSearch】横向对比ElasticSearch与Sphinx
三、ElasticSearch6 集群安装
开源搜索引擎评估:lucene sphinx elasticsearch
Lucene Sphinx 全文索引对比
主流全文索引工具的比较（ Lucene, Sphinx, solr, elastic search)
Sphinx(狮身人面)比lucene还牛的搜索引擎
paip.;论全文检索实现方式lucene
solr以及比较
关于sphinx – 我有话要说
没有更多推荐了，ElasticSearch, Sphinx, Lucene, Solr, Xapian. Which fits for which usage? - Stack Overflow
to customize your list.
This site uses cookies to deliver our services and to show you relevant ads and job listings.
By using our site, you acknowledge that you have read and understand our , , and our .
Your use of Stack Overflow’s Products and Services, including the Stack Overflow Network, is subject to these policies and terms.
Join Stack Overflow to learn, share knowledge, and build your career.
or sign in with
I'm currently looking at other search methods rather than having a huge SQL query.
recently and played with
(a Python implementation of a search engine).
Can you give reasons for your choice(s)?
5,8181771112
3,93352031
closed as not constructive by user252398, , , ,
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references,
or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question
can be improved and possibly reopened,
for guidance. If this question can be reworded to fit the rules in the , please .
As the creator of ElasticSearch, maybe I can give you some reasoning on why I went ahead and created it in the first place :).
Using pure Lucene is challenging. There are many things that you need to take care for if you want it to really perform well, and also, its a library, so no distributed support, its just an embedded Java library that you need to maintain.
In terms of Lucene usability, way back when (almost 6 years now), I created Compass. Its aim was to simplify using Lucene and make everyday Lucene simpler. What I came across time and time again is the requirement to be able to have Compass distributed. I started to work on it from within Compass, by integrating with data grid solutions like GigaSpaces, Coherence and Terracotta, but its not enough.
At its core, a distributed Lucene solution needs to be sharded. Also, with the advancement of HTTP and JSON as ubiquitous APIs, it means that a solution that many different systems with different languages can easily be used.
This is why I went ahead and created ElasticSearch. It has a very advanced distributed model,
speaks JSON natively, and exposes many advanced search features, all seamlessly expressed through JSON DSL.
Solr is also a solution for exposing an indexing/search server over HTTP, but I would argue that
provides a much superior distributed model and ease of use (though currently lacking on some of the search features, but not for long, and in any case, the plan is to get all Compass features into ElasticSearch). Of course, I am biased, since I created ElasticSearch, so you might need to check for yourself.
As for Sphinx, I have not used it, so I can't comment. What I can refer you is to
which I think proves the superior distributed model of ElasticSearch.
Of course, ElasticSearch has many more features then just being distributed. It is actually built with cloud in mind. You can check the feature list on the site.
5,8181771112
We use Lucene regularly to index and
search tens of millions of documents.
Searches are quick enough, and we use
incremental updates that do not take
a long time. It did take us some time
to get here. The strong points of
Lucene are its scalability, a large
range of features and an active
community of developers. Using bare
Lucene requires programming in Java.
If you are starting afresh, the tool for you in the Lucene family is , which is much easier to set up than bare Lucene, and has almost all of Lucene's power. It can import database documents easily. Solr are written in Java, so any modification of Solr requires Java knowledge, but you can do a lot just by tweaking configuration files.
I have also heard good things about Sphinx, especially in conjunction with a MySQL database. Have not used it, though.
IMO, you should choose according to:
The required functionality - e.g. do you need a French stemmer? Lucene and Solr have one, I do not know about the others.
Proficiency in the implementation language - Do not touch Java Lucene if you do not know Java. You may need C++ to do stuff with Sphinx. Lucene has also been ported into
. This is mostly important if you want to extend the search engine.
Ease of experimentation - I believe Solr is best in this aspect.
Interfacing with other software - Sphinx has a good interface with MySQL. Solr supports ruby, XML and JSON interfaces as a RESTful server. Lucene only gives you programmatic access through Java.
are wrappers of Lucene that integrate it into larger frameworks.
18.9k43763
I have used Sphinx, Solr and Elasticsearch. Solr/Elasticsearch are built on top of Lucene. It adds many common functionality: web server api, faceting, caching, etc.
If you want to just have a simple full text search setup, Sphinx is a better choice.
If you want to customize your search at all, Elasticsearch and Solr are the better choices. They are very extensible: you can write your own plugins to adjust result scoring.
Some example usages:
Sphinx: craigslist.org
Solr: Cnet, Netflix, digg.com
Elasticsearch: Foursquare, Github
5,29084465
We use Sphinx in a Vertical Search project with 10.000.000 + of MySql records and 10+ different database .
It has got very excellent support for MySQL and high performance on indexing , research is fast but maybe a little less than Lucene.
However it's the right choice if you need quickly indexing every day and use a MySQL db.
An experiment to
11.7k1172152
My sphinx.conf
source post_source
type = mysql
sql_host = localhost
sql_user = ***
sql_pass = ***
sql_port = 3306
sql_query_pre = SET NAMES utf8
# query before fetching rows to index
sql_query = SELECT *, id AS pid, CRC32(safetag) as safetag_crc32 FROM hb_posts
sql_attr_uint = pid
# pid (as 'sql_attr_uint') is necessary for sphinx
# this field must be unique
# that is why I like sphinx
# you can store custom string fields into indexes (memory) as well
sql_field_string = title
sql_field_string = slug
sql_field_string = content
sql_field_string = tags
sql_attr_uint = category
# integer fields must be defined as sql_attr_uint
sql_attr_timestamp = date
# timestamp fields must be defined as sql_attr_timestamp
sql_query_info_pre = SET NAMES utf8
# if you need unicode support for sql_field_string, you need to patch the source
# this param. is not supported natively
sql_query_info = SELECT * FROM my_posts WHERE id = $id
index posts
source = post_source
# source above
path = /var/data/posts
# index location
charset_type = utf-8
Test script:
require "sphinxapi.php";
$safetag = $_GET["my_post_slug"];
$safetag = preg_replace("/[^a-z0-9\-_]/i", "", $safetag);
$conf = getMyConf();
$cl = New SphinxClient();
$cl-&SetServer($conf["server"], $conf["port"]);
$cl-&SetConnectTimeout($conf["timeout"]);
$cl-&setMaxQueryTime($conf["max"]);
# set search params
$cl-&SetMatchMode(SPH_MATCH_FULLSCAN);
$cl-&SetArrayResult(TRUE);
$cl-&setLimits(0, 1, 1);
# looking for the post (not searching a keyword)
$cl-&SetFilter("safetag_crc32", array(crc32($safetag)));
# fetch results
$post = $cl-&Query(null, "post_1");
echo "&pre&";
var_dump($post);
echo "&/pre&";
exit("done");
Sample result:
[array] =&
"id" =& 123,
"title" =& "My post title.",
"content" =& "My &p&post&/p& content.",
[ and other fields ]
Sphinx query time:
0.001 sec.
Sphinx query time (1k concurrent):
=& 0.346 sec. (average)
=& 0.340 sec. (average of last 10 query)
MySQL query time:
"SELECT * FROM hb_posts WHERE id = 123;"
=& 0.001 sec.
MySQL query time (1k concurrent):
"SELECT * FROM my_posts WHERE id = 123;"
=& 1.612 sec. (average)
=& 1.920 sec. (average of last 10 query)
The only elasticsearch vs solr performance comparison I've been able to find so far is here:
Lucene is nice and all, but their stop word set is awful. I had to manually add a ton of stop words to StopAnalyzer.ENGLISH_STOP_WORDS_SET just to get it anywhere near usable.
I haven't used Sphinx but I know people swear by its speed and near-magical "ease of setup to awesomeness" ratio.
Try indextank.
As the case of elastic search, it was conceived to be much easier to use than lucene/solr. It also includes very flexible scoring system that can be tweaked without reindexing.
Not the answer you're looking for?
Browse other questions tagged
Stack Overflow works best with JavaScript enabledJAVA工程师（大数据和机器学习）
solr与Elasticsearch对比
搜索引擎：Solr与Elasticsearch比较分析
Elasticsearch是一个实时的分布式搜索和分析引擎。它可以帮助你用前所未有的速度去处理大规模数据。
它可以用于全文搜索，结构化搜索以及分析，当然你也可以将这三者进行组合。
Elasticsearch是一个建立在全文搜索引擎 Apache Lucene(TM) 基础上的搜索引擎，可以说Lucene是当今最先进，最高效的全功能开源搜索引擎框架。
但是Lucene只是一个框架，要充分利用它的功能，需要使用JAVA，并且在程序中集成Lucene。需要很多的学习了解，才能明白它是如何运行的，Lucene确实非常复杂。
Elasticsearch使用Lucene作为内部引擎，但是在使用它做全文搜索时，只需要使用统一开发好的API即可，而不需要了解其背后复杂的Lucene的运行原理。
当然Elasticsearch并不仅仅是Lucene这么简单，它不但包括了全文搜索功能，还可以进行以下工作:
分布式实时文件存储，并将每一个字段都编入索引，使其可以被搜索。
实时分析的分布式搜索引擎。
可以扩展到上百台服务器，处理PB级别的结构化或非结构化数据。
这么多的功能被集成到一台服务器上，你可以轻松地通过客户端或者任何你喜欢的程序语言与ES的RESTful API进行交流。
Elasticsearch的上手是非常简单的。它附带了很多非常合理的默认值，这让初学者很好地避免一上手就要面对复杂的理论，
它安装好了就可以使用了，用很小的学习成本就可以变得很有生产力。
随着越学越深入，还可以利用Elasticsearch更多高级的功能，整个引擎可以很灵活地进行配置。可以根据自身需求来定制属于自己的Elasticsearch。
使用案例：
维基百科使用Elasticsearch来进行全文搜做并高亮显示关键词，以及提供search-as-you-type、did-you-mean等搜索建议功能。
英国卫报使用Elasticsearch来处理访客日志，以便能将公众对不同文章的反应实时地反馈给各位编辑。
StackOverflow将全文搜索与地理位置和相关信息进行结合，以提供more-like-this相关问题的展现。
GitHub使用Elasticsearch来检索超过1300亿行代码。
每天，Goldman Sachs使用它来处理5TB数据的索引，还有很多投行使用它来分析股票市场的变动。
但是Elasticsearch并不只是面向大型企业的，它还帮助了很多类似DataDog以及Klout的创业公司进行了功能的扩展。
Elasticsearch的优缺点:
Elasticsearch是分布式的。不需要其他组件，分发是实时的，被叫做”Push replication”。Elasticsearch 完全支持 Apache Lucene 的接近实时的搜索。处理多租户（）不需要特殊配置，而Solr则需要更多的高级设置。Elasticsearch 采用 Gateway 的概念，使得完备份更加简单。各节点组成对等的网络结构，某些节点出现故障时会自动分配其他节点代替其进行工作。
只有一名开发者（当前Elasticsearch GitHub组织已经不只如此，已经有了相当活跃的维护者）还不够自动（不适合当前新的Index Warmup API）
Solr（读作“solar”）是Apache Lucene项目的开源企业搜索平台。其主要功能包括全文检索、命中标示、分面搜索、动态聚类、数据库集成，以及富文本（如Word、PDF）的处理。Solr是高度可扩展的，并提供了分布式搜索和索引复制。Solr是最流行的企业级搜索引擎，Solr4
还增加了NoSQL支持。
Solr是用Java编写、运行在Servlet容器（如 Apache Tomcat 或Jetty）的一个独立的全文搜索服务器。 Solr采用了 Lucene Java 搜索库为核心的全文索引和搜索，并具有类似REST的HTTP/XML和JSON的API。Solr强大的外部配置功能使得无需进行Java编码，便可对其进行调整以适应多种类型的应用程序。Solr有一个插件架构，以支持更多的高级定制。
因为2010年 Apache Lucene 和 Apache Solr 项目合并，两个项目是由同一个Apache软件基金会开发团队制作实现的。提到技术或产品时，Lucene/Solr或Solr/Lucene是一样的。
Solr的优缺点
Solr有一个更大、更成熟的用户、开发和贡献者社区。支持添加多种格式的索引，如：HTML、PDF、微软 Office 系列软件格式以及 JSON、XML、CSV 等纯文本格式。Solr比较成熟、稳定。不考虑建索引的同时进行搜索，速度更快。
建立索引时，搜索效率下降，实时索引搜索效率不高。
Elasticsearch与Solr的比较
当单纯的对已有数据进行搜索时，Solr更快。
当实时建立索引时, Solr会产生io阻塞，查询性能较差, Elasticsearch具有明显的优势。
随着数据量的增加，Solr的搜索效率会变得更低，而Elasticsearch却没有明显的变化。
综上所述，Solr的架构不适合实时搜索的应用。
实际生产环境测试
下图为将搜索引擎从Solr转到Elasticsearch以后的平均查询速度有了50倍的提升。
Elasticsearch 与 Solr 的比较总结
二者安装都很简单；Solr 利用 Zookeeper 进行分布式管理，而 Elasticsearch 自身带有分布式协调管理功能;Solr 支持更多格式的数据，而 Elasticsearch 仅支持json文件格式；Solr 官方提供的功能更多，而 Elasticsearch 本身更注重于核心功能，高级功能多有第三方插件提供；Solr 在传统的搜索应用中表现好于 Elasticsearch，但在处理实时搜索应用时效率明显低于 Elasticsearch。
Solr 是传统搜索应用的有力解决方案，但 Elasticsearch 更适用于新兴的实时搜索应用。
关于Solr/ES，我们不得不知道的十件事
elasticsearch与solr比较
Elasticsearch与solr区别
Elasticsearch与Solr比较
Solr 和 ElasticSearch 对比
Solr与ES（ElasticSearch）的对比
搜索引擎选择： Elasticsearch与Solr
solr和Elasticsearch搜索引擎的区别和使用方式
搜索引擎solr和elasticsearch
Solr vs Elasticsearch vs Lucene
没有更多推荐了，
(window.slotbydup=window.slotbydup || []).push({
id: '5865575',
container: s,
size: '300,250',
display: 'inlay-fix'elasticsearch 在大数据中能实现哪些功能_百度知道
elasticsearch 在大数据中能实现哪些功能
答题抽奖
首次认真答题后
即可获得3次抽奖机会，100%中奖。
兄弟连教育北京总校知道合伙人
专注培养IT技术人才
兄弟连教育北京总校
知道合伙人
兄弟连IT教育专注PHP培训，JAVA培训，大数据培训，HTML5培训，UI培训，Linux培训,python培训，云计算培训。是中国最大的移动开发高端人才教育平台，也是中国移动互联网研发人才一体化服务的领导者！
由于需要提升项目的搜索质量，最近研究了一下Elasticsearch，一款非常优秀的分布式搜索程序。最开始的一些笔记放到github，这里只是归纳总结一下。首先，为什么要使用Elasticsearch？最开始的时候，我们的项目仅仅使用MySQL进行简单的搜索，然后一个不能索引的like语句，直接拉低MySQL的性能。后来，我们曾考虑过sphinx，并且sphinx也在之前的项目中成功实施过，但想想现在的数据量级，多台MySQL，以及搜索服务本身HA，还有后续扩容的问题，我们觉得sphinx并不是一个最优的选择。于是自然将目光放到了Elasticsearch上面。根据官网自己的介绍，Elasticsearch是一个分布式搜索服务，提供Restful API，底层基于Lucene，采用多shard的方式保证数据安全，并且提供自动resharding的功能，加之github等大型的站点也采用 Elasticsearch作为其搜索服务，我们决定在项目中使用Elasticsearch。对于Elasticsearch，如果要在项目中使用，需要解决如下问题：索引，对于需要搜索的数据，如何建立合适的索引，还需要根据特定的语言使用不同的analyzer等。搜索，Elasticsearch提供了非常强大的搜索功能，如何写出高效的搜索语句？数据源，我们所有的数据是存放到MySQL的，MySQL是唯一数据源，如何将MySQL的数据导入到Elasticsearch？对于1和2，因为我们的数据都是从MySQL生成，index的field是固定的，主要做的工作就是根据业务场景设计好对应的mapping以及search语句就可以了，当然实际不可能这么简单，需要我们不断的调优。而对于3，则是需要一个工具将MySQL的数据导入Elasticsearch，因为我们对搜索实时性要求很高，所以需要将MySQL的增量数据实时导入，笔者唯一能想到的就是通过row based binlog来完成。而近段时间的工作，也就是实现一个MySQL增量同步到Elasticsearch的服务。LuceneElasticsearch底层是基于Lucene的，Lucene是一款优秀的搜索lib，当然，笔者以前仍然没有接触使用过。:-)Lucene关键概念：Document：用来索引和搜索的主要数据源，包含一个或者多个Field，而这些Field则包含我们跟Lucene交互的数据。Field：Document的一个组成部分，有两个部分组成，name和value。Term：不可分割的单词，搜索最小单元。Token：一个Term呈现方式，包含这个Term的内容，在文档中的起始位置，以及类型。Lucene使用Inverted index来存储term在document中位置的映射关系。譬如如下文档：Elasticsearch Server 1.0 （document 1）Mastring Elasticsearch （document 2）Apache Solr 4 Cookbook （document 3）使用inverted index存储，一个简单地映射关系：TermCountDocuemnt1.0
Elasticsearch
对于上面例子，我们首先通过分词算法将一个文档切分成一个一个的token，再得到该token与document的映射关系，并记录token出现的总次数。这样就得到了一个简单的inverted index。Elasticsearch关键概念要使用Elasticsearch，笔者认为，只需要理解几个基本概念就可以了。在数据层面，主要有：Index：Elasticsearch用来存储数据的逻辑区域，它类似于关系型数据库中的db概念。一个index可以在一个或者多个shard上面，同时一个shard也可能会有多个replicas。Document：Elasticsearch里面存储的实体数据，类似于关系数据中一个table里面的一行数据。document由多个field组成，不同的document里面同名的field一定具有相同的类型。document里面field可以重复出现，也就是一个field会有多个值，即multivalued。Document type：为了查询需要，一个index可能会有多种document，也就是document type，但需要注意，不同document里面同名的field一定要是相同类型的。Mapping：存储field的相关映射信息，不同document type会有不同的mapping。对于熟悉MySQL的童鞋，我们只需要大概认为Index就是一个db，document就是一行数据，field就是table的column，mapping就是table的定义，而document type就是一个table就可以了。Document type这个概念其实最开始也把笔者给弄糊涂了，其实它就是为了更好的查询，举个简单的例子，一个index，可能一部分数据我们想使用一种查询方式，而另一部分数据我们想使用另一种查询方式，于是就有了两种type了。不过这种情况应该在我们的项目中不会出现，所以通常一个index下面仅会有一个 type。在服务层面，主要有：Node: 一个server实例。Cluster：多个node组成cluster。Shard：数据分片，一个index可能会存在于多个shards，不同shards可能在不同nodes。Replica：shard的备份，有一个primary shard，其余的叫做replica shards。Elasticsearch之所以能动态resharding，主要在于它最开始就预先分配了多个shards（貌似是1024），然后以shard为单位进行数据迁移。这个做法其实在分布式领域非常的普遍，codis就是使用了1024个slot来进行数据迁移。因为任意一个index都可配置多个replica，通过冗余备份的方式保证了数据的安全性，同时replica也能分担读压力，类似于MySQL中的slave。Restful APIElasticsearch提供了Restful API，使用json格式，这使得它非常利于与外部交互，虽然Elasticsearch的客户端很多，但笔者仍然很容易的就写出了一个简易客户端用于项目中，再次印证了Elasticsearch的使用真心很容易。Restful的接口很简单，一个url表示一个特定的资源，譬如/blog/article/1，就表示一个index为blog，type为aritcle，id为1的document。而我们使用http标准method来操作这些资源，POST新增，PUT更新，GET获取，DELETE删除，HEAD判断是否存在。这里，友情推荐httpie，一个非常强大的http工具，个人感觉比curl还用，几乎是命令行调试Elasticsearch的绝配。一些使用httpie的例子:# createhttp POST :9200/blog/article/1 title=&hello elasticsearch& tags:='[&elasticsearch&]'# gethttp GET :9200/blog/article/1# updatehttp PUT :9200/blog/article/1 title=&hello elasticsearch& tags:='[&elasticsearch&, &hello&]'# deletehttp DELETE :9200/blog/article/1# existshttp HEAD :9200/blog/article/1索引和搜索虽然Elasticsearch能自动判断field类型并建立合适的索引，但笔者仍然推荐自己设置相关索引规则，这样才能更好为后续的搜索服务。我们通过定制mapping的方式来设置不同field的索引规则。而对于搜索，Elasticsearch提供了太多的搜索选项，就不一一概述了。索引和搜索是Elasticsearch非常重要的两个方面，直接关系到产品的搜索体验，但笔者现阶段也仅仅是大概了解了一点，后续在详细介绍。同步MySQL数据Elasticsearch是很强大，但要建立在有足量数据情况下面。我们的数据都在MySQL上面，所以如何将MySQL的数据导入Elasticsearch就是笔者最近研究的东西了。虽然现在有一些实现，譬如elasticsearch-river-jdbc，或者elasticsearch-river-mysql，但笔者并不打算使用。elasticsearch-river-jdbc的功能是很强大，但并没有很好的支持增量数据更新的问题，它需要对应的表只增不减，而这个几乎在项目中是不可能办到的。elasticsearch-river-mysql倒是做的很不错，采用了python-mysql-replication来通过binlog获取变更的数据，进行增量更新，但它貌似处理MySQL dump数据导入的问题，不过这个笔者真的好好确认一下？话说，python-mysql-replication笔者还提交过pull解决了minimal row image的问题，所以对elasticsearch-river-mysql这个项目很有好感。只是笔者决定自己写一个出来。为什么笔者决定自己写一个，不是因为笔者喜欢造轮子，主要原因在于对于这种MySQL syncer服务（增量获取MySQL数据更新到相关系统），我们不光可以用到Elasticsearch上面，而且还能用到其他服务，譬如cache上面。所以笔者其实想实现的是一个通用MySQL syncer组件，只是现在主要关注Elasticsearch罢了。项目代码在这里go-mysql-elasticsearch，现已完成第一阶段开发，内部对接测试中。go-mysql-elasticsearch的原理很简单，首先使用mysqldump获取当前MySQL的数据，然后在通过此时binlog的name和position获取增量数据。一些限制：binlog一定要变成row-based format格式，其实我们并不需要担心这种格式的binlog占用太多的硬盘空间，MySQL 5.6之后GTID模式都推荐使用row-based format了，而且通常我们都会把控SQL语句质量，不允许一次性更改过多行数据的。需要同步的table最好是innodb引擎，这样mysqldump的时候才不会阻碍写操作。需要同步的table一定要有主键，好吧，如果一个table没有主键，笔者真心会怀疑设计这个table的同学编程水平了。多列主键也是不推荐的，笔者现阶段不打算支持。一定别动态更改需要同步的table结构，Elasticsearch只能支持动态增加field，并不支持动态删除和更改field。通常来说，如果涉及到alter table，很多时候已经证明前期设计的不合理以及对于未来扩展的预估不足了。更详细的说明，等到笔者完成了go-mysql-elasticsearch的开发，并通过生产环境中测试了，再进行补充。总结最近一周，笔者花了不少时间在Elasticsearch上面，现在算是基本入门了。其实笔者觉得，对于一门不懂的技术，找一份靠谱的资料（官方文档或者入门书籍），蛋疼的对着资料敲一遍代码，不懂的再问google，最后在将其用到实际项目，这门技术就算是初步掌握了，当然精通还得在下点功夫。现在笔者只是觉得Elasticsearch很美好，上线之后铁定会有坑的，那时候只能慢慢填了。话说，笔者是不是要学习下java了，省的到时候看不懂代码就惨了。:-)
为你推荐：
其他类似问题
个人、企业类
违法有害信息,请在下方选择后提交
色情、暴力
我们会通过消息、邮箱等方式尽快将举报结果通知您。}

常信村百科网