i have spark job reads cassandra, processes/transforms/filters data, , writes results elasticsearch. use docker integration tests, , running trouble of writing spark elasticsearch.
dependencies:
"joda-time" % "joda-time" % "2.9.4", "javax.servlet" % "javax.servlet-api" % "3.1.0", "org.elasticsearch" % "elasticsearch" % "2.3.2", "org.scalatest" %% "scalatest" % "2.2.1", "com.github.nscala-time" %% "nscala-time" % "2.10.0", "cascading" % "cascading-hadoop" % "2.6.3", "cascading" % "cascading-local" % "2.6.3", "com.datastax.spark" %% "spark-cassandra-connector" % "1.4.2", "com.datastax.cassandra" % "cassandra-driver-core" % "2.1.5", "org.elasticsearch" % "elasticsearch-hadoop" % "2.3.2" excludeall(exclusionrule("org.apache.storm")), "org.apache.spark" %% "spark-catalyst" % "1.4.0" % "provided"
in unit tests can connect elasticsearch using transportclient setup template , index
aka. works
val conf = new sparkconf().setappname("test_reindex").setmaster("local") .set("spark.cassandra.input.split.size_in_mb", "67108864") .set("spark.cassandra.connection.host", cassandrahoststring) .set("es.nodes", elasticsearchhoststring) .set("es.port", "9200") .set("http.publish_host", "") sc = new sparkcontext(conf) esclient = transportclient.builder().build() esclient.addtransportaddress(new inetsockettransportaddress(inetaddress.getbyname(elasticsearchhoststring), 9300)) esclient.admin().indices().prepareputtemplate(testtemplate).setsource(source.frominputstream(getclass.getresourceasstream("/mytemplate.json")).mkstring).execute().actionget() esclient.admin().indices().preparecreate(estestindex).execute().actionget() esclient.admin().indices().preparealiases().addalias(estestindex, "hot").execute().actionget()
however when try run
esspark.savetoes( myrdd, "hot/mytype", map("es.mapping.id" -> "id", "es.mapping.parent" -> "parent_id") )
i receive stack trace
org.elasticsearch.hadoop.rest.eshadoopnonodesleftexception: connection error (check network and/or proxy settings)- nodes failed; tried [[172.17.0.2:9200]] @ org.elasticsearch.hadoop.rest.networkclient.execute(networkclient.java:142) @ org.elasticsearch.hadoop.rest.restclient.execute(restclient.java:434) @ org.elasticsearch.hadoop.rest.restclient.executenotfoundallowed(restclient.java:442) @ org.elasticsearch.hadoop.rest.restclient.exists(restclient.java:518) @ org.elasticsearch.hadoop.rest.restclient.touch(restclient.java:524) @ org.elasticsearch.hadoop.rest.restrepository.touch(restrepository.java:491) @ org.elasticsearch.hadoop.rest.restservice.initsingleindex(restservice.java:412) @ org.elasticsearch.hadoop.rest.restservice.createwriter(restservice.java:400) @ org.elasticsearch.spark.rdd.esrddwriter.write(esrddwriter.scala:40) @ org.elasticsearch.spark.rdd.esspark$$anonfun$savetoes$1.apply(esspark.scala:67) @ org.elasticsearch.spark.rdd.esspark$$anonfun$savetoes$1.apply(esspark.scala:67) @ org.apache.spark.scheduler.resulttask.runtask(resulttask.scala:66) @ org.apache.spark.scheduler.task.run(task.scala:89) @ org.apache.spark.executor.executor$taskrunner.run(executor.scala:214) @ java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor.java:1142) @ java.util.concurrent.threadpoolexecutor$worker.run(threadpoolexecutor.java:617) @ java.lang.thread.run(thread.java:745) 16/08/08 12:30:46 warn tasksetmanager: lost task 0.0 in stage 2.0 (tid 2, localhost): org.elasticsearch.hadoop.rest.eshadoopnonodesleftexception: connection error (check network and/or proxy settings)- nodes failed; tried [[172.17.0.2:9200]] @ org.elasticsearch.hadoop.rest.networkclient.execute(networkclient.java:142) @ org.elasticsearch.hadoop.rest.restclient.execute(restclient.java:434) @ org.elasticsearch.hadoop.rest.restclient.executenotfoundallowed(restclient.java:442) @ org.elasticsearch.hadoop.rest.restclient.exists(restclient.java:518) @ org.elasticsearch.hadoop.rest.restclient.touch(restclient.java:524) @ org.elasticsearch.hadoop.rest.restrepository.touch(restrepository.java:491) @ org.elasticsearch.hadoop.rest.restservice.initsingleindex(restservice.java:412) @ org.elasticsearch.hadoop.rest.restservice.createwriter(restservice.java:400) @ org.elasticsearch.spark.rdd.esrddwriter.write(esrddwriter.scala:40) @ org.elasticsearch.spark.rdd.esspark$$anonfun$savetoes$1.apply(esspark.scala:67) @ org.elasticsearch.spark.rdd.esspark$$anonfun$savetoes$1.apply(esspark.scala:67) @ org.apache.spark.scheduler.resulttask.runtask(resulttask.scala:66) @ org.apache.spark.scheduler.task.run(task.scala:89) @ org.apache.spark.executor.executor$taskrunner.run(executor.scala:214) @ java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor.java:1142) @ java.util.concurrent.threadpoolexecutor$worker.run(threadpoolexecutor.java:617) @ java.lang.thread.run(thread.java:745)
i can verify using 'docker network inspect bridge trying connect correct ip address.
docker network inspect bridge [ { "name": "bridge", "id": "ef184e3be3637be28f854c3278f1c8647be822a9413120a8957de6d2d5355de1", "scope": "local", "driver": "bridge", "enableipv6": false, "ipam": { "driver": "default", "options": null, "config": [ { "subnet": "172.17.0.0/16", "gateway": "172.17.0.1" } ] }, "internal": false, "containers": { "0c79680de8ef815bbe4bdd297a6f845cce97ef18bb2f2c12da7fe364906c3676": { "name": "analytics_rabbitmq_1", "endpointid": "3f03fdabd015fa1e2af802558aa59523f4a3c8c72f1231d07c47a6c8e60ae0d4", "macaddress": "02:42:ac:11:00:04", "ipv4address": "172.17.0.4/16", "ipv6address": "" }, "9b1f37c8df344c50e042c4b3c75fcb2774888f93fd7a77719fb286bb13f76f38": { "name": "analytics_elasticsearch_1", "endpointid": "fb083d27aaf8c0db1aac90c2a1ea2f752c46d8ac045e365f4b9b7d1651038a56", "macaddress": "02:42:ac:11:00:02", "ipv4address": "172.17.0.2/16", "ipv6address": "" }, "ed0cfad868dbac29bda66de6bee93e7c8caf04d623d9442737a00de0d43c372a": { "name": "analytics_cassandra_1", "endpointid": "2efa95980d681b3627a7c5e952e2f01980cf5ffd0fe4ba6185b2cab735784df6", "macaddress": "02:42:ac:11:00:03", "ipv4address": "172.17.0.3/16", "ipv6address": "" } }, "options": { "com.docker.network.bridge.default_bridge": "true", "com.docker.network.bridge.enable_icc": "true", "com.docker.network.bridge.enable_ip_masquerade": "true", "com.docker.network.bridge.host_binding_ipv4": "0.0.0.0", "com.docker.network.bridge.name": "docker0", "com.docker.network.driver.mtu": "1500" }, "labels": {} } ]
i running locally on macbook/osx. @ loss why can connect docker container using transportclient , through browser, function esspark.savetoes(...) fails.
Comments
Post a Comment