JanusGraphjava数据导⼊_JanusGraph批量导⼊数据代码总
结
这⾥写⾃定义⽬录标题
1. Json导⼊到本地TinkerGraph
2. CSV导⼊到本地TinkerGraph
3. Json导⼊到分布式存储(berkeleyje-es)
本⽂中的代码基于janusgraph 0.3.1进⾏演⽰。数据⽂件都为janusgraph包中⾃带的数据⽂件。
1. Json导⼊到本地TinkerGraph
1.1 配置
conf/hadoop-graph/hadoop-load-json.properties 配置如下:
#
# Hadoop Graph Configuration
#
aphReader=org.lin.hadoop.aphson.GraphSONInputFormat
aphWriter=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat
gremlin.hadoop.inputLocation=./data/grateful-dead.json
gremlin.hadoop.outputLocation=output
gremlin.hadoop.jarsInDistributedCache=true
#
# SparkGraphComputer Configuration
#
spark.master=local[*]
<=1g
spark.rializer=org.apache.spark.rializer.KryoSerializer
istrator=org.lin.spark.GryoRegistrator
1.2 样例Json
{"id":1,"label":"song","inE":{"followedBy":[{"id":3059,"outV":153,"properties":{"weight":1}},{"id":276,"outV":5,"properties": {"weight":2}},{"id":3704,"outV":3,"properties":{"weight":2}},{"id":4383,"outV":62,"pr
operties":{"weight":1}}]},"outE":{"followedBy":[{"id":0,"inV":2,"properties":{"weight":1}},{"id":1,"inV":3,"properties":{"weight":2}}, {"id":2,"inV":4,"properties":{"weight":1}},{"id":3,"inV":5,"properties":{"we
ight":1}},{"id":4,"inV":6,"properties":{"weight":1}}],"sungBy":[{"id":7612,"inV":340}],"writtenBy":
[{"id":7611,"inV":527}]},"properties":{"name":[{"id":0,"value":"HEY BO DIDDLEY"}],"songType":[{"id":2,"value":"
cover"}],"performances":[{"id":1,"value":5}]}}
{"id":2,"label":"song","inE":{"followedBy":[{"id":0,"outV":1,"properties":{"weight":1}},{"id":323,"outV":34,"properties": {"weight":1}}]},"outE":{"followedBy":[{"id":6190,"inV":123,"properties":{"weight":1}},{"i
d":6191,"inV":50,"properties":{"weight":1}}],"sungBy":[{"id":7666,"inV":525}],"writtenBy":[{"id":7665,"inV":525}]},"properties": {"name":[{"id":3,"value":"IM A MAN"}],"songType":[{"id":5,"value":"cover"}],"perfo
rmances":[{"id":4,"value":1}]}}
s
1.3 代码
readGraph = GraphFactory.open('conf/hadoop-graph/hadoop-load-json.properties')
writeGraphConf = new BaConfiguration()
writeGraphConf.tProperty("aph", "org.lin.tinkergraph.structure.TinkerGraph")
writeGraphConf.tProperty("aphFormat", "gryo")
writeGraphConf.tProperty("aphLocation", "/tmp/csv-graph.kryo")
blvp = BulkLoaderVertexProgram.build().bulkLoader(OneTimeBulkLoader).writeGraph(writeGraphConf).create(readGraph)
1.4 ⽂件校验
新⽣成的⽂件如下
[root@vm03 data]# ls -l /tmp/csv-graph.kryo
雅思保分班-rw-r--r--. 1 root root 726353 May 29 04:09 /tmp/csv-graph.kryo
2. CSV导⼊到本地TinkerGraph
2.1 配置
conf/hadoop-graph/hadoop-load-csv.properties 配置如下:
#
# Hadoop Graph Configuration
#how deep is your love
aphReader=org.lin.hadoop.structure.io.script.ScriptInputFormat
aphWriter=org.lin.hadoop.aphson.GraphSONOutputFormat
gremlin.hadoop.inputLocation=./
gremlin.hadoop.outputLocation=output
gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.scriptInputFormat.script=./vy
#
# SparkGraphComputer Configuration
#
spark.master=local[*]
贿赂英文
<=1g
spark.rializer=org.apache.spark.rializer.KryoSerializer
istrator=org.lin.spark.GryoRegistrator
2.2 样例CSV
1,song,HEY BO DIDDLEY,cover,5
followedBy,2,1|followedBy,3,2|followedBy,4,1|followedBy,5,1|followedBy,6,1|sungBy,340|writtenBy,527
followedBy,3,2|followedBy,5,2|followedBy,62,1|followedBy,153,1
2,song,IM A MAN,cover,1 followedBy,50,1|followedBy,123,1|sungBy,525|writtenBy,525 followedBy,1,1|followedBy,34,1
3,song,NOT FADE AWAY,cover,531
followedBy,81,1|followedBy,86,5|followedBy,127,10|followedBy,59,1|followedBy,83,3|followedBy,103,2|followedBy,68,1|followedB
2.3 代码
try to do和try doing的区别vy 代码如下:
def par(line) {
def (vertex, outEdges, inEdges) = line.split(/\t/, 3)
def (v1id, v1label, v1props) = vertex.split(/,/, 3)
dismissal
def v1 = graph.addVertex(T.id, Integer(), T.label, v1label)
switch (v1label) {
ca "song":
def (name, songType, performances) = v1props.split(/,/)
v1.property("name", name)
v1.property("songType", songType)
v1.property("performances", Integer())
break
ca "artist":
v1.property("name", v1props)
break
default:
throw new Exception("Unexpected vertex label: ${v1label}")
}
[[outEdges, true], [inEdges, fal]].each { def edges, def out ->
edges.split(/\|/).grep().each { def edge ->
def parts = edge.split(/,/)
def otherV, eLabel, weight = null
if (parts.size() == 2) {
(eLabel, otherV) = parts
} el {
(eLabel, otherV, weight) = parts
}
def v2 = graph.addVertex(T.id, Integer())
def e = out ? v1.addOutEdge(eLabel, v2) : v1.addInEdge(eLabel, v2)
if (weight != null) e.property("weight", Integer())
}
}
return v1
leanonme}
janusgraph代码:
readGraph = GraphFactory.open('conf/hadoop-graph/hadoop-load-csv.properties')
writeGraphConf = new BaConfiguration()
writeGraphConf.tProperty("aph", "org.lin.tinkergraph.structure.TinkerGraph") writeGraphConf.tProperty("aphFormat", "gryo")
囧司徒每日秀中文网
writeGraphConf.tProperty("aphLocation", "/tmp/csv-graph2.kryo")
blvp = BulkLoaderVertexProgram.build().bulkLoader(OneTimeBulkLoader).writeGraph(writeGraphConf).create(readGraph) pute(SparkGraphComputer).workers(1).program(blvp).submit().get()
gumpert
g = GraphFactory.open(writeGraphConf).traversal()
g.V().valueMap(true)
2.4 ⽂件校验
新⽣成的⽂件如下
[root@vm03 data]# ls -l /tmp/csv-graph2.kryo
-rw-r--r--. 1 root root 339939 May 29 04:56 /tmp/csv-graph2.kryo
3. Json导⼊到分布式存储(berkeleyje-es)
3.1 配置
conf/hadoop-graph/hadoop-load-json-ber-es.properties 配置如下:
#
# Hadoop Graph Configuration
#
aphReader=org.lin.hadoop.aphson.GraphSONInputFormat aphWriter=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat
gremlin.hadoop.inputLocation=./data/grateful-dead.json
gremlin.hadoop.outputLocation=output
gremlin.hadoop.jarsInDistributedCache=true
#
# SparkGraphComputer Configuration
#
spark.master=local[*]
<=1g
spark.rializer=org.apache.spark.rializer.KryoSerializer
istrator=org.lin.spark.GryoRegistrator
./conf/janusgraph-berkeleyje-es-bulkload.properties 配置如下:
代表英文
storage.backend=berkeleyje
storage.directory=../db/berkeley
index.arch.backend=elasticarch
3.2 样例Json
{"id":1,"label":"song","inE":{"followedBy":[{"id":3059,"outV":153,"properties":{"weight":1}},{"id":276,"outV":5,"properties": {"weight":2}},{"id":3704,"outV":3,"properties":{"weight":2}},{"id":4383,"outV":62,"pr
operties":{"weight":1}}]},"outE":{"followedBy":[{"id":0,"inV":2,"properties":{"weight":1}},{"id":1,"inV":3,"properties":{"weight":2}}, {"id":2,"inV":4,"properties":{"weight":1}},{"id":3,"inV":5,"properties":{"we
ight":1}},{"id":4,"inV":6,"properties":{"weight":1}}],"sungBy":[{"id":7612,"inV":340}],"writtenBy":
[{"id":7611,"inV":527}]},"properties":{"name":[{"id":0,"value":"HEY BO DIDDLEY"}],"songType":[{"id":2,"value":"
cover"}],"performances":[{"id":1,"value":5}]}}
{"id":2,"label":"song","inE":{"followedBy":[{"id":0,"outV":1,"properties":{"weight":1}},{"id":323,"outV":34,
"properties": {"weight":1}}]},"outE":{"followedBy":[{"id":6190,"inV":123,"properties":{"weight":1}},{"i
d":6191,"inV":50,"properties":{"weight":1}}],"sungBy":[{"id":7666,"inV":525}],"writtenBy":[{"id":7665,"inV":525}]},"properties": {"name":[{"id":3,"value":"IM A MAN"}],"songType":[{"id":5,"value":"cover"}],"perfo
rmances":[{"id":4,"value":1}]}}
s
3.3 代码
outputGraphConfig = './conf/janusgraph-berkeleyje-es-bulkload.properties'
readGraph = GraphFactory.open('conf/hadoop-graph/hadoop-load-json-ber-es.properties')
blvp = BulkLoaderVertexProgram.build().writeGraph(outputGraphConfig).create(readGraph)
g = GraphFactory.open(outputGraphConfig).traversal()
g.V().valueMap(true)
3.4 验证